easyml: ease the process of machine learning with data flojunxu/publications/sosp2017-aisys... ·...

14
EasyML: Ease the Process of Machine Learning with Data Flow Jun Xu Institute of Computing Technology, Chinese Academy of Sciences Oct. 28, 2017 1 SOSP AI System Workshop Shanghai Oct. 28, 2017

Upload: others

Post on 21-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

EasyML:EasetheProcessofMachineLearningwithDataFlow

JunXu

InstituteofComputingTechnology,ChineseAcademyofSciencesOct.28,2017

1

SOSPAI SystemWorkshopShanghai

Oct.28,2017

Page 2: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

ApplyingMachineLearningisnotEasy

2

UI

Volume

Simple UI is helpful

Collaboratingandsharing

Volume

Share the data, algorithms, and

experience

The barrier comes not only from the advanced algorithms, but also from the complex process of using the algorithms!

Mobility

Volume

Can access the service everywhere

Page 3: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

LargeScaleMachineLearningSystem@ICT

3

Ourfocus

Data Storage and ManagementLarge scale data management

HDFSStructured data management

MySQL

Distributed Computing

Map-Reduce

LIB: scalable machine learning algorithms

Conventional ML algorithms

Data pre-processing, ETL, model evaluation …

Easy ML: interactive graphical UIDesigner: ML task creation, editing,

submitting and managementMonitor: task monitoring, result visualization, and task reusing

Spark TensorFlow

DL/RL algorithms for ranking & matching

Focus of EasyML

Page 4: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

DesignofEasyMachineLearning

4

Data Storage and ManagementLarge scale data management

HDFSStructured data management

MySQL

Distributed Computing

Map-Reduce Spark TensorFlow

Scheduling: Oozie

Execute command lines Program status

Interactive GUI (GWT)

Dataflow DAG

Workflow DAG

Job status, program status

designer monitor

Submit Oozie job

Node: program / dataEdge: dataflow

Node: program / start / end / fork / joinEdge: dependency

Page 5: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

KeyFeatures— Resource Management

• Managingthealgorithms,data,andtasks• Uploadingalgorithmsanddata

5

UploadingnewalgorithmsManagingprograms,data,andtasks

Page 6: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

KeyFeatures— TaskDesign

• CreatingthetaskDAG(usuallybycloningandeditinganexistingtask)withdrag-and-dropmanner

• Settingtheparametersforeachnode• Submittingforexecution

6

Page 7: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

KeyFeatures— TaskMonitoring

• Monitoringstatusoftasksandnodes• Checking/downloadingtheoutputs• Visualizingthedata/models

7

Data/resultsvisualizationTask/nodestatusmonitoring

Page 8: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

KeyFeatures— TaskReusing

• Editing(appendingnodes,deletingnodes,andchangingparameters)andre-submiting

8

Page 9: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

DeployasWebServicehttp://159.226.40.104:18080/dev

• Advantages– Sharing:sharedata/programs/tasksamongusers– Collaborating:workingtogetherforonetask– Mobility:accessingwithwebbrowsersanywhere– Open:ETLfordataimport/export;canrunthird-partyalgorithms

92017/10/30

Brower EasyML service Hadoop/Spark/TensorFlow cluster

Page 10: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

SourceSharedatGithubhttps://github.com/ICT-BDA/EasyML

• Top1JavaprojectatGithub trendingforoneweek

• 1400+starsand~300forks• CIKM2016bestdemocandidate

[Guo etal.,CIKM’16]

10

Page 11: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

RelatedSystems

• StanfordCodaLab– Acollaborativeplatformforreproducible

research• Rapid Miner Studio

– InteractiveGUI andintegratedmachinelearningalgorithms

• MicrosoftAzuremachinelearning– GUI-basedIDEforconstructingand

operationalizingmachinelearningworkflowonAzure

• Alibaba御膳房/DT PAI• The Fourth Paradigm Prophet

11

Page 12: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

Summary

• Ease theprocessofusingmachinelearning– MachinelearningprocessasdataflowDAG– InteractiveGUIfordesigningandmanagingMLtasks– Deployedasawebservice

• Resourcemanagement• Taskdesign• Taskmonitoring• Resultreusing

– Sourcecode@Githubhttps://github.com/ICT-BDA/EasyML

12

Page 13: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

EasyML Team

• Xinjie Chen,ICTCAS• Zhaohui Li,ICTCAS• Tianyou Guo,Sogou Inc.• Jianpeng Hou,GoogleChina• PingLi,Tencent Wechat• Jiashuo Cao,CUIT• DongHuang,UCAS• Xiaohui Yan• Xueqi Cheng,ICTCAS

13

Page 14: EasyML: Ease the Process of Machine Learning with Data Flojunxu/publications/SOSP2017-AISys... · 2017. 10. 30. · Applying Machine Learning is not Easy 2 UI Volume Simple UI is

Thanks!

[email protected]://www.bigdatalab.ac.cn/~junxu

14