open machine learning
DESCRIPTION
This talk explores the possibility of turning machine learning research into open science and proposed concrete approaches to achieve this goalTRANSCRIPT
![Page 1: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/1.jpg)
the open experiment databasemeta-learning for the masses
Joaquin Vanschoren @joavanschoren
![Page 2: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/2.jpg)
The Polymath story
Tim Gowers
![Page 3: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/3.jpg)
Machine Learningare we doing it right?
![Page 4: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/4.jpg)
Computer Science
• The scientific method• Make a hypothesis about the world
• Generate predictions based on this hypothesis
• Design experiments to verify/falsify the prediction
• Predictions verified: hypothesis might be true
• Predictions falsified: hypothesis is wrong
![Page 5: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/5.jpg)
Computer Science
• The scientific method (for ML)• Make a hypothesis about (the structure of) given data
• Generate models based on this hypothesis
• Design experiments to measure accuracy of the models
• Good performance: It works (on this data)
• Bad performance: It doesn’t work on this data
• Aggregates (it works 60% of the time) not useful
![Page 6: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/6.jpg)
Computer Science
• The scientific method (for ML)• Make a hypothesis about (the structure of) given data
• Generate models based on this hypothesis
• Design experiments to measure accuracy of the models
• Good performance: It works (on this data)
• Bad performance: It doesn’t work on this data
• Aggregates (it works 60% of the time) not usefulHow can data be characterized on
which the algorithm works well?
![Page 7: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/7.jpg)
Computer Science
• The scientific method (for ML)• Make a hypothesis about (the structure of) given data
• Generate models based on this hypothesis
• Design experiments to measure accuracy of the models
• Good performance: It works (on this data)
• Bad performance: It doesn’t work on this data
• Aggregates (it works 60% of the time) not usefulHow can data be characterized on
which the algorithm works well? What is the effect of
parameter settings?
![Page 8: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/8.jpg)
Meta-Learning
• The science of understanding which algorithms work well on which types of data
• Hard: thorough understanding of data and algorithms
• Requires good data: extensive experimentation
• Why is this separate from other ML research?• A thorough algorithm evaluation = a meta-learning study
• Original authors know algorithms and data best, have large sets of experiments, are (presumably) interested in knowing on which data their algorithms work well (or not)
![Page 9: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/9.jpg)
Meta-Learning
With the right tools, can we make everyone a meta-learner?
ML algorithm design meta-learning
Large sets of experiments algorithm selection
algorithm characterizationdata characterization
bias-variance analysis
learning curvesdata insight
algorithm insight
algorithm comparisondatasets
source code
![Page 10: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/10.jpg)
Open Machine Learning
![Page 11: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/11.jpg)
Open science
World-wide Telescope
![Page 12: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/12.jpg)
Open science
Microarray Databases
![Page 13: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/13.jpg)
Open science
GenBank
![Page 14: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/14.jpg)
Open machine learning?
• We can also be `open’• Simple, common formats to describe experiments, workflows,
algorithms,...
• Platform to share, store, query, interact
• We can go (much) further• Share experiments automatically (open source ML tools)
• Experiment on-the-fly (cheap, no expensive instruments)
• Controlled experimentation (experimentation engine)
![Page 15: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/15.jpg)
Formalizing machine learning
• Unique names for algorithms, datasets, evaluation measures, data characterizations,... (ontology)
• Based on DMOP, OntoDM, KDOntology, EXPO,...
• Simple, structured way to describe algorithm setups, workflows and experiment runs
• Detailed enough to reproduce all experiments
![Page 16: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/16.jpg)
Run
run
![Page 17: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/17.jpg)
Run
run
Execution of a predefined setup
![Page 18: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/18.jpg)
Run
run
Execution of a predefined setup
setup
![Page 19: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/19.jpg)
Run
setup
run
![Page 20: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/20.jpg)
Run
in
setup
data run
![Page 21: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/21.jpg)
Run
in
setup
data
machine
run
![Page 22: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/22.jpg)
Run
in out
setup
data data
machine
run
![Page 23: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/23.jpg)
Run
in out
setup
data data
machine
run
Also: start time author status,...
![Page 24: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/24.jpg)
Setup
setup
![Page 25: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/25.jpg)
Setup
Plan of what we want to do
setup
![Page 26: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/26.jpg)
Setup
Plan of what we want to do
setup
f(x)algorithm
setupfunction
setupwork!ow experiment
![Page 27: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/27.jpg)
Setup
setup
f(x)algorithm
setupfunction
setupwork!ow experiment
part of
Hierarchical
![Page 28: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/28.jpg)
Setup
setup
f(x)algorithm
setupfunction
setupwork!ow experiment
part ofp=!
parameter setting
HierarchicalParameterized
![Page 29: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/29.jpg)
Setup
setup
f(x)algorithm
setupfunction
setupwork!ow experiment
part ofp=!
parameter setting
HierarchicalParameterized
Abstract/concrete
![Page 30: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/30.jpg)
Algorithm Setup
algorithmsetup
![Page 31: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/31.jpg)
Algorithm Setup
Fully defined algorithm configuration
algorithmsetup
part of
![Page 32: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/32.jpg)
Algorithm Setup
Fully defined algorithm configuration
algorithmsetup
p=!parameter settingimplementation
part of
f(x)function
setup
![Page 33: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/33.jpg)
Algorithm Setup
Fully defined algorithm configuration
algorithmsetup
p=!parameter settingimplementation
part of
f(x)function
setup
![Page 34: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/34.jpg)
Algorithm Setup
algorithmsetup
p=!parameter setting
part of
f(x)function
setupimplementation
![Page 35: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/35.jpg)
Algorithm Setup
algorithmsetup
p=!
algorithm
parameter setting
algorithm quality
part of
f(x)function
setup
p=?parameter
f(x)mathematical function
implementation
![Page 36: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/36.jpg)
Algorithm Setup
algorithmsetup
p=!
algorithm
parameter setting
algorithm quality
part of
f(x)function
setup
p=?parameter
f(x)mathematical function
implementation
unique names
![Page 37: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/37.jpg)
Algorithm Setup
algorithmsetup
p=!
algorithm
parameter setting
algorithm quality
part of
f(x)function
setup
p=?parameter
f(x)mathematical function
implementation
unique names
Roles: learner, base-learner, kernel,...
![Page 38: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/38.jpg)
Setup
setup
f(x)algorithm
setupfunction
setupwork!ow experiment
part of
![Page 39: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/39.jpg)
Workflow Setup
setup
algorithmsetup
work!ow
part of
![Page 40: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/40.jpg)
Workflow Setup
setup
algorithmsetup
work!ow
part of
source
connection
target
Workflow: components, connections, and parameters (inputs)
![Page 41: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/41.jpg)
Workflow Setup
setup
algorithmsetup
work!ow
part of
source
connection
target
Workflow: components, connections, and parameters (inputs)
Also: ports datatype
![Page 42: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/42.jpg)
WorkflowExample
Weka.ARFFLoader
p=! location= http://...
2:loadData
Weka.Evaluation
p=! F=10
3:crossValidate
Weka.SMO
p=! C=0.01
4:learner
Weka.RBF
f(x) 5:kernel
p=! G=0.01
p=! S=1
data
data
eval
pred
url evalu-ations
predic-tions
par
logRuns=true logRuns=falselogRuns=true
1:mainFlow
![Page 43: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/43.jpg)
WorkflowExample
Weka.ARFFLoader
p=! location= http://...
2:loadData
Weka.Evaluation
p=! F=10
3:crossValidate
Weka.SMO
p=! C=0.01
4:learner
Weka.RBF
f(x) 5:kernel
p=! G=0.01
p=! S=1
data
data
eval
pred
url evalu-ations
predic-tions
par
logRuns=true logRuns=falselogRuns=true
1:mainFlow
86
Evaluations
7 Predictions
data data evalpred
predictions
evaluations
Weka.Instances
![Page 44: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/44.jpg)
Setup
setup
f(x)algorithm
setupfunction
setupwork!ow experiment
part of
![Page 45: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/45.jpg)
ExperimentSetup
setup
algorithmsetup
work!ow experiment
part of
<X>experiment
variable
![Page 46: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/46.jpg)
ExperimentSetup
setup
algorithmsetup
work!ow experiment
part of
<X>experiment
variable
setup
Also: experiment design, description, literature reference, author,...
![Page 47: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/47.jpg)
Experiment Setup
![Page 48: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/48.jpg)
Experiment SetupVariables: labeled tuples which can be
referenced in setups
![Page 49: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/49.jpg)
Run
in out
setup
data data
machine
run
Also: start time author status,...
![Page 50: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/50.jpg)
Run
data
dataset evaluation model predictions
![Page 51: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/51.jpg)
Run
sourcedata run
dataset evaluation model predictions
![Page 52: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/52.jpg)
Run
sourcedata run
dataset evaluation model predictions
data quality
![Page 53: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/53.jpg)
EXPMLWeka.ARFFLoader
p=! location= http://...
2:loadData
Weka.Evaluation
p=! F=10
3:crossValidate
Weka.SMO
p=! C=0.01
4:learner
Weka.RBF
f(x) 5:kernel
p=! G=0.01
p=! S=1
data
data
eval
pred
url evalu-ations
predic-tions
par
logRuns=true logRuns=falselogRuns=true
1:mainFlow
![Page 54: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/54.jpg)
Demo(preview)
![Page 55: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/55.jpg)
Learning curves
0.2$
0.3$
0.4$
0.5$
0.6$
0.7$
0.8$
0.9$
1$
10$ 20$ 30$ 40$ 50$ 60$ 70$ 80$ 90$ 100$
pred
ic've)accuracy)
percentage)of)original)dataset)size)
RandomForest$C45$Logis<cRegression$RacedIncrementalLogitBoostAStump$NaiveBayes$SVMARBF$
Examples
![Page 56: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/56.jpg)
When does one algorithm outperform another?
Examples
![Page 57: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/57.jpg)
When does one algorithm outperform another?
Examples
![Page 58: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/58.jpg)
Bias-variance profile + effect of dataset size
Examples
![Page 59: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/59.jpg)
Bias-variance profile + effect of dataset size
boosting
bagging
Examples
![Page 60: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/60.jpg)
Bias-variance profile + effect of dataset size
Examples
![Page 61: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/61.jpg)
Taking it furtherSeamless integration
• Webservice for sharing, querying experiments
• Integrate experiment sharing in ML tools (WEKA, KNIME, RapidMiner, R, ....)
• Mapping implementations, evaluation measures,...
• Online platform for custom querying, community interaction
• Semantic wiki: algorithm/data descriptions, rankings, ...
![Page 62: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/62.jpg)
Experimentation Engine
• Controlled experimentation (Delve, MLComp)• Download datasets, build training/test sets
• Feed training and test sets to algorithms, retrieve predictions/models
• Run broad set of evaluation measures
• Benchmarking (Cross-Validation), learning curve analysis, bias-variance analysis, workflows(!)
• Compute data properties for new datasets
![Page 63: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/63.jpg)
Why would you use it?(seeding)
• Let the system run the experiments for you
• Immediate, highly detailed benchmarks (no repeats)
• Up to date, detailed results (vs. static, aggregated in journals)
• All your results organized online (private?), anytime, anywhere
• Interact with people (weird results?)
• Get credit for all your results (e.g. citations), unexpected results
• Visibility, new collaborations
• Check if your algorithm really the best (e.g. active testing)
• On which datasets does it perform well/badly?
![Page 64: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/64.jpg)
Question
Is open
machine learning possible?
![Page 65: Open Machine Learning](https://reader031.vdocuments.net/reader031/viewer/2022020921/554a542fb4c905572f8b4ab6/html5/thumbnails/65.jpg)
http://expdb.cs.kuleuven.be
Thanks
Gracias
Xie XieDanke
Dank U
Merci
Efharisto
Dhanyavaad
GrazieSpasiba
Kia oraTesekkurler
Diolch
KöszönömArigato
Hvala
Toda