![Page 1: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/1.jpg)
Collaborators: Evan Sparks, Michael Franklin, Michael I. Jordan, Tim Kraska
UC Berkeley
Ameet Talwalkar
Towards an OpBmizer for MLbase
![Page 2: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/2.jpg)
Problem: Scalable implementa.ons difficult for ML Developers…
ML Developer
Meta-Data
Statistics
User
Declarative ML Task
ML Contract + Code
Master Server
….
result (e.g., fn-model & summary)
Optimizer
Parser
Executor/Monitoring
ML Library
DMX Runtime
DMX Runtime
DMX Runtime
DMX Runtime
LLP
PLP
Master
Slaves
![Page 3: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/3.jpg)
Problem: Scalable implementa.ons difficult for ML Developers…
ML Developer
Meta-Data
Statistics
User
Declarative ML Task
ML Contract + Code
Master Server
….
result (e.g., fn-model & summary)
Optimizer
Parser
Executor/Monitoring
ML Library
DMX Runtime
DMX Runtime
DMX Runtime
DMX Runtime
LLP
PLP
Master
Slaves
Key Features
-
-
® ®
The Language of Technical Computing
MATLAB® is a high-level language and interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.
You can use MATLAB for a range of appli-cations, including signal processing and communications, image and video process-ing, control systems, test and measurement, computational finance, and computational biology. More than a million engineers and scientists in industry and academia use MATLAB, the language of technical computing.
MATLAB Overview 2:04
Analyzing and visualizing data using the MATLAB desktop. The MATLAB environment also lets you write programs and develop algorithms and applications.
Key Features
-
-
® ®
The Language of Technical Computing
MATLAB® is a high-level language and interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.
You can use MATLAB for a range of appli-cations, including signal processing and communications, image and video process-ing, control systems, test and measurement, computational finance, and computational biology. More than a million engineers and scientists in industry and academia use MATLAB, the language of technical computing.
MATLAB Overview 2:04
Analyzing and visualizing data using the MATLAB desktop. The MATLAB environment also lets you write programs and develop algorithms and applications.
![Page 4: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/4.jpg)
Problem: Scalable implementa.ons difficult for ML Developers…
ML Developer
Meta-Data
Statistics
User
Declarative ML Task
ML Contract + Code
Master Server
….
result (e.g., fn-model & summary)
Optimizer
Parser
Executor/Monitoring
ML Library
DMX Runtime
DMX Runtime
DMX Runtime
DMX Runtime
LLP
PLP
Master
Slaves
![Page 5: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/5.jpg)
Problem: Scalable implementa.ons difficult for ML Developers…
ML Developer
Meta-Data
Statistics
User
Declarative ML Task
ML Contract + Code
Master Server
….
result (e.g., fn-model & summary)
Optimizer
Parser
Executor/Monitoring
ML Library
DMX Runtime
DMX Runtime
DMX Runtime
DMX Runtime
LLP
PLP
Master
Slaves
CHALLENGE: Can we simplify distributed ML development?
![Page 6: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/6.jpg)
Problem: ML is difficultfor End Users…
![Page 7: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/7.jpg)
Too many ways to preprocess…
Problem: ML is difficultfor End Users…
![Page 8: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/8.jpg)
Too many ways to preprocess…
Problem: ML is difficultfor End Users…
Too many algorithms…
![Page 9: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/9.jpg)
Too many ways to preprocess…
Too many knobs…
Problem: ML is difficultfor End Users…
Too many algorithms…
![Page 10: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/10.jpg)
Too many ways to preprocess…
Too many knobs…
Problem: ML is difficultfor End Users…
Difficult to debug…
Too many algorithms…
![Page 11: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/11.jpg)
Too many ways to preprocess…
Too many knobs…
Problem: ML is difficultfor End Users…
Difficult to debug…
Doesn’t scale…
Too many algorithms…
![Page 12: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/12.jpg)
Too many ways to preprocess…
Too many knobs…
Problem: ML is difficultfor End Users…
Difficult to debug…
Doesn’t scale…
CHALLENGE: Can we automate ML pipeline
construcBon?
Too many algorithms…
![Page 13: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/13.jpg)
MLbase
4
MLbase aims to simplify development and deployment of ML
pipelines
![Page 14: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/14.jpg)
MLbase
4
Apache Spark
Spark: Cluster compuBng system designed for iteraBve computaBon
MLbase aims to simplify development and deployment of ML
pipelines
![Page 15: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/15.jpg)
MLbase
4
MLlibApache Spark
Spark: Cluster compuBng system designed for iteraBve computaBon
MLlib: Spark’s core ML library
MLbase aims to simplify development and deployment of ML
pipelines
![Page 16: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/16.jpg)
MLbase
4
MLlib
MLI
Apache Spark
Spark: Cluster compuBng system designed for iteraBve computaBon
MLlib: Spark’s core ML library
MLI: API to simplify ML development
MLbase aims to simplify development and deployment of ML
pipelines
![Page 17: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/17.jpg)
MLbase
4
MLlib
MLIMLOpt
Apache Spark
Spark: Cluster compuBng system designed for iteraBve computaBon
MLlib: Spark’s core ML library
MLI: API to simplify ML development
MLOpt: DeclaraBve layer that aims to automate ML pipeline construcBon via search over feature extractors and models
MLbase aims to simplify development and deployment of ML
pipelines
![Page 18: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/18.jpg)
MLbase
4
MLlib
MLIMLOpt
Apache Spark
Spark: Cluster compuBng system designed for iteraBve computaBon
MLlib: Spark’s core ML library
MLI: API to simplify ML development
MLOpt: DeclaraBve layer that aims to automate ML pipeline construcBon via search over feature extractors and models
MLbase aims to simplify development and deployment of ML
pipelinesMLOpt and MLI are experimental testbeds
![Page 19: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/19.jpg)
Vision MLlib and MLI MLOpt
![Page 20: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/20.jpg)
6
MLlib+ Scalable and fast + Simple development environment + Part of Spark’s robust ecosystem
SparkSQL
Apache Spark
Spark Streaming
MLlib (machine learning)
GraphX (graph)
![Page 21: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/21.jpg)
AcBve Development
![Page 22: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/22.jpg)
AcBve DevelopmentIni.al Release• Developed by MLbase team in AMPLab (11 contributors)
• Scala, Java
• Shipped with Spark v0.8 (Sep 2013)
![Page 23: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/23.jpg)
AcBve DevelopmentIni.al Release• Developed by MLbase team in AMPLab (11 contributors)
• Scala, Java
• Shipped with Spark v0.8 (Sep 2013)
11 months later…• 55+ contributors from various organizaBons
• Scala, Java, Python
• Improved documentaBon / code examples, API stability
• Latest release part of Spark v1.0 (May 2014)
![Page 24: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/24.jpg)
Algorithms in v0.8• classifica.on: logisBc regression, linear support vector machines (SVM)
• regression: linear regression,
• collabora.ve filtering: alternaBng least squares (ALS)
• clustering: k-‐means
• op.miza.on: stochasBc gradient descent (SGD)
![Page 25: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/25.jpg)
Algorithms in v1.0• classifica.on: logisBc regression, linear support vector machines (SVM), naive Bayes, decision trees
• regression: linear regression, regression trees
• collabora.ve filtering: alternaBng least squares (ALS)
• clustering: k-‐means
• op.miza.on: stochasBc gradient descent (SGD), limited-‐memory BFGS (L-‐BFGS)
• dimensionality reduc.on: singular value decomposiBon (SVD), principal component analysis (PCA)
![Page 26: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/26.jpg)
MLlib, MLI and Roadmap
![Page 27: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/27.jpg)
MLlib, MLI and Roadmap• MLI: Shield ML Developers from low-‐details
• Provide familiar mathemaBcal operators in distributed sebng (tables, matrices, opBmizaBon primiBves)
• Standard APIs defining ML algorithms and feature extractors
![Page 28: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/28.jpg)
MLlib, MLI and Roadmap• MLI: Shield ML Developers from low-‐details
• Provide familiar mathemaBcal operators in distributed sebng (tables, matrices, opBmizaBon primiBves)
• Standard APIs defining ML algorithms and feature extractors
• Many of these ideas are (or soon will be) in MLlib
![Page 29: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/29.jpg)
MLlib, MLI and Roadmap• MLI: Shield ML Developers from low-‐details
• Provide familiar mathemaBcal operators in distributed sebng (tables, matrices, opBmizaBon primiBves)
• Standard APIs defining ML algorithms and feature extractors
• Many of these ideas are (or soon will be) in MLlib
• Next release of Spark and MLlib being tested now • staBsBcal toolbox, python decision tree API, online logisBc regression, …
![Page 30: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/30.jpg)
MLlib, MLI and Roadmap• MLI: Shield ML Developers from low-‐details
• Provide familiar mathemaBcal operators in distributed sebng (tables, matrices, opBmizaBon primiBves)
• Standard APIs defining ML algorithms and feature extractors
• Many of these ideas are (or soon will be) in MLlib
• Next release of Spark and MLlib being tested now • staBsBcal toolbox, python decision tree API, online logisBc regression, …
• Longer term • Scalable implementaBons of standard ML algorithms and underlying opBmizaBon primiBves
• Support for ML pipeline development (related to MLOpt)
![Page 31: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/31.jpg)
MLlib, MLI and Roadmap• MLI: Shield ML Developers from low-‐details
• Provide familiar mathemaBcal operators in distributed sebng (tables, matrices, opBmizaBon primiBves)
• Standard APIs defining ML algorithms and feature extractors
• Many of these ideas are (or soon will be) in MLlib
• Next release of Spark and MLlib being tested now • staBsBcal toolbox, python decision tree API, online logisBc regression, …
• Longer term • Scalable implementaBons of standard ML algorithms and underlying opBmizaBon primiBves
• Support for ML pipeline development (related to MLOpt)
Feedback and Contribu9ons Encouraged!
![Page 32: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/32.jpg)
Vision MLlib and MLI MLOpt
![Page 33: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/33.jpg)
Grand Vision
✦ User declaraBvely specifies a task ✦ Search through MLlib/MLI to find
the best model/pipeline
SQL Result ‘MQL’ Model
![Page 34: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/34.jpg)
A Standard ML Pipeline
![Page 35: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/35.jpg)
!Data
A Standard ML Pipeline
![Page 36: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/36.jpg)
!Data
Feature ExtracBon
A Standard ML Pipeline
![Page 37: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/37.jpg)
!Data
Feature ExtracBon
Model Training
A Standard ML Pipeline
![Page 38: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/38.jpg)
!Data
Feature ExtracBon
Model Training
Final Model
A Standard ML Pipeline
![Page 39: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/39.jpg)
!Data
Feature ExtracBon
Model Training
Final Model
A Standard ML Pipeline
✦ In pracBce, model building is an iteraBve process of conBnuous refinement
![Page 40: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/40.jpg)
!Data
Feature ExtracBon
Model Training
Final Model
A Standard ML Pipeline
✦ In pracBce, model building is an iteraBve process of conBnuous refinement
✦ Our grand vision is to automate the construcBon of these pipelines
![Page 41: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/41.jpg)
Training A Model
![Page 42: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/42.jpg)
Training A Model
✦ For each point in dataset ✦ compute gradient ✦ update model ✦ repeat unBl converged
![Page 43: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/43.jpg)
Training A Model
✦ For each point in dataset ✦ compute gradient ✦ update model ✦ repeat unBl converged
✦ Requires mul.ple passes
![Page 44: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/44.jpg)
Training A Model
✦ For each point in dataset ✦ compute gradient ✦ update model ✦ repeat unBl converged
✦ Requires mul.ple passes✦ Common access pafern
✦ Naive Bayes, Trees, etc.
![Page 45: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/45.jpg)
Training A Model
✦ For each point in dataset ✦ compute gradient ✦ update model ✦ repeat unBl converged
✦ Requires mul.ple passes✦ Common access pafern
✦ Naive Bayes, Trees, etc.
✦ Minutes to train an SVM on 200GB of data on a 16-‐node cluster
![Page 46: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/46.jpg)
The Tricky Part✦ Algorithms
✦ LogisBc Regression, SVM, Tree-‐based, etc.
✦ Algorithm hyper-‐parameters ✦ Learning Rate, RegularizaBon, etc.
Algorithms
Hyper Parameters
![Page 47: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/47.jpg)
The Tricky Part✦ Algorithms
✦ LogisBc Regression, SVM, Tree-‐based, etc.
✦ Algorithm hyper-‐parameters ✦ Learning Rate, RegularizaBon, etc.
Algorithms
Hyper Parameters
FeaturizaBon
✦ FeaturizaBon ✦ Text: n-‐grams, TF-‐IDF ✦ Images: Gabor filters, random
convoluBons ✦ Random projecBon? Scaling?
![Page 48: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/48.jpg)
The Tricky Part✦ Algorithms
✦ LogisBc Regression, SVM, Tree-‐based, etc.
✦ Algorithm hyper-‐parameters ✦ Learning Rate, RegularizaBon, etc.
Algorithms
Hyper Parameters
FeaturizaBon
✦ FeaturizaBon ✦ Text: n-‐grams, TF-‐IDF ✦ Images: Gabor filters, random
convoluBons ✦ Random projecBon? Scaling?
![Page 49: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/49.jpg)
A Standard ML Pipeline!
DataFeature ExtracBon
Model Training
Final Model
![Page 50: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/50.jpg)
A Standard ML Pipeline
✦ In pracBce, model building is an iteraBve process of conBnuous refinement
✦ Our grand vision is to automate the construcBon of these pipelines
!Data
Feature ExtracBon
Model Training
Final Model
![Page 51: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/51.jpg)
A Standard ML Pipeline
✦ In pracBce, model building is an iteraBve process of conBnuous refinement
✦ Our grand vision is to automate the construcBon of these pipelines
✦ Start with one aspect of the pipeline -‐ model selecBon
!Data
Feature ExtracBon
Model Training
Final Model
Automated Model Selec.on
![Page 52: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/52.jpg)
One Approach
Learning Rate
RegularizaBon
![Page 53: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/53.jpg)
One Approach
Learning Rate
RegularizaBon✦ Try it all!
✦ Search over all hyperparameters, algorithms, features, etc.
![Page 54: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/54.jpg)
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
![Page 55: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/55.jpg)
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
![Page 56: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/56.jpg)
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
![Page 57: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/57.jpg)
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
![Page 58: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/58.jpg)
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
![Page 59: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/59.jpg)
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
![Page 60: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/60.jpg)
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
![Page 61: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/61.jpg)
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
![Page 62: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/62.jpg)
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
![Page 63: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/63.jpg)
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
✦ Drawbacks ✦ Expensive to compute models ✦ Hyperparameter space is large
✦ Some version of this sBll oken done in pracBce!
![Page 64: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/64.jpg)
A Befer Approach
✦ Befer resource uBlizaBon ✦ through batching
✦ Algorithmic Speedups ✦ via early stopping
✦ Improved Search ✦ e.g., via randomizaBon
Learning Rate
RegularizaBon
Best answer
![Page 65: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/65.jpg)
A Befer Approach
✦ Befer resource uBlizaBon ✦ through batching
✦ Algorithmic Speedups ✦ via early stopping
✦ Improved Search ✦ e.g., via randomizaBon
Learning Rate
RegularizaBon
Best answer
![Page 66: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/66.jpg)
A Befer Approach
✦ Befer resource uBlizaBon ✦ through batching
✦ Algorithmic Speedups ✦ via early stopping
✦ Improved Search ✦ e.g., via randomizaBon
Learning Rate
RegularizaBon
Best answer
![Page 67: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/67.jpg)
A Befer Approach
✦ Befer resource uBlizaBon ✦ through batching
✦ Algorithmic Speedups ✦ via early stopping
✦ Improved Search ✦ e.g., via randomizaBon
Learning Rate
RegularizaBon
Best answer
![Page 68: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/68.jpg)
A Tale Of 3 OpBmizaBons
Be4er Resource U.liza.on
Algorithmic Speedups
Improved Search
![Page 69: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/69.jpg)
Befer Resource UBlizaBon
![Page 70: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/70.jpg)
Befer Resource UBlizaBon
✦ Modern memory slower than processors
✦ Can read: 0.6b doubles/sec/core (4.8 GB/s)
✦ Can compute: 15b flops/sec/core
![Page 71: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/71.jpg)
Befer Resource UBlizaBon
✦ Modern memory slower than processors
✦ Can read: 0.6b doubles/sec/core (4.8 GB/s)
✦ Can compute: 15b flops/sec/core
✦ We can do 25 flops/double read
![Page 72: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/72.jpg)
What Does This Mean For Modeling?
A B C
1 a Dog
1 b Cat
2 c Cat
2 d Cat
3 e Dog
3 f Horse
4 g Doge
Mod
el 2
Mod
el 1
![Page 73: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/73.jpg)
What Does This Mean For Modeling?
✦ Typical model update requires 2-‐4 flops/double ✦ recall: 25 flops / double read
A B C
1 a Dog
1 b Cat
2 c Cat
2 d Cat
3 e Dog
3 f Horse
4 g Doge
Mod
el 2
Mod
el 1
![Page 74: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/74.jpg)
What Does This Mean For Modeling?
✦ Typical model update requires 2-‐4 flops/double ✦ recall: 25 flops / double read
✦ Can do 7-‐10 model updates per double we read ✦ Assuming that models fit in
cache
A B C
1 a Dog
1 b Cat
2 c Cat
2 d Cat
3 e Dog
3 f Horse
4 g Doge
Mod
el 2
Mod
el 1
![Page 75: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/75.jpg)
What Does This Mean For Modeling?
✦ Typical model update requires 2-‐4 flops/double ✦ recall: 25 flops / double read
✦ Can do 7-‐10 model updates per double we read ✦ Assuming that models fit in
cache
✦ Train mul.ple models simultaneously
A B C
1 a Dog
1 b Cat
2 c Cat
2 d Cat
3 e Dog
3 f Horse
4 g Doge
Mod
el 2
Mod
el 1
![Page 76: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/76.jpg)
What Do We See In Spark?
✦ 2x and 5x increase in models trained/sec with batching
✦ Overhead from virtualizaBon, network, etc.
![Page 77: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/77.jpg)
What Do We See In Spark?
![Page 78: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/78.jpg)
What Do We See In Spark?
✦ These numbers are with vector-‐matrix mulBplies
![Page 79: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/79.jpg)
What Do We See In Spark?
✦ These numbers are with vector-‐matrix mulBplies
✦ Can do befer when rewriBng in terms of matrix-‐matrix mulBplies
![Page 80: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/80.jpg)
What Do We See In Spark?
✦ These numbers are with vector-‐matrix mulBplies
✦ Can do befer when rewriBng in terms of matrix-‐matrix mulBplies
![Page 81: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/81.jpg)
A Tale Of 3 OpBmizaBons
Be4er Resource U.liza.on
Algorithmic Speedups
Improved Search
![Page 82: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/82.jpg)
Learning Rate
RegularizaBon
Best answer
Algorithmic Speedups
![Page 83: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/83.jpg)
Learning Rate
RegularizaBon
Best answer
Algorithmic Speedups
✦ Each point in hyper-‐parameter space represents trained model
![Page 84: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/84.jpg)
Learning Rate
RegularizaBon
Best answer
Algorithmic Speedups
✦ Each point in hyper-‐parameter space represents trained model
✦ SomeBmes we see early on that a model is no good
![Page 85: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/85.jpg)
Learning Rate
RegularizaBon
Best answer
Algorithmic Speedups
✦ Each point in hyper-‐parameter space represents trained model
✦ SomeBmes we see early on that a model is no good
✦ So we stop early -‐ give up on models that are not progressing
![Page 86: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/86.jpg)
Learning Rate
RegularizaBon
Best answer
Algorithmic Speedups
✦ Each point in hyper-‐parameter space represents trained model
✦ SomeBmes we see early on that a model is no good
✦ So we stop early -‐ give up on models that are not progressing
![Page 87: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/87.jpg)
Algorithmic Speedups
✦ Each point in hyper-‐parameter space represents trained model
✦ SomeBmes we see early on that a model is no good
✦ So we stop early -‐ give up on models that are not progressing
![Page 88: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/88.jpg)
Algorithmic Speedups
✦ Each point in hyper-‐parameter space represents trained model
✦ SomeBmes we see early on that a model is no good
✦ So we stop early -‐ give up on models that are not progressing
![Page 89: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/89.jpg)
Be4er Resource U.liza.on
Algorithmic Speedups
Improved Search
A Tale Of 3 OpBmizaBons
![Page 90: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/90.jpg)
What Search Method?
![Page 91: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/91.jpg)
What Search Method?
✦ Various derivaBve-‐free opBmizaBon techniques ✦ Simple ones (Grid, Random) ✦ Classic DerivaBve-‐Free (Nelder-‐Mead, Powell’s method) ✦ Bayesian (SMAC, TPE, Spearmint)
![Page 92: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/92.jpg)
What Search Method?
✦ Various derivaBve-‐free opBmizaBon techniques ✦ Simple ones (Grid, Random) ✦ Classic DerivaBve-‐Free (Nelder-‐Mead, Powell’s method) ✦ Bayesian (SMAC, TPE, Spearmint)
✦ What should we do? ✦ Tried on 5 datasets, opBmized over 4 hyperparameters!
![Page 93: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/93.jpg)
What Search Method? GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
australianbreast
diabetesfourclass
splice
16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls
Data
set a
nd V
alid
atio
n Er
ror
Maximum Calls1681256625
Comparison of Search Methods Across Learning Problems
✦ Various derivaBve-‐free opBmizaBon techniques ✦ Simple ones (Grid, Random) ✦ Classic DerivaBve-‐Free (Nelder-‐Mead, Powell’s method) ✦ Bayesian (SMAC, TPE, Spearmint)
✦ What should we do? ✦ Tried on 5 datasets, opBmized over 4 hyperparameters!
![Page 94: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/94.jpg)
What Search Method? GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
australianbreast
diabetesfourclass
splice
16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls
Data
set a
nd V
alid
atio
n Er
ror
Maximum Calls1681256625
Comparison of Search Methods Across Learning Problems
✦ Various derivaBve-‐free opBmizaBon techniques ✦ Simple ones (Grid, Random) ✦ Classic DerivaBve-‐Free (Nelder-‐Mead, Powell’s method) ✦ Bayesian (SMAC, TPE, Spearmint)
✦ What should we do? ✦ Tried on 5 datasets, opBmized over 4 hyperparameters!
![Page 95: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/95.jpg)
What Search Method? GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
australianbreast
diabetesfourclass
splice
16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls
Data
set a
nd V
alid
atio
n Er
ror
Maximum Calls1681256625
Comparison of Search Methods Across Learning Problems
✦ Various derivaBve-‐free opBmizaBon techniques ✦ Simple ones (Grid, Random) ✦ Classic DerivaBve-‐Free (Nelder-‐Mead, Powell’s method) ✦ Bayesian (SMAC, TPE, Spearmint)
✦ What should we do? ✦ Tried on 5 datasets, opBmized over 4 hyperparameters!
![Page 96: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/96.jpg)
What Search Method? GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
australianbreast
diabetesfourclass
splice
16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls
Data
set a
nd V
alid
atio
n Er
ror
Maximum Calls1681256625
Comparison of Search Methods Across Learning Problems
![Page 97: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/97.jpg)
Pubng It All Together
![Page 98: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/98.jpg)
Pubng It All Together✦ First version of MLbase opBmizer
![Page 99: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/99.jpg)
Pubng It All Together✦ First version of MLbase opBmizer✦ 30GB dense images (240K x 16K)✦ 2 model families, 5 hyperparams
![Page 100: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/100.jpg)
Pubng It All Together✦ First version of MLbase opBmizer✦ 30GB dense images (240K x 16K)✦ 2 model families, 5 hyperparams✦ Baseline: grid search✦ Our method: combinaBon of
✦ Batching ✦ Early stopping ✦ Random or TPE
![Page 101: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/101.jpg)
●●●●●●●●●●●●●●●●
●●●●
●●●●
●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0.25
0.50
0.75
0 200 400 600 800Time elapsed (m)
Best
Val
idat
ion
Erro
r See
n So
Far
Search Method●
●
●
Grid − UnoptimizedRandom − OptimizedTPE − Optimized
Model Convergence Over Time
Pubng It All Together✦ First version of MLbase opBmizer✦ 30GB dense images (240K x 16K)✦ 2 model families, 5 hyperparams✦ Baseline: grid search✦ Our method: combinaBon of
✦ Batching ✦ Early stopping ✦ Random or TPE
![Page 102: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/102.jpg)
●●●●●●●●●●●●●●●●
●●●●
●●●●
●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0.25
0.50
0.75
0 200 400 600 800Time elapsed (m)
Best
Val
idat
ion
Erro
r See
n So
Far
Search Method●
●
●
Grid − UnoptimizedRandom − OptimizedTPE − Optimized
Model Convergence Over Time
Pubng It All Together✦ First version of MLbase opBmizer✦ 30GB dense images (240K x 16K)✦ 2 model families, 5 hyperparams✦ Baseline: grid search✦ Our method: combinaBon of
✦ Batching ✦ Early stopping ✦ Random or TPE
20x speedup compared to grid search 15 minutes vs 5 hours!
![Page 103: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/103.jpg)
Does It Scale?
●●●●●●●● ●●●●●● ●●●●● ●●●●
●●●●
●● ●● ●
0.25
0.50
0.75
5 10Time elapsed (h)
Best
Val
idat
ion
Erro
r See
n So
Far
Convergence of Model Accuracy on 1.5TB Dataset
![Page 104: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/104.jpg)
Does It Scale?
✦ 1.5TB dataset (1.2M x 160K)●●●●●●●● ●●●●●● ●●●●● ●●●●
●●●●
●● ●● ●
0.25
0.50
0.75
5 10Time elapsed (h)
Best
Val
idat
ion
Erro
r See
n So
Far
Convergence of Model Accuracy on 1.5TB Dataset
![Page 105: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/105.jpg)
Does It Scale?
✦ 1.5TB dataset (1.2M x 160K)✦ 128 nodes, thousands of passes
over data✦ Tried 32 models in 15 hours
✦ Good answer aker 11 hours
●●●●●●●● ●●●●●● ●●●●● ●●●●
●●●●
●● ●● ●
0.25
0.50
0.75
5 10Time elapsed (h)
Best
Val
idat
ion
Erro
r See
n So
Far
Convergence of Model Accuracy on 1.5TB Dataset
![Page 106: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/106.jpg)
Future Work
!Data
Feature ExtracBon
Model Training
Final Model
![Page 107: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/107.jpg)
Future Work
!Data
Feature ExtracBon
Model Training
Final Model
Automated Model Selec.on
![Page 108: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/108.jpg)
Future Work
!Data
Feature ExtracBon
Model Training
Final Model
![Page 109: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/109.jpg)
Future Work
!Data
Feature ExtracBon
Model Training
Final Model
Automated ML Pipeline Construc.on
![Page 110: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/110.jpg)
A Real Pipeline for Image ClassificaBon
Inspired by Coates & Ng, 2012
Data Image Parser Normalizer Convolver
sqrt,mean
Zipper
Linear Solver
Symmetric Rectifier
ident,absident,mean
Global Pooling
Pooler
Patch Extractor
Patch Whitener
KMeans Clusterer
Feature Extractor
Label Extractor
ModelLinear Mapper
Test Data
Label Extractor
Feature Extractor
Test Error
Error Computer
Slide courtesy of Evan Sparks
![Page 111: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/111.jpg)
Data Image Parser Normalizer Convolver
sqrt,mean
Zipper
Linear Solver
Symmetric Rectifier
ident,absident,mean
Global
Pooler
Patch Extractor
Patch Whitener
KMeans Clusterer
Feature Extractor
Label Extractor
Linear Mapper Model
Test Data
Label Extractor
Feature Extractor
Test Error
Error Computer
No Hyperparameters A few Hyperparameters Lotsa Hyperparameters
Slide courtesy of Evan Sparks
![Page 112: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/112.jpg)
Other Future Work
![Page 113: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/113.jpg)
Other Future Work
✦ Ensembling
![Page 114: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/114.jpg)
Other Future Work
✦ Ensembling
✦ Leverage sampling
![Page 115: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/115.jpg)
Other Future Work
✦ Ensembling
✦ Leverage sampling
✦ Befer parallelism for smaller datasets
![Page 116: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/116.jpg)
Other Future Work
✦ Ensembling
✦ Leverage sampling
✦ Befer parallelism for smaller datasets
✦ MulBple hypothesis tesBng issues
![Page 117: mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop](https://reader030.vdocuments.net/reader030/viewer/2022041023/5ed498596894a01fe75ded0c/html5/thumbnails/117.jpg)
MLOpt: DeclaraBve layer that aims to automate ML pipeline construcBon
MLlib: Spark’s core ML library
MLI: API to simplify ML development
Spark: Cluster compuBng system designed for iteraBve computaBon
THANKS! QUESTIONS?
baseML
baseML
baseML
baseML
ML base
ML base
ML base
ML base
ML base
www.mlbase.org
MLlib
MLIMLOpt
Apache Spark