scott clark, ceo, sigopt, at mlconf seattle 2017

42
BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Deep Learning / AI Models Scott Clark [email protected]

Upload: mlconf

Post on 17-Mar-2018

191 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

BAYESIAN GLOBAL OPTIMIZATIONUsing Optimal Learning to Deep Learning / AI Models

Scott Clark

[email protected]

Page 2: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

OUTLINE

1. Why is Tuning Models Hard?

2. Comparison of Tuning Methods

3. Bayesian Global Optimization

4. Deep Learning Examples

Page 3: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

Deep Learning / AI is extremely powerful

Tuning these systems is extremely non-intuitive

Page 4: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

https://www.quora.com/What-is-the-most-important-unresolved-problem-in-machine-learning-3

What is the most important unresolved problem in machine learning?

“...we still don't really know why some configurations of deep neural networks work in some case and not others, let alone having a more or less automatic approach to determining the architectures and the hyperparameters.”

Xavier Amatriain, VP Engineering at Quora(former Director of Research at Netflix)

Page 5: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

Photo: Joe Ross

Page 6: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

TUNABLE PARAMETERS IN DEEP LEARNING

Page 7: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

TUNABLE PARAMETERS IN DEEP LEARNING

Page 8: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

Photo: Tammy Strobel

Page 9: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

STANDARD METHODSFOR HYPERPARAMETER SEARCH

Page 10: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

STANDARD TUNING METHODS

ParameterConfiguration

?Grid Search Random Search

Manual Search

- Weights- Thresholds- Window sizes- Transformations

ML / AIModel

TestingData

CrossValidation

TrainingData

Page 11: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

OPTIMIZATION FEEDBACK LOOP

Objective Metric

Better Results

REST API

New configurationsML / AIModel

TestingData

CrossValidation

TrainingData

Page 12: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

BAYESIAN GLOBAL OPTIMIZATION

Page 13: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

… the challenge of how to collect information as efficiently as possible, primarily for settings where collecting information is time consuming and expensive.

Prof. Warren Powell - Princeton

What is the most efficient way to collect information?Prof. Peter Frazier - Cornell

How do we make the most money, as fast as possible?Scott Clark - CEO, SigOpt

OPTIMAL LEARNING

Page 14: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

● Optimize objective function○ Loss, Accuracy, Likelihood

● Given parameters○ Hyperparameters, feature/architecture params

● Find the best hyperparameters○ Sample function as few times as possible○ Training on big data is expensive

BAYESIAN GLOBAL OPTIMIZATION

Page 15: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

SMBO

Sequential Model-Based Optimization

HOW DOES IT WORK?

Page 16: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

1. Build Gaussian Process (GP) with points sampled so far

2. Optimize the fit of the GP (covariance hyperparameters)

3. Find the point(s) of highest Expected Improvement within parameter domain

4. Return optimal next best point(s) to sample

GP/EI SMBO

Page 17: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

GAUSSIAN PROCESSES

Page 18: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

GAUSSIAN PROCESSES

Page 19: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

overfit good fit underfit

GAUSSIAN PROCESSES

Page 20: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

EXPECTED IMPROVEMENT

Page 21: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

DEEP LEARNING EXAMPLES

Page 22: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

● Classify movie reviews using a CNN in MXNet

SIGOPT + MXNET

Page 23: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

TEXT CLASSIFICATION PIPELINE

ML / AIModel

(MXNet)

TestingText

ValidationAccuracy

Better Results

REST API

HyperparameterConfigurations

andFeature

TransformationsTrainingText

Page 24: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

TUNABLE PARAMETERS IN DEEP LEARNING

Page 25: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

● Comparison of several RMSProp SGD parametrizations

STOCHASTIC GRADIENT DESCENT

Page 26: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

ARCHITECTURE PARAMETERS

Page 27: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

Grid Search Random Search

This slide’s GIF loops automatically

?

TUNING METHODS

Page 28: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

SPEED UP #2: RANDOM/GRID -> SIGOPT

Page 29: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

CONSISTENTLY BETTER AND FASTER

Page 30: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

● Classify house numbers in an image dataset (SVHN)

SIGOPT + TENSORFLOW

Page 31: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

COMPUTER VISION PIPELINE

ML / AIModel

(Tensorflow)

TestingImages

CrossValidation

Accuracy

Better Results

REST API

HyperparameterConfigurations

andFeature

TransformationsTrainingImages

Page 32: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

METRIC OPTIMIZATION

Page 33: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

● All convolutional neural network● Multiple convolutional and dropout layers● Hyperparameter optimization mixture of

domain expertise and grid search (brute force)

SIGOPT + NEON

http://arxiv.org/pdf/1412.6806.pdf

Page 34: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

COMPARATIVE PERFORMANCE

● Expert baseline: 0.8995○ (using neon)

● SigOpt best: 0.9011○ 1.6% reduction in

error rate○ No expert time

wasted in tuning

Page 35: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

SIGOPT + NEON

http://arxiv.org/pdf/1512.03385v1.pdf

● Explicitly reformulate the layers as learning residualfunctions with reference to the layer inputs, instead of learning unreferenced functions

● Variable depth● Hyperparameter optimization mixture of domain expertise and grid search (brute force)

Page 36: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

COMPARATIVE PERFORMANCE

Standard Method

● Expert baseline: 0.9339○ (from paper)

● SigOpt best: 0.9436○ 15% relative error

rate reduction○ No expert time

wasted in tuning

Page 37: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

SIGOPT SERVICE

Page 38: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

OPTIMIZATION FEEDBACK LOOP

Objective Metric

Better Results

REST API

New configurationsML / AIModel

TestingData

CrossValidation

TrainingData

Page 39: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

SIMPLIFIED OPTIMIZATIONClient Libraries● Python● Java● R● Matlab● And more...

Framework Integrations● TensorFlow● scikit-learn● xgboost● Keras● Neon● And more...

Live Demo

Page 40: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

DISTRIBUTED TRAINING

● SigOpt serves as a distributed scheduler for training models across workers

● Workers access the SigOpt API for the latest parameters to try for each model

● Enables easy distributed training of non-distributed algorithms across any number of models

Page 41: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017

https://sigopt.com/getstarted

Try it yourself!