apsis - automatic hyperparameter optimization framework for machine learning
TRANSCRIPT
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
apsisAutomated Hyperparameter Optimization Using Bayesian
Optimization
Frederik Diehl Andreas Jauch
March 10, 2015
State March 10, 2015
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
AGENDA
Problem Description
Bayesian Optimization
apsis and its Architecture
Project Organisation
Performance Evaluation
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
MOTIVATION
Why Hyperparameter Optimization and why automating it?
I hyperparameter tuning often leads to huge performancegain
I ”more of an art than a science”
I reproducibility of published results
I automatic methods might be better than humans
I provide ml algorithms to non-expert users
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
ML PROCESS OVERVIEW
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
FORMAL PROBLEM DESCRIPTION
λ : hyperparameter vector λ = (λ(1), ..., λ(n))
L(X, f ) : loss function evaluated for model f and dataset xAλ(X) : learning algorithm with hyperparameter vector λ
learning on dataset XXtrain : training data, Xvalid : validation data, Xtest : test dataΨ(λ) : hyperparameter response function/surface
Hyperparameter Optimization Problem
λ̂ ≈ argminλεΛ
(mean
Xi∈Xvalid(L(xi,Aλ(Xtrain))
)︸ ︷︷ ︸
Ψ(λ)
(1)
= argminλεΛ
(Ψ(λ)) (2)
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
KEY PROBLEM PROPERTIES
I unknown, probably non-convex response surface Ψ
I no derivative-based optimization possible
I every evaluation of Ψ is expensive
I evaluation time of Ψ depends on individual value of λ
I low effective dimensionality of Ψ
I which dimensions are important is dataset dependent
I tree-structured configuration space1
1not addressed in apsis yet
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
STATE OF THE ART
I optimization still manual in many projects
I grid search most common methodI often the only provided method by many ml frameworks
I random search
I bayesian optimizationI code of Jasper Snoek et. al. Harvard/Toronto
I whetlab - bay opt in the cloud
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
BAYESIAN OPTIMIZATION
I approximate Ψ(λ) by a surrogate function M(λ) = y
I surrogate function cheaper to evaluate than Ψ
I interpret model to find minimization candidates for Ψ
I evaluate Ψ for promising candidates
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
BAYESIAN OPTIMIZATION FUNDAMENTALS
We need two design choices
I Surrogate Modelling Function - Gaussian Processes
I universal approximation
I very flexible and have many useful properties
I closed under sampling
I Acquisition Function
I Probability of Improvement
I Expected Improvement
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
ACQUISITION FUNCTION u
I measures the expected utility of evaluating the objectivefunction at a point λnext
I exploitation vs. exploration trade-off
(picture adopted from [1])
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
OPTIMIZATION - SUCCESSIVELY UPDATING THE GP
1. find
maxλ
(u(λ)) = λnext
max of acquisition
2. Evaluate M at λnext
3. Update the GP
(picture adopted from [1])
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
FITTING THE GP TO THE PROBLEM
by tuning the covariance!
I Squared Exponential Kernel
KSE(λ, λ′) = exp(− 1
2l2·∑
1..dim(D)
(λd − λ′d)2)
I use Automatic Relevance Determination (ARD)
KSE(λ, λ′) = θ0 · exp
−12·∑
1..dim(D)
(1θd
2 (λd − λ′d)2)
with ARD vector θ
θ = ( θ0︸︷︷︸bias
, θ1, . . . , θd︸ ︷︷ ︸dimension weights
)
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
HOW DO ACQUISITION FUNCTIONS LOOK LIKE?
Expected Improvement (EI)
uEI(λ) =
∫−∞
∞max(Ψ(λ∗)− y, 0) · pM(y|λ) dy
I closed form solution for GPs available
uEI(λ|Mt) = σ(λ) ·(
f (λ∗)− µ(λ)
σ(λ)· Φ(λ) + φ(λ)
)I gradient analytically derived in apsis for more effective
optimization
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
THE apsis TOOLKIT
Automated Hyperparameter Optimization Framework for
I random search
I bayesian optimization
as an open source framework featuring
I flexible architecture, ready to be extended for moreoptimizers
I ready for use with scikit-learn and theano
I implemented in Python
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
PROJECT OBJECTIVES
I open source implementation of state of the art research inbayesian optimization
I extendible project to encourage collaboration with otherresearchers
I easy integration with existing machine learningframeworks
I multi core support
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
apsis ARCHITECTURE OVERVIEW
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
apsis CORE MODEL COMPONENTS
Parameter DefinitionsI define the meta information for each hyperparameter
CandidatesI represent a specific hyperparameter vector and its value
I holds function value if available
ExperimentsI represent an optimization object
I keeps track of finished and unfinished Candidates
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
USING apsis - EXPERIMENT ASSISTANTS
I single experiment interaction interface
I provides plots and result bookkeeping
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
USING apsis - LAB ASSISTANTSI multiple experiments to compare different optimization
techniques
I cross validation
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
EXTENSIVE EXPERIMENT TRACKING
I automated plot writing
I automated results writing
I write out information at every step
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
AUTOMATED PLOTTING
I plot function evaluations and best results
I plot confidence bars when using cross validation
I write out plots at every step
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
PARAMETER REPRESENTATION IN apsis
I different representation by parameter type
I various nominal and numeric types
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
NUMERIC PARAMETERS IN apsis
Warping MechanismI parameters are warped into [0, 1] interval
I optimization core can assume a uniform and equaldistribution in [0, 1] space
I warping can be user defined
Provided Warpings forI normalization of arbitrary intervals [a, b] into [0, 1] space.
I asymptotic parameters, e.g. learning rate asymptotic at 0
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
NOMINAL PARAMETERS IN apsis
I generally supported in apsis
I no support in Bayesian Optimization
I GP kernels based on distance metrics between parameters
I interesting topic for further research
I no publications on this topic so far
I whetlab pretends to deal well with them - but doesn’t sayhow
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
EXPECTED IMPROVEMENT OPTIMIZATION
I gradient analytically derived2
I 1000 Steps Random Search for Initialization
I Several iterative optimization methods integratedI L-BFGS-B Bounded Low Memory Quasi Newton Method
I BFGS Quasi Newton Method
I Nelder-Mead
I Inexact Newton with Conjugate Gradient Solver
I ...
2See our paper for derivation.
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
DEALING WITH GP HYPERPARAMETERS
I the GP surrogate model introduces new hyperparameterI not subject of optimization⇒ hyper-hyperparameters
I optimization by maximum likelihood method
I integrating over these parameters in the acquisitionfunction using Hybrid Monte Carlo sampling
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
apsis PROJECT SET UP
I Open-Source project from the beginning
I MIT-License
I active issue tracking
I PEP-8 code styling convention
I Fully automated sphinx documentation build on everycommit
I 90% test coverage
I clear commit messages
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
apsis GITHUB REPOSITORY
Check out http://github.com/FrederikDiehl/apsis !
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
ISSUE TRACKING AND DISCUSSION
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
UNIT TESTS
I 90% overall test coverage
I 100% in most core components
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
GOOD CODE DOCUMENTATION
I almost 50:50 ratio of code vs. doc
I documented according to sphinx standard
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
FULLY AUTOMATED DOCUMENTATION BUILD
I builds on every commit
Visit http://apsis.readthedocs.org !
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
BRANIN HOO OPTIMIZATION
Best Value by OptimizerValues by Optimizer
I random search finds better end result but bay opt is morestable
I similar performance as in other bay opt literature
I no other group publishes comparison to random search
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
OPTIMIZING ARTIFICIAL NOISE FUNCTION
One dimensional noise function with several smoothingvariances
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
OPTIMIZING ARTIFICIAL NOISE FUNCTION
Minimization result on 3d noise by smoothing factor.
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
BREZE MNIST NEURAL NETWORK
Neural Network on MNIST using uniform parameters
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
BREZE MNIST NEURAL NETWORK (2)
Neural Network on MNIST using asymptotic parameters forlearning rate and learning rate decay
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
TAKING THE PROJECT TO THE NEXT LEVEL -PROGRAM
I implement full multicore support
I implement a REST web-service to offer interoperabilitywith any language
I improve integration of matplotlib
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
TAKING THE PROJECT TO THE NEXT LEVEL -BAYESIAN OPTIMIZATION
I deal with nominal parameters
I try replacing GPs with Student-t processes
I try to take tree structured configuration space into account
I account for evaluation cost depending on hyperparametersetting
I implement freeze-thaw optimization idea [3]
I automated learning of input warping [2]
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
Thank You!
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation
REFERENCES I
Eric Brochu, Vlad M. Cora, and Nando de Freitas.A Tutorial on Bayesian Optimization of Expensive CostFunctions, with Application to Active User Modeling andHierarchical Reinforcement Learning.IEEE Transactions on Reliability, abs/1012.2, 2010.
Jasper Snoek, Kevin Swersky, Richard S Zemel, and Ryan PAdams.Input warping for bayesian optimization of non-stationaryfunctions.arXiv preprint arXiv:1402.0929, 2014.
Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams.Freeze-thaw bayesian optimization.arXiv preprint arXiv:1406.3896, 2014.