apsis - automatic hyperparameter optimization framework for machine learning

42
Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation apsis Automated Hyperparameter Optimization Using Bayesian Optimization Frederik Diehl Andreas Jauch March 10, 2015 State March 10, 2015

Upload: andi1400

Post on 16-Jul-2015

262 views

Category:

Software


1 download

TRANSCRIPT

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

apsisAutomated Hyperparameter Optimization Using Bayesian

Optimization

Frederik Diehl Andreas Jauch

March 10, 2015

State March 10, 2015

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

AGENDA

Problem Description

Bayesian Optimization

apsis and its Architecture

Project Organisation

Performance Evaluation

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

MOTIVATION

Why Hyperparameter Optimization and why automating it?

I hyperparameter tuning often leads to huge performancegain

I ”more of an art than a science”

I reproducibility of published results

I automatic methods might be better than humans

I provide ml algorithms to non-expert users

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

ML PROCESS OVERVIEW

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

FORMAL PROBLEM DESCRIPTION

λ : hyperparameter vector λ = (λ(1), ..., λ(n))

L(X, f ) : loss function evaluated for model f and dataset xAλ(X) : learning algorithm with hyperparameter vector λ

learning on dataset XXtrain : training data, Xvalid : validation data, Xtest : test dataΨ(λ) : hyperparameter response function/surface

Hyperparameter Optimization Problem

λ̂ ≈ argminλεΛ

(mean

Xi∈Xvalid(L(xi,Aλ(Xtrain))

)︸ ︷︷ ︸

Ψ(λ)

(1)

= argminλεΛ

(Ψ(λ)) (2)

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

KEY PROBLEM PROPERTIES

I unknown, probably non-convex response surface Ψ

I no derivative-based optimization possible

I every evaluation of Ψ is expensive

I evaluation time of Ψ depends on individual value of λ

I low effective dimensionality of Ψ

I which dimensions are important is dataset dependent

I tree-structured configuration space1

1not addressed in apsis yet

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

STATE OF THE ART

I optimization still manual in many projects

I grid search most common methodI often the only provided method by many ml frameworks

I random search

I bayesian optimizationI code of Jasper Snoek et. al. Harvard/Toronto

I whetlab - bay opt in the cloud

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

BAYESIAN OPTIMIZATION

I approximate Ψ(λ) by a surrogate function M(λ) = y

I surrogate function cheaper to evaluate than Ψ

I interpret model to find minimization candidates for Ψ

I evaluate Ψ for promising candidates

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

BAYESIAN OPTIMIZATION FUNDAMENTALS

We need two design choices

I Surrogate Modelling Function - Gaussian Processes

I universal approximation

I very flexible and have many useful properties

I closed under sampling

I Acquisition Function

I Probability of Improvement

I Expected Improvement

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

ACQUISITION FUNCTION u

I measures the expected utility of evaluating the objectivefunction at a point λnext

I exploitation vs. exploration trade-off

(picture adopted from [1])

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

OPTIMIZATION - SUCCESSIVELY UPDATING THE GP

1. find

maxλ

(u(λ)) = λnext

max of acquisition

2. Evaluate M at λnext

3. Update the GP

(picture adopted from [1])

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

FITTING THE GP TO THE PROBLEM

by tuning the covariance!

I Squared Exponential Kernel

KSE(λ, λ′) = exp(− 1

2l2·∑

1..dim(D)

(λd − λ′d)2)

I use Automatic Relevance Determination (ARD)

KSE(λ, λ′) = θ0 · exp

−12·∑

1..dim(D)

(1θd

2 (λd − λ′d)2)

with ARD vector θ

θ = ( θ0︸︷︷︸bias

, θ1, . . . , θd︸ ︷︷ ︸dimension weights

)

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

HOW DO ACQUISITION FUNCTIONS LOOK LIKE?

Expected Improvement (EI)

uEI(λ) =

∫−∞

∞max(Ψ(λ∗)− y, 0) · pM(y|λ) dy

I closed form solution for GPs available

uEI(λ|Mt) = σ(λ) ·(

f (λ∗)− µ(λ)

σ(λ)· Φ(λ) + φ(λ)

)I gradient analytically derived in apsis for more effective

optimization

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

THE apsis TOOLKIT

Automated Hyperparameter Optimization Framework for

I random search

I bayesian optimization

as an open source framework featuring

I flexible architecture, ready to be extended for moreoptimizers

I ready for use with scikit-learn and theano

I implemented in Python

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

PROJECT OBJECTIVES

I open source implementation of state of the art research inbayesian optimization

I extendible project to encourage collaboration with otherresearchers

I easy integration with existing machine learningframeworks

I multi core support

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

apsis ARCHITECTURE OVERVIEW

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

apsis CORE MODEL COMPONENTS

Parameter DefinitionsI define the meta information for each hyperparameter

CandidatesI represent a specific hyperparameter vector and its value

I holds function value if available

ExperimentsI represent an optimization object

I keeps track of finished and unfinished Candidates

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

USING apsis - EXPERIMENT ASSISTANTS

I single experiment interaction interface

I provides plots and result bookkeeping

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

USING apsis - LAB ASSISTANTSI multiple experiments to compare different optimization

techniques

I cross validation

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

EXTENSIVE EXPERIMENT TRACKING

I automated plot writing

I automated results writing

I write out information at every step

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

AUTOMATED PLOTTING

I plot function evaluations and best results

I plot confidence bars when using cross validation

I write out plots at every step

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

PARAMETER REPRESENTATION IN apsis

I different representation by parameter type

I various nominal and numeric types

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

NUMERIC PARAMETERS IN apsis

Warping MechanismI parameters are warped into [0, 1] interval

I optimization core can assume a uniform and equaldistribution in [0, 1] space

I warping can be user defined

Provided Warpings forI normalization of arbitrary intervals [a, b] into [0, 1] space.

I asymptotic parameters, e.g. learning rate asymptotic at 0

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

NOMINAL PARAMETERS IN apsis

I generally supported in apsis

I no support in Bayesian Optimization

I GP kernels based on distance metrics between parameters

I interesting topic for further research

I no publications on this topic so far

I whetlab pretends to deal well with them - but doesn’t sayhow

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

EXPECTED IMPROVEMENT OPTIMIZATION

I gradient analytically derived2

I 1000 Steps Random Search for Initialization

I Several iterative optimization methods integratedI L-BFGS-B Bounded Low Memory Quasi Newton Method

I BFGS Quasi Newton Method

I Nelder-Mead

I Inexact Newton with Conjugate Gradient Solver

I ...

2See our paper for derivation.

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

DEALING WITH GP HYPERPARAMETERS

I the GP surrogate model introduces new hyperparameterI not subject of optimization⇒ hyper-hyperparameters

I optimization by maximum likelihood method

I integrating over these parameters in the acquisitionfunction using Hybrid Monte Carlo sampling

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

apsis PROJECT SET UP

I Open-Source project from the beginning

I MIT-License

I active issue tracking

I PEP-8 code styling convention

I Fully automated sphinx documentation build on everycommit

I 90% test coverage

I clear commit messages

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

apsis GITHUB REPOSITORY

Check out http://github.com/FrederikDiehl/apsis !

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

ISSUE TRACKING AND DISCUSSION

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

UNIT TESTS

I 90% overall test coverage

I 100% in most core components

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

GOOD CODE DOCUMENTATION

I almost 50:50 ratio of code vs. doc

I documented according to sphinx standard

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

FULLY AUTOMATED DOCUMENTATION BUILD

I builds on every commit

Visit http://apsis.readthedocs.org !

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

BRANIN HOO OPTIMIZATION

Best Value by OptimizerValues by Optimizer

I random search finds better end result but bay opt is morestable

I similar performance as in other bay opt literature

I no other group publishes comparison to random search

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

OPTIMIZING ARTIFICIAL NOISE FUNCTION

One dimensional noise function with several smoothingvariances

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

OPTIMIZING ARTIFICIAL NOISE FUNCTION

Minimization result on 3d noise by smoothing factor.

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

BREZE MNIST NEURAL NETWORK

Neural Network on MNIST using uniform parameters

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

BREZE MNIST NEURAL NETWORK (2)

Neural Network on MNIST using asymptotic parameters forlearning rate and learning rate decay

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

TAKING THE PROJECT TO THE NEXT LEVEL -PROGRAM

I implement full multicore support

I implement a REST web-service to offer interoperabilitywith any language

I improve integration of matplotlib

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

TAKING THE PROJECT TO THE NEXT LEVEL -BAYESIAN OPTIMIZATION

I deal with nominal parameters

I try replacing GPs with Student-t processes

I try to take tree structured configuration space into account

I account for evaluation cost depending on hyperparametersetting

I implement freeze-thaw optimization idea [3]

I automated learning of input warping [2]

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

Thank You!

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

REFERENCES I

Eric Brochu, Vlad M. Cora, and Nando de Freitas.A Tutorial on Bayesian Optimization of Expensive CostFunctions, with Application to Active User Modeling andHierarchical Reinforcement Learning.IEEE Transactions on Reliability, abs/1012.2, 2010.

Jasper Snoek, Kevin Swersky, Richard S Zemel, and Ryan PAdams.Input warping for bayesian optimization of non-stationaryfunctions.arXiv preprint arXiv:1402.0929, 2014.

Kevin Swersky, Jasper Snoek, and Ryan Prescott Adams.Freeze-thaw bayesian optimization.arXiv preprint arXiv:1406.3896, 2014.

Problem Description Bayesian Optimization apsis and its Architecture Project Organisation Performance Evaluation

REFERENCES II