monte carlo tree search for algorithm configuration:...

Monte Carlo Tree Search for Algorithm Configuration: MOSAIC

Herilalaina Rakotoarison and Michele SebagTAU

CNRS − INRIA − LRI − Universite Paris-Sud

NeurIPS MetaLearning Wshop − Dec. 8, 2018

1 / 14

Monte Carlo Tree Search for Algorithm Configuration: MOSAIC

Herilalaina Rakotoarison and Michele SebagTackling the Underspecified

CNRS − INRIA − LRI − Universite Paris-Sud

NeurIPS MetaLearning Wshop − Dec. 8, 2018

1 / 14

AutoML: Algorithm Selection and Configuration

A mixed optimization problem

Find λ∗ ∈ arg minλ∈ΛL(λ,P)

with λ a pipeline and L the predictive loss on dataset P

I offline hyper-parameter setting

I online hyper-parameter setting

Approaches

I Bayesian optimization: SMAC, Auto-SkLearn, AutoWeka, BHOBHutter et al., 11; Feurer et al. 15; Kotthoff et al. 17; Falkner et al. 18

I Evolutionary Computation Olson et al. 16; Choromanski et al. 18

I Bilevel optimization Franceschi et al. 17, 18

I Reinforcement learning Andrychowicz 16; Drori et al. 18

2 / 14

Monte Carlo Tree Search

Kocsis & Szepesvari 06, Gelly & Silver 07

Game playing when no good evaluation function and huge search space.

I Upper Confidence Tree (UCT)I Gradually grow the search treeI Building Blocks

I Select next action (bandit-based phase)Auer et al. 02

I Add a node (leaf of the search tree)I Select next action bis (random phase)I Compute instant rewardI Update information in visited nodes

I Returned solutionI Path visited most often

Explored Tree

Search Tree

Within learningFeature selection Gaudel, Sebag, 10

Active learning Rolet, Teytaud, Sebag, 09

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

New Node

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

New Node

PhaseRandom

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

New Node

PhaseRandom

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

New Node

PhaseRandom

3 / 14

Explored Tree

Search TreePhase

Bandit−Based

New Node

PhaseRandom

3 / 14

Monte Carlo Tree Search for AutoML

1. Select next action (alg/hyperparameter)

select arg maxi

{µi + c

√logN

}with µi average reward, ni number visits,N =

∑i ni

2. Add a node: new alg orhyper-parameter;

3. Random phase: complete pipeline withdefault/random choices.

4. Compute reward v : predictive accuracyof pipeline

5. Use v to update µi , increment ni in allvisited nodes

4 / 14

Mosaic: MCTS for AutoML

Overview

I Search space: { Preprocessing algs } × { Algorithms }I Fixed sequence of choices:

1. Preprocessing alg PCA, random proj,...2. hyper-parameters of pre-processing alg dimension3. Algorithm SVM, RF, ...4. hyper-parameters of Alg. C , ε,...

Key features

I CPU management(every ∆t, kills unpromising pipelines; increases evaluation resources forothers)

I Sample continuous hyperparameters: Progressive wideningGelly Silver 07

Increase number of sampled values like⌊√

5 / 14

A MOSAIC Tree

6 / 14

Search space

Methods Parameters

Preprocessing PCA n componentsSelectKBest k, score func

Gaussian Random Projection n components, epsNo preprocessing -

Algorithm Logistic regression C, penalty, solver

SGD Classifierlearning rate, penalty,

alpha, l1 ratio, lossKNN classifier K, metric, weights

XGBoost classifierlearning rate, max depth,

gamma, subsample,regularization

LDA n components,learning decay

Random forestcriterion, max features,max depth, bootstrap,

min sample split

7 / 14

Experiments and Results

AutoML Challenge PAKDD 2018

I Binary classification (Final phase)

I 10 or 20 minutes time budget for each dataset

I Metric: balanced accuracy

Set 1 Set 2 Set 3 Set 4 Set 5aad freiburg 0.5533 0.2839 0.3932 0.2635 0.6766

mosaic 0.5382 0.3161 0.3376 0.3182 0.6317narnars0 0.5418 0.2894 0.3665 0.2005 0.6922W|wang| 0.5655 0.4851 0.2829 -0.0886 0.6840

Final test phase: mosaic ranked second w.r.t. average rank.

8 / 14

Experiment 2: extended pre-processing search space

Methods Parameters

PCA n components, whiten, svd solver, tol, iterated power

KernelPCAn components, kernel, gamma, degree, coef0, alpha,

eigen solver, tol, max iterFastICA n components, algorithm, max iter, tol, whiten, funIdentity -

IncrementalPCA n components, whiten, batch sizeSelectKBest score func, k

SelectPercentile score func, percentileLinearSVC Pre-processing C, class weight, max iter

ExtraTreesClassifierPre-processing

n estimators, criterion, max depth,min samples split, min samples leaf,

min weight fraction leaf, max features,max leaf nodes, class weight

FeatureAgglomeration n clusters, affinity, linkagePolynomialFeatures degree

RBFSampler gamma, n components

RandomTreesEmbeddingn components, max depth, min samples split,

min samples leaf, min weight fraction leaf, max leaf nodes,min impurity decrease

9 / 14

Experiment 2: extended algorithm search space

Algorithms ParametersLinearDiscriminantAnalysis solver, shrinkage

QuadraticDiscriminantAnalysis reg paramDummyClassifier -

AdaBoostClassifierbase estimator, n estimators,

learning rate, algorithm

ExtraTreesClassifier

n estimators, criterion, max depth,min samples split, min samples leaf,

min weight fraction leaf, max features,max leaf nodes, class weight

RandomForestClassifier

n estimators, criterion, min samples split,min samples leaf, min weight fraction leaf,

max features, max leaf nodes,class weight, bootstrap

GradientBoostingClassifier

loss, learning rate, n estimators,max depth, criterion, min samples split,

min samples leaf, min weight fraction leaf,subsample, max features, max leaf nodes

SGD Classifierlearning rate, penalty,

alpha, l1 ratio, loss, epsilon,eta0, power t, class weight, max iter

10 / 14

Experiment 2: extended algorithm search space, foll.

Algorithms ParametersPerceptron penalty, alpha, max iter, tol, shuffle, eta0

RidgeClassifier alpha, max iter, class weight, solverPassiveAggressiveClassifier C, max iter, tol, loss, class weight

KNeighborsClassifier n neighbors, weights, algorithm, leaf size, p, metric

MLPClassifier

hidden layer sizes, activation, solver,alpha, batch size, learning rate,

learning rate init, power t, max iter,shuffle, warm start, momentum,

nesterovs momentum, early stopping,validation fraction, beta 1, beta 2, epsilon

SVCC, max iter, tol, loss, class weight,

kernel, degree, gamma, coef0

DecisionTreeClassifier

criterion, splitter, max depth, min samples split,min samples leaf, min weight fraction leaf,

max features, max leaf nodes,min impurity decrease, class weight

ExtraTreeClassifier

criterion, splitter, max depth, min samples split,min samples leaf, min weight fraction leaf,

max features, max leaf nodes,min impurity decrease, class weight

11 / 14

Experiment 2

On 133 datasets from the OpenML repository Vanschoren et al. 14

1 hour per (dataset, run); 10 runs per dataset.

Average rank (lower is better) of MOSAIC and Vanilla Auto-Sklearn across 102datasets (Datasets on which the performance of both methods differsstatistically according to Mann-Whitney rank test with p = 0.05).

12 / 14

Discussion

Monte Carlo Tree Search for Algorithm Configuration

I Proof of concept

Limitations

I Order of hyper-parameters

I Time allocation

13 / 14

Perspectives

Short term

I Refine initialization

I Extend to constrained satisfaction

Medium term

I Learn value of parameters across datasets

I Improving the sampling of continuous hyperparameter values

Long term

I Learning Meta-Features: a (the) key Meta-Learning task.

14 / 14

monte carlo tree search for algorithm configuration:...

Documents