electrochem bo slides - people @ eecs at uc berkeley

59
An Introduction to Bayesian Optimisation and (Potential) Applications in Materials Science Kirthevasan Kandasamy Machine Learning Dept, CMU Electrochemical Energy Symposium Pittsburgh, PA, November 2017

Upload: others

Post on 21-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Electrochem BO Slides - People @ EECS at UC Berkeley

An Introduction to Bayesian Optimisation and(Potential) Applications in Materials Science

Kirthevasan Kandasamy

Machine Learning Dept, CMU

Electrochemical Energy Symposium

Pittsburgh, PA, November 2017

Page 2: Electrochem BO Slides - People @ EECS at UC Berkeley

Designing Electrolytes in Batteries

1/19

Page 3: Electrochem BO Slides - People @ EECS at UC Berkeley

Black-box Optimisation in Computational Astrophysics

Cosmological Simulator

Observation

E.g:Hubble ConstantBaryonic Density

Likelihood Score

Likelihood computation

1/19

Page 4: Electrochem BO Slides - People @ EECS at UC Berkeley

Black-box Optimisation

Expensive Blackbox Function

Other Examples:- Pre-clinical Drug Discovery- Optimal policy in Autonomous Driving- Synthetic gene design

1/19

Page 5: Electrochem BO Slides - People @ EECS at UC Berkeley

Black-box Optimisation

f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.

Let x? = argmaxx f (x).

x

f(x)

2/19

Page 6: Electrochem BO Slides - People @ EECS at UC Berkeley

Black-box Optimisation

f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.

Let x? = argmaxx f (x).

x

f(x)

2/19

Page 7: Electrochem BO Slides - People @ EECS at UC Berkeley

Black-box Optimisation

f : X → R is an expensive, black-box function, accessible only vianoisy evaluations.Let x? = argmaxx f (x).

x

f(x)

x∗

f(x∗)

2/19

Page 8: Electrochem BO Slides - People @ EECS at UC Berkeley

Outline

I Part I: Bayesian Optimisation

I Bayesian Models for f

I Two algorithms: upper confidence bounds & Thompsonsampling

I Part II: Some Modern Challenges

I Multi-fidelity Optimisation

I Parallelisation

3/19

Page 9: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Functions with no observations

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19

Page 10: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Functions with no observations

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19

Page 11: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Prior GP

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19

Page 12: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Observations

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19

Page 13: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Posterior GP given observations

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19

Page 14: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Models for f e.g. Gaussian Processes (GP)

GP: A distribution over functions from X to R.

Posterior GP given observations

x

f(x)

After t observations, f (x) ∼ N (µt(x), σ2t (x) ).

4/19

Page 15: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP.

Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)

x

f(x)

1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.

3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .

5/19

Page 16: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP.

Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)

x

f(x)

1) Construct posterior GP.

2) ϕt = µt−1 + β1/2t σt−1 is a UCB.

3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .

5/19

Page 17: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP.

Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)

x

f(x) ϕt = µt−1 + β1/2t σt−1

1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.

3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .

5/19

Page 18: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP.

Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)

x

f(x) ϕt = µt−1 + β1/2t σt−1

xt

1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.

3) Choose xt = argmaxx ϕt(x).

4) Evaluate f at xt .

5/19

Page 19: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Upper Confidence Bounds

Model f ∼ GP.

Gaussian Process Upper Confidence Bound (GP-UCB)(Srinivas et al. 2010)

x

f(x) ϕt = µt−1 + β1/2t σt−1

xt

1) Construct posterior GP. 2) ϕt = µt−1 + β1/2t σt−1 is a UCB.

3) Choose xt = argmaxx ϕt(x). 4) Evaluate f at xt .5/19

Page 20: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

x

f(x)

6/19

Page 21: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 1x

f(x)

6/19

Page 22: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 2x

f(x)

6/19

Page 23: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 3x

f(x)

6/19

Page 24: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 4x

f(x)

6/19

Page 25: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 5x

f(x)

6/19

Page 26: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 6x

f(x)

6/19

Page 27: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 7x

f(x)

6/19

Page 28: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 11x

f(x)

6/19

Page 29: Electrochem BO Slides - People @ EECS at UC Berkeley

GP-UCB (Srinivas et al. 2010)

t = 25x

f(x)

6/19

Page 30: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ).

Thompson Sampling (TS) (Thompson, 1933).

x

f(x)

1) Construct posterior GP. 2) Draw sample g from posterior.

3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .

7/19

Page 31: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ).

Thompson Sampling (TS) (Thompson, 1933).

x

f(x)

1) Construct posterior GP.

2) Draw sample g from posterior.

3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .

7/19

Page 32: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ).

Thompson Sampling (TS) (Thompson, 1933).

x

f(x)

1) Construct posterior GP. 2) Draw sample g from posterior.

3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .

7/19

Page 33: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ).

Thompson Sampling (TS) (Thompson, 1933).

x

f(x)

xt

1) Construct posterior GP. 2) Draw sample g from posterior.

3) Choose xt = argmaxx g(x).

4) Evaluate f at xt .

7/19

Page 34: Electrochem BO Slides - People @ EECS at UC Berkeley

Bayesian Optimisation with Thompson Sampling

Model f ∼ GP(0, κ).

Thompson Sampling (TS) (Thompson, 1933).

x

f(x)

xt

1) Construct posterior GP. 2) Draw sample g from posterior.

3) Choose xt = argmaxx g(x). 4) Evaluate f at xt .

7/19

Page 35: Electrochem BO Slides - People @ EECS at UC Berkeley

More on Bayesian Optimisation

Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .

Other criteria for selecting xt :

I Expected improvement (Jones et al. 1998)

I Probability of improvement (Kushner et al. 1964)

I Predictive entropy search (Hernandez-Lobato et al. 2014)

I Information directed sampling (Russo & Van Roy 2014)

Other Bayesian models for f :

I Neural networks (Snoek et al. 2015)

I Random Forests (Hutter 2009)

8/19

Page 36: Electrochem BO Slides - People @ EECS at UC Berkeley

More on Bayesian Optimisation

Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .

Other criteria for selecting xt :

I Expected improvement (Jones et al. 1998)

I Probability of improvement (Kushner et al. 1964)

I Predictive entropy search (Hernandez-Lobato et al. 2014)

I Information directed sampling (Russo & Van Roy 2014)

Other Bayesian models for f :

I Neural networks (Snoek et al. 2015)

I Random Forests (Hutter 2009)

8/19

Page 37: Electrochem BO Slides - People @ EECS at UC Berkeley

More on Bayesian Optimisation

Theoretical results: Both UCB and TS will eventually find theoptimum under certain smoothness assumptions of f .

Other criteria for selecting xt :

I Expected improvement (Jones et al. 1998)

I Probability of improvement (Kushner et al. 1964)

I Predictive entropy search (Hernandez-Lobato et al. 2014)

I Information directed sampling (Russo & Van Roy 2014)

Other Bayesian models for f :

I Neural networks (Snoek et al. 2015)

I Random Forests (Hutter 2009)

8/19

Page 38: Electrochem BO Slides - People @ EECS at UC Berkeley

Some Modern Challenges/Opportunities

1. Multi-fidelity Optimisation (Kandasamy et al. NIPS 2016 a&b,

Kandasamy et al. ICML 2017)

2. Parallelisation (Kandasamy et al. Arxiv 2017)

9/19

Page 39: Electrochem BO Slides - People @ EECS at UC Berkeley

1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)

Desired function f is very expensive, but . . .we have access to cheap approximations.

x⋆f

f1, f2, f3 ≈ f whichare cheaper to evaluate.

E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation

10/19

Page 40: Electrochem BO Slides - People @ EECS at UC Berkeley

1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)

Desired function f is very expensive, but . . .we have access to cheap approximations.

x⋆

f1

f2

f3

f

f1, f2, f3 ≈ f whichare cheaper to evaluate.

E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation

10/19

Page 41: Electrochem BO Slides - People @ EECS at UC Berkeley

1. Multi-fidelity Optimisation(Kandasamy et al. NIPS 2016 a&b, Kandasamy et al. ICML 2017)

Desired function f is very expensive, but . . .we have access to cheap approximations.

x⋆

f1

f2

f3

f

f1, f2, f3 ≈ f whichare cheaper to evaluate.

E.g. f : a real world battery experimentf2: lab experimentf1: computer simulation

10/19

Page 42: Electrochem BO Slides - People @ EECS at UC Berkeley

MF-GP-UCB (Kandasamy et al. NIPS 2016b)

Multi-fidelity Gaussian Process Upper Confidence Bound

With 2 fidelities (1 Approximation),

x⋆xt

t = 14

f (1)

f (2)

Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).

Can be extended to multiple approximations and continuousapproximations.

11/19

Page 43: Electrochem BO Slides - People @ EECS at UC Berkeley

MF-GP-UCB (Kandasamy et al. NIPS 2016b)

Multi-fidelity Gaussian Process Upper Confidence Bound

With 2 fidelities (1 Approximation),

x⋆xt

t = 14

f (1)

f (2)

Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).

Can be extended to multiple approximations and continuousapproximations.

11/19

Page 44: Electrochem BO Slides - People @ EECS at UC Berkeley

MF-GP-UCB (Kandasamy et al. NIPS 2016b)

Multi-fidelity Gaussian Process Upper Confidence Bound

With 2 fidelities (1 Approximation),

x⋆xt

t = 14

f (1)

f (2)

Theorem: MF-GP-UCB finds the optimum x? with less resourcesthan GP-UCB on f (2).

Can be extended to multiple approximations and continuousapproximations.

11/19

Page 45: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Cosmological Maximum Likelihood Inference

I Type Ia Supernovae Data

I Maximum likelihood inference for 3 cosmological parameters:

I Hubble Constant H0

I Dark Energy Fraction ΩΛ

I Dark Matter Fraction ΩM

I Likelihood: Robertson Walker metric (Robertson 1936)

Requires numerical integration for each point in the dataset.

12/19

Page 46: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Cosmological Maximum Likelihood Inference

3 cosmological parameters. (d = 3)Fidelities: integration on grids of size (102, 104, 106). (M = 3)

500 1000 1500 2000 2500 3000 3500-10

-5

0

5

10

13/19

Page 47: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Hartmann-3D

2 Approximations (3 fidelities).We want to optimise the m = 3rd fidelity, which is the mostexpensive. m = 1st fidelity is cheapest.

0 0.5 1 1.5 2 2.5 3 3.50

5

10

15

20

25

30

35

40

Num.of

Queries

Query frequencies for Hartmann-3D

f (3)(x)

m=1m=2m=3

14/19

Page 48: Electrochem BO Slides - People @ EECS at UC Berkeley

2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.

Sequential evaluations with one worker

Parallel evaluations with M workers (Asynchronous)

Parallel evaluations with M workers (Synchronous)

15/19

Page 49: Electrochem BO Slides - People @ EECS at UC Berkeley

2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.

Sequential evaluations with one worker

Parallel evaluations with M workers (Asynchronous)

Parallel evaluations with M workers (Synchronous)

15/19

Page 50: Electrochem BO Slides - People @ EECS at UC Berkeley

2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.

Sequential evaluations with one worker

Parallel evaluations with M workers (Asynchronous)

Parallel evaluations with M workers (Synchronous)

15/19

Page 51: Electrochem BO Slides - People @ EECS at UC Berkeley

2. Parallelising function evaluationsParallelisation with M workers: can evaluate f at M differentpoints at the same time.E.g.: Test M different battery solvents at the same time.

Sequential evaluations with one worker

Parallel evaluations with M workers (Asynchronous)

Parallel evaluations with M workers (Synchronous)

15/19

Page 52: Electrochem BO Slides - People @ EECS at UC Berkeley

Parallel Thompson Sampling (Kandasamy et al. Arxiv 2017)

Asynchronous: asyTS

At any given time,1. (x ′, y ′)← Wait for

a worker to finish.2. Compute posterior GP.3. Draw a sample g ∼ GP.

4. Re-deploy worker atargmax g .

Synchronous: synTS

At any given time,1. (x ′m, y ′m)Mm=1 ← Wait for

all workers to finish.2. Compute posterior GP.3. Draw M samples

gm ∼ GP, ∀m.4. Re-deploy worker m at

argmax gm, ∀m.

16/19

Page 53: Electrochem BO Slides - People @ EECS at UC Berkeley

Parallel Thompson Sampling (Kandasamy et al. Arxiv 2017)

Asynchronous: asyTS

At any given time,1. (x ′, y ′)← Wait for

a worker to finish.2. Compute posterior GP.3. Draw a sample g ∼ GP.

4. Re-deploy worker atargmax g .

Synchronous: synTS

At any given time,1. (x ′m, y ′m)Mm=1 ← Wait for

all workers to finish.2. Compute posterior GP.3. Draw M samples

gm ∼ GP, ∀m.4. Re-deploy worker m at

argmax gm, ∀m.

16/19

Page 54: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution

0 10 20 30 40

10 -2

10 -1

17/19

Page 55: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution

0 10 20 30 40

10 -2

10 -1

17/19

Page 56: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Branin-2D M = 4Evaluation time sampled from a uniform distribution

synRANDsynHUCBsynUCBPEsynTSasyRANDasyUCBasyHUCBasyEIasyHTSasyTS

0 10 20 30 40

10 -2

10 -1

17/19

Page 57: Electrochem BO Slides - People @ EECS at UC Berkeley

Experiment: Hartmann-18D M = 25Evaluation time sampled from an exponential distribution

synRANDsynHUCBsynUCBPEsynTSasyRANDasyUCBasyHUCBasyEIasyHTSasyTS

0 5 10 15 20 25 30

2.5

3

3.5

4

4.5

55.5

66.5

18/19

Page 58: Electrochem BO Slides - People @ EECS at UC Berkeley

Summary

I Black-box Optimisation methods are used in several scientificand engineering applications.

I Bayesian Optimisation: A method for black-box optimisationwhich uses Bayesian uncertainty estimates for f .

I Some modern challenges

I Multi-fidelity optimisation

I Parallel evaluations

I and several more . . .

Thank you.Slides are up on my website: www.cs.cmu.edu/∼kkandasa

19/19

Page 59: Electrochem BO Slides - People @ EECS at UC Berkeley

Summary

I Black-box Optimisation methods are used in several scientificand engineering applications.

I Bayesian Optimisation: A method for black-box optimisationwhich uses Bayesian uncertainty estimates for f .

I Some modern challenges

I Multi-fidelity optimisation

I Parallel evaluations

I and several more . . .

Thank you.Slides are up on my website: www.cs.cmu.edu/∼kkandasa

19/19