optimal learning: an overview

83
Optimal Learning: an Overview Peter I. Frazier Operations Research & Information Engineering, Cornell University Thursday June 12, 2014 Guest Lecture, Operations Research 3 – Decision-making Tsinghua University Research supported by AFOSR and NSF Frazier (Cornell University) Optimal Learning talk 1 / 45

Upload: others

Post on 21-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimal Learning: an Overview

Optimal Learning: an Overview

Peter I. Frazier

Operations Research & Information Engineering, Cornell University

Thursday June 12, 2014Guest Lecture, Operations Research 3 – Decision-making

Tsinghua University

Research supported by AFOSR and NSF

Frazier (Cornell University) Optimal Learning talk 1 / 45

Page 2: Optimal Learning: an Overview

What is optimal learning?

In many applications, we make decisions about which data to collect.

In making these decisions we trade the benefit of information (theability to make better decisions in the future) against its cost (money,time, or opportunity cost).

Statistical learning is making predictions or decisions based on data.

Optimal learning is making decisions about which data tocollect in an optimal way.

Frazier (Cornell University) Optimal Learning talk 2 / 45

Page 3: Optimal Learning: an Overview

Optimal learning overlaps with other fields

Optimal learning overlaps with these fields:

Bayesian statistics, and machine learning.

Decision-making under uncertainty, and dynamic programming.

Frazier (Cornell University) Optimal Learning talk 3 / 45

Page 4: Optimal Learning: an Overview

Outline

1 Example Optimal Learning Problems

2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming

3 Conclusion

Frazier (Cornell University) Optimal Learning talk 4 / 45

Page 5: Optimal Learning: an Overview

Dynamic Pricing

Our goal is to price airline tickets tomaximize revenue.

We learn about demand for a flight as wesell tickets.

The information collected depends on howwe price each ticket: we only observewhether the price that the customer waswilling to pay was above or below theoffered price.

Collecting more information now mayprovide the ability to improve revenueslater.

Round Trip (Start New Search)

Depart New York/Newark, NJ (EWR - Liberty)

Arrive Los Angeles, CA (LAX)

Date Sun., Jan. 11, 2009 Time Anytime

Cabin Economy Travelers 1

Fares listed are for the entire trip per person and do not include all taxes and fees. Additional bag charges may

apply.

The fare displayed is the lowest available for the dates requested; however some flights may not be in the cabin

you requested.

Nonstopfrom

$937

Search tip:

To use US Helicopter service between Newark and Manhattan, select the Wall Street (JRB) or

Midtown Manhattan (TSS) heliport.

Select Your Departing Flight for Sun., Jan. 11, 2009:

BUY NOW - limited tickets at our lowest price

Price Departing Arriving Travel Time OnePass Miles

Nonstop flights from $937

from

$937

4 tickets

at this price

Select

Depart:

8:30 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

- Liberty)

Arrive:

11:52 a.m.Sun., Jan. 11, 2009

Los Angeles, CA (LAX)

Travel

Time:

6 hr 22

mn

OnePass Miles/

Elite

Qualification*:

2,454

/150%

Flight: CO1402

Aircraft: Boeing 737-800

Fare Class: Economy (H)

Meal: Snack

See On-Time Performance

View Seats

Select Departing Flight

from

$937

3 tickets

at this price

Select

Depart:

1:10 p.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

- Liberty)

Arrive:

4:15 p.m.Sun., Jan. 11, 2009

Los Angeles, CA (LAX)

Travel

Time:

6 hr 5

mn

OnePass Miles/

Elite

Qualification*:

2,454

/150%

Flight: CO17

Aircraft: Boeing 757-200

Fare Class: Economy (H)

Meal: Snack

See On-Time Performance

View Seats

from

$937

8 tickets

at this price

Select

Depart:

7:00 p.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

- Liberty)

Arrive:

10:27 p.m.Sun., Jan. 11, 2009

Los Angeles, CA (LAX)

Travel

Time:

6 hr 27

mn

OnePass Miles/

Elite

Qualification*:

2,454

/150%

Flight: CO1502

Aircraft: Boeing 737-800

Fare Class: Economy (H)

Meal: Snack

See On-Time Performance

View Seats

from

$1,187

2 tickets

at this price

Select

Depart:

3:20 p.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

- Liberty)

Arrive:

6:36 p.m.Sun., Jan. 11, 2009

Los Angeles, CA (LAX)

Travel

Time:

6 hr 16

mn

OnePass Miles/

Elite

Qualification*:

2,454

/150%

Flight: CO65

Aircraft: Boeing 757-200

Fare Class: Economy (Y)

Meal: Snack

See On-Time Performance

View Seats

from

$1,237

1 ticket

at this price

Select

Depart:

4:30 p.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

- Liberty)

Arrive:

7:51 p.m.Sun., Jan. 11, 2009

Los Angeles, CA (LAX)

Travel

Time:

6 hr 21

mn

OnePass Miles/

Elite

Qualification*:

2,454

/150%

Flight: CO702

Aircraft: Boeing 737-800

Fare Class: First (A)

Meal: Dinner

See On-Time Performance

View Seats

Flights with stops from $1,182

from

$1,182

Depart:

1:29 p.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

Arrive:

4:08 p.m.Sun., Jan. 11, 2009

Atlanta, GA (ATL)

Flight

Time:

2 hr 39

mn

OnePass Miles/

Elite

Qualification*:

756 /100%

Flight: CO1161

Aircraft: Boeing 737-300

Fare Class: Economy (N)

Meal: None

Select- Liberty) See On-Time Performance

View Seats

Change Planes. Connect time in Atlanta, GA (ATL) is 1 hour 22 minutes.

Depart:

5:30 p.m.Sun., Jan. 11, 2009

Atlanta, GA (ATL)

Arrive:

7:34 p.m.Sun., Jan. 11, 2009

Los Angeles, CA (LAX)

Flight

Time:

5 hr 4 mn

Travel

Time:

9 hr 5

mn

OnePass Miles/

Elite

Qualification*:

1,946 /150%

Total Miles:

2,702

Flight: CO4524

Aircraft: Boeing 767-300

Fare Class: Economy (H)

Meal: None

No Special Meal Offered.

View Seats

Continental flight 4524 operated by Delta Air Lines.

from

$1,376

5 tickets

at this price

Select

Depart:

2:25 p.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

- Liberty)

Arrive:

4:07 p.m.Sun., Jan. 11, 2009

Cleveland, OH (CLE)

Flight

Time:

1 hr 42

mn

OnePass Miles/

Elite

Qualification*:

416 /100%

Flight: CO325

Aircraft: Boeing 737-800

Fare Class: Economy (N)

Meal: None

See On-Time Performance

View Seats

Change Planes. Connect time in Cleveland, OH (CLE) is 1 hour 8 minutes.

Depart:

5:15 p.m.Sun., Jan. 11, 2009

Cleveland, OH (CLE)

Arrive:

7:26 p.m.Sun., Jan. 11, 2009

Los Angeles, CA (LAX)

Flight

Time:

5 hr 11

mn

Travel

Time:

8 hr 1

mn

OnePass Miles/

Elite

Qualification*:

2,053 /150%

Total Miles:

2,469

Flight: CO67

Aircraft: Boeing 737-800

Fare Class: Economy (H)

Meal: Snack

No Special Meal Offered.

See On-Time Performance

View Seats

from

$1,446

6 tickets

at this price

Select

Depart:

5:30 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

- Liberty)

Arrive:

8:23 a.m.Sun., Jan. 11, 2009Houston, TX (IAH -

Intercontinental)

Flight

Time:

3 hr 53

mn

OnePass Miles/

Elite

Qualification*:

1,415

/100%

Flight: CO611

Aircraft: Boeing 757-200

Fare Class: Economy (M)

Meal: Snack

No Special Meal Offered.

See On-Time Performance

View Seats

Change Planes. Connect time in Houston, TX (IAH - Intercontinental) is 52 minutes.

Depart:

9:15 a.m.Sun., Jan. 11, 2009Houston, TX (IAH -

Intercontinental)

Arrive:

10:53 a.m.Sun., Jan. 11, 2009

Los Angeles, CA (LAX)

Flight

Time:

3 hr 38

mn

Travel

Time:

8 hr 23

mn

OnePass Miles/

Elite

Qualification*:

1,379 /150%

Total Miles:

2,794

Flight: CO1495

Aircraft: Boeing 757-300

Fare Class: Economy (H)

Meal: Snack

No Special Meal Offered.

See On-Time Performance

View Seats

from

$1,446

5 tickets

at this price

Select

Depart:

7:30 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

- Liberty)

Arrive:

10:30 a.m.Sun., Jan. 11, 2009Houston, TX (IAH -

Intercontinental)

Flight

Time:

4 hr 0 mn

OnePass Miles/

Elite

Qualification*:

1,415

/100%

Flight: CO211

Aircraft: Boeing 767-200

Fare Class: Economy (M)

Meal: Snack

No Special Meal Offered.

See On-Time Performance

View Seats

Change Planes. Connect time in Houston, TX (IAH - Intercontinental) is 1 hour 20 minutes.

Depart:

11:50 a.m.Sun., Jan. 11, 2009Houston, TX (IAH -

Intercontinental)

Arrive:

1:30 p.m.Sun., Jan. 11, 2009

Los Angeles, CA (LAX)

Flight

Time:

3 hr 40

mn

Travel

OnePass Miles/

Elite

Qualification*:

1,379 /150%

Total Miles:

2,794

Flight: CO1605

Aircraft: Boeing 737-900

Fare Class: Economy (H)

Meal: Snack

No Special Meal Offered.

See On-Time Performance

Time:

9 hr 0

mn

View Seats

from

$1,446

4 tickets

at this price

Select

Depart:

7:30 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

- Liberty)

Arrive:

10:30 a.m.Sun., Jan. 11, 2009Houston, TX (IAH -

Intercontinental)

Flight

Time:

4 hr 0 mn

OnePass Miles/

Elite

Qualification*:

1,415

/100%

Flight: CO211

Aircraft: Boeing 767-200

Fare Class: Economy (M)

Meal: Snack

No Special Meal Offered.

See On-Time Performance

View Seats

Change Planes. Connect time in Houston, TX (IAH - Intercontinental) is 2 hours 30 minutes.

Depart:

1:00 p.m.Sun., Jan. 11, 2009Houston, TX (IAH -

Intercontinental)

Arrive:

2:48 p.m.Sun., Jan. 11, 2009

Los Angeles, CA (LAX)

Flight

Time:

3 hr 48

mn

Travel

Time:

10 hr

18 mn

OnePass Miles/

Elite

Qualification*:

1,379 /150%

Total Miles:

2,794

Flight: CO1695

Aircraft: Boeing 737-800

Fare Class: Economy (H)

Meal: Snack

No Special Meal Offered.

See On-Time Performance

View Seats

from

$1,446

4 tickets

at this price

Select

Depart:

8:50 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

- Liberty)

Arrive:

12:00 p.m.Sun., Jan. 11, 2009Houston, TX (IAH -

Intercontinental)

Flight

Time:

4 hr 10

mn

OnePass Miles/

Elite

Qualification*:

1,415

/100%

Flight: CO1011

Aircraft: Boeing 737-700

Fare Class: Economy (M)

Meal: Snack

No Special Meal Offered.

See On-Time Performance

View Seats

Change Planes. Connect time in Houston, TX (IAH - Intercontinental) is 1 hour.

Depart:

1:00 p.m.Sun., Jan. 11, 2009Houston, TX (IAH -

Intercontinental)

Arrive:

2:48 p.m.Sun., Jan. 11, 2009

Los Angeles, CA (LAX)

Flight

Time:

3 hr 48

mn

Travel

Time:

8 hr 58

mn

OnePass Miles/

Elite

Qualification*:

1,379 /150%

Total Miles:

2,794

Flight: CO1695

Aircraft: Boeing 737-800

Fare Class: Economy (H)

Meal: Snack

No Special Meal Offered.

See On-Time Performance

View Seats

from

$1,446

5 tickets

at this price

Select

Depart:

8:50 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

- Liberty)

Arrive:

12:00 p.m.Sun., Jan. 11, 2009Houston, TX (IAH -

Intercontinental)

Flight

Time:

4 hr 10

mn

OnePass Miles/

Elite

Qualification*:

1,415

/100%

Flight: CO1011

Aircraft: Boeing 737-700

Fare Class: Economy (M)

Meal: Snack

No Special Meal Offered.

See On-Time Performance

View Seats

Change Planes. Connect time in Houston, TX (IAH - Intercontinental) is 2 hours 15 minutes.

Depart:

2:15 p.m.Sun., Jan. 11, 2009Houston, TX (IAH -

Intercontinental)

Arrive:

4:00 p.m.Sun., Jan. 11, 2009

Los Angeles, CA (LAX)

Flight

Time:

3 hr 45

mn

Travel

Time:

10 hr

10 mn

OnePass Miles/

Elite

Qualification*:

1,379 /150%

Total Miles:

2,794

Flight: CO137

Aircraft: Boeing 737-900

Fare Class: Economy (H)

Meal: None

See On-Time Performance

View Seats

from

$1,446

5 tickets

at this price

Depart:

10:40 a.m.Sun., Jan. 11, 2009New York/Newark, NJ (EWR

- Liberty)

Arrive:

1:32 p.m.Sun., Jan. 11, 2009Houston, TX (IAH -

Intercontinental)

Flight

Time:

3 hr 52

mn

OnePass Miles/

Elite

Qualification*:

1,415

/100%

Flight: CO303

Aircraft: Boeing 737-800

Fare Class: Economy (M)

Meal: Snack

No Special Meal Offered.

See On-Time Performance

View Seats

Frazier (Cornell University) Optimal Learning talk 5 / 45

Page 6: Optimal Learning: an Overview

AIDS Treatment and Prevention

We would like to treat and preventAIDS in Africa.

We are uncertain about theeffectiveness of experimentaltreatments and untried preventionmethods, but we can learn aboutthem by using them in practice, orby conducting scientific studies.

To which treatment and preventionmethods should we allocate ourinvestigative resources?

How should we balance using thosemethods that appear to be mosteffective, with those untriedmethods that may be very good?

Frazier (Cornell University) Optimal Learning talk 6 / 45

Page 7: Optimal Learning: an Overview

Exploration vs. Exploitation in News Feeds

We would like to design anautomatic document screeningsystem that forwards documents(e.g., webpages) of interest to ahuman.

The screening system earns a rewardif the forwarded document is ofinterest, and pays a penalty if not.

Even if the expected immediatereward of forwarding a particulardocument is negative, the systemmay still want to do so becausehuman feedback may allow thesystem to improve futureperformance.

Frazier (Cornell University) Optimal Learning talk 7 / 45

Page 8: Optimal Learning: an Overview

Adaptive Web Design (multi-armed bandits)

Frazier (Cornell University) Optimal Learning talk 8 / 45

Page 9: Optimal Learning: an Overview

Product development(optimization of expensive functions)

We have a product whose featureswe are selecting based on asequence of focus groups.

We have the time and budget for afixed number of focus groups,through which we want to learnmore about underlying consumerpreferences for these features.

After conducting these focusgroups, we will choose a particularset of features with which to bringour product to market and receive areward based on the resulting salesrevenue and manufacturing anddevelopment costs.

Frazier (Cornell University) Optimal Learning talk 9 / 45

Page 10: Optimal Learning: an Overview

Other examples

Materials informatics / Designing novel materials

Simulation optimization

Optimization of long-running computer codes

Clinical trials (sequential hypothesis testing)

Inventory control with censored demand

Quality control (changepoint detection)

Frazier (Cornell University) Optimal Learning talk 10 / 45

Page 11: Optimal Learning: an Overview

Outline

1 Example Optimal Learning Problems

2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming

3 Conclusion

Frazier (Cornell University) Optimal Learning talk 11 / 45

Page 12: Optimal Learning: an Overview

Outline

1 Example Optimal Learning Problems

2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming

3 Conclusion

Frazier (Cornell University) Optimal Learning talk 12 / 45

Page 13: Optimal Learning: an Overview

We consider an optimal learning problem called BayesianRanking & Selection

We consider an optimal optimal learning problem called “BayesianRanking & Selection (R&S)” or “Bayesian Selection of the Best”.

In this problem, we wish to know which of a finite number of optionsis the best.

To figure out the quality of an option, we can sample it (try it out).

When we sample an option, we get a noisy observation of its quality.

We can take a limited number of samples.

We wish to allocate this sampling budget efficienly, so as to bestsupport selecting the best.

Frazier (Cornell University) Optimal Learning talk 13 / 45

Page 14: Optimal Learning: an Overview

Example: Drug Discovery

A pharmaceutical company has a library of millions of compounds that itwould like to screen for potential cancer drugs. Robots will do the initialassay by performing a fixed test one or several times on some subset of thecompounds.

Sources: http://www.paa.co.uk/img/labauto/inst highres/ssi/mini dispenser.jpg,

http://www.kalyx.com/store/images/Images SW/SW 201442-51.jpg

Frazier (Cornell University) Optimal Learning talk 14 / 45

Page 15: Optimal Learning: an Overview

Example: Queuing Control

We would like to choose a nurse/doctor staffing policy in a hospitalto minimize expected patient waiting time.To figure out the patient waiting time under a particular staffingpolicy is, we can simulate it using a discrete event simulation.Each simulation takes about 1 minute.We want to choose the best among 100 possible staffing policies,using at most 24 hours of simulation effort.

Shi, Chen, and Yucesan

budget. This is the basic idea of optimal computing budget

allocation (OCBA) (Chen et al. 1996, 1999).

We apply the hybrid algorithm for a stochastic

resource allocation problem, where no analytical

expression exists for the objective function, and it is

estimated through simulation. Numerical results show that

our proposed algorithm can be effectively used for solving

large-scale stochastic discrete optimization problems.

The paper is organized as follows: In section 2 we

formulate the resource allocation problem as a stochastic

discrete optimization problem. In section 3 we present the

hybrid algorithm. The performance of the algorithm is

illustrated with one numerical example in Section 4.

Section 5 concludes the paper.

problem of performing numerical expectation since the

functional L( 0,5> is available only in the form of a complex

calculation via simulation. The standard approach is to

estimate E[L( 6, 5>] by simulation sampling, i.e.,

Unfortunately, t can not be too small for a reasonable

estimation of E[L(O, 01. And the total number of

simulation samples can be extremely large since in the

resource allocation problems, the number of (el, &,..., 0,) combinations is usually very large as we will show the

following example.

2 RESOURCE ALLOCATION PROBLEMS 2.1 Buffer Allocation in Supply Chain Management

There are many resource allocation problems in the design

of discrete event systems. In this paper we consider the

following resource allocation optimization problem:

where 0 is a finite discrete set and J : 0 + R is a

performance function that is subject to noise. Often J ( @ is

an expectation of some random estimate of the

performance,

where 5 is a random vector that represents uncertain factors

in the systems. The "stochastic" aspect has to do with the

We consider a 10-node network shown in Figure 1. There

are 10 servers and 10 buffers, which is an example of a

supply chain, although such a network could be the model

for many different real-world systems, such as a

manufacturing system, a communication or a traffic

network. There are two classes of customers with different

arrival distributions, but the same service requirements. We

consider both exponential and non-exponential

distributions (uniform) in the network. Both classes arrive

at any of Nodes 0-3, and leave the network after having

gone through three different stages of service. The routing

is not probabilistic, but class dependent as shown in Figure

1. Finite buffer sizes at all nodes are assumed which is

exactly what makes our optimization problem interesting.

More specific, we are interested in distributing optimally

C1: Unif[2,18]

C2: Exp(O.12) Arrival:

Figure 1: A 10-node Network in the Resource Allocation Problem

396

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 20, 2008 at 16:29 from IEEE Xplore. Restrictions apply.

Source: Shi,Chen,and Yucesan 1999Frazier (Cornell University) Optimal Learning talk 15 / 45

Page 16: Optimal Learning: an Overview

Mathematical Model

We consider k alternative options.

The underlying value of alternative x is θx ∈ R. We do not observethis, and must try to learn it through sampling. Let θ = (θ1, . . . ,θk).

At each time n = 1, . . . ,N, we choose an alternative to sample,xn ∈ {1, . . . ,k}.We observe a sample,

yn | xn,θ1:k ∼ Normal(θxn ,λ2).

To keep things simple, we assume that λ 2 is known and is the samefor all options. It is also possible to allow λ 2 to be unknown, and tovary with x .

At time N, we select an option x̂ ∈ {1, . . . ,k}, which we hope is thebest option.

We receive a reward of θx̂ , which is the true value of the selectedoption x̂ .

Frazier (Cornell University) Optimal Learning talk 16 / 45

Page 17: Optimal Learning: an Overview

Example, Time 0

-2

-1

0

1

2

x=1 x=2 x=3 x=4 x=5

Frazier (Cornell University) Optimal Learning talk 17 / 45

Page 18: Optimal Learning: an Overview

Example, Time 1

-2

-1

0

1

2

x=1 x=2 x=3 x=4 x=5

measurement

Frazier (Cornell University) Optimal Learning talk 17 / 45

Page 19: Optimal Learning: an Overview

Example, Time 2

-2

-1

0

1

2

x=1 x=2 x=3 x=4 x=5

measurements

Frazier (Cornell University) Optimal Learning talk 17 / 45

Page 20: Optimal Learning: an Overview

Example, Time 3

-2

-1

0

1

2

x=1 x=2 x=3 x=4 x=5

measurements

Frazier (Cornell University) Optimal Learning talk 17 / 45

Page 21: Optimal Learning: an Overview

Example, Time 4

-2

-1

0

1

2

x=1 x=2 x=3 x=4 x=5

measurements

Frazier (Cornell University) Optimal Learning talk 17 / 45

Page 22: Optimal Learning: an Overview

Example, Time 5

-2

-1

0

1

2

x=1 x=2 x=3 x=4 x=5

measurements

Frazier (Cornell University) Optimal Learning talk 17 / 45

Page 23: Optimal Learning: an Overview

Example, Time 10

-2

-1

0

1

2

x=1 x=2 x=3 x=4 x=5

measurements

Frazier (Cornell University) Optimal Learning talk 17 / 45

Page 24: Optimal Learning: an Overview

Example, Time 10

-2

-1

0

1

2

x=1 x=2 x=3 x=4 x=5

measurementstrue values

Frazier (Cornell University) Optimal Learning talk 17 / 45

Page 25: Optimal Learning: an Overview

Outline

1 Example Optimal Learning Problems

2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming

3 Conclusion

Frazier (Cornell University) Optimal Learning talk 18 / 45

Page 26: Optimal Learning: an Overview

We put a Bayesian prior probability distribution on θ

The underlying value of alternative x is θx .

We do not know θx , but based on intuition, experience, and datafrom other similar problems, we may be able to make statements like“The expected waiting time for this nurse staffing policy is probablybetween 15 minutes and 4 hours.”

We formalize this by supposing that θx was drawn by nature atrandom from a Bayesian prior probability distribution.

Once θx is drawn by nature (at time n = 0), it stays fixed (overn = 1,2, . . .).

We use a normal prior probility distribution, because it is flexible, andallows easy computation:

θx ∼ Normal(µ0x ,σ20x).

Frazier (Cornell University) Optimal Learning talk 19 / 45

Page 27: Optimal Learning: an Overview

We can use Bayesian statistics to estimate θx , based onnoisy samples.

Suppose our first sample is from option x , so x1 = x .

We observey1 | x1 = x ,θ1:k ∼ Normal(θx ,λ

2).

We can use Bayes rule to calculate the conditional distribution of θx

given this sample.

The conditional distribution given the data is called the posteriordistribution.

Frazier (Cornell University) Optimal Learning talk 20 / 45

Page 28: Optimal Learning: an Overview

We can use Bayesian statistics to estimate θx , based onnoisy samples.

Bayes rule shows us that the posterior distribution on θx is

θx | x1,y1 ∼ Normal(µ1,x ,σ

21,x

),

where

µ1,x =(σ0,x)−2µ0,x + λ−2y1

σ−20,x + λ−2

σ21,x =

[σ−20,x + λ

−2]−1

The posterior distribution on θx ′ , where x ′ 6= x , does not change.

Frazier (Cornell University) Optimal Learning talk 21 / 45

Page 29: Optimal Learning: an Overview

There is a nice expression for the posterior distribution

In general,

θx | x1, . . . ,xn,y1, . . . ,yn ∼ Normal(µn,x ,σ

2n,x

),

where µn,x ,σn,x can be computed recursively.For x = xn, the posterior is updated via:

µn+1,x =σ−2n,x µn,x + λ−2yn+1

σ−2n,x + λ−2

σ2n+1,x =

[σ−2n,x + λ

−2]−1and the posterior for x 6= xn does not change:

µn+1,x = µn,x for x 6= xn

σn+1,x = σn,x for x 6= xn

Frazier (Cornell University) Optimal Learning talk 22 / 45

Page 30: Optimal Learning: an Overview

Example of the posterior distribution

Frazier (Cornell University) Optimal Learning talk 23 / 45

Page 31: Optimal Learning: an Overview

Example of the posterior distribution

Frazier (Cornell University) Optimal Learning talk 23 / 45

Page 32: Optimal Learning: an Overview

Example of the posterior distribution

Frazier (Cornell University) Optimal Learning talk 23 / 45

Page 33: Optimal Learning: an Overview

Example of the posterior distribution

Frazier (Cornell University) Optimal Learning talk 23 / 45

Page 34: Optimal Learning: an Overview

Example of the posterior distribution

Frazier (Cornell University) Optimal Learning talk 23 / 45

Page 35: Optimal Learning: an Overview

Example of the posterior distribution

Frazier (Cornell University) Optimal Learning talk 23 / 45

Page 36: Optimal Learning: an Overview

Example of the posterior distribution

Frazier (Cornell University) Optimal Learning talk 23 / 45

Page 37: Optimal Learning: an Overview

Example of the posterior distribution

Frazier (Cornell University) Optimal Learning talk 23 / 45

Page 38: Optimal Learning: an Overview

Example of the posterior distribution

Frazier (Cornell University) Optimal Learning talk 23 / 45

Page 39: Optimal Learning: an Overview

Example of the posterior distribution

Frazier (Cornell University) Optimal Learning talk 23 / 45

Page 40: Optimal Learning: an Overview

Example of the posterior distribution

Frazier (Cornell University) Optimal Learning talk 23 / 45

Page 41: Optimal Learning: an Overview

We can use the posterior distribution to choose x̂

Recall that x̂ is our selection of the best, and it is chosen at time Nbased on all previous samples x1, . . . ,xN ,y1, . . . ,yN .

Based on these samples, the posterior is,

θx | x1, . . . ,xN ,y1, . . . ,yN ∼ Normal(µN,x ,σ

2N,x

).

Recall that the reward for choosing x̂ = x is θx .

The conditional expected reward for choosing x̂ = x is

E [θx | x1, . . . ,xN ,y1, . . . ,yN ] = µN,x .

Thus, the choice that gives the biggest conditional expected reward isarg maxx µN,x and it has value maxx µN,x .

Frazier (Cornell University) Optimal Learning talk 24 / 45

Page 42: Optimal Learning: an Overview

Example of choosing x̂

Frazier (Cornell University) Optimal Learning talk 25 / 45

Page 43: Optimal Learning: an Overview

Example of choosing x̂

Frazier (Cornell University) Optimal Learning talk 25 / 45

Page 44: Optimal Learning: an Overview

How should we choose the xn?

Our ability to choose x̂ accurately depends on the chooses we makefor x1, . . . ,xN .

Intuitively, a good way to choose these should spend the first part ofthe budget exploring the options to figure out which ones are amongthe best, and then focus the rest of the budget on these options.

But how precisely should we accomplish this?

One way to choose the xn is through the knowledge-gradient (KG)method for independent beliefs.

Later, in the seminar, I will talk about the knowedge-gradient methodfor correlated beliefs.

Frazier (Cornell University) Optimal Learning talk 26 / 45

Page 45: Optimal Learning: an Overview

Outline

1 Example Optimal Learning Problems

2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming

3 Conclusion

Frazier (Cornell University) Optimal Learning talk 27 / 45

Page 46: Optimal Learning: an Overview

The knowledge-gradient factor quantifies a sample’s value

The knowledge-gradient method is created via the following thoughtexperiment.

If we were to stop at time n, and select x̂ based on x1:n,y1:n, wewould earn an expected reward of

µ∗n = max

xµn,x .

If we were to take one more sample, xn+1, and observe yn+1, andthen select x̂ , we would earn an expected reward of

µ∗n+1 = max

xµn+1,x .

Frazier (Cornell University) Optimal Learning talk 28 / 45

Page 47: Optimal Learning: an Overview

The knowledge-gradient factor quantifies a sample’s value

Before the new sample, our value was µ∗n . After, it was µ∗n+1.

The additional sample xn+1,yn+1 has increased our solution’s value by

µ∗n+1−µ

∗n .

At time n, we don’t know yn+1, so we can’t compute this quantity.

We can, however, compute its expected value,

KGn(x) = En

[µ∗n+1−µ

∗n | xn+1 = x

].

We call this quantity the knowledge-gradient (KG) factor, because itmeasures the change in the value of our knowledge.

Frazier (Cornell University) Optimal Learning talk 29 / 45

Page 48: Optimal Learning: an Overview

Computing the KG factor requires us to think about howthe next measurement will change our posterior.

At time n, suppose we decide to measure xn+1 = x .

Before we observe yn+1, it is random.

We can calculate its conditional distribution givenx1, . . . ,xn+1,y1, . . . ,yn.

yn+1 | x1, . . . ,xn+1,y1, . . . ,yn ∼ Normal(µn,x ,σ2n,x + λ

2),

From this, and the formula for µn+1,x in terms of µn,x , σ2n,x , and

yn+1, we can calculate that

µn+1,x | x1, . . . ,xn+1,y1, . . . ,yn ∼ Normal(µn,x , σ̃2n,x),

where σ̃n,x = σ2n,x/

√σ2n,x + λ 2.

This distribution is called the “posterior predictive distribution”.

Frazier (Cornell University) Optimal Learning talk 30 / 45

Page 49: Optimal Learning: an Overview

The KG factor has a convenient formula.

The VOI / KG factor for measuring alternative x at time n is

KGn(x) = σ̃n,x f

(−∆n,x

σ̃n,x

)where

∆n,x = |µn,x −maxx ′ 6=x

µn,x ′ |,

f (c) = cΦ(c) + ϕ(c),

Φ is the normal cdf, and ϕ is the normal pdf.

Frazier (Cornell University) Optimal Learning talk 31 / 45

Page 50: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=0

priorY_x

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 51: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=1

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 52: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=2

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 53: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=3

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 54: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=4

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 55: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=5

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 56: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=6

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 57: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=7

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 58: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=8

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 59: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=9

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 60: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=10

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 61: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=11

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 62: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=12

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 63: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=13

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 64: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=14

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 65: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=15

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 66: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=16

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 67: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=17

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 68: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=18

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 69: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=19

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 70: Optimal Learning: an Overview

Animation of the KG method

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2 3 4 5 6

KnowledgeGradient n=20

priorY_xyhat

Frazier (Cornell University) Optimal Learning talk 32 / 45

Page 71: Optimal Learning: an Overview

The KG method works well

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 >.950

20

40

60

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 >.950

20

40

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 >.950

10

20

Value(KG)−Value(Boltzmann)

Value(KG)−Value(EqualAllocation)

Value(KG)−Value(Exploit)

−0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

20

40

60

−0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

20

40

60

−0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.080

20

40

60

Value(KG)−Value(IE)

Value(KG)−Value(OCBA)

Value(KG)−Value(LL(S))

Histogram of the sampled difference in value for competing policiesaggregated across the 100 randomly generated problems.

Frazier (Cornell University) Optimal Learning talk 33 / 45

Page 72: Optimal Learning: an Overview

Outline

1 Example Optimal Learning Problems

2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming

3 Conclusion

Frazier (Cornell University) Optimal Learning talk 34 / 45

Page 73: Optimal Learning: an Overview

The knowledge-gradient method is good, but it notoptimal in general

The KG method works well against other algorithms proposed for thisproblem.

The KG method is optimal if we have only sample remaining.

But in general, multiple samples remain.

What is the best algorithm in general?

Frazier (Cornell University) Optimal Learning talk 35 / 45

Page 74: Optimal Learning: an Overview

The optimal algorithm is the solution to a dynamicprogram

The conditional expected value we receive, given what we know attime N, is maxx µN,x .

Define VN = VN(µN ,σ2N) = maxx µN,x .

At time N−1, the optimal choice of xN is the one that maximizes theexpected value of this reward,

arg maxxN

EN [VN |xN ],

and this maximal expected value is

VN−1 = VN−1(µN−1,σ2N−1) = max

xNEN−1[VN |xN ].

Notation: En means the conditional expectation with respect to µn

and σ2n ; µN = (µN ,x : x = 1, . . . ,k) and similarly for σ2

N .

Frazier (Cornell University) Optimal Learning talk 36 / 45

Page 75: Optimal Learning: an Overview

In principle, we can repeat this to find the optimal rule forevery xn

We iterate backward n = N,N−1,N−2, . . . ,1, where in each stage n:

We computed Vn+1(µn+1,σ2n+1) in the previous stage.

The optimal choice for xn+1 is

xn+1 ∈ arg maxxn+1

En[Vn+1(µn+1,σ2n+1)|xn+1]

The value of this decision is

Vn(µn,σn) = maxxn+1

En[Vn+1(µn+1,σ2n+1)|xn+1].

This is dynamic programming.

Frazier (Cornell University) Optimal Learning talk 37 / 45

Page 76: Optimal Learning: an Overview

We can solve the DP exactly for small problems

Here is the value function for a Bayesian ranking and selection problemwith Bernoulli (0/1) observations, and independent beta prior distributions.

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0 1 2 3 4 5 6 7 8

Valu

e

# measurements (N)

k=4k=3k=2

Frazier (Cornell University) Optimal Learning talk 38 / 45

Page 77: Optimal Learning: an Overview

For large problems, this does not work because of thecurse of dimensionality

To use dynamic programming, we need to compute and storeVn(µn,σ

2n ) for each possible value of µn and σ2

n . (We need tocompute Vn for every n, but at any given time we only need Vn andVn+1 in memory.)

There are infinitely many possible values for µn. We can discretize,but it is a vector in k dimensions, and so discretizing into m pieces ineach dimension allows for mk possible values.

σ2n only takes finitely many values, since (σ2

nx)−1/λ−2 is the numberof samples of alternative x , but there are still kn/n! possible values.

For large values of k (say, k > 10), solving the dynamic program iscomputationally intractable.

For such large values of k , we recommend using the KG policy.

Frazier (Cornell University) Optimal Learning talk 39 / 45

Page 78: Optimal Learning: an Overview

The KG method has nice optimality properties

The dynamic programming equations to prove certain optimality propertiesof the KG policy:

The knowledge-gradient policy is optimal when N = 1.

The knowledge-gradient policy is asymptotically optimal as N → ∞.

For other N, the knowledge-gradient policy’s suboptimality isbounded by

VKG ,n(Sn)≥ V n(Sn)− N−n−1√2π

maxx

σ̃nx ,

where VKG ,n gives the value of the knowledge-gradient policy and V n

the value of the optimal policy, both with N−n measurementsremaining.

Frazier (Cornell University) Optimal Learning talk 40 / 45

Page 79: Optimal Learning: an Overview

The KG method has nice optimality properties

If there are exactly 2 alternatives (M=2), the knowledge-gradient policy isoptimal. In this case, the optimal policy reduces to

xn = arg maxx

σnx .

Frazier (Cornell University) Optimal Learning talk 41 / 45

Page 80: Optimal Learning: an Overview

The KG method has nice optimality properties

If there is no measurement noise, and alternatives may be reordered so that

µ01 ≥ µ

02 ≥ . . .≥ µ

0M

σ01 ≥ σ

02 ≥ . . .≥ σ

0M ,

then the knowledge-gradient policy is optimal.

Frazier (Cornell University) Optimal Learning talk 42 / 45

Page 81: Optimal Learning: an Overview

Outline

1 Example Optimal Learning Problems

2 Bayesian Selection of the BestProblem summaryBayesian inferenceThe Knowledge-Gradient (KG) methodOptimality analysis using dynamic programming

3 Conclusion

Frazier (Cornell University) Optimal Learning talk 43 / 45

Page 82: Optimal Learning: an Overview

Conclusion

We gave an introduction to Bayesian ranking and selection, which isone of many optimal learning problems.

We showed how Bayesian statistics and a one-step optimality analysiscan be used to derive the KG policy for this problem.

In the seminar today, we will look at another optimal learningproblem: simulation optimization, with correlated Bayesian priordistributions.

Knowledge-gradient methods offer a convenient yet principaled way todevelop algorithms for a wide variety of optimal learning problems.

Frazier (Cornell University) Optimal Learning talk 44 / 45

Page 83: Optimal Learning: an Overview

For further reading

P.I. Frazier, “Tutorial: Optimization via Simulation with BayesianStatistics and Dynamic Programming,” Winter SimulationConference, 2012. (available on my website)

W.B. Powell & I.O. Ryzhov “Optimal Learning”, 2012. (textbook)

The original paper on the KG method: P.I. Frazier, W.B. Powell, andS. Dayanik “A Knowledge-Gradient Policy for Sequential InformationCollection,” SIAM Journal on Control and Optimization, 2008.

Other introductory materials available on my website,http://people.orie.cornell.edu/pfrazier/

Frazier (Cornell University) Optimal Learning talk 45 / 45