advanced topics in search theory pandora’s problem based on: “optimal search for the best...

Advanced Topics in Advanced Topics in Search TheorySearch Theory

Pandora’s Problem

Based on: “Optimal Search for the Best Alternative”, Econometrica, May 1979 by Marty Weitzman

In Today’s ClassIn Today’s Class

Pandora’s problem formal presentationDifference from “one-sided search”Solution principle

2

Pandora’s ProblemPandora’s Problem There are n closed boxes Box i, 1≤i≤n, contains a potential reward with

comulative distribution function , independent of the other rewards

It costs to open box i and learn its contents, which becomes known after a time lag of

- representing a fallback reward that could always be collected (usually set to 0)

Originally, discount factor – r (but we’ll ignore that)

ii xF

ic

it0x

)(xf

)(xf

)(xf

)(xf

)(xf

1c

2c

3c

4c5c

Pandora can open each of these boxes…

The Decision ProblemThe Decision Problem

At each stage:– Decide whether or not to open a box

– If stop – collect at that time the maximum reward

– If continue:• Select next box to open

• Pay the fee for opening

• Wait for the outcome

Notice: the sum of search costs is paid during search whereas maximum reward is collected after the search has been terminated

5

The GoalThe Goal

Find a sequential decision rule that will tell at each stage whether or not to continue searching, in if so, which box to open next, in a way that maximize the expected present discounted value

6

Example – Buying a CarExample – Buying a Car

)(xf

Value of car

Each ad is a box…- With a distribution of utility

-With a an opening fee

Strategy: which car to see first, which second (and so on) and when to stop and buy?

Other ApplicationOther Application

Oil drilling Investing in new technologyBuying a new DVD:

– Better chance for a good price at the outlet store (but higher cost to get there)

Dynamic-Programming-Dynamic-Programming-based Solutionbased Solution

Let the collection of n boxes, denoted I, be partitioned into:– S – set of sampled boxes– - complementing set of closed boxes

We use y to represent the maximum sampled reward

It is enough to know y in order to make a decision (since the probability distributions are independent, once a box is opened its original distribution has no meaning any more)

10

S

ISS SS nI ,,2,1

iSi xy }0{max

The state of the system is thus given by:

Define: - expected value of following the optimal policy onwards

At stage Pandora can:– Terminate search, collecting a reward y– Open box

11

yS , yS ,

yS ,

Si

Opening box Opening box

The expected gain in this case is:

12

Si

iyx iii

i

y

x ii

i

dxxfxiS

dxxfyiSc

i

i

,

,

y

ii xf

xx ,

Dynamic ProgrammingDynamic Programming

13

iyx iii

i

y

x ii

iSi

dxxfxiS

dxxfyiScyyS

i

i

,

,max,max,

Now solve recursively for :-Solve for all cases where one box is closed-Then for two boxes-Then three and so on…

Combinatoric task…

yS ,

Connection to “one-sided” Connection to “one-sided” SearchSearch

Lifetime Utility

f(x)

Reservation V

alue - x

Terminate Search

Resume Search - sample one more

Main differences:-All “boxes” have similar probability distribution function- Infinite horizon-No need for recall

The optimal strategyThe optimal strategy

Suppose there are just 2 boxes:– One closed box i– Other is opened hypothetical box with

reward The decision:

– Don’t open box i: receive – Open box, receive expected net benefit:

15

iz

iz

izx iiii

z

x iiii dxxfxdxxfzcii

i

i

The optimal strategyThe optimal strategy

The indifference (or equivalence) between opening/not opening is when:

16

izx iiii

z

x iiiii dxxfxdxxfzczii

i

i

izx iiii

z

x iiii

izx iii

z

x iii

dxxfxdxxfzc

dxxfdxxfz

ii

i

i

ii

i

i

izx iiiii dxxfzxcii

The Reservation ValueThe Reservation Value

The value which satisfies the equation is called “reservation value” of box i:

17

izx iiiii dxxfzxcii

iz

The Optimal StrategyThe Optimal Strategy

Pandora’s Rule:– Selection Rule: if a box is to be opened, it

should be that closed box with highest reservation price

– Stopping Rule: Terminate search whenever the maximum sampled reward exceeds the reservation price of every closed box

18

Interesting ImplicationsInteresting Implications

Entire structure of the optimal policy has been reduced to a simple statement about reservation prices

The reservation value of each box is calculated by equating a hypothetical gain of stopping with the myopic gain of opening the box and terminating (rather than the full gain of opening the box and continuing on in an optimal manner)

19

Properties of the Properties of the Reservation ValueReservation Value

The RV is completely insensitive to the probability distribution rewards at the lower end of the tail -> any rearrangement of the probability mass located below leaves unaltered (it does change the expected value though)

20

ii xf

iz

ii xf

iz

iz iz


Other things being equal, it is optimal to sample first from distributions which are more spread out or riskier

These low-probability high-payoff situations should be prime candidates for early investigation even though they have a smaller chance of ending up as the source ultimately yielding the maximum reward when the search ends


RV decreases with:– Greater search cost

izx iiiii dxxfzxcii

1c

2c

3c

4c5c

Equivalent?Equivalent?

1c

2c

3c

4c5c

Pandora already opened 4 boxes

Best prize found so far was 12

Pandora already opened 1 box

Best prize found so far was 12

RV and Expected BenefitRV and Expected Benefit

It’s easy to find the optimal policy…… but difficult to calculate the expected

net benefit of that strategy

24

Expected Benefit Expected Benefit CalculationCalculation

sN

ssss

s

s

sNsN

ss

sNsN

s

iiii

s

q

NNNN

N

kkN

R

NN

RRq

N

i R

iiii

i

kki

RRq

Rq

Ni

dqdqdqqfcqfqfqf

dqdqdqqfcqfqf

dqqfcqRRV

11max

11

max

2211

1

2 max

111max

2211

111111

1

21

121

21

)(max

)(max

,,

ii qqq ,,,maxmax 21 where:

terminating search on first trial

terminating search after the i-th box

Trying all boxes

Limitations of the modelLimitations of the model

Current assumptions do not consider:– Adaptive learning about correlated

probability distributions– Parallel search– Risk aversion– Incomplete or no recall– Binding time horizon– Uncertain search costs or time– …

26

ExampleExampleOpen box

1

Open box

2

79

10-2=8

6.5

7-1=6

10-2=89-2=7

7.59-1=8

010

7-2=59-2=7

6

0-1=-1

10-2=810-2=8

6

10-1=9

8 6.5

7.25

9

7.5

10

7-2=5

01007979

Open box

1

Open box

1

8

stopstopOpen box

2stop

Open box

2stop vvvv

v

Box 1 0

10

½

½

Box 2 7

9

½

½

C=1

" לבעיה " פנדורה " פיתרון לבעיה " פנדורה פיתרון

1חלופה :

)10(*5.01 1z

1

111 zdFzxc

81 z

" פנדורה " "פיתרון פנדורה " פיתרון

2חלופה:–Z>7 :

–Z<=7 :

7

)7(*5.0)9(*5.01

2

22

z

zz

בלתי אפשרי

2

222 zdFzxc

7

)9(*5.01

2

2

z

z

Proof of OptimalityProof of Optimality

הוכחה באמצעות אינדוקציה על מספרהקופסאות הסגורות

קופסאות mנניח שכלל פנדורה אופטימלי עם – הטוב ביותר שמצאנו עד כהyסגורות וערך

, האופטימליות נובעת מעצם ההגדרה m=1עבור –reservation valueשל ה-

יש להוכיח כי כלל פנדורה הוא אופטימלי גם עבור –m+1 וערך y

30

izx iiiii dxxfzxcii

הוכחההוכחה

-נסתכל על הקופסה עם הrv הגבוה ביותר שלה rv הקופסאות (נסמן את ה- m+1מבין

)zב- אםy>z הרי שאחרי שנפתח קופסה אחת

קופסאות שלפי האסטרטגיה mנישאר עם האופטימלית לא כדאי לפתוח אף אחת מהן

(הנחת האינדוקציה)כלומר, השאלה היא אם כדאי לפתוח קופסה –

אחת בלבד31

izx iiiii dxxfzxcii

עלות פתיחת הקופסה רווח מפתיחת הקופסה

) המשך ) (הוכחה המשך ) הוכחה

אםy<z:ברור שכדאי לפתוח לפחות קופסה אחת (אפילו –

הגבוה ביותר RVאם נפתח רק את זו עם ה- ואח"כ נעצור נקבל יותר מאשר אם לא נפתח

כלל), kנניח שכדאי לפתוח תחילה קופסה אחרת, –

z נמוך מ- rvבעלת קופסאות אותן mאם נפתח אותה, נישאר עם –

נפתח לפי כלל פנדורה (שאנחנו מניחים שהוא אופטימלי)

32

) המשך ) (הוכחה המשך ) הוכחה

אםy<z:לחילופין, ניתן להסתכל על האלטרנטיבה הבאה:– הגבוה rvאנו פותחים תחילה את הקופסה עם ה- –

ביותר השני הכי גבוה rvאם מה שהתקבל בה גדול מה- –

מסט הקופסאות אזי נעצורkאחרת, נפתח את הקופסה – קופסאותm-1מנקודה זו נמשיך עם כלל פנדורה עם – -המשך ההוכחה טכני – מראה שהexpected

payoffבחלופה האחרונה גבוה יותר 33

Problem 1Problem 1

You are about to purchase an iPod touch over the internet

You estimate the price distribution of the product over the different sellers to be uniform between 200-300 dollars

You can search by yourself, by visiting different web-sites – the cost of time for obtaining a price quote is $1

How will you search? What will be your expected cost? What’s the mean of the number of merchants you’ll visit?

Problem 2Problem 2

You decide not to wait and so you go to the mall in order to buy the iPod.

The mall’s parking lot charges you $8 per hour There are 3 stores in the mall selling iPods:

How will you conduct your search and what will be your overall expected cost?

Store Distribution of Prices

Time you’ll wait for service

A U(80,100) 20 min

B U(70,110) 10 min

C Fixed - $80 1 hour

SolutionSolution

This is actually Pandora’s Problem– The reservation value of Box i:

– …and because we’re looking for minimum price:

izx iiiii dxxfzxcii

i

z

iiiii dxxfxzci

0

Now let’s calculateNow let’s calculate

37

Store Distribution of Prices

Time you’ll wait for service

c f(x)

A U(80,100) 20 min 8/3 1/20

B U(70,110) 10 min 8/6 1/40

C Fixed - $80 1 hour 8 N/A

33.903

8*40804080

2

80*05.0

2

8080

205.0

5.005.005.0

222

80

2

80

AA

AA

A

z

yA

z

AA

cz

zz

z

yyzdyyzcAA

Similarly…Similarly…

Optimal Strategy:– Sample B. If price below $88 then buy in store

B. Otherwise, buy in store C.

33.806

8*80708070

2

70*025.0

2

7070

2025.0

5.0025.0025.0

222

70

2

70

AB

BA

B

z

yB

z

BB

cz

zz

z

yyzdyyzcBB

Other Properties of Other Properties of PandoraPandora

In many cases, the box we check first has the lowest chance of being the actual box used

39

לקופסה להגיע לקופסה הסתברות להגיע הסתברות

מה ההסתברות שנגיע לקופסהi?

1->2->3->4->5…

P(reach i)

40

1

1

121

i

jij

iiii

RVxP

RVxPRVxPRVxP

בקופסה לעצור בקופסה הסתברות לעצור iiהסתברות

מה ההסתברות שנגיע לקופסהi ובה נעצור?

1->2->3->4->5…

P(stop at i)=P(reach i)-P(reach i+1)=

41

i

jij

i

jij RVxPRVxP

11

1

1

של בסופו שלנו שהפרס של הסתברות בסופו שלנו שהפרס הסתברותמקופסה הוא מקופסה דבר הוא iiדבר

מה ההסתברות שנגיע לקופסהi ובסופו של דבר הערך שנמצא בה הוא הערך שנבחר?

42

ix iii dxxGxfi

אז זה הערך שבו יבחר המחפש בסופו של דברx_iהסתברות שאם הערך בקופסה הוא

iN iNi ii iN

ii iii ii ij

ii ii iii

xRx xRx xx xx

jjjNiiii

xRx xRx xx xx

jjjjiiii

xRx xRx xRx

iii

i

dqdqdqxfxfxfxf

dqdqdqxfxfxfxf

dqdqxfxfxf

xG

,min ,min

11111111

,min ,min

11111111

,min ,min ,min

11112211

1 1 1

1 1 1

1 2 1

Ni

jij

ii

Rx

RxR

Rx

1

1

הקופסאות מספר תוחלת הקופסאות מהי מספר תוחלת מהישנפתח?שנפתח?

.תרגיל בית

43

על להשפיע יכולה קופסה על איך להשפיע יכולה קופסה איך??מצבהמצבה

השפעה על ההסתברות שיגיעו אליההשפעה על ההסתברות שהערך שלה ייבחרהשפעה על תוחלת "הערך המנוצל" שלה

44

ההגעה הסתברות על ההגעה השפעה הסתברות על השפעה

-באמצעות שינוי הcost -משפיעים על ה RV

:הסתברות ההגעה החדשה לקופסה

העלותc -ולכן יש גבול לעליה ב- 0 חסומה ב RV . שלילי ישים (למשל פרס אם תגיע cלעיתים לחנות)

45

1->2->3->4->5…

1

1

'i

jij RVxP

izx iiiii dxxfzxcii

" המנוצל ערך ה תוחלת על " השפעה המנוצל ערך ה תוחלת על ""השפעה

-טריוויאלי באמצעות הcost:אפשרי גם דרך שינוי ההתפלגות, אבל

הגדלת הפרסים גורמת גם לירידה בתוחלת "הרווח של –הקופסה" (למשל חנות שתציע מחירים יותר זולים

תגדיל את ההסתברות שיקנו בחנות אבל תקטין בכך את שולי הרווח)

46

ix iii dxxGxfi

iN iNi ii iN

ii iii ii ii

ii ii iii

xRx xRx xx xx

jjjNiiii

xRx xRx xx xx

jjjjiiii

xRx xRx xRx

iii

i

dqdqdqxfxfxfxf

dqdqdqxfxfxfxf

dqdqxfxfxf

xG

,min ,min

11111111

,min ,min

11111111

,min ,min ,min

11112211

1 1 1

1 1 1 1

1 2 1

ix iiii dxxGxfxi

- ב - טיפול ב subsetssubsetsטיפול

מה קורה כשמותר לנו לפתוח רק חלק?N מתוך kמהקופסאות? למשל רק

.תרגיל בית

47

מחפשים על השפעה מחפשים יכולת על השפעה יכולת

מחפש שאיננו משתמש באסטרטגייתהחיפוש האופטימלי – ביצועים ירודים

"אסטרטגייה אפשרית – "מניפולציה לבעיית החיפוש על-מנת "לדחוף"

לאסטרטגיה אופטימלית עבור הבעיה המקורית

48

תוחלת תוחלת אוהבי אוהבי

אסטרטגיה: התוחלת + מחיר סידור הקופסאות לפי

פתיחה כלל עצירה: כאשר התוחלת+מחיר פתיחה

של הקופסה הבאה שמועמדת לפתיחה גדול מערך אמיתי של קופסה שכבר פתחנו

המשך תוחלת המשך אוהבי תוחלת ......אוהבי

מניפולציה: כיוון שסדר החיפוש האופטימאליידוע והאסטרטגיה של אוהבי התוחלת ידועה אפשר לעשות על סט הקופסאות מניפולציה שתגרום למחפשים הנ"ל לחפש בדיוק כמו

האופטימאלי. עבור כל קופסה נשנה את פונקצית ההתפלגות

שלה כך שהתוחלת החדשה של פונקצית ההתפלגות + מחיר הפתיחה יהיה שווה לערך

הסף של אותה קופסה כפי שחושב עפ"י החיפוש האופטימאלי

מניפולציה מניפולציה לפני לפניB_1 B_2 B_

3

Exp(B_1)+cost(B_1) < Exp(B_2)+cost(B_2) < Exp(B_3)+cost(B_3)

אופטימאלי חיפוש לפי הנכון אופטימאלי הסדר חיפוש לפי הנכון הסדרB_3 B_2 B_

1

ResVal(B_3)< ResVal(B_2) < ResVal(B_1)

מניפולציה מניפולציה לאחר לאחרB_1 B_2 B_

3


החיפוש החיפוש ממעיטי ממעיטי

אסטרטגיה: בחירה רנדומית של קופסה, לקיחת הערך

שיש באותה קופסה רנדומית

המשך החיפוש המשך ממעיטי החיפוש ממעיטי

מניפולציה: חישוב תוחלת + מחיר פתיחה עבור כל

קופסה. נציג בפני המחפש אך ורק את הקופסה עם

הערך הכי נמוך שחושב לעיל

מניפולציה מניפולציה לפני לפני

B_3

B_2

B_1


מניפולציה מניפולציה לאחר לאחר

B_1

הבעיה שמוצגת :למשתמש

החיפוש החיפוש מרבי מרבי

אסטרטגיה: מתשאלים כמעט כל קופסה כדי לקבל

החלטה

המשך החיפוש המשך מרבי החיפוש מרבי

מניפולציה: לא נציג בפניהם את הקופסאות שהסיכוי

שהמחפש האופטימאלי יתשאל אותן קטן. לכל קופסה נחשב את ערך הסף ונחשב מה

ההסתברות שנגיע לאותה קופסה לפי החיפוש האופטימאלי. אם ערך זה קטן מ

alpha שקבענו אז לא נציג לפני המחפש את הקופסה הזו.

אופטימאלי חיפוש לפי הנכון אופטימאלי הסדר חיפוש לפי הנכון הסדרB_3 B_2 B_

1

ResVal(B_3)< ResVal(B_2) < ResVal(B_1)

ההסתברות שלפי החיפוש האופטימאלי alpha קטנה מ B_1נתשאל את קופסה

מניפולציה מניפולציה לאחר לאחר

B_3 B_2

advanced topics in search theory pandora’s problem based on: “optimal search for the best...

Documents