hierarchical pomdp planning and execution joelle pineau machine learning lunch november 20, 2000

15
Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Post on 21-Dec-2015

226 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Hierarchical POMDP Planning and Execution

Joelle Pineau

Machine Learning LunchNovember 20, 2000

Page 2: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

Partially Observable MDP

POMDPs are characterized by: States: sS Actions:aA Observations: oO

Transition probabilities: T(s,a,s’)=Pr(s’|s,a) Observation probabilities: T(o,a,s’)=Pr(o|s,a) Rewards: R(s,a)

Beliefs: b(st)=Pr(st|ot,at,…,o0,a0)

S1

S2 S3

Page 3: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

The problem

How can we find good policies for complex POMDPs?

Is there a principled way to provide near-optimal policies?

Page 4: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

Proposed Approach

Exploit structure in the problem domain.

What type of structure?Action set partitioning

Act

InvestigateHealth Move

NavigateCheckPulse

AskWhere

Left Right Up Down

CheckMeds

Page 5: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

Hierarchical POMDP Planning What do we start with?

A full POMDP model: {So,Ao,Oo,Mo}. An action set partitioning graph.

Key idea: Break the problem into many “related” POMDPs. Each smaller POMDP has only a subset of Ao.

imposing policy constraint

But why? POMDP: exponential run-time per value iteration O(|A|n-1

|O|)

Page 6: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

Example

M

BK

E

0.1

0.1 0.1

0.1

0.1

0.1

0.8

0.8

POMDP:

So= {Meds, Kitchen, Bedroom}

Ao = {ClarifyTask, CheckMeds, GoToKitchen, GoToBedroom}

Oo = {Noise, Meds, Kitchen, Bedroom}

Value Function:

MedsState

KitchenState

BedroomState

0.8

GoToKitchen

ClarifyTask

GoToBedroom

CheckMeds

Page 7: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

Hierarchical POMDP

Action Partitioning:

Act

Move CheckMeds

ClarifyTask

ClarifyTask

GoToKitchen GoToBedroom

Page 8: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

Local Value Function and Policy - Move Controller

ClarifyTask

GoToKitchen

GoToBedroom

MedsState

KitchenState

BedroomState

Page 9: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

ClarifyTask

GoToKitchen

GoToBedroom

MedsState

KitchenState

BedroomState

Modeling Abstract ActionsProblem: Need parameters for abstract action Move

Solution: Use the local policy of corresponding low-level controller

General form: Pr ( sj | si, akabstract ) = Pr ( sj | si, Policy(ak

abstract,si) )

Example:Pr ( sj | MedsState, Move )= Pr ( sj | MedsState, ClarifyTask )

Policy (Move,si):

Page 10: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

Local Value Function and Policy - Act Controller

Move

MedsState

KitchenState

BedroomState

CheckMeds

Page 11: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

Comparing Policies

= ClarifyTask = CheckMeds = GoToKitchen = GoToBedroom

Hierarchical Policy: Optimal Policy:

Page 12: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

Bounding the value of the approximation

Value function of top-level controller is an upper-bound on the value of the approximation.

Why? We were optimistic when modeling the abstract action.

Similarly, we can find a lower-bound.How? We can assume “worst-case” view when

modeling the abstract action.

If we partition the action set differently, we will get different bounds.

Page 13: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

A real dialogue management example

- AskGoWhere- GoToRoom- GoToKitchen- GoToFollow- VerifyRoom- VerifyKitchen- VerifyFollow

- GreetGeneral- GreetMorning- GreetNight- RespondThanks

- AskWeatherTime- SayCurrent- SayToday- SayTomorrow

- StartMeds- NextMeds- ForceMeds- QuitMeds

- AskCallWho- CallHelp- CallNurse- CallRelative- VerifyHelp- VerifyNurse- VerifyRelative

- AskHealth- OfferHelp

- SayTimeAct

CheckHealth

PhoneDoMedsCheckWeatherMoveGreet

Page 14: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

Results:

MDP H-POMDP POMDP

Navigation Problem:|S|=11, |A|=6, |O|=6

CPU Time (secs): 0.000654 2.84 1119.93

Average Reward: 0.0 12.2 12.5

Dialogue Problem:|S|=20, |A|=30, |O|=27

CPU Time (secs): 6.46 77.99 >24hrs

Average Reward: 53.33 64.43

%Correct actions: 80.0 93.2

Page 15: Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000

Joelle PineauHierarchical POMDP Planning and Execution

Final words

We presented:a general framework to exploit structure in

POMDPs;

Future work:automatic generation of good action partitioning;conditions for additional observation abstraction;bigger problems!