fast approximate pomdp planning: overcoming the curse of history!
DESCRIPTION
Fast approximate POMDP planning: Overcoming the curse of history!. Joelle Pineau, Geoff Gordon and Sebastian Thrun, CMU Point-based value iteration: an anytime algorithm for POMDPs Workshop on Advances in Machine Learning - June, 2003. Why use a POMDP?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/1.jpg)
Fast approximate POMDP planning:
Overcoming the curse of history!
Joelle Pineau, Geoff Gordon and Sebastian Thrun, CMU
Point-based value iteration: an anytime algorithm for POMDPs
Workshop on Advances in Machine Learning - June, 2003
![Page 2: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/2.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Why use a POMDP?
• POMDPs provide a rich framework for sequential decision-making, which can model:
– varying rewards across actions and goals
– actions with random effects
– uncertainty in the state of the world
![Page 3: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/3.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Existing applications of POMDPs
– Maintenance scheduling
» Puterman, 1994
– Robot navigation
» Koenig & Simmons, 1995;
Roy & Thrun, 1999
– Helicopter control
» Bagnell & Schneider, 2001;
Ng et al., 2002
– Dialogue modeling
» Roy, Pineau & Thrun, 2000;
Peak&Horvitz, 2000
– Preference elicitation
» Boutilier, 2002
![Page 4: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/4.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
POMDP Model
POMDP is n-tuple { S, A, , T, O, R }:
What goes on: st-1 st
at-1 at
T(s,a,s’) = state-to-state transition probabilitiesO(s,a,o) = observation generation probabilitiesR(s,a) = Reward function
S = state setA = action set = observation set
What we see: ot-1 ot
What we infer: bt-1 bt
![Page 5: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/5.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Understanding the belief state
• A belief is a probability distribution over states
Where Dim(B) = |S|-1
– E.g. Let S={s1, s2}
P(s1)
0
1
![Page 6: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/6.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Understanding the belief state
• A belief is a probability distribution over states
Where Dim(B) = |S|-1
– E.g. Let S={s1, s2, s3}
P(s1)
P(s2)
0
1
1
![Page 7: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/7.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Understanding the belief state
• A belief is a probability distribution over states
Where Dim(B) = |S|-1
– E.g. Let S={s1, s2, s3 , s4}
P(s1)
P(s2)
0
1
1
P(s3)
![Page 8: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/8.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
The first curse of POMDP planning
• The curse of dimensionality:
– dimension of planning problem = # of states
– related to the MDP curse of dimensionality
![Page 9: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/9.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
POMDP value functions
V(b) = expected total discounted future reward starting from b
• Represent V as the upper surface of a set of hyper-planes.
• V is piecewise-linear convex
• Backup operator T: V TV
Bb
AabVbabTabRbV
'
)'()',,(),(max)(
P(s1)
V(b)
b
![Page 10: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/10.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Exact value iteration for POMDPs
• Simple problem: |S|=2, |A|=3, ||=2
Iteration # hyper-planes 0 1
P(s1)
V0(b)
b
![Page 11: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/11.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Exact value iteration for POMDPs
• Simple problem: |S|=2, |A|=3, ||=2
Iteration # hyper-planes 0 1 1 3
P(s1)
V1(b)
b
![Page 12: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/12.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Exact value iteration for POMDPs
• Simple problem: |S|=2, |A|=3, ||=2
Iteration # hyper-planes 0 1 1 3 2 27
P(s1)
V2(b)
b
![Page 13: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/13.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Exact value iteration for POMDPs
• Simple problem: |S|=2, |A|=3, ||=2
Iteration # hyper-planes 0 1 1 3 2 27 3 2187
P(s1)
V2(b)
b
![Page 14: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/14.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Exact value iteration for POMDPs
• Simple problem: |S|=2, |A|=3, ||=2
Iteration # hyper-planes 0 1 1 3 2 27 3 2187 4 14,348,907
P(s1)
V2(b)
b
![Page 15: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/15.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Exact value iteration for POMDPs
• Simple problem: |S|=2, |A|=3, ||=2
Many hyper-planes can be pruned away
P(s1)
V2(b)
b
Iteration # hyper-planes 0 1 1 3 2 5 3 9 4 7 5 13 10 27 15 47 20 59
![Page 16: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/16.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Is pruning sufficient?
|S|=20, |A|=6, ||=8
Iteration # hyper-planes0 11 5
2 213 3 ?????
…
Not for this problem!
![Page 17: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/17.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Certainly not for this problem!
Physiotherapy
Patientroom
Robothome
|S|=576, |A|=19, |O|=17
State Features: {RobotLocation, ReminderGoal, UserLocation, UserMotionGoal,
UserStatus, UserSpeechGoal}
![Page 18: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/18.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
The second curse of POMDP planning
• The curse of dimensionality:
– the dimension of each hyper-plane = # of states
• The curse of history:
– the number of hyper-planes grows
exponentially with the planning horizon
![Page 19: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/19.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
The second curse of POMDP planning
• The curse of dimensionality:
– the dimension of each hyper-plane = # of states
• The curse of history:
– the number of hyper-planes grows
exponentially with the planning horizon
||1
2 |||||| nAS
|| n
Complexity of POMDP value iteration:
dimensionality history
![Page 20: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/20.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Possible approximation approaches
• Ignore the belief:
• Discretize the belief:
• Compress the belief:
• Plan for trajectories:
s1
s0
s2
- overcomes both curses- very fast- performs poorly in high entropy beliefs[Littman et al., 1995]
- overcomes the curse of history (sort of) - scales exponentially with # states[Lovejoy, 1991; Brafman 1997;Hauskrecht, 1998; Zhou&Hansen, 2001]
- overcomes the curse of dimensionality[Poupart&Boutilier, 2002; Roy&Gordon, 2002]
- can diminish both curses- requires restricted policy class- local minimum, small gradients[Baxter&Bartlett, 2000; Ng&Jordan, 2002]
![Page 21: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/21.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
A new algorithm: Point-based value iteration
• Main idea:
– Select a small set of belief points
P(s1)
V(b)
b1 b0 b2
![Page 22: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/22.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
A new algorithm: Point-based value iteration
• Main idea:
– Select a small set of belief points
– Plan for those belief points only
P(s1)
V(b)
b1 b0 b2
![Page 23: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/23.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
A new algorithm: Point-based value iteration
• Main idea:
– Select a small set of belief points Focus on reachable beliefs
– Plan for those belief points only
P(s1)
V(b)
b1 b0 b2a,o a,o
![Page 24: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/24.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
A new algorithm: Point-based value iteration
• Main idea:
– Select a small set of belief points Focus on reachable beliefs
– Plan for those belief points only Learn value and its gradient
P(s1)
V(b)
b1 b0 b2a,o a,o
![Page 25: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/25.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Point-based value update
P(s1)
V(b)
b1 b0 b2
![Page 26: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/26.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Point-based value update
• Initialize the value function (…and skip ahead a few iterations)
P(s1)
Vn(b)
b1 b0 b2
![Page 27: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/27.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
• Initialize the value function (…and skip ahead a few iterations)
• For each bB:
Point-based value update
P(s1)
Vn(b)
b
![Page 28: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/28.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
• Initialize the value function (…and skip ahead a few iterations)
• For each bB:
– For each (a,o): Project forward bba,o and find best value:
Point-based value update
P(s1)
Vn(b)
b
)()( ,, oan
oab bVs
ba1,o2ba2,o2ba2,o1ba1,o1
![Page 29: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/29.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
• Initialize the value function (…and skip ahead a few iterations)
• For each bB:
– For each (a,o): Project forward bba,o and find best value:
Point-based value update
P(s1)
Vn(b)
b
)()( ,, oan
oab bVs
ba1,o2ba2,o2ba2,o1ba1,o1
ba1,o1, b
a2,o1
ba2,o2
ba1,o2
![Page 30: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/30.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
• Initialize the value function (…and skip ahead a few iterations)
• For each bB:
– For each (a,o): Project forward bba,o and find best value:
– Sum over observations:
Point-based value update
P(s1)
Vn(b)
b
',
, )'(),,()',,(),()(so
oab
ab soasOsasTasRs
)()( ,, oan
oab bVs
ba1,o2ba2,o2ba2,o1
ba1,o1, b
a2,o1
ba2,o2
ba1,o2
ba1,o1
![Page 31: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/31.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
• Initialize the value function (…and skip ahead a few iterations)
• For each bB:
– For each (a,o): Project forward bba,o and find best value:
– Sum over observations:
Point-based value update
P(s1)
Vn(b)
b
',
, )'(),,()',,(),()(so
oab
ab soasOsasTasRs
)()( ,, oan
oab bVs
ba1,o1, b
a2,o1
ba2,o2
ba1,o2
![Page 32: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/32.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
• Initialize the value function (…and skip ahead a few iterations)
• For each bB:
– For each (a,o): Project forward bba,o and find best value:
– Sum over observations:
Point-based value update
P(s1)
Vn+1(b)
b
',
, )'(),,()',,(),()(so
oab
ab soasOsasTasRs
)()( ,, oan
oab bVs
ba1
ba2
![Page 33: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/33.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
• Initialize the value function (…and skip ahead a few iterations)
• For each bB:
– For each (a,o): Project forward bba,o and find best value:
– Sum over observations:
– Max over actions:
Point-based value update
',
, )'(),,()',,(),()(so
oab
ab soasOsasTasRs
)()( ,, oan
oab bVs
abanV maxarg1
P(s1)
Vn+1(b)
b
ba1
ba2
![Page 34: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/34.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
• Initialize the value function (…and skip ahead a few iterations)
• For each bB:
– For each (a,o): Project forward bba,o and find best value:
– Sum over observations:
– Max over actions:
Point-based value update
',
, )'(),,()',,(),()(so
oab
ab soasOsasTasRs
)()( ,, oan
oab bVs
abanV maxarg1
P(s1)
Vn+1(b)
b1 b2b0
![Page 35: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/35.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Complexity of value update
Exact Update Point-based Update
I - Projection S2An S2AB
II - Sum SAn SAB2
III - Max SAn SAB
where: S = # states n = # solution vectors at iteration n A = # actions B = # belief points
= # observations
n+1
![Page 36: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/36.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
A bound on the approximation error
• Bound error of the point-based backup operator.
• Bound depends on how densely we sample belief points.– Let be the set of reachable beliefs.
– Let B be the set of belief points.
Theorem: For any belief set B and any horizon n, the error of the PBVI algorithm n=||Vn
B-Vn*|| is bounded by:
1'
2minmax
||'||minmax
)1(
)(
bbwhere
RR
BbbB
Bn
![Page 37: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/37.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Experimental results: Lasertag domain
State space = RobotPosition OpponentPositionObservable: RobotPosition - always
OpponentPosition - only if same as Robot
Action space = {North, South, East, West, Tag}
Opponent strategy: Move away from robot w/ Pr=0.8
|S|=870, |A|=5, ||=30
![Page 38: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/38.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Performance of PBVI on Lasertag domain
Opponent tagged 70% of trials
Opponent tagged 17% of trials
![Page 39: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/39.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Performance on well-known POMDPs
Maze33|S|=36, |A|=5, ||=17
Hallway|S|=60, |A|=5, ||=20
Hallway2|S|=92, |A|=5, ||=17
Reward0.1980.942.302.25
Reward0.261n.v.0.530.53
Reward0.109n.v.0.350.34
Time(s)0.19n.v.
121663448
Time(s)0.51n.v.450288
Time(s)1.44n.v.
27898360
B-
174660470
B-
n.v.30086
B-
3371840
95
%Goal2298
10098
%Goal47n.v10095
MethodQMDP
GridPBUAPBVI
![Page 40: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/40.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Selecting good belief points
• What can we learn from policy search methods?– Focus on reachable beliefs.
P(s1)
b ba1,o2ba2,o2ba2,o1ba1,o1
a2,o2 a1,o2
a2,o1
a1,o1
![Page 41: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/41.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Selecting good belief points
• What can we learn from policy search methods?– Focus on reachable beliefs.
• How can we avoid including all reachable beliefs?– Reachability analysis considers all actions, but stochastic observation
choice.
P(s1)
b ba1,o2ba2,o1
a1,o2
a2,o1
ba2,o2ba1,o1
![Page 42: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/42.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Selecting good belief points
• What can we learn from policy search methods?– Focus on reachable beliefs.
• How can we avoid including all reachable beliefs?– Reachability analysis considers all actions, but stochastic observation
choice.
• What can we learn from our error bound?– Select widely-spaced beliefs, rather than near-by beliefs.
P(s1)
b ba1,o2ba2,o1
21'minmax
)1(
||'||minmax)(
bbRR Bbbn
a1,o2
a2,o1
![Page 43: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/43.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Validation of the belief expansion heuristic
• Hallway domain: |S|=60, |A|=5, ||=20
![Page 44: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/44.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Validation of the belief expansion heuristic
• Tag domain: |S|=870, |A|=5, ||=30
![Page 45: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/45.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
The anytime PBVI algorithm
• Alternate between:
– Growing the set of belief point (e.g. B doubles in size everytime)
– Planning for those belief points
• Terminate when you run out of time or have a good policy.
![Page 46: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/46.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
The anytime PBVI algorithm
• Alternate between:
– Growing the set of belief point (e.g. B doubles in size everytime)
– Planning for those belief points
• Terminate when you run out of time or have a good policy.
• Lasertag results:
– 13 phases: |B|=1334
– ran out of time!
![Page 47: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/47.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
The anytime PBVI algorithm
• Alternate between:
– Growing the set of belief point (e.g. B doubles in size everytime)
– Planning for those belief points
• Terminate when you run out of time or have a good policy.
• Lasertag results:
– 13 phases: |B|=1334
– ran out of time!
• Hallway2 results:
– 8 phases: |B|=95
– found good policy.
![Page 48: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/48.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Summary
• POMDPs suffer from the curse of history» # of beliefs grows exponentially with the planning horizon
• PBVI addresses the curse of history by limiting planning to a small set of likely beliefs.
• Strengths of PBVI include:» anytime algorithm;
» polynomial-time value updates;
» bounded approximation error;
» empirical results showing we can solve problems up to 870 states.
![Page 49: Fast approximate POMDP planning: Overcoming the curse of history!](https://reader035.vdocuments.net/reader035/viewer/2022062322/56814691550346895db3afd9/html5/thumbnails/49.jpg)
Workshop on Advances in Machine Learning Joelle Pineau
Recent work
• Current hurdle to solving even larger POMDPs:
PBVI complexity is O(S2AB + SAB2)
– Addressing S2:
» Combine PBVI with belief compression techniques.
But sparse transition matrices mean: S2 S
– Addressing B2:
» Use ball-trees to structure belief points.
» Find better belief selection heuristics.