generating plans in concurrent, probabilistic, oversubscribed domains li li and nilufer onder...
TRANSCRIPT
![Page 1: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/1.jpg)
Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains
Li Li and Nilufer OnderDepartment of Computer ScienceMichigan Technological University
(Presented by: Li Li)AAAI 08 Chicago
July 16, 2008
![Page 2: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/2.jpg)
Outline
Example domain Two usages of concurrent actions AO* and CPOAO* algorithms Heuristics used in CPOAO* Experiment results Conclusion and future work
![Page 3: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/3.jpg)
A simple Mars rover domain
Locations A, B, C and D on Mars:
A
B
C
D
![Page 4: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/4.jpg)
Main features
Aspects of complex domains Deadlines, limited resources Failures Oversubscription Concurrency
Two types of parallel actions Different goals (“all finish”) Redundant (“early finish”)
Aborting actions When they succeed When they fail
![Page 5: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/5.jpg)
The actions
Action Success probability
Description
Move(L1,L2) 100% Move the rover from Location L1 to location L2
Sample (L) 70% Collect a soil sample at location L
Camera (L) 60% Take a picture at location L
![Page 6: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/6.jpg)
Problem 1
Initial state: The rover is at location A No other rewards have been achieved
Rewards: r1 = 10: Get back to location A r2 = 2: Take a picture at location B r3 = 1: Collect a soil sample at
location B r4 = 3: Take a picture at location C
![Page 7: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/7.jpg)
Problem 1
Time Limit: The rover is only allowed to operate
for 3 time units Actions:
Each action takes 1 time unit to finish Actions can be executed in parallel if
they are compatible
![Page 8: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/8.jpg)
A solution to problem 1
(1) Move (A, B) (2) Camera (B)
Sample (B) (3) Move (B, A)
A
B
D
C R4=3
R1=10
R2=2R3=1
![Page 9: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/9.jpg)
Add redundant actions
Actions Camera0 (60%) andCamera1 (50%) can be executed concurrently.
There are two rewards: R1: Take a picture P1 at location A R2: Take a picture P2 at location A
![Page 10: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/10.jpg)
Two ways of using concurrent actions
All finish: Speed up the executionUse concurrent actions to achieve different goals.
Early finish: Redundancy for critical tasksUse concurrent actions to achieve the same goal.
![Page 11: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/11.jpg)
Example of all finish actions
If R1=10 and R2=10, execute Camera0 to achieve one reward
and execute Camera1 to achieve another.(All finish)
The expected total rewards = 10*60% + 10*50%
= 11
![Page 12: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/12.jpg)
Example of early finish actions
If R1=100 and R2=10, Use both Camera0 and Camera1 to achieve R1. (Early finish)
The expected total rewards = 100*50% + (100-100*50%)*60% = 50 + 30 = 80
![Page 13: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/13.jpg)
The AO* algorithm
AO* searches in an and-or graph (hypergraph)
OR
AND Hyperarcs (compact way)
![Page 14: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/14.jpg)
Concurrent Probabilistic Over-subscription AO* (CPOAO*)
Concurrent action set Represent parallel actions rather than individua
l actions Use hyperarcs to represent them
State Space Resource levels are part of a state Unfinished actions are part of a state
![Page 15: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/15.jpg)
CPOAO* search Example
A
B
C
D
3
4
4
5
6
A Mars Rover problemMap:
Actions: Move (Location, Location) Image_S (Target) 50%, T= 4 Image_L (Target) 60%, T= 5 Sample (Target) 70%, T= 6
Targets: I1 – Image at location B I2 – Image at location C S1 – Sample at location B S2 – Sample at location D
Rewards: Have_Picture(I1) = 3 Have_Picture(I2) = 4 Have_Sample(S1) = 4 Have_Sample(S2) = 5 At_location(A) = 10;
![Page 16: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/16.jpg)
CPOAO* search Example
18.4
S0
A
B
C
D
3
4
4
5
6
Expected reward calculated using the heuristics
T=10
![Page 17: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/17.jpg)
CPOAO* Search Example
15.2
S0
T=10
{Move(A,B)} {Move(A,C)
}
Do-nothing
{Move(A,D)}
3 04 6
S1
S2
S3
S415.
213.2 0 10
T=7 T=4T=6 T=10
A
B
C
D
3
4
4
5
6
Values of terminal nodes
Expected reward calculated using the heuristics
Expected reward calculated from children
Best action
![Page 18: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/18.jpg)
CPOAO* search Example
S1
15.2
{Move(B,D)}
Do-nothing
{Move(A,B)}
{a1,a2,a3}
44
0
a1: Sample(T2)
a3: Image_L(T1)
a2: Image_S(T1)
S5
S6
S7
S815.
814.6
0 0T=3 T=3 T=3 T=750% 50%
Sample(T2)=2 Sample(T2)=2Image_L(T1)=1 Image_L(T1)=1
A
B
C
D
3
4
4
5
6
Values of terminal nodes
Expected reward calculated using the heuristics
Expected reward calculated from children
Best action
S0
15.2
![Page 19: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/19.jpg)
CPOAO* search Example
S1
15.2
{Move(B,D)}
Do-nothing
{Move(A,B)}
{a1,a2,a3}
44
0
a1: Sample(T2)
a3: Image_L(T1)
a2: Image_S(T1)
S5
S6
S7
S815.
814.6
0 0T=3 T=3 T=3 T=750% 50%
Sample(T2)=2 Sample(T2)=2Image_L(T1)=1 Image_L(T1)=1
A
C
D
3
4
4
5
6
Values of terminal nodes
Expected reward calculated using the heuristics
Expected reward calculated from children
Best action
S0
15.2
![Page 20: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/20.jpg)
CPOAO* search Example
S1
11.5
{Move(B,D)}
Do-nothing
{Move(A,B)}
{a1,a2,a3}
44
0
a1: Sample(T2)
a3: Image_L(T1)
a2: Image_S(T1)
S5 S
6 S7
S8
13 10
0 0
T=3
T=2
T=7
50%
Sample(T2)=2Image_L(T1)=1
S10S9
S11
S12
S13 S14
S15
S167 3 13 3 5.8 2.8 010T=1
T=1
T=2T=3T=0 T=0 T=3
a4: Move(B,A)
{a1}
2 1{a1,a2}
a5: Do-nothing
{a4}
{a5}
{a4}
{a5}
3 3 0
0
A
B
C
D
3
4
4
5
6
Values of terminal nodes
Expected reward calculated using the heuristics
Expected reward calculated from children
Best action
S0 13.
2
S213.2
![Page 21: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/21.jpg)
CPOAO* search Example
13.2
S0
T=10
{Move(A,B)} {Move(A,C)
}
Do-nothing
{Move(A,D)}
3 04 6
S1
S2
S3
S4
11.5
13.2 0 10T=7
T=4T=6 T=10
{a1,a2,a3}
a1: Sample(T2)
a3: Image_L(T1)
a2: Image_S(T1)
a4: Move(B,A)a5: Do-nothing
A
B
C
D
3
4
4
5
6
Values of terminal nodes
Expected reward calculated using the heuristics
Expected reward calculated from children
Best action
![Page 22: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/22.jpg)
CPOAO* search Example
11.5
S0
T=10
{Move(A,B)} {Move(A,C)
}
Do-nothing
{Move(A,D)}
3 04 6
S1
S2 S
3
S4
11.5
3.2
0 10T=7
T=4 T=10
{a1,a2,a3}
T=6
S17
S18
S19
S20
T=2 T=2 T=1 T=64 2.4 0 0
a1: Sample(T2)
a3: Image_L(T1)
a2: Image_S(T1)
a4: Move(B,A)a5: Do-nothing
a6: Image_S(T3)
a8: Move(C,D)
{a6,a7} {a8
}
Do-nothing04 5
50% 50%
A
B
C
D
3
4
4
5
6
Values of terminal nodes
Expected reward calculated using the heuristics
Expected reward calculated from children
Best action
a7: Image_L(T3)
![Page 23: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/23.jpg)
CPOAO* search Example
S1
11.5
{Move(B,D)}
Do-nothing
{Move(A,B)}
{a1,a2,a3}
44
0
a1: Sample(T1)
a3: Image_L(T2)
a2: Image_S(T2)
S5
S6 S
7
S8
13
10
0 0
T=3
T=2
T=7
50%
Sample(T2)=2Image_L(T1)=1
S10S9
S11
S12
S13 S14
S15
S165 3 13 3 4.4 1.4 010T=1
T=1
T=2T=3T=0 T=0 T=3
a4: Move(B,A)
{a1}
2 1{a1,a2}
a5: Do-nothing
{a4}
{a5}
{a4}
{a5}
3 3 0
0
11.5
S0
A
B
C
D
3
4
4
5
6
Values of terminal nodes
Expected reward calculated using the heuristics
Expected reward calculated from children
Best action
T=3
![Page 24: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/24.jpg)
CPOAO* search improvements
S1
11.5
{Move(B,D)}
Do-nothing
{Move(A,B)}
{a1,a2,a3}
44
0S5
S6 S
7
S8
13
10
0 0
T=3
T=2
T=7
50%
Sample(T2)=2Image_L(T1)=1
S10S9
S11
S12
S13 S14
S15
S165 3 13 3 4.4 1.4 010T=1
T=1
T=2T=3T=0 T=0 T=3
{a1}
2 1{a1,a2}{a4
}
{a5}
{a4}
{a5}
3 3 0
0
11.5
S0Estimate total expected rewards
Prune branches
T=3
Plan Found: Move(A,B) Image_S(T1) Move(B,A)
![Page 25: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/25.jpg)
Heuristics used in CPOAO*
A heuristic function to estimate the total expected reward for the newly generated states using a reverse plan graph.
A group of rules to prune the branches of the concurrent action sets.
![Page 26: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/26.jpg)
Estimating total rewards A three-step process using an rpgraph
1. Generate an rpgraph for each goal2. Identify the enabled propositions3. Compute the probability of achieving e
ach goal Compute the expected rewards based
on the probabilities Sum up the rewards to compute the val
ue of this state
![Page 27: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/27.jpg)
Heuristics to estimate the total rewards
Have_Picture(I1)
At_Location(B)
Image_S(I1)
Move(A,B)At_Location(A)
At_Location(D)
Reverse plan graph
Start from goals.
Layers of actions and propositions
Cost marked on the actions
Accumulated cost marked on the propositions.
4
5
8
4
8
7
4
5 3
3
Move(A,B)Image_L(I1)
At_Location(B)
Move(D,B)At_Location(D)
At_Location(A)
Move(D,B)4
9
![Page 28: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/28.jpg)
Heuristics to estimate the total rewards
Given a specific state …
At_Location(A)=T
Time= 7 Enabled propositi
ons are marked in blue.
Have_Picture(I1)
At_Location(B)
Image_S(I1)
Move(A,B)At_Location(A)
At_Location(D)4
5
8
4
8
7
4
5 3
3
Move(A,B)Image_L(I1)
At_Location(B)
Move(D,B)At_Location(D)
At_Location(A)
Move(D,B)4
9
![Page 29: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/29.jpg)
Heuristics to estimate the total rewards
Enable more actions and propositions
Actions are probabilistic
Estimate the probabilities that propositions and the goals being true
Sum up rewards on all goals.
Have_Picture(I1)
At_Location(B)
Image_S(I1)
Move(A,B)At_Location(A)
At_Location(D)4
5
8
4
8
7
4
5 3
3
Move(A,B)Image_L(I1)
At_Location(B)
Move(D,B)
Move(D,B)4
9
![Page 30: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/30.jpg)
Rules to prune branches (when time is the only resource)
Include the action if it does not delete anything
Ex. {action-1, action-2, action-3} is better than {action-2,action-3} if action-1 does not delete
anything. Include the action if it can be aborted later Ex. {action-1,action-2} is better than {action-1} if the duration of action2 is longer than the duration
of action-1. Don’t abort an action and restart it again i
mmediately
![Page 31: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/31.jpg)
Experimental work
Mars rover problem with following actions Move Collect-Soil-Sample Camera0 Camera1
3 sets of experiments - Gradually increase complexity Base Algorithm With rules to prune branches only With both heuristics
![Page 32: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/32.jpg)
Results of complexity experimentsProblem Base CPOAO* CPOAO* With Pruning Only CPOAO* With Pruning and rpgraph
Time=20 Time=20 Time=40 Time=20 Time=40
NG ET (s) NG ET (s) NG ET (s) NG ET (s) NG ET (s)
12-12-12 530 <0.1 120 <0.1 2832 1 56 <0.1 662 <0.1
12-12-23 1170 <0.1 287 <0.1 27914 23 86 <0.1 5269 2
12-21-12 501 <0.1 204 <0.1 11315 3 53 <0.1 1391 <0.1
12-21-23 1230 <0.1 380 <0.1 85203 99 116 <0.1 11833 5
15-16-14 1067 <0.1 180 <0.1 6306 2 60 <0.1 1232 1
15-16-31 1941 <0.1 417 <0.1 49954 61 93 <0.1 7946 2
15-28-14 1121 <0.1 347 <0.1 29760 18 71 <0.1 2460 <0.1
15-28-31 2345 <0.1 694 <0.1 --- --- 146 <0.1 19815 8
Problem: locations-paths-rewards; NG: The number of nodes generated; ET: Execution time (sec.)
![Page 33: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/33.jpg)
Ongoing and Future Work
Ongoing Add typed objects and lifted actions Add linear resource consumptions
Future Explore the possibility of using state caching Classify domains and develop domain-specific h
euristic functions Approximation techniques
![Page 34: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/34.jpg)
Related Work LAO*: A Heuristic Search Algorithm that finds soluti
ons with loops (Hansen & Zilberstein 2001) CoMDP: Concurrent MDP (Mausam & Weld 2004) GSMDP: Generalized semi-Markov decision
process (Younes & Simmons 2004) mGPT: A Probabilistic Planner based on Heuristic S
earch (Bonet & Geffner 2005) Over-subscription Planning (Smith 2004; Benton, D
o, & Kambhampati 2005) HAO*: Planning with Continuous Resources in Stoc
hastic Domains (Mausam, Benazera, Brafman, Meuleau & Hansen 2005)
![Page 35: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/35.jpg)
Related Work CPTP:Concurrent Probabilistic Temporal
Planning (Mausam & Weld 2005) Paragraph/Prottle: Concurrent Probabilistic
Planning in the Graphplan Framework (Little & Thiebaux 2006)
FPG: Factored Policy Gradient Planner (Buffet & Aberdeen 2006)
Probabilistic Temporal Planning with Uncertain Durations (Mausam & Weld 2006)
HYBPLAN:A Hybridized Planner for Stochastic Domains (Mausam, Bertoli and Weld 2007)
![Page 36: Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University](https://reader030.vdocuments.net/reader030/viewer/2022032516/56649c7c5503460f94930425/html5/thumbnails/36.jpg)
Conclusion
An AO* based modular framework Use redundant actions to increase
robustness Abort running actions when needed Heuristic function using reverse plan gra
ph Rules to prune branches