15-887 planning, execution and learning learning in planning:...

22
15-887 Planning, Execution and Learning Learning in Planning: Experience Graphs Maxim Likhachev Robotics Institute Carnegie Mellon University

Upload: others

Post on 10-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

15-887

Planning, Execution and Learning

Learning in Planning:

Experience Graphs

Maxim Likhachev

Robotics Institute

Carnegie Mellon University

Page 2: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Robots Often Perform Repetitive Tasks

Carnegie Mellon University 2

[Cowley et al., ‘13]

Page 3: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Robots Often Perform Repetitive Tasks

Carnegie Mellon University 3

[Cowley et al., ‘13]

Can we re-use prior experiences

to accelerate heuristic search?

Can we re-use prior experiences

to accelerate heuristic search?

Page 4: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Learning from Demonstrations

Carnegie Mellon University 4

[Phillips et al., ‘13]

Page 5: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Learning from Demonstrations

Carnegie Mellon University 5

[Phillips et al., ‘13]

Can we re-use prior demonstrations

to accelerate heuristic search?

Can we re-use prior demonstrations

to accelerate heuristic search?

Page 6: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Experience Graphs [Phillips et al., RSS’12]

Consider original graph G = {V,E}

Carnegie Mellon University 6

Page 7: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Experience Graphs [Phillips et al., RSS’12]

Given a set of previous paths (experiences or demonstrations)…

Carnegie Mellon University 7

Page 8: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Experience Graphs [Phillips et al., RSS’12]

Put them together into an E-graph (Experience graph), GE={VE,EE}

Carnegie Mellon University 8

Page 9: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Given a new planning query…

Carnegie Mellon University 9

Experience Graphs [Phillips et al., RSS’12]

Page 10: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

…re-use E-graph GE. For repetitive tasks, planning becomes much faster

goal

start

Carnegie Mellon University 10

Experience Graphs [Phillips et al., RSS’12]

states expanded shown in grey

Page 11: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

…re-use E-graph GE. For repetitive tasks, planning becomes much faster

goal

start

Carnegie Mellon University 11

Experience Graphs [Phillips et al., RSS’12]

states expanded shown in grey

for which domains it would be useful

and for which useless?

Page 12: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

…re-use E-graph GE. For repetitive tasks, planning becomes much faster

goal

start

Carnegie Mellon University 12

Experience Graphs [Phillips et al., RSS’12]

states expanded shown in grey

General idea:Instead of using traditional heuristic to bias the search towards the goal, compute a new heuristics hE(s) that

biases the search towards a set of paths in GE.

General idea:Instead of using traditional heuristic to bias the search towards the goal, compute a new heuristics hE(s) that

biases the search towards a set of paths in GE.

Page 13: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

…re-use E-graph GE. For repetitive tasks, planning becomes much faster

goal

start

Carnegie Mellon University 13

For any state s0 in G, the new heuristic hE() function is:

Experience Graphs [Phillips et al., RSS’12]

states expanded shown in grey

Page 14: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

…re-use E-graph GE. For repetitive tasks, planning becomes much faster

goal

start

Carnegie Mellon University 14

For any state s0 in G, the new heuristic hE() function is:

Experience Graphs [Phillips et al., RSS’12]

states expanded shown in grey

heuristics hE(s) biases the search towards those paths thatcan be used to get to the goal with the least amount oftraversal outside of GE.

heuristics hE(s) biases the search towards those paths thatcan be used to get to the goal with the least amount oftraversal outside of GE.

Page 15: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Planning with Experience Graphs Pseudocode

Carnegie Mellon University 15

Given a planning graph G = {V, E}

Initialize E-graph GE = {VE=0,EE=0}

Every time a new planning query (sstart, sgoal) comes in

re-compute heuristic hE

run weighted A* search with heuristics hE inflated by weight w;

execute the found path π

GE = GE U π //add the found path to the E-graph

Page 16: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Planning with Experience Graphs Pseudocode

Carnegie Mellon University 16

Given a planning graph G = {V, E}

Initialize E-graph GE = {VE=0,EE=0}

Every time a new planning query (sstart, sgoal) comes in

re-compute heuristic hE

run weighted A* search with heuristics hE inflated by weight w;

execute the found path π

GE = GE U π //add the found path to the E-graph

For any state s, hE(s) is the cost of a least cost path from s to s goal

in a graph G’ ={ V, E’}, where E’ consists of:a) All edges (u,v) in EE with cost cE(u,v) (i.e., original cost c(u,v))b) All pairs (u,v) in V with cost ƐEhG(u,v)

For any state s, hE(s) is the cost of a least cost path from s to s goal

in a graph G’ ={ V, E’}, where E’ consists of:a) All edges (u,v) in EE with cost cE(u,v) (i.e., original cost c(u,v))b) All pairs (u,v) in V with cost ƐEhG(u,v)

Page 17: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Planning with Experience Graphs Pseudocode

Carnegie Mellon University 17

Given a planning graph G = {V, E}

Initialize E-graph GE = {VE=0,EE=0}

Every time a new planning query (sstart, sgoal) comes in

re-compute heuristic hE

run weighted A* search with heuristics hE inflated by weight w;

execute the found path π

GE = GE U π //add the found path to the E-graph

For any state s, hE(s) is the cost of a least cost path from s to s goal

in a graph G’ ={ V, E’}, where E’ consists of:a) All edges (u,v) in EE with cost cE(u,v) (i.e., original cost c(u,v))b) All pairs (u,v) in V with cost ƐEhG(u,v)

For any state s, hE(s) is the cost of a least cost path from s to s goal

in a graph G’ ={ V, E’}, where E’ consists of:a) All edges (u,v) in EE with cost cE(u,v) (i.e., original cost c(u,v))b) All pairs (u,v) in V with cost ƐEhG(u,v)

Time for the whiteboard exampleTime for the whiteboard example

What is hE() with no prior experiences (GE = 0}?

How do you efficiently compute hE()?

Page 18: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Efficient Computation of hE

Carnegie Mellon University 18

re-compute heuristic hE for all states s’ in {V E U sgoal }:

run a single Dijkstra’s on graph Gf = {Vf,Ef}, where

a) Vf = {V E U sgoal } and Ef = all pairs of states in Vf

b) cost(u,v) = c(u,v) if (u,v) in EE

cost(u,v) = ƐE hG(u,v) otherwise

during the weighted A* search itself, for any state s:

Page 19: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Planning with Experience Graphs Pseudocode

Carnegie Mellon University 19

Given a planning graph G = {V, E}

Initialize E-graph GE = {VE=0,EE=0}

Every time a new planning query (sstart, sgoal) comes in

re-compute heuristic hE

run weighted A* search with heuristics hE inflated by weight w;

execute the found path π

GE = GE U π //add the found path to the E-graph

heuristics hε(s) is guaranteed to be ε-consistentheuristics hε(s) is guaranteed to be ε-consistent

Page 20: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

Planning with Experience Graphs Pseudocode

Carnegie Mellon University 20

Given a planning graph G = {V, E}

Initialize E-graph GE = {VE=0,EE=0}

Every time a new planning query (sstart, sgoal) comes in

re-compute heuristic hE

run weighted A* search with heuristics hE inflated by weight w;

execute the found path π

GE = GE U π //add the found path to the E-graph

heuristics hε(s) is guaranteed to be ε-consistentheuristics hε(s) is guaranteed to be ε-consistent

Theorem 1: Planning with E-graphs is complete with respect to the original graph

Theorem 2: When running weighted A* with inflation w:

cost(solution) ≤ w∙ƐE∙cost(optimal solution)

Theorem 1: Planning with E-graphs is complete with respect to the original graph

Theorem 2: When running weighted A* with inflation w:

cost(solution) ≤ w∙ƐE∙cost(optimal solution)

Page 21: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

• ƐE controls how much the search can rely on experiences/demonstrations in

E-graph

Effect of ƐE

Carnegie Mellon University 21

Page 22: 15-887 Planning, Execution and Learning Learning in Planning: …mmv/planning/handouts/learningegraph... · 2016-10-27 · goal start Carnegie Mellon University 14 For any state s

• Planning with Experience Graphs – a method for biasing search

towards the reuse of previously planned paths and demonstrations

- Based on the idea of re-computing heuristics to guide the search towards a set of

paths rather than towards a goal

- Not all domains may benefit from reusing prior paths

- Useful in domains where a robot has a similar workspace across planning instances

(e.g., manipulation for manufacturing)

Summary

Carnegie Mellon University 22