knowledge representation meets stochastic planning

30
Knowledge Representation Knowledge Representation Meets Meets Stochastic Planning Stochastic Planning Bob Givan Joint work w/ Alan Fern and SungWook Yoon Electrical and Computer Engineering Purdue University

Upload: mihaly

Post on 13-Feb-2016

23 views

Category:

Documents


5 download

DESCRIPTION

Bob Givan Joint work w/ Alan Fern and SungWook Yoon. Knowledge Representation Meets Stochastic Planning. Electrical and Computer Engineering Purdue University. Overview. We present a form of approximate policy iteration specifically designed for large relational MDPs . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Knowledge Representation Meets Stochastic Planning

Knowledge RepresentationKnowledge RepresentationMeetsMeets

Stochastic PlanningStochastic Planning

Bob GivanJoint work w/ Alan Fern and SungWook Yoon

Electrical and Computer Engineering

Purdue University

Page 2: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 2Dagstuhl, May 12-16, 2003

Overview We present a form of approximate policy iteration

specifically designed for large relational MDPs.

We describe a novel application viewing entire planning domains as MDPs we automatically induce domain-specific planners

Induced planners are state-of-the-art on: deterministic planning benchmarks stochastic variants of planning benchmarks

Page 3: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 3Dagstuhl, May 12-16, 2003

Decision-theoretic Planning

Traditional Planning

Ideas from Two Communities

Induction of Control Knowledge

Planning Heuristics

Policy Rollout Approximate Policy Iteration(API)

Two views of the new technique

Iterative improvementof control knowledge

API with a policy space bias

Page 4: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 4Dagstuhl, May 12-16, 2003

Planning Problems

?

Current State Goal State/Region

States: First-order Interpretations of a particular language

A planning problem gives: a current state a goal state a list of actions and their semantics (may be stochastic)

Available actions:

Pickup(x) PutDown(y)

Page 5: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 5Dagstuhl, May 12-16, 2003

Distributions over problems sharing one set of actions (but with different domains and sizes)

Planning Domains

Blocks World Domain

??

? ?

Available actions:

Pickup(x) PutDown(y)

Page 6: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 6Dagstuhl, May 12-16, 2003

Traditional planners solve problems, not domains. little or no generalization between problems in a domain

Planning domains “solved” by control knowledge pruning some actions, typically eliminating search

Control Knowledge

?

?

?X

e.g. “don’t pick up a solved block”

X

X

Page 7: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 7Dagstuhl, May 12-16, 2003

Recent Control Knowledge Research Human-written c. k. often eliminates search

[Bacchus & Kabanza, 1996] TL-Plan

Helpful c. k. can be learned from “small problems”[Khardon, 1996 & 1999] Learning Horn clause

action strategies

[Huang, Selman & Kautz, 2000] Learning action selection & action rejection rules

[Martin & Geffner, 2000] Learning generalized policies in concept languages

[Yoon, Fern & Givan, 2002] Inductive policy selection forstochastic planning domains

Page 8: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 8Dagstuhl, May 12-16, 2003

Unsolved Problems Finding control knowledge without immediate

access to small problems Can we learn directly in a large domain?

Improving buggy control knowledge All previous techniques produce unreliable control

knowledge…with occasional fatal flaws.

Our approach: view control knowledge as an MDP policy and apply policy improvement

A policy is a choice of action for each MDP state

Page 9: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 9Dagstuhl, May 12-16, 2003

View domain as one big statespace, each state a planning problem

This view facilitates generalization between problems.

Planning Domains as MDPs

Blocks World Domain

??

? ?

Available actions:

Pickup(x) PutDown(y)

Pickup(Purple)

Page 10: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 1

0Dagstuhl, May 12-16, 2003

Decision-theoretic Planning

Traditional Planning

Ideas from Two Communities

Induction of Control Knowledge

Planning Heuristics

Policy Rollout Approximate Policy Iteration(API)

Two views of the new technique

Iterative improvementof control knowledge

API with a policy space bias

Page 11: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 1

1Dagstuhl, May 12-16, 2003

Given a policy and a state s, can we improve (s)?

If V(s) < Q(s,b), then (s) can be improved to blue.

Can make such improvements at all states at once:

Policy Iteration

sRo

Rb

tn

s1

sk

t1…

V(s) = Q(s,o) = Ro + Es’{s1…sk} V(s’)

Q(s,b) = Rb + Es’{t1…tn} V(s’)

(s)

Policy Improvement

base policy improved policy

Page 12: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 1

2Dagstuhl, May 12-16, 2003

Flowchart View of Policy Iteration

Current Policy

Choose best actionat each state

Compute Q

for each actionat all states

Compute V

at all states

Improved Policy ’

V

Q

Problem: too many states

Page 13: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 1

3Dagstuhl, May 12-16, 2003

Compute V

at all statesat all states

Flowchart View of Policy Rollout

Improved Policy

V

Q Choose best actionat each state

Compute Q

for each actionat all states

Current Policy

s”(s”) s’ …(s’)

…Trajectories under

s’V(s’)

at s’

sRa … s1

ska Sample s’ from s1…sk

s

Q(s,•)

at ss

’(s)

at s

Page 14: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 1

4Dagstuhl, May 12-16, 2003

Approximate Policy Iteration

Compute Q

for each action

at state ss

Q(s,•)

Compute V

at state s’at state s’

Choose best actionat state s

Current Policy

s”(s”)

s’V(s’)

s

’(s)draw a training set of pairs (s,’(s)) learn a policyrepeat

Idea: use machine learning to control the number of samples needed

Refinement: use pairs (s,Q(s,•)) to define mis- classification costs

Page 15: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 1

5Dagstuhl, May 12-16, 2003

Challenge Problem

Consider the following stochastic blocks world problem:

Goal: Clear(A)Assume: Block color affects pickup() success

Optimal policy is compact, but value function is not – state value depends on set of colors above A

A A

?

Page 16: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 1

6Dagstuhl, May 12-16, 2003

Policy for Example Problem

A compact policy for this problem: 1. If holding a block, put it down on the table,

else… 2. Pick up a clear block above A.

How can we formalize this policy?

A A

?1.

A

?A

2.

Page 17: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 1

7Dagstuhl, May 12-16, 2003

Action Selection Rules [Martin&Geffner, KR2000]

Pickup a clear block above block A…

Action selection rules based on classes of objects Apply action a to an object in class C (if possible). abbreviated C : a

How can we describe the object classes?

A A

?

A

?A

Page 18: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 1

8Dagstuhl, May 12-16, 2003

A

?A

Formal Policy for Example Problem

English Decision List Taxonomic Syntax

1.“blocks being held” : putdown

2.“clear blocks above block A” : pickup

1. holding : putdown

2. clear (on* A) : pickup

A A

?1. 2.

We find this policy with a heuristic search

guided by the training data

Page 19: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 1

9Dagstuhl, May 12-16, 2003

Decision-theoretic Planning

Traditional Planning

Ideas from Two Communities

Induction of Control Knowledge

Planning Heuristics

Policy Rollout Approximate Policy Iteration(API)

Two views of the new technique

Iterative improvementof control knowledge

API with a policy space bias

Page 20: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 2

0Dagstuhl, May 12-16, 2003

API with a Policy Language Bias

Compute Q

for each action

at state ss

Q(s,•)

Compute V

at state s’at state s’

Choose best actionat state s

Current Policy

s”(s”)

s’V(s’)

s

’(s) Train a new policy ’

Page 21: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 2

1Dagstuhl, May 12-16, 2003

Incorporating Value Estimates What happens if the policy can’t find reward?

For learning control knowledge, we use the FF-plan plangraph heuristic

s’ …(s’)

Trajectories under Use a value estimate at these states

Page 22: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 2

2Dagstuhl, May 12-16, 2003

Initial Policy Choice

Policy iteration requires an initial base policy

Options include: random policy greedy policy with respect to a planning heuristic policy learned from small problems

Page 23: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 2

3Dagstuhl, May 12-16, 2003

Experimental Domains

(Stochastic)Blocks World

(Stochastic)Painted Blocks

World

(Stochastic)Logistics World

SBW(n) SPW(n) SLW(t,p,c)

Page 24: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 2

4Dagstuhl, May 12-16, 2003

API ResultsStarting with flawed policies learned from small problems

Suc

cess

Rat

e

Suc

cess

Rat

e

Page 25: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 2

5Dagstuhl, May 12-16, 2003

API Results

We used the heuristic of FF-plan (Hoffman and Nebel ’02 JAIR)

Starting with a policy greedy with respect to adomain independent heuristic

Page 26: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 2

6Dagstuhl, May 12-16, 2003

How Good is the Induced Planner?

SuccessRate

Average Plan Length

RunningTime(s)

FF API FF API FF API

BW(10) 1 0.99 33 25 0.1 0.5

BW(15) 0.96 0.99 53 39 4.8 0.9

BW(20) 0.72 0.98 74 55 35.2 1.4

BW(30) 0.11 0.99 112 86 176.1 2.4

LW(4,6,4) 1 1 16 16 0.0 0.5

LW(5,14,20) 1 1 73 74 0.7 3.4

Page 27: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 2

7Dagstuhl, May 12-16, 2003

Conclusions Using a policy space bias, we can learn good

policies for extremely large structured MDPs.

We can automatically learn domain-specific planners that compete favorably with the state-of-the-art domain-independent planners.

Page 28: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 2

8Dagstuhl, May 12-16, 2003

Approximate Policy IterationSample states s, and compute Q values at each:

Form a training set of tuples (s,b,Q,b(s)).

Learn a new policy from this training set.

sRo

Rb

tn

s1

skt1…

Estimate Rb + Es’{t1…tn} V(s’) by

Sampling states ti from t1…tn

Drawing trajectories under from ti to estimate V

Computing Q,b(s):

Page 29: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 2

9Dagstuhl, May 12-16, 2003

Markov Decision Process (MDP) Ingredients:

System state x in state space X Control action a in A(x) Reward R(x,a) State-transition probability P(x,y,a)

Find control policy to maximize objective fun

Page 30: Knowledge Representation Meets Stochastic Planning

Bob Givan Electrical and Computer Engineering Purdue University 3

0Dagstuhl, May 12-16, 2003

Control Knowledge vs. Policy Perhaps the biggest difference in communities:

deterministic planning works with action sequences decision-theoretic planning works with policies

Policies are needed because uncertainty may carry you to any state. compare: control knowledge also handles every state

Good c.k. eliminates search defines a policy over the possible state/goal pairs