a1a1 a4a4 a2a2 a3a3 context-specific multiagent coordination and planning with factored mdps carlos...

Context-Specific Multiagent Coordination and Planning with Factored MDPsCarlos Guestrin Shobha Venkataraman Daphne KollerStanford University

Construction Crew Problem: Dynamic Resource Allocation

Joint Decision Space Represent as MDP:

Action space: joint action a for all agents State space: joint state x of all agents Reward function: total reward r

Action space is exponential: Action is assignment a = {a1,…, an}

State space: Exponential in # variables Global decision requires complete observation

Context-Specific Structure

Summary: Context-Specific Coordination

Summary of Algorithm

1. Pick local rule-based basis functions hi

2. Single LP algorithm for Factored MDPs obtains Qi’s

3. Variable coordination graph computes maximizing action

Construction Crew Problem

SysAdmin: Rule-based x Table-based

Search and rescue Factory management Supply chain Firefighting Network routing Air traffic control Multiple, simultaneous

decisions Limited observability Limited communication

Multiagent Coordination Examples

Comparing to Apricodd [Boutilier et al. ’96-’99]

Conclusions and Extensions

Multiagent planning algorithm: Variable coordination structure; Limited context-specific communication; Limited context-specific observability.

Solve large MDPs!

Extensions to hierarchical and relational models

Stanford UniversityStanford University ! CMU

Agent 2Plumbing, Painting

Agent 1Foundation, Electricity, Plumbing

Agent 3Electricity, Painting

Agent 4Decoration

WANTED: Agents that coordinate to

build and maintain houses, but only when necessary!

Foundation ! {Electricity, Plumbing} ! Painting ! Decoration

Local Q-function Approximation

Q(A1,…,A4, X1,…,X4) ¼

Q1(A1, A4, X1,X4) + Q2(A1, A2, X1,X2) +

Q3(A2, A3, X2,X3) + Q4(A3, A4,

X3,X4) Associated with Agent 3

Observe only

X2 and X3

Limited observability: agent i only observes variables in Qi

Must choose action to maximize i Qi

Problems with Coordination Graph

Tasks last multiple time steps Failures cause chain reactions Multiple houses

0 5 10 15 20 25

number of agents

running time (seconds)

non-csi

Bidirectional Ring

y = 0.53x2 - 0.96x - 0.01

R2 = 0.990

0 5 10 15 20 25 30

number of agents

running time (seconds)

non-csi

y = 0.000049 exp(2.27x)R = 0.9992

Server

Reverse Star

Optimal

Apricodd

Rule-based

Expon06

530.9 530.9 530.9

Expon08

77.09 77.09 77.09

Expon10

0.034 0.034 0.034

Optimal

Apricodd

Rule-based

Linear06

531.4 531.4 531.4

Linear08

430.5 430.5 430.5

Linear10

348.7 348.7 348.7

Running Times for the 'Linear' Problems

y = 0.1473x 3 - 0.8595x2 + 2.5006x - 1.5964

R2 = 0.9997

y = 0.0254x 2 + 0.0363x + 0.0725

R2 = 0.9983

6 8 10 12 14 16 18 20No. of variables

Time (in seconds)Apricodd

Rule-based

Running Times for the 'Expon' Problems

6 8 10 12No. of variables

Time (in seconds)

Apricodd

Rule-based

Context-Specific Coordination Structure

Table size exponential in #variables Messages are tables

Agents communicate even if not necessary Fixed coordination structure

What we want: Use structure in tables Variable coordination structure

Exploit context specific independence!

Local value rules represent context-specific structure:

Set of rules Qi for each agent Must coordinate to maximize total

value: ∑ iaa

Rule-based variable elimination [Zhang and Poole ’99]

Maximizing out A1

Rule-based coordination graph for finding optimal action

A - Simplification on instantiation of the state B - Simplification when passing messages C - Simplification on maximization Simplification by approximation

Variable agent communication structure Coordination structure is dynamic

Long-term Utility = Value of MDP Value computed by linear programming:

One variable V(x) for each state One constraint for each state x and action a Number of states and actions exponential!

)()( :subject to

)(:minimize

⎩⎨⎧∀

Decomposable Value Function

Linear combination of restricted domain basis functions: ∑=

i iihwV )()(~

Each hi is a rule over small part(s) of a complex system: The value of having two agents in the same house The value of two agents are painting a house together

Must find w giving good approximate value function

Single LP Solution for Factored MDPs

)x,a()x( :to subject

:minimize

⎪⎩

⎪⎨⎧

≥∑∑

One variable wi for each basis function Polynomially many LP variables

One constraint for every state and action

Factored MDP

Plumbingi

Paintingi

Plumbingi’

Paintingi’

Required Tasks

Dependent Tasks

Agent 2Plumbing, Painting

Agent 1Foundation, Electricity, Plumbing

Agent 3 Electricity,

Painting

Agent 4Decoration

[Schweitzer and Seidmann ‘85]

[Guestrin et al. ’01]

Rule-based variable elim. Exponentially smaller LP than table-based!

1.0:32 xaa ∧∧

3:43 xaa ∧∧

3:421 xaaa ∧∧∧

5:21 xaa ∧∧1:31 xaa ∧∧

7:6 xa ∧4:51 xaa ∧∧2:65 xaa ∧∧ 3:61 xaa ∧∧

1.0:32 aa ∧

3:43 aa ∧

3:421 aaa ∧∧

5:21 aa ∧

7:6a4:51 aa ∧2:65 aa ∧

Instantiate current state: x = true

B Eliminate Variable A1

Local MaximizationA4 A2

1:4 xa ∧ 1:4a

1.0:32 aa ∧

3:43 aa ∧

7:6a4:5a

2:65 aa ∧

4:51 aa ∧5:21 aa ∧

3:421 aaa ∧∧

1.0:32 aa ∧

3:43 aa ∧

3:421 aaa ∧∧

5:21 aa ∧

7:6a4:51 aa ∧2:65 aa ∧

Outline

Given long-term utilities i Qi(x,a) Local message passing computes maximizing

action Variable coordination structure

Long-term planning to obtain i Qi(x,a) Linear programming approach Exploit context-specific structure

[Bellman et al. ‘63], [Tsitsiklis & Van Roy ’96], [Koller & Parr ’99,’00], [Guestrin et al. ’01]

∑+='

)(),|'(),(),(x

xVaxxPaxRaxQ γ

Factored Value function V = wi hi

Factored Q function Q = Qi

Foundation ! {Electricity, Plumbing} ! Painting ! Decoration

2 Agents, 1 house Agent 1 = {Foundation, Electricity, Plumbing} Agent 2 = {Plumbing, Painting and Decoration}

4 Agents, 2 houses Agent 1 = {Painting, Decoration}; moves Agent 2 = {Foundation, Electricity, Plumbing, Painting} house 1 Agent 3 = {Foundation, Electricity} house 2 Agent 4 = {Plumbing, Decoration} house 2

Example 1:

Example 2:

Actual value of resulting policies Actual value of resulting policies

Our rule-based approach

Apricodd

Algorithm based on Linear programming Value iteration

Types of independence exploited

Additive and context-specific

Only context-specific

“Basis function” representation

Specified by user Determined by algorithm

Introduction Context-Specific Coordination, Given Qi’s Long-Term Planning, Computing Qi’s Experimental Results

Use Coordination graph [Guestrin et al. ’01]

Use variable elimination for maximization: [Bertele & Brioschi ‘72]

Limited communication for optimal action choice

Comm. bandwidth = induced width of coord. graph

Here we need only 23, instead of 63 sum operations.

),(),(),(max321312211

AAgAAQAAQA A A

),(),(max),(),(max 424433312211,

AAQAAQAAQAAQA A A A

),(),(),(),(max424433312211

,,, 4321

AAQAAQAAQAAQA A A A

Computing Maximizing Action: Coordination Graph

For every action of A2 and A3,

maximum value for A4

hi and Qi depend on small sets of variables and actions

Polynomial-time algorithm generates compact LP

),()( :subject to⎪⎩

⎪⎨⎧

≥∑∑ax

xaQxhwi

ii )(),( :subject to max0

,⎩⎨⎧

−∑≥i

xhwxaQ

[ ]),(),(max),(),(max0 4321,,

DBfDCfCAfBAfDCBA

+++≥

),(),(

),(),(max0

),(121

DBfDCfg

gCAfBAf

a1a1 a4a4 a2a2 a3a3 context-specific multiagent coordination and planning with factored mdps carlos...

Documents

ay jull a1a1 ay ho – ngongoi a1a1 cavities in canterbury...

a1a1 resolver based punch press · pdf file a1a1...

lenovo® thinkserver® ts140 -...

a0a0 a1a1 cone de luz tipo tempo tipo espaço fig. 2.1

oe hoja-53 a1a1).pdf · title: oe_hoja-53 a1 author: belen...

geological...

a1a1 muebles (no answers)

45° a a a a a 2 60° 30° a 2a2a a3a3 special triangles...

analysis of the nmssm higgs decay h1 a1a1 with atlas ... ·...

informationen zur fahrerlaubnis klasse a1a1.pdfhaben wir die...

master of advanced studies in real...

ifmt - instituto federal de educaÇÃo, ciÊncia e...

ffff ttftt 9tt9ftttt9f99tttt9f99tft9t999999999 ·...

module 4, recitation 2 concept problems. f a1a1 m1m1 f m2m2...

Скачать ГОСТ Р 50944-96 Снегоходы....

design and analysis of prerequisite algorithms...17cs304...

halloween - rinderzucht tirol€¦ · halloween de 06...

estrutura eletrônica e “energética” de superfícies...

workshop 118 on wavelet application in transportation...

junior cycle business studies: first...