approximate solutions for factored dec-pomdps with many...
TRANSCRIPT
May 8, 2013 1 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Approximate Solutions for FactoredDec-POMDPs with Many Agents
Frans A. Oliehoek, Shimon Whiteson, & Matthijs T.J. Spaan
AAMAS, Wednesday May 8, 2013
May 8, 2013 2 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Visual Roadmap
resultsresults
modelmodelbackground: FSPCbackground: FSPC
Factored FSPCFactored FSPC
practical factored FFSPCpractical factored FFSPC
May 8, 2013 3 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Multiagent decision making under Uncertainty
Outcome Uncertainty
Partial Observability
Multiagent Systems: uncertainty about others
May 8, 2013 4 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Formal Model A Dec-POMDP
n agents S – set of states A – set of joint actions PT – transition function
O – set of joint observations PO – observation function
R – reward function h – horizon (finite)
⟨S , A , PT ,O , PO , R ,h⟩
a=⟨a1,a2, ... ,an⟩
o=⟨o1,o2, ... , on⟩
P(s '∣s , a)
P(o∣a , s ')
R (s , a)=E s ' R(s ,a , s ' )
May 8, 2013 5 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Factored Dec-POMDPs
1 2 3 4
h4
h1
h2
h3
t
state variables or 'factors'
E.g., 'Firefighting Graph' H houses h1...hH
H-1 agents actions: left or right observations: flames or not
May 8, 2013 6 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Factored Dec-POMDPs
h4
h1
h2
h3
h1'
h2'
h3'
h4'
t t+1
CPTs(for each factor)
1 2 3 4
E.g., 'Firefighting Graph' H houses h1...hH
H-1 agents actions: left or right observations: flames or not
May 8, 2013 7 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Factored Dec-POMDPs
h4
h1
h2
h3
a1
h1'
h2'
h3'
h4'
a2
a3
t t+1
CPTs(for each factor)
1 2 3 4
E.g., 'Firefighting Graph' H houses h1...hH
H-1 agents actions: left or right observations: flames or not
May 8, 2013 8 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Factored Dec-POMDPs
h4
h1
h2
h3
a1
h1'
h2'
h3'
h4'
o1
a2o2
a3o3
t t+1
CPTs(for each factor)
1 2 3 4
E.g., 'Firefighting Graph' H houses h1...hH
H-1 agents actions: left or right observations: flames or not
May 8, 2013 9 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Factored Dec-POMDPs
h4
h1
h2
h3
a1
h1'
h2'
h3'
h4'
o1
a2o2
a3o3
t t+1
reward for house 1
R1
1 2 3 4
E.g., 'Firefighting Graph' H houses h1...hH
H-1 agents actions: left or right observations: flames or not
May 8, 2013 10 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Factored Dec-POMDPs
h4
h1
h2
h3
a1
h1'
h2'
h3'
h4'
o1
a2o2
a3o3
t t+1
R1
Expected reward for house 1
1 2 3 4
E.g., 'Firefighting Graph' H houses h1...hH
H-1 agents actions: left or right observations: flames or not
May 8, 2013 11 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Background: FSPC – 1
Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC)
for each t=0,...,h-1 compute best joint decision rule δt
given past joint policy φt
May 8, 2013 12 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Background: FSPC – 1
Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC)
for each t=0,...,h-1 compute best joint decision rule δt
given past joint policy φt
φ0
( )
May 8, 2013 13 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Background: FSPC – 1
Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC)
for each t=0,...,h-1 compute best joint decision rule δt
given past joint policy φt
φ0
δ0
( )
May 8, 2013 14 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Background: FSPC – 1
Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC)
for each t=0,...,h-1 compute best joint decision rule δt
given past joint policy φt
φ1
φ0
( )
(δ0)
May 8, 2013 15 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Background: FSPC – 1
Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC)
for each t=0,...,h-1 compute best joint decision rule δt
given past joint policy φt
φ1
φ0
( )
(δ0)
δ1
May 8, 2013 16 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Background: FSPC – 1
Goal: scalability w.r.t. #agents Forward-sweep policy computation (FSPC)
for each t=0,...,h-1 compute best joint decision rule δt
given past joint policy φt
φ1
φ2
φ0
( )
(δ0)
(δ0,δ1)
May 8, 2013 17 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Background: FSPC – 2
As collaborative Bayesian games (CBGs) agents, actions types θi histories↔
probabilities: P(θ) payoffs: Q(θ,a)
δit(θ⃗i
t)=ait
B(φ1)
B(φ2)
B(φ0)
φ1
φ2
φ0
Problem at stage t select δ=<δ1,...,δn>
δi : histories actions→
May 8, 2013 18 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Background: CBGs
As collaborative Bayesian games (CBGs) agents, actions types θi histories↔
probabilities: P(θ) payoffs: Q(θ,a)
B(φ1)
B(φ2)
B(φ0)
H2 H3
... ...
... ...
H2 H3
H1 +2 -1
H2 ... ...
... ...
... ...
H1 … ...
H2 ... ...
No FlamesFlames
No Flames
Flames
ag. 2
agent 1
1 2 3
May 8, 2013 19 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
This paper: Factored FSPC
if value function factored
→ replace CBGs with Collaborative graphical Bayesian games (CGBGs)
Q( x , θ⃗ , a)=∑eQe(x e , θ⃗e , ae)
Factored FSPC – Basic idea:
exploit independence between agents in FSPC
Factored FSPC – Basic idea:
exploit independence between agents in FSPC
May 8, 2013 20 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
if value function factored
→ replace CBGs with Collaborative graphical Bayesian games (CGBGs)
This paper: Factored FSPC
agent 1 agent 3
agent 2
Total payoff is sum of components
May 8, 2013 21 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Factored FSPC for Many Agents
Improving scalability w.r.t. the number of agents...
1) Approximate structure of Q* Predetermined scope structure
2) Compute CGBG payoff functions Transfer Planning (TP)
3) Inference techniques to construct CGBGs Extension of factored frontier [Murphy&Weiss UAI 2001]
4) Efficient solutions of CGBGs Max-plus to ATI-FG [OWS UAI 2012]
B(φ1)
B(φ2)
B(φ0)
May 8, 2013 22 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Factored FSPC for Many Agents
Improving scalability w.r.t. the number of agents...
1) Approximate structure of Q* Predetermined scope structure
2) Compute CGBG payoff functions Transfer Planning (TP)
3) Inference techniques to construct CGBGs Extension of factored frontier [Murphy&Weiss UAI 2001]
4) Efficient solutions of CGBGs Max-plus to ATI-FG [OWS UAI 2012]
B(φ1)
B(φ2)
B(φ0)
Rest of this talk
see paper
May 8, 2013 23 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Accumulation of Reward over Time
a1
h1
h2
h3
h4
a2
a3
a1
h1
h2
h3
h4
o1
o2
o3
a2
a3
a1
h1
h2
h3
h4
o1
o2
o3
a2
a3
t=0 t=1 t=2
R1Qe=1(xe , θ⃗e , ae)
May 8, 2013 24 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Accumulation of Reward over Time
a1
h1
h2
h3
h4
a2
a3
a1
h1
h2
h3
h4
o1
o2
o3
a2
a3
a1
h1
h2
h3
h4
o1
o2
o3
a2
a3
t=0 t=1 t=2
R1
May 8, 2013 25 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Accumulation of Reward over Time
a1
h1
h2
h3
h4
a2
a3
a1
h1
h2
h3
h4
o1
o2
o3
a2
a3
a1
h1
h2
h3
h4
o1
o2
o3
a2
a3
t=0 t=1 t=2
R1
Q1
Still factored, but components not local!Still factored, but components not local!
May 8, 2013 26 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
1) Predetermined Scope Structure Use smaller scopes
a1
h1
h2
h3
h4
a2
a3
a1
h1
h2
h3
h4
o1
o2
o3
a2
a3
a1
h1
h2
h3
h4
o1
o2
o3
a2
a3
t=0 t=1 t=2
Q1
May 8, 2013 27 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
a1
h1
h2
h3
h4
a2
a3
a1
h1
h2
h3
h4
o1
o2
o3
a2
a3
a1
h1
h2
h3
h4
o1
o2
o3
a2
a3
t=0 t=1 t=2
Q1
Q3
Q2
Q4
1) Predetermined Scope Structure Use smaller scopes
May 8, 2013 28 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Define a source problem for each component involving fewer agents
Solve those exactly or approximately (QMDP, etc.)
2) Transfer Planning
a1
h1
h2
h3
h4
a2
a3
a1
h1
h2
h3
h4
o1
o2
o3
a2
a3
a1
h1
h2
h3
h4
o1
o2
o3
a2
a3
t=0 t=1 t=2
Q3
Q2
1 2 3
2 3 4
May 8, 2013 29 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Results – Compared to Optimal
1 2 3 4
Fact. FSPC + TP + QBG performs optimallyFact. FSPC + TP + QBG performs optimally
May 8, 2013 30 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Results – vs. Approximate
FFSCP: TP vs ADP, NR Non-factored FSPC, DICE [Oliehoek et al. 2008 Informatica]
TP achieves (near-) best value....
TP achieves (near-) best value.... ...and superior
scaling behavior...and superior
scaling behavior
May 8, 2013 31 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Results – Many Agents
May 8, 2013 32 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
Conclusions Factored FSPC with transfer planning:
approximates factored Dec-POMDPs with multiple abstractions involving subsets of agents
Unprecedented scalability for this class results up to 1000 agents
Future work: scale to higher horizon understand such abstractions
empirically verified near-optimal quality formal understanding influence-based abstraction [AAAI' 12]
May 8, 2013 33 / 33Oliehoek, Whiteson & Spaan - Factored Dec-POMDPs with Many Agents
References R. Emery-Montemerlo, G. Gordon, J. Schneider, and S. Thrun. Approximate solutions for
partially observable stochastic games with common payoffs. In AAMAS, 2004. K. P. Murphy and Y. Weiss. The factored frontier algorithm for approximate inference in
DBNs. In UAI, 2001. F. A. Oliehoek, J. F. Kooi, and N. Vlassis. The cross-entropy method for policy search in
decentralized POMDPs. Informatica, 32:341–357, 2008 F. A. Oliehoek, M. T. J. Spaan, S. Whiteson, and N. Vlassis. Exploiting locality of interaction in
factored Dec-POMDPs. In AAMAS, 2008. F. A. Oliehoek, S. Witwicki, and L. P. Kaelbling. Influence-Based Abstraction for Multiagent
Systems. In AAAI, 2012. F. A. Oliehoek, S. Whiteson, and M. T. J. Spaan. Exploiting structure in cooperative Bayesian
games. In UAI, 2012. D. Szer, F. Charpillet, and S. Zilberstein. MAA*: A heuristic search algorithm for solving
decentralized POMDPs. In UAI, 2005.