perfect recall:

Post on 22-Feb-2016

46 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Belief Propagation for Structured Decision Making. c 1 c 4 d 1. c 1 c 2 d 2. abc. bcd. ab. c 3 d 1. Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine. c 1. d 1. e. d. abe. ed. d 1. d 1. d 2. d 2. c 4 d 2 d 3. bc. c 2 c 3 d 3. c 2. - PowerPoint PPT Presentation

TRANSCRIPT

Perfect recall:• Every decision node observes all earlier decision nodes and their parents (along a “temporal” order)

• Sum-max-sum rule (dynamical programming):

• Perfect recall is unrealistic: memory limit, decentralized systems

Variational methods:• Log-partition function duality:

• Junction graph BP: approximating and

Belief Propagation for Structured Decision Making Qiang Liu Alexander Ihler

Department of Computer Science, University of California, Irvine

AbstractVariational inference methods such as loopy BP have revolutionized inference abilities on graphical models.

Influence diagrams (or decision networks) are extension of graphical models for representing structured decision making problems.

Our contribution:• A general variational framework for solving influence diagrams• A junction graph belief propagation for IDs with an intuitive interpretation and strong theoretical guarantees • A convergent double-loop algorithm• Significant empirical improvement over the baseline algorithm

Variational Framework for structured decision

Influence Diagram

Graphical Models and Variational MethodsGraphical models:• Factors & exponential family form

• Graphical representations: Bayes nets, Markov random fields …

Inference: answering queries about graphical models

Our Algorithms

Experiments

Junction graph belief propagation for MEU:• Construct junction graph over the augmented distribution

Main result:

• Intuition: the last term encourages policies to be deterministic • Perfect recall convex optimization (easier)• Imperfect recall non-convex optimization (harder)

Bethe-Kikuchi approximation: locally consistent polytopeed

abc bcd

abe

d

bc

e

ab

b

e

a c

d

Loopy Junction graph

Influence diagram:• Chance nodes (C):

Augmented distribution:

Maximum expected utility (MEU):

Imperfect recall:• No closed form solution• Dominant algorithm: single policy updating (SPU), with policy-by-policy optimality

If is the maximum, the optimal strategy is Causes policies to be deterministic

Significance:• Enables converting arbitrary variational methods to MEU algorithms • “Integrates” the policy evaluation and policy improvement steps (avoiding expensive inner loops)

c1c4d1

c1c2d2

c3d1

c2c3d3 c4d2d3

Influence diagram Augmented distribution (factor graph)

Junction graph

• For each decision node , identify a unique cluster (called a decision cluster) that includes

Decision cluster of d1

Normal cluster

• Message passing algorithm ( )Sum-messages (from normal clusters):

MEU-messages (from decision clusters):

Optimal policies:

• Strong local optimality: provably better than SPU

Convergent algorithm by proximal point method:• Iteratively optimize a smoothed objective,

Diagnostic network (UAI08 inference challenge):

e.g., calculating (log) partition function:

Decentralized Sensor network:

Conditional probability:

Decision rule:

Global utility function: Local utility function:

• Decision nodes (D):

• Utility nodes (U):

or

d1 d2

u

d1 d2

u

Perfect recall Imperfect recall

Additive

d1 d2 utility

+1 +1 2-1 -1 1+1 -1 0-1 +1 0

Toy example:

Multiplicative

Weather

Activity

Forecast

Happiness

d3d2

u

c2 c3

d1

c4

c1

d3

d2

c2

c3

d1

c4

c1

top related