probabilistic graphical models - epfl€¦ · probabilistic graphical models lecture 2: graphical...
TRANSCRIPT
-
Probabilistic Graphical Models
Lecture 2: Graphical Models. Belief Propagation
Volkan Cevher, Matthias SeegerEcole Polytechnique Fédérale de Lausanne
30/9/2011
(EPFL) Graphical Models 30/9/2011 1 / 30
-
Outline
1 Graphical Models
2 Belief Propagation
(EPFL) Graphical Models 30/9/2011 2 / 30
-
Graphical Models
Literature
Excellent book about graphical models and belief propagation, writtenby one of the pioneers in these topics:
Pearl, J.Probabilistic Reasoning in Intelligent Systems (1990)
(EPFL) Graphical Models 30/9/2011 3 / 30
-
Graphical Models
The Need to Factorize
Variables x1, x2, . . . , xn
P(x1) =X
x2
· · ·X
xn
P(x1, x2, . . . , xn)
Marginalization: Exponential timeStorage: Exponential space ) Need factorization
Independence?But probabilistic modelling is about dependencies!
Conditional independenceDependencies may have simple structure
(EPFL) Graphical Models 30/9/2011 4 / 30
-
Graphical Models
The Need to Factorize
Variables x1, x2, . . . , xn
P(x1) =X
x2
· · ·X
xn
P(x1, x2, . . . , xn)
Marginalization: Exponential timeStorage: Exponential space ) Need factorization
Independence?But probabilistic modelling is about dependencies!Conditional independenceDependencies may have simple structure
(EPFL) Graphical Models 30/9/2011 4 / 30
-
Graphical Models
Towards Bayesian NetworksTracking a fly
Path pretty random
Positions not independentBut conditionally independent(Markovian)
(EPFL) Graphical Models 30/9/2011 5 / 30
-
Graphical Models
Towards Bayesian NetworksTracking a fly
Path pretty random
Positions not independentBut conditionally independent(Markovian)
(EPFL) Graphical Models 30/9/2011 5 / 30
-
Graphical Models
Towards Bayesian NetworksTracking a fly
Path pretty randomPositions not independent
But conditionally independent(Markovian)
(EPFL) Graphical Models 30/9/2011 5 / 30
-
Graphical Models
Towards Bayesian NetworksTracking a fly
Path pretty randomPositions not independent
But conditionally independent(Markovian)
(EPFL) Graphical Models 30/9/2011 5 / 30
-
Graphical Models
Towards Bayesian NetworksTracking a fly
Path pretty randomPositions not independent
But conditionally independent(Markovian)
(EPFL) Graphical Models 30/9/2011 5 / 30
-
Graphical Models
Towards Bayesian NetworksTracking a fly
Path pretty randomPositions not independentBut conditionally independent(Markovian)
(EPFL) Graphical Models 30/9/2011 5 / 30
-
Graphical Models
Towards Bayesian NetworksTracking a fly
Path pretty randomPositions not independentBut conditionally independent(Markovian)
Remember
P(x1, . . . , xn) =P(x1)P(x2|x1) . . .P(xn|xn�1, . . . , x1) ?
Here: P(xn|xn�1, . . . , x1) = P(xn|xn�1)) Linear storageCausal factorization) Bayesian networks
(EPFL) Graphical Models 30/9/2011 5 / 30
-
Graphical Models
Bayesian Networks (Directed Graphical Models)Causal factorization:
P(x1, . . . , xn) =nY
i=1
P(xi |x⇡i )
Bayesian network(aka directed graphical model,aka causal network):
Graphical representation ofancestry [DAG]P(xi |x⇡i ): Conditionalprobability table (CPT)
Conditional IndependenceA?B | C , P(A,B | C) = P(A | C)P(B | C) , P(A | C,B) = P(A | C)
(EPFL) Graphical Models 30/9/2011 6 / 30
-
Graphical Models
Did It Rain Tonight?
(EPFL) Graphical Models 30/9/2011 7 / 30
-
Graphical Models
Monty Hall Problem
Let’s make a deal!Door with car (hidden)First choice of yours (remains closed)Host opens door with goat, H 6= F ,DDo you switch?
“Intuition”: Fifty-fifty.F ,H give no information. He would be stupid, wouldn’t he?
(EPFL) Graphical Models 30/9/2011 8 / 30
-
Graphical Models
Monty Hall Problem
Let’s make a deal!Door with car (hidden)First choice of yours (remains closed)Host opens door with goat, H 6= F ,DDo you switch?
“Intuition”: Fifty-fifty.F ,H give no information. He would be stupid, wouldn’t he?
(EPFL) Graphical Models 30/9/2011 8 / 30
-
Graphical Models
Winning with Bayes (I)
F
D
HI
oor withcar
irstchoice
ostopens
D=F?
Intuition “H does not tell anything” correct inprinciple. But about what?Add latent I = I{D=F} = I{first choice correct}
Gut feeling: F ,H no information about I.“He will not tell me whether I am correct”.P(I|F ,H) = P(I).Will use Bayes to see that.
OK, but P(Switch wins) = P(I = 0|F ,H) = P(I = 0) = 2/3!
Bayes makes you switch anddouble your chance ofwinning!
(EPFL) Graphical Models 30/9/2011 9 / 30
-
Graphical Models
Winning with Bayes (I)
F
D
HI
oor withcar
irstchoice
ostopens
D=F?
Intuition “H does not tell anything” correct inprinciple. But about what?Add latent I = I{D=F} = I{first choice correct}Gut feeling: F ,H no information about I.“He will not tell me whether I am correct”.P(I|F ,H) = P(I).Will use Bayes to see that.
OK, but P(Switch wins) = P(I = 0|F ,H) = P(I = 0) = 2/3!
Bayes makes you switch anddouble your chance ofwinning!
(EPFL) Graphical Models 30/9/2011 9 / 30
-
Graphical Models
Winning with Bayes (I)
F
D
HI
oor withcar
irstchoice
ostopens
D=F?
Intuition “H does not tell anything” correct inprinciple. But about what?Add latent I = I{D=F} = I{first choice correct}Gut feeling: F ,H no information about I.“He will not tell me whether I am correct”.P(I|F ,H) = P(I).Will use Bayes to see that.
OK, but P(Switch wins) = P(I = 0|F ,H) = P(I = 0) = 2/3!
Bayes makes you switch anddouble your chance ofwinning!
(EPFL) Graphical Models 30/9/2011 9 / 30
-
Graphical Models
Winning with Bayes (I)
F
D
HI
oor withcar
irstchoice
ostopens
D=F?
Intuition “H does not tell anything” correct inprinciple. But about what?Add latent I = I{D=F} = I{first choice correct}Gut feeling: F ,H no information about I.“He will not tell me whether I am correct”.P(I|F ,H) = P(I).Will use Bayes to see that.
OK, but P(Switch wins) = P(I = 0|F ,H) = P(I = 0) = 2/3!
Bayes makes you switch anddouble your chance ofwinning!
(EPFL) Graphical Models 30/9/2011 9 / 30
-
Graphical Models
Winning with Bayes (II)
F
D
HI
oor withcar
irstchoice
ostopens
D=F?
To show: P(I|H,F ) = P(I).P(I|F ) = P(I),because D,F independent.
P(I|H,F ) = P(I|F ), I?H|F , H?I|F, P(H|I,F ) = P(H|F )(independence is symmetric)
P(H|F , I = 1) = (1/2)I{H 6=F}If F = D, host picks random goat
P(H|F , I = 0) = (1/2)I{H 6=F}D,F independent, and H 6= D,F
Working with Graphical ModelsIntermediate between lots of headscratching and doing all sumsPowerful division of inference in manageable, local steps
(EPFL) Graphical Models 30/9/2011 10 / 30
-
Graphical Models
Winning with Bayes (II)
F
D
HI
oor withcar
irstchoice
ostopens
D=F?
To show: P(I|H,F ) = P(I).P(I|F ) = P(I),because D,F independent.P(I|H,F ) = P(I|F ), I?H|F , H?I|F, P(H|I,F ) = P(H|F )(independence is symmetric)
P(H|F , I = 1) = (1/2)I{H 6=F}If F = D, host picks random goat
P(H|F , I = 0) = (1/2)I{H 6=F}D,F independent, and H 6= D,F
Working with Graphical ModelsIntermediate between lots of headscratching and doing all sumsPowerful division of inference in manageable, local steps
(EPFL) Graphical Models 30/9/2011 10 / 30
-
Graphical Models
Winning with Bayes (II)
F
D
HI
oor withcar
irstchoice
ostopens
D=F?
To show: P(I|H,F ) = P(I).P(I|F ) = P(I),because D,F independent.P(I|H,F ) = P(I|F ), I?H|F , H?I|F, P(H|I,F ) = P(H|F )(independence is symmetric)
P(H|F , I = 1) = (1/2)I{H 6=F}If F = D, host picks random goat
P(H|F , I = 0) = (1/2)I{H 6=F}D,F independent, and H 6= D,F
Working with Graphical ModelsIntermediate between lots of headscratching and doing all sumsPowerful division of inference in manageable, local steps
(EPFL) Graphical Models 30/9/2011 10 / 30
-
Graphical Models
Winning with Bayes (II)
F
D
HI
oor withcar
irstchoice
ostopens
D=F?
To show: P(I|H,F ) = P(I).P(I|F ) = P(I),because D,F independent.P(I|H,F ) = P(I|F ), I?H|F , H?I|F, P(H|I,F ) = P(H|F )(independence is symmetric)
P(H|F , I = 1) = (1/2)I{H 6=F}If F = D, host picks random goat
P(H|F , I = 0) = (1/2)I{H 6=F}D,F independent, and H 6= D,F
Working with Graphical ModelsIntermediate between lots of headscratching and doing all sumsPowerful division of inference in manageable, local steps
(EPFL) Graphical Models 30/9/2011 10 / 30
-
Graphical Models
Winning with Bayes (II)
F
D
HI
oor withcar
irstchoice
ostopens
D=F?
To show: P(I|H,F ) = P(I).P(I|F ) = P(I),because D,F independent.P(I|H,F ) = P(I|F ), I?H|F , H?I|F, P(H|I,F ) = P(H|F )(independence is symmetric)
P(H|F , I = 1) = (1/2)I{H 6=F}If F = D, host picks random goat
P(H|F , I = 0) = (1/2)I{H 6=F}D,F independent, and H 6= D,F
Working with Graphical ModelsIntermediate between lots of headscratching and doing all sumsPowerful division of inference in manageable, local steps
(EPFL) Graphical Models 30/9/2011 10 / 30
-
Graphical Models
Why Graphical Models?
1 Easy way of communicating ideas about dependencies, models2 Precise semantics: Conditional independence constraints on
distributions. Efficient algorithms for testing these3 Lead to large savings in computations (belief propagation)
(EPFL) Graphical Models 30/9/2011 11 / 30
-
Graphical Models
Graphical Models in PracticeDependency structures, and efficient ways to propagate information orconstraints, are fundamental.
Coding / Information TheoryLDPC codes and BP decodingrevolutionized this field(resurrection of Gallagercodes)Used from deep spacecommunication (Mars rovers)over satellite transmission toCD players / hard drives
Courtesy MacKay: Information Theory . . . (2003)
(EPFL) Graphical Models 30/9/2011 12 / 30
-
Graphical Models
Graphical Models in PracticeDependency structures, and efficient ways to propagate information orconstraints, are fundamental.
Expert systems done rightQMR-DT: Invert causalnetwork for helping medicaldiagnosesHugin: Advanced decisionsupport (Lauritzen)
http://www.hugin.com/
Promedas: Medical diagnosticadvisory system(SNN Nimegen)
http://www.promedas.nl/
(EPFL) Graphical Models 30/9/2011 12 / 30
-
Graphical Models
Graphical Models in PracticeDependency structures, and efficient ways to propagate information orconstraints, are fundamental.
Computer Vision:Markov Random Fields
Denoising, super-resolution,restoration (early work byBesag)Depth / reconstruction fromstereo, matching,correspondencesSegmentation, matting,blending, stitching, impainting,. . .
Courtesy MSR
(EPFL) Graphical Models 30/9/2011 12 / 30
-
Graphical Models
Conditional Independence Semantics
Graphical model formally equivalent to long (finite) list ofconditional independence constraints:xA1?xB1 | xC1 , xA2?xB2 | xC2 , . . . Which do you prefer?
Graphs not just simpler for us:Linear-time algorithm to test such constraints (Bayes ball)Distribution consistent with graph iff all CI constraints are met.P(x1)P(x2) . . .P(xn): Consistent with all graphsHow do I see whether xA?xB | xC from the graph?Graph separation: If paths A $ B blocked by CFor Bayesian networks (directed graphical models): d-separation.) You’ll find out in the exercises!
(EPFL) Graphical Models 30/9/2011 13 / 30
-
Graphical Models
Conditional Independence Semantics
Graphical model formally equivalent to long (finite) list ofconditional independence constraints:xA1?xB1 | xC1 , xA2?xB2 | xC2 , . . . Which do you prefer?Graphs not just simpler for us:Linear-time algorithm to test such constraints (Bayes ball)
Distribution consistent with graph iff all CI constraints are met.P(x1)P(x2) . . .P(xn): Consistent with all graphsHow do I see whether xA?xB | xC from the graph?Graph separation: If paths A $ B blocked by CFor Bayesian networks (directed graphical models): d-separation.) You’ll find out in the exercises!
(EPFL) Graphical Models 30/9/2011 13 / 30
-
Graphical Models
Conditional Independence Semantics
Graphical model formally equivalent to long (finite) list ofconditional independence constraints:xA1?xB1 | xC1 , xA2?xB2 | xC2 , . . . Which do you prefer?Graphs not just simpler for us:Linear-time algorithm to test such constraints (Bayes ball)Distribution consistent with graph iff all CI constraints are met.P(x1)P(x2) . . .P(xn): Consistent with all graphs
How do I see whether xA?xB | xC from the graph?Graph separation: If paths A $ B blocked by CFor Bayesian networks (directed graphical models): d-separation.) You’ll find out in the exercises!
(EPFL) Graphical Models 30/9/2011 13 / 30
-
Graphical Models
Conditional Independence Semantics
Graphical model formally equivalent to long (finite) list ofconditional independence constraints:xA1?xB1 | xC1 , xA2?xB2 | xC2 , . . . Which do you prefer?Graphs not just simpler for us:Linear-time algorithm to test such constraints (Bayes ball)Distribution consistent with graph iff all CI constraints are met.P(x1)P(x2) . . .P(xn): Consistent with all graphsHow do I see whether xA?xB | xC from the graph?Graph separation: If paths A $ B blocked by C
For Bayesian networks (directed graphical models): d-separation.) You’ll find out in the exercises!
(EPFL) Graphical Models 30/9/2011 13 / 30
-
Graphical Models
Conditional Independence Semantics
Graphical model formally equivalent to long (finite) list ofconditional independence constraints:xA1?xB1 | xC1 , xA2?xB2 | xC2 , . . . Which do you prefer?Graphs not just simpler for us:Linear-time algorithm to test such constraints (Bayes ball)Distribution consistent with graph iff all CI constraints are met.P(x1)P(x2) . . .P(xn): Consistent with all graphsHow do I see whether xA?xB | xC from the graph?Graph separation: If paths A $ B blocked by CFor Bayesian networks (directed graphical models): d-separation.) You’ll find out in the exercises!
(EPFL) Graphical Models 30/9/2011 13 / 30
-
Graphical Models
Undirected Graphical Models (Markov Random Fields)
Bayesian Networks: Describe CIs with directed graphs (DAGs)Markov Random Fields: Describe CIs with undirected graphs
CI semantics of undirected models: Really just graph separation
(EPFL) Graphical Models 30/9/2011 14 / 30
-
Graphical Models
Undirected Graphical Models (Markov Random Fields)
Bayesian Networks: Describe CIs with directed graphs (DAGs)Markov Random Fields: Describe CIs with undirected graphsCI semantics of undirected models: Really just graph separation
(EPFL) Graphical Models 30/9/2011 14 / 30
-
Graphical Models
Undirected Graphical Models (II)
Why two frameworks?Each can capture setups the other cannotMore important: In practice, some problems are much easier toparameterize (therefore: to learn) as MRFs, others much easier asBayes nets
How do distributions P for MRF graph G look like?Hammersley / Clifford:
Maximal cliques (completely connected parts) Cj of GP(x ) consistent with MRF G,
P(x ) = Z�1Y
j
�j(xCj ), Z :=X
x
Y
j
�j(xCj )
with potentials �j(xCj ) � 0. Z : Partition function.Potentials need not normalize to 1
(EPFL) Graphical Models 30/9/2011 15 / 30
-
Graphical Models
Undirected Graphical Models (II)
Why two frameworks?Each can capture setups the other cannotMore important: In practice, some problems are much easier toparameterize (therefore: to learn) as MRFs, others much easier asBayes nets
How do distributions P for MRF graph G look like?Hammersley / Clifford:
Maximal cliques (completely connected parts) Cj of GP(x ) consistent with MRF G,
P(x ) = Z�1Y
j
�j(xCj ), Z :=X
x
Y
j
�j(xCj )
with potentials �j(xCj ) � 0. Z : Partition function.Potentials need not normalize to 1
(EPFL) Graphical Models 30/9/2011 15 / 30
-
Graphical Models
Undirected Graphical Models (III)
(EPFL) Graphical Models 30/9/2011 16 / 30
-
Graphical Models
Directed vs. Undirected
Sampling x ⇠ P(x ):Always simple from Bayes net. Can be very hard for an MRF
Implicit, symmetrical knowledge? Little idea about causal links(pixels of image, correspondences)? MRFs more useful thenBottomline: Usually, one or the other is much more suitable.Better know well about both!
(EPFL) Graphical Models 30/9/2011 17 / 30
-
Graphical Models
Directed vs. Undirected
Sampling x ⇠ P(x ):Always simple from Bayes net. Can be very hard for an MRFImplicit, symmetrical knowledge? Little idea about causal links(pixels of image, correspondences)? MRFs more useful then
Bottomline: Usually, one or the other is much more suitable.Better know well about both!
(EPFL) Graphical Models 30/9/2011 17 / 30
-
Graphical Models
Directed vs. Undirected
Sampling x ⇠ P(x ):Always simple from Bayes net. Can be very hard for an MRFImplicit, symmetrical knowledge? Little idea about causal links(pixels of image, correspondences)? MRFs more useful thenBottomline: Usually, one or the other is much more suitable.Better know well about both!
(EPFL) Graphical Models 30/9/2011 17 / 30
-
Belief Propagation
Towards Efficient Marginalization
With sufficient Markovian CI constraints (directed or undirected):
P(x1, . . . , xn) /Y
j
�j(xNj ), |Nj |⌧ n
Can store that. But what about computation?
Short answer: It depends on global graph structure properties,beyond local factorization
(EPFL) Graphical Models 30/9/2011 18 / 30
-
Belief Propagation
Towards Efficient Marginalization
With sufficient Markovian CI constraints (directed or undirected):
P(x1, . . . , xn) /Y
j
�j(xNj ), |Nj |⌧ n
Can store that. But what about computation?Short answer: It depends on global graph structure properties,beyond local factorization
Storage: Linear in nComputation: Exponential in n1/2 [P6=NP]
(EPFL) Graphical Models 30/9/2011 18 / 30
-
Belief Propagation
Node Elimination
Chain:P(x1, . . . , x7) = �1(x1, x2)�2(x2, x3) . . .�6(x6, x7)
(EPFL) Graphical Models 30/9/2011 19 / 30
-
Belief Propagation
Node Elimination
Chain:P(x1, . . . , x7) = �1(x1, x2)�2(x2, x3) . . .�6(x6, x7)
X
x4
P(x1, . . . , x4, . . . , x7)
(EPFL) Graphical Models 30/9/2011 19 / 30
-
Belief Propagation
Node Elimination
Chain:P(x1, . . . , x7) = �1(x1, x2)�2(x2, x3) . . .�6(x6, x7)
X
x4
P(x1, . . . , x4, . . . , x7)
=�1(x1, x2)�2(x2, x3)
X
x4
�3(x3, x4)�4(x4, x5)
!�5(x5, x6)�6(x6, x7)
(EPFL) Graphical Models 30/9/2011 19 / 30
-
Belief Propagation
Node Elimination
Chain:P(x1, . . . , x7) = �1(x1, x2)�2(x2, x3) . . .�6(x6, x7)
X
x4
P(x1, . . . , x4, . . . , x7)
=�1(x1, x2)�2(x2, x3)
X
x4
�3(x3, x4)�4(x4, x5)
!�5(x5, x6)�6(x6, x7)
=�1(x1, x2)�2(x2, x3)M35(x3, x5)�5(x5, x6)�6(x6, x7)
(EPFL) Graphical Models 30/9/2011 19 / 30
-
Belief Propagation
Tree Graphs
(EPFL) Graphical Models 30/9/2011 20 / 30
-
Belief Propagation
Factor Graphs
Factor graphs: Yet another type of graphical model
Bipartite graph:variable / factor nodesNo probabilitysemanticsJust for derivingMarkovian propagationalgorithmsFactor graph = tree) Fast computation
(EPFL) Graphical Models 30/9/2011 21 / 30
-
Belief Propagation
Factor Graphs
Factor graphs: Yet another type of graphical model
Bipartite graph:variable / factor nodesNo probabilitysemanticsJust for derivingMarkovian propagationalgorithmsFactor graph = tree) Fast computation
Undirected GM! Factor graph: ImmediateDirected GM ! Factor graph: Easy exercise
(EPFL) Graphical Models 30/9/2011 21 / 30
-
Belief Propagation
Towards Belief Propagation
(EPFL) Graphical Models 30/9/2011 22 / 30
-
Belief Propagation
What is a Message?
(EPFL) Graphical Models 30/9/2011 23 / 30
-
Belief Propagation
What is a Message?
Formally: Directed potential over onevariableIntuition: Message T2 ! a:What T2 thinks xa should beNaive “definition”:
Product: All T2, and edge! aSum: All except xa
) Real definition recursive (G tree!)
(EPFL) Graphical Models 30/9/2011 24 / 30
-
Belief Propagation
What is a Message?
Formally: Directed potential over onevariableIntuition: Message T2 ! a:What T2 thinks xa should beNaive “definition”:
Product: All T2, and edge! aSum: All except xa
) Real definition recursive (G tree!)
Subtle points:Messages: Not conditional / marginal distributions of P.Message µT2!a(xa) has seen T2 only
Strictly speaking: Two types of messages: �! ⇤, ⇤!�) Understand idea, behind formalities
(EPFL) Graphical Models 30/9/2011 24 / 30
-
Belief Propagation
What is a Message?
Formally: Directed potential over onevariableIntuition: Message T2 ! a:What T2 thinks xa should beNaive “definition”:
Product: All T2, and edge! aSum: All except xa
) Real definition recursive (G tree!)
Subtle points:Messages: Not conditional / marginal distributions of P.Message µT2!a(xa) has seen T2 onlyStrictly speaking: Two types of messages: �! ⇤, ⇤!�) Understand idea, behind formalities
(EPFL) Graphical Models 30/9/2011 24 / 30
-
Belief Propagation
Message Passing: The RecipeMessage µT!a(xa) to a
1 Expand factor: (T1, b1),(T2, b2)
2 Product: Gather potentials
�j(xa, xb1 , xb2)All µ?!b1(xb1), exceptb1 �jµ?!b2(xb2) dito
3 Sum: Over xb1 , xb2
(EPFL) Graphical Models 30/9/2011 25 / 30
-
Belief Propagation
Message Passing: The RecipeMessage µT!a(xa) to a
1 Expand factor: (T1, b1),(T2, b2)
2 Product: Gather potentials
�j(xa, xb1 , xb2)All µ?!b1(xb1), exceptb1 �jµ?!b2(xb2) dito
3 Sum: Over xb1 , xb2
(EPFL) Graphical Models 30/9/2011 25 / 30
-
Belief Propagation
Message Passing: The RecipeMessage µT!a(xa) to a
1 Expand factor: (T1, b1),(T2, b2)
2 Product: Gather potentials�j(xa, xb1 , xb2)All µ?!b1(xb1), exceptb1 �jµ?!b2(xb2) dito
3 Sum: Over xb1 , xb2
(EPFL) Graphical Models 30/9/2011 25 / 30
-
Belief Propagation
Message Passing: The RecipeMessage µT!a(xa) to a
1 Expand factor: (T1, b1),(T2, b2)
2 Product: Gather potentials�j(xa, xb1 , xb2)All µ?!b1(xb1), exceptb1 �jµ?!b2(xb2) dito
3 Sum: Over xb1 , xb2
µT!a(xa) /X
xb1 ,xb2
�j(xab1b2)
0
@Y
T̃ :T1\b1
µT̃!b1(xb1)
1
A
0
@Y
T̃ :T2\b2
µT̃!b2(xb2)
1
A
(EPFL) Graphical Models 30/9/2011 25 / 30
-
Belief Propagation
Message Passing: The Idea
I can never remember these message passing equations.What I remember:
Messages are partial information, given part of graph
Message passing is information propagation
Product: PredictSum: Marginalize (cover your tracks)
Directed like filtering)We’ll see cases where
Qis not
Q, and
Pis not
P
Marginal distributions (our goal!) are obtained by combiningmessages$ combining information from all partsMP works on trees, because information cannot go around incycles
(EPFL) Graphical Models 30/9/2011 26 / 30
-
Belief Propagation
Message Passing: The Idea
I can never remember these message passing equations.What I remember:
Messages are partial information, given part of graphMessage passing is information propagation
Product: PredictSum: Marginalize (cover your tracks)
Directed like filtering)We’ll see cases where
Qis not
Q, and
Pis not
P
Marginal distributions (our goal!) are obtained by combiningmessages$ combining information from all partsMP works on trees, because information cannot go around incycles
(EPFL) Graphical Models 30/9/2011 26 / 30
-
Belief Propagation
Message Passing: The Idea
I can never remember these message passing equations.What I remember:
Messages are partial information, given part of graphMessage passing is information propagation
Product: PredictSum: Marginalize (cover your tracks)
Directed like filtering)We’ll see cases where
Qis not
Q, and
Pis not
P
Marginal distributions (our goal!) are obtained by combiningmessages$ combining information from all parts
MP works on trees, because information cannot go around incycles
(EPFL) Graphical Models 30/9/2011 26 / 30
-
Belief Propagation
Message Passing: The Idea
I can never remember these message passing equations.What I remember:
Messages are partial information, given part of graphMessage passing is information propagation
Product: PredictSum: Marginalize (cover your tracks)
Directed like filtering)We’ll see cases where
Qis not
Q, and
Pis not
P
Marginal distributions (our goal!) are obtained by combiningmessages$ combining information from all partsMP works on trees, because information cannot go around incycles
(EPFL) Graphical Models 30/9/2011 26 / 30
-
Belief Propagation
Belief Propagation: More than Node Elimination
Marginalization by message passing:
P(xa) =X
x\xa
P(x ) / �a(xa)Y
j2Na
µTj!a(xa)
Na : Factor nodes neighbouring a$ factors �j(xa, . . . )�a: Can be ⌘ 1
All marginals P(x1), P(x2), . . . ? Do this n times. Right?) NO! Do this twice only!) If you understand that, you’ve understood belief propagation
Belief Propagation on Trees
Message uniquely defined, independent of use, order ofcomputationMessage can be computed once all inputs received. Oncecomputed, it does not change anymoreCompute all messages (2 per edge)) All marginals, O(1) each
(EPFL) Graphical Models 30/9/2011 27 / 30
-
Belief Propagation
Belief Propagation: More than Node Elimination
Marginalization by message passing:
P(xa) =X
x\xa
P(x ) / �a(xa)Y
j2Na
µTj!a(xa)
Na : Factor nodes neighbouring a$ factors �j(xa, . . . )�a: Can be ⌘ 1All marginals P(x1), P(x2), . . . ? Do this n times. Right?) NO! Do this twice only!) If you understand that, you’ve understood belief propagation
Belief Propagation on Trees
Message uniquely defined, independent of use, order ofcomputationMessage can be computed once all inputs received. Oncecomputed, it does not change anymoreCompute all messages (2 per edge)) All marginals, O(1) each
(EPFL) Graphical Models 30/9/2011 27 / 30
-
Belief Propagation
Belief Propagation: More than Node Elimination
Marginalization by message passing:
P(xa) =X
x\xa
P(x ) / �a(xa)Y
j2Na
µTj!a(xa)
Na : Factor nodes neighbouring a$ factors �j(xa, . . . )�a: Can be ⌘ 1All marginals P(x1), P(x2), . . . ? Do this n times. Right?) NO! Do this twice only!) If you understand that, you’ve understood belief propagation
Belief Propagation on TreesMessage uniquely defined, independent of use, order ofcomputationMessage can be computed once all inputs received. Oncecomputed, it does not change anymoreCompute all messages (2 per edge)) All marginals, O(1) each
(EPFL) Graphical Models 30/9/2011 27 / 30
-
Belief Propagation
Implementation of Belief Propagation
Belief Propagation (Sum-Product) on Trees1 Designate node (any will do!) as root2 Inward pass: Compute messages leaves! root3 Outward pass: Compute messages root! leaves
(EPFL) Graphical Models 30/9/2011 28 / 30
-
Belief Propagation
Implementation of Belief Propagation
Belief Propagation (Sum-Product) on Trees1 Designate node (any will do!) as root2 Inward pass: Compute messages leaves! root3 Outward pass: Compute messages root! leaves
Messages can be normalized at will:
µT!a(xa) =X
xCj\a�j(xCj )
YCµT̃!b1(xb1) . . .
(EPFL) Graphical Models 30/9/2011 28 / 30
-
Belief Propagation
Implementation of Belief Propagation
Belief Propagation (Sum-Product) on Trees1 Designate node (any will do!) as root2 Inward pass: Compute messages leaves! root3 Outward pass: Compute messages root! leaves
Messages can be normalized at will:
µT!a(xa) = CX
xCj\a�j(xCj )
YµT̃!b1(xb1) . . .
(EPFL) Graphical Models 30/9/2011 28 / 30
-
Belief Propagation
Implementation of Belief Propagation
Belief Propagation (Sum-Product) on Trees1 Designate node (any will do!) as root2 Inward pass: Compute messages leaves! root3 Outward pass: Compute messages root! leaves
Avoiding underflow / overflow (yes, it does matter):Renormalize each message to sum to 1Better: Work in log domain (log-messages, log-potentials):Q! +P! logsumexp [careful with zeros!]
logsumexp(v ) := logXk
i=1evi = M + log
Xki=1
evi�M| {z }
numerically stable
, M = maxi
vi
(EPFL) Graphical Models 30/9/2011 28 / 30
-
Belief Propagation
Searching for the Mode: Max-ProductDecoding:
x⇤ 2 argmaxx
P(x )
max,Q
: Same decomposition asP
,Q
.Better: max,
Pin log domain
Max-messages:
µT!a(xa) = maxxCj\a
✓log�j(xCj ) +
Xb2Cj\a
µTb!b(xb)◆
Back-pointer tables:
�T!a(xa) 2 argmaxxCj\a
✓log�j(xCj ) +
Xb2Cj\a
µTb!b(xb)◆
(EPFL) Graphical Models 30/9/2011 29 / 30
-
Belief Propagation
Searching for the Mode: Max-ProductDecoding:
x⇤ 2 argmaxx
P(x )
max,Q
: Same decomposition asP
,Q
.Better: max,
Pin log domain
Max-messages:
µT!a(xa) = maxxCj\a
✓log�j(xCj ) +
Xb2Cj\a
µTb!b(xb)◆
Back-pointer tables:
�T!a(xa) 2 argmaxxCj\a
✓log�j(xCj ) +
Xb2Cj\a
µTb!b(xb)◆
(EPFL) Graphical Models 30/9/2011 29 / 30
-
Belief Propagation
Searching for the Mode: Max-ProductDecoding:
x⇤ 2 argmaxx
P(x )
max,Q
: Same decomposition asP
,Q
.Better: max,
Pin log domain
Max-messages:
µT!a(xa) = maxxCj\a
✓log�j(xCj ) +
Xb2Cj\a
µTb!b(xb)◆
Back-pointer tables:
�T!a(xa) 2 argmaxxCj\a
✓log�j(xCj ) +
Xb2Cj\a
µTb!b(xb)◆
(EPFL) Graphical Models 30/9/2011 29 / 30
-
Belief Propagation
Wrap-Up
Belief propagation (sum-product) on trees:All marginals in linear time, by local information propagationMax-product, max-sum, logsumexp-sum, . . . :What matters is the graph!
What about general graphs?
Decomposable graphs. Treewidth of a graphJunction tree algorithm
Interested?
PMR Edinburgh slides:http://www.inf.ed.ac.uk/teaching/courses/pmr/slides/jta-2x2.pdf
Lauritzen, S; Spiegelhalter, D. Local Computations withProbabilities on Graphical Structures and their Application to ExpertSystems. JRSS-B, 50: 157-224 (1988)
Beware (not surprising): Inference on general graphs is NP hard.In general, approximations are a must
(EPFL) Graphical Models 30/9/2011 30 / 30
-
Belief Propagation
Wrap-Up
Belief propagation (sum-product) on trees:All marginals in linear time, by local information propagationMax-product, max-sum, logsumexp-sum, . . . :What matters is the graph!What about general graphs?
Decomposable graphs. Treewidth of a graphJunction tree algorithm
Interested?PMR Edinburgh slides:http://www.inf.ed.ac.uk/teaching/courses/pmr/slides/jta-2x2.pdf
Lauritzen, S; Spiegelhalter, D. Local Computations withProbabilities on Graphical Structures and their Application to ExpertSystems. JRSS-B, 50: 157-224 (1988)
Beware (not surprising): Inference on general graphs is NP hard.In general, approximations are a must
(EPFL) Graphical Models 30/9/2011 30 / 30
-
Belief Propagation
Wrap-Up
Belief propagation (sum-product) on trees:All marginals in linear time, by local information propagationMax-product, max-sum, logsumexp-sum, . . . :What matters is the graph!What about general graphs?
Decomposable graphs. Treewidth of a graphJunction tree algorithm
Interested?PMR Edinburgh slides:http://www.inf.ed.ac.uk/teaching/courses/pmr/slides/jta-2x2.pdf
Lauritzen, S; Spiegelhalter, D. Local Computations withProbabilities on Graphical Structures and their Application to ExpertSystems. JRSS-B, 50: 157-224 (1988)
Beware (not surprising): Inference on general graphs is NP hard.In general, approximations are a must
(EPFL) Graphical Models 30/9/2011 30 / 30
Graphical ModelsBelief Propagation