Download - Global Approximate Inference
![Page 1: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/1.jpg)
Global Approximate Inference Eran Segal
Weizmann Institute
![Page 2: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/2.jpg)
General Approximate Inference Strategy
Define a class of simpler distributions Q Search for a particular instance in Q that is “close”
to P Answer queries using inference in Q
![Page 3: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/3.jpg)
Cluster Graph A cluster graph K for factors F is an undirected
graph Nodes are associated with a subset of variables
CiU The graph is family preserving: each factor F is
associated with one node Ci such that Scope[]Ci Each edge Ci–Cj is associated with a sepset Si,j = Ci
Cj
A cluster tree over factors F that satisfies the running intersection property is called a clique tree
![Page 4: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/4.jpg)
Clique Tree InferenceC
D I
SG
L
JH
Verify:
Tree and family preserving
Running intersection property
C,D G,I,DD
G,S,I G,J,S,L H,G,JG,I G,S G,J
P(C)P(D|C)
P(G|I,D) P(I)P(S|I)
P(L|G)P(J|L,S)
P(H|G,J)
1 2 3 45
![Page 5: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/5.jpg)
Message Passing: Belief Propagation
Initialize the clique tree For each clique Ci set For each edge Ci—Cj set
While unset cliques exist Select Ci—Cj
Send message from Ci to Cj
Marginalize the clique over the sepset
Update the belief at Cj
Update the sepset at Ci–Cj
ii )(:
1, ji
jii SC iji,
jiji ,
ji
jijj
,
![Page 6: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/6.jpg)
Clique Tree Invariant Belief propagation can be viewed as
reparameterizing the joint distribution Upon calibration we showed
Initially this invariant holds since
At each update step invariant is also maintained Message only changes i and i,j so most terms remain
unchanged We need to show
But this is exactly the message passing step
Belief propagation reparameterizes P at each step
TCC jiji
TC ii
ji
i
S
CP
)( ,, )(
][)(
U
)(1)(
][
)( ,,
UPS
CF
TCC jiji
TC ii
ji
i
ji
i
ji
i
,,'
'
ji
ijii
,
,''
![Page 7: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/7.jpg)
Global Approximate Inference Inference as optimization Generalized Belief Propagation
Define algorithm Constructing cluster graphs Analyze approximation guarantees
Propagation with approximate messages Factorized messages Approximate message propagation
Structured variational approximations
![Page 8: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/8.jpg)
The Energy Functional Suppose we want to approximate P with Q
Represent P by factors
Define the energy functional Then:
)(][ln],[ UQF
QF HEQPF
F
F ZP
)(
1)( UU
)||(],[
)(ln)(ln)(ln)(ln
)(ln)(ln
)(ln)(ln
)(ln)(ln
)(ln)(lnln
FF
FFQQQQ
FFQQ
FQF
Q
FF
FF
PQDQPF
PEQEQEUE
PEUE
PEUE
PU
PUZ
UUU
U
U
U
U Minimizing D(Q||PF) is equivalent to maximizing F[PF,Q]
lnZ F[PF,Q] (since D(Q||PF)0)
![Page 9: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/9.jpg)
Inference as Optimization We show that inference can be viewed as
maximizing the energy functional F[PF,Q] Define a distribution Q over clique potentials Transform F[PF,Q] to an equivalent factored form
F’[PF,Q] Show that if Q maximizes F’[PF,Q] subject to
constraints in which Q represents calibrated potentials, then there exists factors that satisfy the inference message passing equations
![Page 10: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/10.jpg)
Defining Q Recall that throughout BP
Define Q as reparameterization of P such that
Since D(Q||PF)=0 we show that calibrating Q is equivalent to maximizing F[PF,Q]
TCC jiji
TC ii
ji
i
S
CP
)( ,, )(
][)(
U
})(:{}{ , TCCQ jijii
TCC jiji
TC ii
T
ji
i
S
CQ
)( ,, )(
][)(
U
![Page 11: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/11.jpg)
Factored Energy Functional Define the factored energy functional as
Theorem: if Q is a set of calibrated potentials for T, then F[PF,Q]=F’[PF,Q]
TCC
jii TC
iiF
ji
ji
i
iiSHCHEQPF
)(,
0 )()(][ln],[',
![Page 12: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/12.jpg)
Inference as Optimization Optimization task
Find Q that maximizes F’[PF,Q] subject to
Theorem: fixed points of Q for the above optimization
Suggests iterative optimization procedure Identical to belief propagation!
i
jii
Cii
SCjiiji
TC
TCC
1
)(,
,
ijjiji
Nj ijii
SC jNk ikiji
iC
jii iC
,
0
}{
0
,
![Page 13: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/13.jpg)
Global Approximate Inference Inference as optimization Generalized Belief Propagation
Define algorithm Constructing cluster graphs Analyze approximation guarantees
Propagation with approximate messages Factorized messages Approximate message propagation
Structured variational approximations
![Page 14: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/14.jpg)
Generalized Belief Propagation
Perform belief propagation in a cluster graph with loops
Strategy:
C
A
B D
Bayesian
network
A,B,D
B,C,D
B,D
Cluster tree
A,B
B,C
B
Cluster graph
A,D
C,D
D
C
A
![Page 15: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/15.jpg)
Generalized Belief Propagation
Perform belief propagation in a cluster graph with loops
Strategy:
A,B
B,C
B
Cluster graph
A,D
C,D
D
C
A Inference may be incorrect: double counting evidence
Unlike in BP on trees: Convergence is not guaranteed Potentials in calibrated tree are not
guaranteed to be marginals in P
![Page 16: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/16.jpg)
Generalized Belief Propagation
Perform belief propagation in a cluster graph with loops
Strategy:
A,B
B,C
B
Cluster graph
A,D
C,D
D
C
A
![Page 17: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/17.jpg)
Generalized Cluster Graph A cluster graph K for factors F is an undirected
graph Nodes are associated with a subset of variables CiU The graph is family preserving: each factor F is
associated with one node Ci such that Scope[]Ci Each edge Ci–Cj is associated with a sepset Si,j = Ci Cj
A generalized cluster graph K for factors F is an undirected graph
Nodes are associated with a subset of variables CiU The graph is family preserving: each factor F is
associated with one node Ci such that Scope[]Ci Each edge Ci–Cj is associated with a subset Si,j Ci Cj
![Page 18: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/18.jpg)
Generalized Cluster Graph A generalized cluster graph obeys the running
intersection property if for each XCi and XCj, there is exactly one path between Ci and Cj for which XS for each subset S along the path
All edges associated with X form a tree that spans all the clusters that contain X
Note: some of these clusters may beconnected with more than one path
A,B
B,C
B
A,D
C,D
D
C
A
![Page 19: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/19.jpg)
Calibrated Cluster Graph A generalized cluster graph is calibrated if for
each edge Ci – Cj we have:
Weaker than in clique trees, since Si,j is a subset of the intersection between Ci and Cj
If a cluster graph satisfies the running intersection property, then the marginal on any variable X is the same in every cluster that contains X
jijjii SC
jjSC
ii CC,,
][][
![Page 20: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/20.jpg)
GBP is Efficient
X11 X12 X13
X21 X22 X23
X31 X32 X33
X11,X12
X12X12,X13
X12,X22 X13,X23X11,X21
X21,X22 X22,X23
X22,X32 X23,X33X21,X31
X31,X32 X32,X33
X11
X21
X21
X31
X32
X32 X33
X23X22
X22
X22 X23
X13X12
Cluster graph
Markov grid network
Note: clique tree in a n x n grid is exponential in n
Round of GBP is O(n)
![Page 21: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/21.jpg)
Global Approximate Inference Inference as optimization Generalized Belief Propagation
Define algorithm Constructing cluster graphs Analyze approximation guarantees
Propagation with approximate messages Factorized messages Approximate message propagation
Structured variational approximations
![Page 22: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/22.jpg)
Constructing Cluster Graphs When constructing clique trees, all
constructions give the same result but differ in computational complexity
In GBP, different cluster graphs can vary in both computational complexity and approximation quality
![Page 23: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/23.jpg)
Transforming Pairwise MNs A pairwise Markov network over a graph H has:
A set of node potentials {[Xi]:i=1,...n} A set of edge potentials {[Xi,Xj]: Xi,XjH} Example:
X11 X12 X13
X21 X22 X23
X31 X32 X33
X11,X21
X11 X12 X13
X23
X33X32X31
X21 X22
X12,X22 X13,X23
X21,X31 X22,X32 X23,X33
X21,X22
X11,X12
X31,X32
X22,X23
X12,X13
X32,X33
![Page 24: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/24.jpg)
Transforming Bayesian Networks
Example:
“Large” cluster per each CPD Single nodes for each variable Connect node and large cluster if node in CPD Graph obeys running intersection property
A,B,C
A D FC B
A,B,D B,D,FA B
DC
F Bethe approximation
![Page 25: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/25.jpg)
Global Approximate Inference Inference as optimization Generalized Belief Propagation
Define algorithm Constructing cluster graphs Analyze approximation guarantees
Propagation with approximate messages Factorized messages Approximate message propagation
Structured variational approximations
![Page 26: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/26.jpg)
Generalized Belief Propagation GBP maintains distribution invariance
(since message passing maintains invariance)
KCC jiji
KC ii
F
ji
i
S
CP
)( ,, )(
][)(
U
![Page 27: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/27.jpg)
Generalized Belief Propagation If GBP converges (K is calibrated)
Each subtree T is calibrated with edge potentials corresponding to marginals of PT(U)
(since PT(U) is a calibrated tree)
TCC jiji
TC ii
T
ji
i
S
CP
)( ,, )(
][)(
U
![Page 28: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/28.jpg)
Generalized Belief Propagation Calibrated graph potentials are not PF(U)
marginals A,B
B,C
B
A,D
C,D
D
C
A1
2 3
4A,B
B,C
B
C,DC
1
2 3
][][][][
],[],[],[],[
),,,(
1,44,33,22,1
4321
ADCB
DADCCBBA
DCBAPF
][][
],[],[],[
),,,(
3,22,1
321
CB
DCCBBA
DCBAPT
][][],[ 4,14,34 ADDA
),(],[1 BAPBAPP FFT
![Page 29: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/29.jpg)
Inference as Optimization Optimization task
Find Q that maximizes F’[PF,Q] subject to
Theorem: fixed points of Q for the above optimization
Suggests iterative optimization procedure Identical to belief propagation!
i
jii
Cii
SCjiiji
TC
TCC
1
)(,
,
ijjiji
Nj ijii
SC jNk ikiji
iC
jii iC
,
0
}{
0
,
![Page 30: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/30.jpg)
GBP as Optimization Optimization task
Find Q that maximizes F’[PF,Q] subject to
Theorem: fixed points of Q for the above optimization
Note: Si,j is only a subset of intersection between Ci and Cj Iterative optimization procedure is GBP
i
jii
Cii
SCjiiji
KC
KCC
1
)(,
,
ijjiji
Nj ijii
SC jNk ikiji
iC
jii iC
,
0
}{
0
,
![Page 31: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/31.jpg)
GBP as Optimization Clique trees
F[PF,Q]=F’[PF,Q] Iterative procedure (BP) guaranteed to converge Convergence point represents marginal distributions
of PF
Cluster graphs F[PF,Q]=F’[PF,Q] does not hold! Iterative procedure (GBP) not guaranteed to
converge Convergence point does not represent marginal
distributions of PF
![Page 32: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/32.jpg)
GBP in Practice Dealing with non-convergence
Often small portions of the network do not converge stop inference and use current beliefs
Use intelligent message passing scheduling Tree reparameterization (TRP) selects entire trees, and
calibrates them while keeping all other beliefs fixed Focus attention on uncalibrated regions of the graph
![Page 33: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/33.jpg)
Global Approximate Inference Inference as optimization Generalized Belief Propagation
Define algorithm Constructing cluster graphs Analyze approximation guarantees
Propagation with approximate messages Factorized messages Approximate message propagation
Structured variational approximations
![Page 34: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/34.jpg)
Propagation w. Approximate Msgs
General idea Perform BP (or GBP) as before, but propagate
messages that are only approximate Modular approach
General inference scheme remains the same Can plug in many different approximate message
computations
![Page 35: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/35.jpg)
Factorized Messages
X11 X12 X13
X21 X22 X23
X31 X32 X33
X21
X11 X12
X21
X31 X32
X22
X31
X11 X13
X22
X33
X23
X32
X12
1 2 3
Markov network
Clique tree
Keep internal structure of the clique tree cliques
Calibration involves sending messages that are joint over three variables
Idea: simplify messages using factored representation
Example:
][~][
~][
~],,[
~31212121112131211121 XXXXXX
![Page 36: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/36.jpg)
Computational Savings Answering queries in Cluster 2
Exact inference: Exponential in joint space of cluster 2
Approximate inference with factored messages Notice that subnetwork with factored messages is a tree Perform efficient exact inference on subtree to answer
queries
X21
X11 X12
X21
X31 X32
X22
X31
X11
X22
X32
X12
1 2 3
2321022
],,[~
31211121 XXX 2321022
~~~ ],,[
~32221223 XXX
![Page 37: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/37.jpg)
Factor Sets A factor set ={1,...,k} provides a compact
representation for high-dimensional factor 1,...,k
Belief propagation Multiplication of factor sets
Easy: simply the union of the factors in each factor set multiplied
Marginalization of factor set: inference in simplified network
Example: compute 23X21
X11 X12
X21
X31 X32
X22
X31
X11
X22
X32
X12
1 2 3
21
~ 0
2 )~
(~
210232
![Page 38: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/38.jpg)
Global Approximate Inference Inference as optimization Generalized Belief Propagation
Define algorithm Constructing cluster graphs Analyze approximation guarantees
Propagation with approximate messages Factorized messages Approximate message propagation
Structured variational approximations
![Page 39: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/39.jpg)
Approximate Message Propagation
Input Clique tree (or cluster graph) Assignments of original factors to clusters/cliques The factorized form of each cluster/clique
Can be represented by a network for each edge Ci—Cj that specifies the factorization (in previous examples we assumed empty network)
Two strategies for approximate message propagation
Sum-product message passing scheme Belief update messages
![Page 40: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/40.jpg)
Sum-Product Propagation Same propagation scheme as in exact inference
Select a root Propagate messages towards the root
Each cluster collects messages from its neighbors and sends outgoing messages when possible
Propagate messages from the root
Each message passing performs inference on cluster
Terminates in a fixed number of iterations
Note: final marginals at each variable are not exact
![Page 41: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/41.jpg)
Message Passing: Belief Propagation
Same as BP but with approximate messages Initialize the clique tree
For each clique Ci set For each edge Ci—Cj set
While unset cliques exist Select Ci—Cj
Send message from Ci to Cj
Marginalize the clique over the sepset
Update the belief at Cj
Update the sepset at Ci–Cj
ii )(:
~
1~
, ji
jii SC iji,
~~
jiji ~~,
ji
jijj
,~
~~~
Approximation
Two message passing schemes differ in approximate inference
![Page 42: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/42.jpg)
Global Approximate Inference Inference as optimization Generalized Belief Propagation
Define algorithm Constructing cluster graphs Analyze approximation guarantees
Propagation with approximate messages Factorized messages Approximate message propagation
Structured variational approximations
![Page 43: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/43.jpg)
Structured Variational Approx. Select a simple family of distributions Q Find QQ that maximizes F[PF,Q]
![Page 44: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/44.jpg)
Mean Field Approximation Q(x) = Q(Xi) Q loses much of the information of PF
Approximation is computationally attractive Every query in Q is simple to compute Q is easy to represent
X11 X12 X13
X21 X22 X23
X31 X32 X33
PF – Markov grid network
X11 X12 X13
X21 X22 X23
X31 X32 X33
Q – Mean field network
![Page 45: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/45.jpg)
Mean Field Approximation The energy functional is easy to compute, even
for networks where inference is complex
)(][ln],[ UQF
QF HEQPF
)(ln)()(ln)(][ln
uuuu
uu
iX
iF
Q xQQE
i
iQQ XHH )()(U
![Page 46: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/46.jpg)
Mean Field Maximization Maximizing the Energy Functional of Mean-Field
Find Q(x) = Q(Xi) that maximizes F[PF,Q]
Subject to for all i: xiQ(xi)=1
![Page 47: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/47.jpg)
Mean Field Maximization Theorem: Q(Xi) is a stationary point of the
mean field given Q(X1),...Q(Xi-1),Q(Xi+1),...Q(Xn) if and only if
Proof: To optimize Q(Xi) define the Lagrangian
corresponds to the constraint that Q(Xi) is a distribution
We now compute the derivative of Li
F
iQi
i xEZ
xQ
]|[lnexp1
)(
ixi
jjQ
FQi xQXHEQL 1)()(][ln][
![Page 48: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/48.jpg)
Mean Field Maximization
ix
ij
jQF
Qi
ii
xQXHExQ
QLxQ
1)()(][ln)(
][)(
FiQ
F Xxii
Fi
F Xxii
i
F i
FiFQ
i
xE
xQxQ
xQ
xxQQxQ
QxQ
QxQ
ExQ
ii
ii
]|[ln
],|)[ln()'(
],|)[ln(
],|)[ln()()(
]|)[ln()(
]|[ln)()(
][ln)(
' Vv
Vv
Vv
Xx
Xx
vv
vv
vv
xx
xx
![Page 49: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/49.jpg)
Mean Field Maximization
ix
ij
jQF
Qi
ii
xQXHExQ
QLxQ
1)()(][ln)(
][)(
1)(ln
)(ln)()(
)(ln)()(
)()(
i
Xxii
i
j Xxjj
ijjQ
i
xQ
xQxQxQ
xQxQxQ
XHxQ
ii
jj
ixi
i
xQxQ
1)()(
F FiQQ
i
xEExQ
]|[ln][ln)(
![Page 50: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/50.jpg)
Mean Field Maximization
1)(ln]|[ln
1)()(][ln)(
][)(
iiF
Q
xi
jjQ
FQ
ii
i
xQxE
xQXHExQ
QLxQ
i
F
iQi xExQ
]|[ln1)(ln
Setting the derivative to zero, and rearranging terms, we get:
Taking exponents of both sides we get:
F
iQi
i xEZ
xQ
]|[lnexp1
)(
![Page 51: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/51.jpg)
Mean Field Maximization: Intuition
)]([ln)]|([ln
)],('[ln
),('ln)(
,|ln)(
],|[ln)(]|[ln
VV
V
vv
vv
vv
Vv
Vv
Vv
FQiFQ
iFQ
iF
Fi
Fi
FiQ
ZPExPE
xPE
xPQ
xQ
xQxE
We can thus rewrite Q(xi) as:
)]([lnexp)]|([lnexp1
)( VV FQiFQi
i ZPExPEZ
xQ
F
iQi
i xEZ
xQ
]|[lnexp1
)(
![Page 52: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/52.jpg)
Mean Field Maximization: Intuition
)]([lnexp)]|([lnexp1
)( VV FQiFQi
i ZPExPEZ
xQ
)]|([lnexp1
)( ViFQi
i xPEZ
xQ
)]|([)|()()( vvvv
iFPiFFiF xPExPPxPF
Q(xi) is the geometric average of P(xi|V) Relative to the probability distribution Q In this sense, marginal is “consistent” with other
marginals In PF we can also represent marginals
Arithmetic average with respect to PF
![Page 53: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/53.jpg)
Mean Field: Algorithm Simplify:
To:
Since terms that do not involve xi can be added to constant
Note: Q(xi) does not appear on right hand side Can solve and reach optimal Q(xi) in one step Note: step is only optimal given all other Q(Xi) Suggests an iterative algorithm Convergence guaranteed to local maxima since each step
improves
F
iQi
i xEZ
xQ
]|[lnexp1
)(
)(:
)],([lnexp1
)(
ScopeX
iQi
i
i
xUEZ
xQ
![Page 54: Global Approximate Inference](https://reader035.vdocuments.net/reader035/viewer/2022062422/56813e31550346895da8152c/html5/thumbnails/54.jpg)
Markov Network Approximations
Can use Q that are increasingly complex As long as Q is easy (=inference feasible)
efficient update equations can be derived
X11 X12 X13
X21 X22 X23
X31 X32 X33
PF – Markov grid network
X11 X12 X13
X21 X22 X23
X31 X32 X33
Q – Mean field network