Overview
1 Background
2 Inference in graphical models
3 Variational approximations and linear response
4 A new method based on self-consistent constraints
5 Cluster variational methods, and experiments on models
6 Conclusions
7 References and extra material
Who am I
Discuss today:
* Variational inference and approximation in graphicalmodels.
Reasons I find Statistical Physics inspiring:
LDPC, kSAT, compressed sensing : where are the hardproblems, why does local search fail or slow down, cannucleation help avoid metastability? phase characteristicsand transitions.
My research: Edinburgh University; Martin Evans
Self-propelled particles (non-equilibrium physics on a 1dlattice).
My research: Aston University (NCRG); David Saad
* Code Division Multiple Access (analysis of protocols formulti-user access channels).
Noisy computation (robustness of computation on randomdirected acyclic graphs).
1-in-k SAT on random graphs (a boolean constraintsatisfaction problem).
My research: Hong Kong UST; Michael Wong
Typical case hard optimization (next nearest neighbor Isingmodels on random graphs).
Compressed sensing (extracting a sparse signal of Ncomponents from M < N measurements).
My research: University of Rome; FedericoRicci-Tersenghi
* Loop-correction algorithms (improvements to tree-likeiterative algorithms).
* Linear response for inverse problems (improvements tosimple matrix inverse methods).
Compressed sensing.
Typical case hard optimization (vertex cover/ resourceallocation).
Short summary
I have been to a lot of photogenic places...
I like optimization problems on random graphs.
Next The MARG problem, graphical model representations.
Graphical models
The probability for N interacting variables x can be described:
P (x) =1
Z
∏a
ψa(x)
A factor graph representation is very general:
Interactions
Variables
subgraphGraphical model
The marginalization and maximization problems:
MARG Approximate P (xi),P ({xi, xj}) and other marginalsefficiently.
MAX Approximate argmaxxP (x).
Inference (marg. / max) is NP-hard in the (interesting regimes)of the problems presented.
My research as graphical models
e.g. Code Division Multiple Access
Given signal reconstruct best source (MAX), or best marginalestimates (MARG).
P (x|y) ∝∏µ
exp
[− 1
2σ2(yµ −
∑i
Aµixi)
]︸ ︷︷ ︸
factorized likelihood
∏i
[δ(xi − 1) + δ(xi + 1)]︸ ︷︷ ︸factorized prior
(1)
Working from the free energy for graphical models
The free energy is
F = − log∑x
∏a
ψa(xa)
The sum over N states should not be done naively:
1 Transfer-Matrix method (Junction Tree algorithm), orMarkov Chain Monte Carlo (MCMC).
If too slow:
2 Advanced or simple mean-field methods (variationalmethods).
Graphical models, a restricted example
essential exponential family of probabilities; discrete states.
digestible Pairwise interactions; spin variables xi = ±1, β = 1.
P (x) = exp
∑ij
Jijxixj +∑i
Hixi + F (J,H)
.
The free energy ensures normalization
F (J,H) = − log
∑x
exp
∑ij
Jijxixj +∑i
Hixi
.
Graphical models, a restricted example
P (x) = exp(∑ij
Jijxixj +∑i
Hixi + F (J,H)) . (2)
To do:
1 (Direct) Infer marginal distributions E[xi] and E[xixj ]given couplings J and fields H.
2 (Inverse) Infer couplings and fields from M data samples{x1, . . . , xM} or statistics E[xi] , E[xixj ].
Short summary
Graphical models can represent a range of disorderedproblems, revealing features through graph theoreticproperties.
The MARG problem is to approximate P (xi) from anintractable distribution P (x).
Next Solving the MARG problem from the variational freeenergy or by local iterative schemes.
Relation between the free energy and marginals
The free energy is the moment generating function, themarginals are simple functions of the moments
F (H,J) = − log
∑x
exp
∑ij
Jijxixj +∑i
Hixi
. (3)
Magnetizations (1st order cumulants):
E[xi] = − ∂
∂HiF ; (4)
Connected Correlations (2nd order cumulants, responses):
E[xixj ]− E[xi]E[xj ] = − ∂2
∂Hi∂HjF ; . (5)
Variational principle
VP From a tractable family of probability models Q(x), select amodel minimizing the Kullback-Leibler Divergence to P (x).
KL(Q||P ) =∑
Q(x) log
(Q(x)
P (x)
)≥ 0 (6)
The variational free energy is an upper bound to F (H,J)
FH,J(Q) = EQ[logQ(x)]−∑ij
JijEQ[xixj ]−∑i
HiEQ[xi] (7)
H,J are the quenched parameters, {Q(x) :∑Q(x) = 1} is
variational and easily marginalized.
Local iteration schemes
LI Define local probability approximations (e.g. magnetizationE[xi]), pass information locally to refine the estimatesunder consistency constraints.
NMF Mean-field argument (exact for weak correlations): allvariables have a magnetization, observing neighbor statesand local interactions one can reestimate:
Mi = tanh(Hi +∑j
JijMj) (8)
BP Belief propagation (exact for locally tree like graphs):variables have a set of conditional magnetizations one foreach outward neighbor, which is the magnetizationexcluding the effect of that neighbor (Jij = 0). Theseconditional magnetizations are updated
mi→j = tanh
Hi +∑k\j
atanh[tanh(Jik)mk→i]
(9)
Local iteration versus variational schemes
• Mean field iterative scheme = local minima of Mean fieldfree energy.
• Belief propagation iterative scheme (Gallagher,Bethe-Peiels, . . . ) = local minima of Bethe free energy(Yedidia 2005).
• Expectation propagation schemes (Minka) = Weighted sumof Gaussian and Marginal free energies (Wainwright andJordan, 2008).
Most common iterative schemes represent specifiic ways tominimze non-convex free energies.
Belief propagation (Bethe free energy): it does work
The MARG problem is solved in practice by BP forN ∼ O(100), O(1000), in CDMA, LDPC (channel coding),compressed sensing, community detection models. Large,disordered ”locally tree like” or ”fully connected” networks.
Belief propagation (Bethe free energy): it doesn’t work
L1
L3
L2
L1
L2= =
Locally homogeneous graphs:
1) Insensitive to boundary
2) Insensitive to dimensionality
For locally identical problems coupling J, field H, connectivityk:
MFM = tanh {H + kJM} (10)
BPm→ = tanh {H + (k − 1)atanh[tanh(J)m→]} (11)
Corrections for short and long loops, sensitivity todimensionality and boundary.
*Short loops: Cluster Variational Method (Kikuchi,51)pay an exponential price in cluster width, loops can remain buton larger regions.Long loops: Loop calculus (Chertkov and Chernyak,05)many interesting graphs have a factorial number of loops, howto select a subset?something else...:Moment matching: Minka and Qi (2004), Opper and Winther(2005).*Linear Response: Kappen et al. (1998), Welling and Teh(2004), Montanari and Rizzo (2005), Yasuda et al (2013),Opper and Winther (2001)Variable Clamping: Eaton et al. (2008), Mooij et al (2005)
How to calculate linear response
Using an unconstrained representation of the free energy e.g.
bi(xi) =1 + (M∗i + δMi)xi
2(12)
Free energy about the fixed point (by Taylor expansion)
Fmodel,H,J(M) = Fmodel(M∗)+0+
1
2δMT ∂2Fmodel
∂M∂MT
∣∣∣∣M∗
δM (13)
A small change in H, will lead to a correspondingly smallchange in the magnetization
Fmodel,H+δH,J(M∗ + δM) = Fmodel(M∗) + δHT ∂Fmodel
∂H
∣∣∣∣M∗
+
δHT ∂2Fmodel∂H∂MT
∣∣∣∣M∗
δM +1
2δMT ∂2Fmodel
∂M∂MT
∣∣∣∣M∗
δM (14)
Need to minimize in the variational parameter (theperturbation δM)
How to calculate linear response (continued..)
The new saddle-points
0 = δHT ∂2Fmodel∂H∂MT
∣∣∣∣M∗
+ δMT ∂2Fmodel∂M∂MT
∣∣∣∣M∗[
∂M
∂H
]−1=
∂2Fmodel∂M∂MT
∣∣∣∣M∗
Inverse correlation matrix is a simple function of the Hessian.Infact[∂M
∂H
]−1
=∂2F (M∗, C∗)
∂M∂MT− ∂2F (M∗, C∗)
∂M∂CT
[∂2F (M∗, C∗)
∂C∂CT
]−1∂2F (M∗, C∗)
∂C∂MT
In the case of Bethe and NMF, the estimates obtained dependon the loops. Note we can also use the expression to estimatecouplings:([
Cdata]−1)i,j
=∂2F (Mdata, Cdata)
∂Mi∂Mj− . . . = −Ji,j +
[∂2S(M∗, C∗)
∂Mi∂Mj− . . .
]
Short summary
Choose a variational form Q and minimize.
Possibly improve the estimation of correlations with linearresponse.
Belief Propagation values for M∗, C∗ are ignorant of loops;linear response dM/dH can see loops, but is also inexact.
The linear response expression can be inverted to get J assimple functions of M and C (from data).
Next Recent work with Federico Ricci-Tersenghi; makingvariational approximations self-consistent with linearresponse constraints
A simple idea
1 Clamp out some small subset of troublesome variables.
2 Use cluster methods to coarse grain the problem (deal withshort loops), use expansion methods for important longloops.
3 Calculate message perturbations (linear response) to yieldextra information (connected correlations).
3(NEW) Require consistency between the message information andthe linear response information.
J. Raymond and F. Ricci-Tersenghi, “Mean-field method withcorrelations determined by linear response,” Phys. Rev. E,vol. 87, p. 052111, 2013.
Why should the constraints help?
A variational approach based on trial probability distribution Q
F (Q) =∑
[Q(x) logQ(x)]
−∑i,j
Ji,j∑
[Q(x)xixj ]−∑i
Hi
∑[Q(x)xi] (15)
1 Constraints on the physical quantity make the methodconsistent (in its predictions).
2 In a good variational approximations Q∗ ∼ P + εδP : 1storder derivative errors O(ε), nth order derivative errorsO(εn). (e.g. Opper, 2003).
3 First derivative conditions introduce local consistency. TheHessian related constraints introduces a global consistency(Welling and Teh, 2004).
NEW Fix parameters by second order constraints, reduce errorsand include extra graph wide consistency.
Two ways to get the self response
At the minima of the variational free energy
1−[∂
∂HiFmodel
]2= 1− (Mi)
2 .
If we were using the exact free energy, this would equal theresponse
∂Mi
∂Hi=
∂2
∂Hi∂HiFmodel .
but this is violated in NMF, Bethe and otherapproximations. we introduce a constraint.
Variational approach implemented
Reasonable trial distribution (e.g. NMF, independent variables)
Q(x) =
N∏i=1
Qi(xi) =
N∏i=1
1 +Mixi2
. (16)
Kullback-Leibler divergence (relative entropy) is:
D(Q||P ) = −∑ij
JijEQ[xixj ]−∑i
HiEQ[xi] + EQ[log(Q)]
− F (J,H) ≥ 0 . (17)
EQ is the expectation with respect to the trial distribution. Themodel free energy is
Fmodel(M) = Emodel(M)− Smodel(M) ≥ F . (18)
STD Minimize w.r.t M . ∂MF = 0
NEW Minimize subject to (1−M2i ) = [∂2HF (M)]i,i.
Diagonal and Off-diagonal consistency
Beliefs can always be represented by magnetization (M) andcorrelations (C) in a consistent way:
bi(xi) =1 +Mixi
2bij(xi) = bi(xi)bj(xj) +
Cijxixj4
On-diagonal constraint: Lagrange multiplier λi[1−
(∂Fmodel∂Hi
)2
=
]1− (Mi)
2
[=∂2Fmodel∂H2
i
]Off-diagonal constraint: Lagrange multiplier λi,j[
∂Fmodel∂Jij
− ∂Fmodel∂Hi
∂Fmodel∂Hj
=
]Cij
[=∂2Fmodel∂Hi∂Hj
]NB: derivatives evaluated at M∗, C∗
Diagonal and Off-diagonal consistency
To minimize the constrained variational free energy
∂MF (M,C, λ) = 0 saddle-point
∂CF (M,C, λ) = 0 saddle-point
∂2Hi,HiF (b, λ) = 1−M2i on-diagonal consistency
∂2Hi,HjF (b, λ) = Ci,j off-diagonal consistency
Normally we fix M,C by the first two conditions, with λ = 0.Now we introduce new variables λ and new constraints, andsolve jointly.
Belief propagation
Standard
mi→j = tanh
Hi +∑
k=neigh.\j
atanh (tanh(Jki)mk→i)
vs New (new effective field, effective coupling)
mi→j = tanh
Hi + λiMi +
∑k=neigh
λikMk
+
∑k=neigh.\j
atanh (tanh(Jki − λki)mk→i)
λ are Lagrange multipliers for the constraints, M aremagnetizations to be fixed self-consistently.O(N) algorithm on sparse graph.To calculate all linear response we simply perturb thesemessages O(N2) algorithm.
Schematically
c
f
b
e
dc
f
b
On diagonal only
a
Off (off + on) diagonal
a
With only on-diagonal constraints same as I-SUSPalgorithm (Yasuda et al., 2013).
Take simultaneously messages {u→cav.} and responses{u→cav.,z} incident on the edge/vertex region.
Calculate consistently M ,C and λ on the region, generatenew internal messages.
One choice among many available from variationalformulation.
Weak coupling/ high temperature result
i ji
Bethe
NMFji
(Reaction term error)
−1 −1
i j
Parameter error
Correlation
Ameliorated error sources
[C ] On diagonal error [C ] Off diagonal error
Magnetization and Connected Correlation errors by expansionabout independent variables.
• NMF errors: O(J2, β3).
• λ-NMF errors: mag. O(J3, β4) and c.c. O(J2, β3).
• Bethe errors: O(J3, β4) .
• λ-Bethe errors: O(J4, β5).
Opper Winther (2001): (Ising spins, equivalent toadaptive-TAP)
Recovery and extension of adaptive-TAP framework (Opperand Winther, 2001)
Mi = tanh
Hi +∑j(6=i)
JijMj + λiMi
, (19)
(1−M2i ) =
[(Diag[
1
1−M2i
− λi]− J)−1]
i,i
. (20)
λi is the Onsager reaction term, we needn’t know thedistribution of J in order for its calculation.Exact in (advanced) ”mean-field” models (includes SherringtonKirkpatrick model, Hopfield model, fully connected disorderedmodels).
Short summary
We shift the point of evaluation for the free energy from theminima, to a new minima consistent with linear response.
We gain an order of magnitude in estimate quality, addingconstraints transforms a bad standard approximation(NMF) into an optimal choice (TAP) for a broad range ofmodels.
We can also formulate ”message passing like” algorithms.
Next Brief introduction of a more advance variational method,some applications.
The cluster variational (region based) approximations
Empirically, the entropy on large regions {β} can be understood in terms of contributions from smaller constitutent partsThe unexplained part is S and typically decays exponentiallywith cluster size.
S =∑α
Sα +∑β
Sβ =∑α
Sα + correlation/loop correction.
We truncate the Mobius transform of the entropy (An,88).e.g. {α} = {i} NMF, α = {(i, j)} ∪ {i} Bethe.The model parameters of the approximation are beliefs b(marginal probabilities)
Smodel =∑α
cαSα
The cluster variational (region based) approximations
Si(bi) = −∑xi
bi(xi) log bi(xi)
Sij(bij , bi, bj) = −∑xi,xj
bij(xi, xj) logbij(xi, xj)
bi(xi)bj(xj)
Sijk({b·}) = −∑
bijk(xi, xj , xk) logbijk(xi, xj , xk)bi(xi)bj(xj)bk(xk)
bij(xi, xj)bjk(xi, xj)bik(xi, xj)
Asymptotic phenomena low temperature
Disordered models
0.02
0.1
0.4
0.2 0.25 0.3 0.35 0.4 0.45 0.5
E[(
χij* -χ
ij)2]1
/2
β
8 by 8 2D square lattice Edward Anderson model (J=pm 1,H=0) , 8 samples
BetheBethe (+off-diagonal constraints)
CVM (Lage-Castellanos et al. 2012)CVM (+off-diagonal constraints)
Inverse Temperature
Err
or
in c
orr
elat
ion e
stim
ates
On diagonal constraint
further improves high
temperature behaviour
Significant improvements[better handling of unfrustrated subdomains]
Sampling error floor (specialized monte−carlo (Zhou,2012))
Disordered and frustrated model, complicated phase space.
Disordered models, e.g. Random field Ising model
1e-06
1e-05
0.0001
0.001
0.01
0.1
0 0.05 0.1 0.15 0.2 0.25 0.3
|Err
or
Co
nn
ecte
d C
orr
ela
tio
n|
Exact Connected Correlation
Param: No constaint (BP)Response: No constraint (BP)
Param: Diag. constraint (I-SUSP)Response: Diag. constraint (I-SUSP)
All constraints 1e-05
0.0001
0.001
0.01
0 0.2 0.4 0.6 0.8 1
|Err
or
Ma
gn
etiza
tio
n|
Exact Magnetization
No constaint (BP)Diag. constraint (I-SUSP)
All constraintsβ=0.25
β=0.25 β=0.52
2D square lattice random field Ising model: J=1, H iid Gaussian variables N(0.2,0.5), L=5
β=0.52
• For models dominated by a single pure state (weaklycorrelated) convergence is robust. e.g. small and large β (ifunique ground state).
• Adding constraints boosts performance significantly
and other algorithms?
1e-05
0.0001
0.001
0.01
0 0.2 0.4 0.6 0.8 1
|Err
or M
agne
tizat
ion|
Exact Magnetization
No constaint (CVM-plaquettes)treeEP
treeRWBP
libDAI C++ library of Mooij (standard implementations)
Tree EP (Qi and Minka 2005): A structured moment matchingalgorithm.
Gen. BP (Yedidia et al.2005, Heskes et al. 2003) : A Kikuchi freeenergy (plaquette regions).
TRW BP (Wainwright et al., Wiegerinck et al. 2003): A convexifiedform of the Bethe approximation.
Inverse examples
0.001
0.01
0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55
β(E
[Jij(
CM
C)-
Jije
xa
ct )2
])1
/2
β
NMF/TAP (λ=0)Bethe (λ=0)
Bethe (λ=0) [KR]Bethe
Plaquette
Coupling, J
O(10 ) independent samples6
Kappen−Rodriguezdiagonal heuristic
Square Lattice diluted (0.7) ferromagnet, L=7
1e-05
0.0001
0.001
0.01
0.1
-1.2 -1 -0.8 -0.6 -0.4 -0.2 0
|β|(
E[J
ij(C
exa
ct )-
Jije
xa
ct )2
])1
/2
β
Bethe (λ=0)Bethe
PlaquetteErr
or
in c
ou
pli
ng
str
eng
th
All possible triangular regions
edge regionsAll possible
Ricci−Tersenghi ’11
Sessak−Monasson ’09
Fully frustrated triangular lattice, L=5
Coupling, J
7 5
4J
JJ
Form:
Ji,j = −[C−1data]i,j
Ji,j = −[C−1data]i,j + ΛB.i,j(Mdata, C∗) ; −[C−1data]i,j + ΛB.i,j(Mdata, Cdata)
Similar expressions for magnetization.
• Provably improved scaling for weak-coupling limit.
• From realistic data: Improvement in an intermediate rangeof temperatures
• Works well away from phase transitions.
Inverse examples: Protein
Residue contact maps for protein (length 161 residues [21 states])
Mutual information Bethe (like) free energy NMF free energy
]Blue is true contact structure (well understood special case), [Morcosa et al PNAS 2011)
Yellow indicate true positives (on largest inferred couplings), grey false positives
Comparison of largest inferred couplings and true structure
The sequence of residues (each one of 21 states) in a protein oflength O(100) are correlated through evolutionary function.If residues fold together they coevolve (show strong correlation),we can use a pairwise model to infer functional relationships(i.e. spatial proximity in folding).
Inverse examples: RNA families (telemorase)
State of the art for proteins and RNA (Morcos 2009), simpleinverse; we can add ΛV to determine tertiary structure?
Ji,j = −[C−1data]i,j + ΛV (M,C)
Thanks for attention
• A consistent variational approximation, using informationimplicit to the approximation.
• weak-coupling limit validation.
• Significant results: Montanari-Rizzo approximationfor loop corrections; adaptive-TAP;Sessak-Monasson inverse Ising expression; andextensions.
• Variational framework: algorithmic flexibility andexpansion methods available.
• With distributed message passing O(N2) frameworks (andproven I-SUSP, Yasuda et al.)
J. Raymond and F. Ricci-Tersenghi, “Mean-field method withcorrelations determined by linear response,” Phys. Rev. E,vol. 87, p. 052111, 2013.http://chimera.roma1.infn.it/JACK [email protected]
References
J. Raymond and F. Ricci-Tersenghi, “Mean-field method with correlationsdetermined by linear response,” Phys. Rev. E, vol. 87, p. 052111, 2013.
G. An, “A note on the cluster variation method,” Journal of StatisticalPhysics, vol. 52, pp. 727–734, 1988.
Alessandro Pelizzola.Cluster variation method in statistical physics and probabilistic graphicalmodels.J. Phys. A, 38(33):R309, 2005.
J.S. Yedidia, W.T. Freeman, and Y. Weiss.Constructing free-energy approximations and generalized belief propagationalgorithms.Information Theory, IEEE Transactions on, 51(7):2282–2312, 2005.
M. Welling and Y. Teh, “Linear response algorithms for approximateinference in graphical models,” Neural Comput., vol. 16, pp. 197–221, 2004.
M. Yasuda and K. Tanaka, “Susceptibility propagation by using diagonalconsistency,” Phys. Rev. E, vol. 87, p. 012134, Jan 2013.
M. Opper and O. Winther, “Adaptive and self-averagingthouless-anderson-palmer mean-field theory for probabilistic modeling,”Phys. Rev. E, vol. 64, p. 056131, 2001.
A. Lage-Castellanos, R. Mulet, F. Ricci-Tersenghi, and T. Rizzo.Replica cluster variational method: the replica symmetric solution for the2d random bond ising model.Jour. of Physics A, 46(13):135001, 2013.
A. Montanari and T. Rizzo.How to compute loop corrections to the Bethe approximation.J. Stat. Mech., page P10011, 2005.
V. Sessak and R. Monasson.Small-correlation expansions for the inverse ising problem.
J. Phys. A, 42(5):055001, 2009.
Single loop scheme (Bethe level)
(a)
vertex "cavity"
(b)
edge "cavity"
Out: u, du/dH
In: u, du/dH
A message passing scheme uses u→ and u→,z = ∂u/∂Hz akin toBelief Propagation and Susceptibility Propagation.Knowing {M,λ} we have BP/SP subject to modified fields andcouplings
Hi = Hi + λiMi +∑j
λijMj ; Jij = Jij − λij .
Scheme (a) for on-diagonal constaints (Yasuda et al. I-SUSP)
Mi = tanh(Hi + λiMi +∑j
utj→i)
λi = − 1
1−M2i
∑j
utj→i,i
Single loop scheme (Bethe level)
(a)
vertex "cavity"
(b)
edge "cavity"
Out: u, du/dH
In: u, du/dH
Scheme (b) for on and off-diagonal constraints (i↔ j):
Mi = tanh
Hi + λiMi + λijMj +∑k\j
utk→i + uj→i
.
λi = −[2λij + u
′j→iλj
]u′j→i −
1
(1 −M2i )
∑k\j
uk→i,i + u′j→i
∑k\i
uk→j,i
.
λij = −[λi +
(1 −M2j )
(1 −M2i )λj + u
′i→jλij
]u′i→j −
1
(1 −M2i )
∑k\j
uk→j,i + u′i→j
∑k\j
uk→i,i
.
uj→i = atanh
tanh(Jij) tanh(Hj + λjMj + λijMi +∑k\i
utk→j)
; u′j→i =
∂tj→i
∂Hj
.
Exact marginals
0.2 0.4 0.6 0.8Β
-8
-6
-4
-2
0
LogHÈEnCVM - EnÈL
NMF
Bethe
Exact Mag.
Exact Corr.
Comparison of entropy approximation error when enteringexact or partially exact marginal information (relative tostandard maximum entropy [solid thick line]). A residual errorremains because of the entropy truncation.
Asymptotic phenomena continued..
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-0.6 -0.4 -0.2 0 0.2 0.4
E[σ
i σj],
ne
are
st
ne
igh
bo
r co
rre
latio
n
uniform coupling on triangular lattice, J
exactBethe, bij
NMFTAP
NMF (10)Bethe
Bethe (10)Bethe (12)
Bethe (10,12)
Fu
ll c
orr
elat
ion
(−
ener
gy
/3)
Recovery of magnetized solutionRecovery of magnetized solution.
Delayed discontinuous transition.Premature continuous transition.
Exact
transitioncontinuous
Parameter curve, f(connectivity)Parameter=LR curve f(lattice,boundary)
Coupling strength (zero field)
Ferromagnetictriangular lattice model
Divergence of lin. response.
Interesting phenomena (i.e. outstanding challenges)
• Absence of solutions in some models at low temperature(e.g. 2D ferromagnetic lattices)
• Absence of continuous transitions where expected
• Absence of message-passing scheme meeting “messagepassing paradigm”
• Non-convexity (concavity/convexity) of free energy (thefeasibility of finding solutions is in general unclear)
The main problem with the algorithm remains convergence andcomplexity... being addressed in libdai “double loop”formulation.
Abstract
Variational free energies when minimized can make inconsistentpredictions of physical quantities, as defined by first and secondderivative identities. Minimizing subject to consistency of thesederivatives can be shown to improve approximation. Theconstraints allow a connection with a number of disparateresults: From a constrained naive mean-field free energy werecover a type of expectation-propagation algorithm. From aconstrained Bethe approximation, applied to the inverseproblem of coupling estimation, we recover and extend theSessak-Monasson expression.
The End