tom minka microsoft research cambridge, uk · 2 message-passing algorithms power ep pep [minka 04]...
TRANSCRIPT
![Page 1: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/1.jpg)
1
Divergence measures and message passing
Tom Minka
Microsoft Research
Cambridge, UK
with thanks to the Machine Learning and Perception Group
![Page 2: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/2.jpg)
2
Message-Passing Algorithms
[Minka 04]PEPPower EP
[Wiegerinck,Heskes 02]FBPFractional belief propagation
[Wainwright,Jaakkola,Willsky
03]
TRWTree-reweighted message passing
[Minka 01]EPExpectation propagation
[Frey,MacKay 97]BPLoopy belief propagation
[Peterson,Anderson 87]MFMean-field
![Page 3: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/3.jpg)
3
Outline
• Example of message passing
• Interpreting message passing
• Divergence measures
• Message passing from a divergence measure
• Big picture
![Page 4: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/4.jpg)
4
Outline
• Example of message passing
• Interpreting message passing
• Divergence measures
• Message passing from a divergence measure
• Big picture
![Page 5: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/5.jpg)
5
Estimation Problem
x
y
z
a
b
c
d
f
e
![Page 6: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/6.jpg)
6
Estimation Problem
x
y
z
a
b
c
d
f
e
0
1 ?
0
1 ?
0
1 ?
![Page 7: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/7.jpg)
7
Estimation Problem
x
y
z
![Page 8: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/8.jpg)
8
Estimation Problem
Queries:
Want to do these quickly
![Page 9: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/9.jpg)
9
Belief Propagation
y
x z
![Page 10: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/10.jpg)
10
Belief Propagation
x
y
z
Final
![Page 11: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/11.jpg)
11
Belief Propagation
Marginals: (Exact)
(BP)
Normalizing constant: 0.45 (Exact)
0.44 (BP)
Argmax: (0,0,0) (Exact)
(0,0,0) (BP)
![Page 12: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/12.jpg)
12
Outline
• Example of message passing
• Interpreting message passing
• Divergence measures
• Message passing from a divergence measure
• Big picture
![Page 13: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/13.jpg)
13
Message Passing =
Distributed Optimization
• Messages represent a simpler distribution q(x)that approximates p(x)– A distributed representation
• Message passing = optimizing q to fit p– q stands in for p when answering queries
• Parameters:– What type of distribution to construct (approximating
family)
– What cost to minimize (divergence measure)
![Page 14: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/14.jpg)
14
How to make a message-passing algorithm
1. Pick an approximating family
• fully-factorized, Gaussian, etc.
2. Pick a divergence measure
3. Construct an optimizer for that measure
• usually fixed-point iteration
4. Distribute the optimization across factors
![Page 15: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/15.jpg)
15
Outline
• Example of message passing
• Interpreting message passing
• Divergence measures
• Message passing from a divergence measure
• Big picture
![Page 16: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/16.jpg)
16
Kullback-Leibler (KL) divergence
Let p,q be unnormalized distributions
Alpha-divergence (α is any real number)
Asymmetric, convex
![Page 17: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/17.jpg)
17
Examples of alpha-divergence
![Page 18: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/18.jpg)
18
Minimum alpha-divergence
q is Gaussian, minimizes Dα(p||q)
α = -∞
![Page 19: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/19.jpg)
19
Minimum alpha-divergence
q is Gaussian, minimizes Dα(p||q)
α = 0
![Page 20: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/20.jpg)
20
Minimum alpha-divergence
q is Gaussian, minimizes Dα(p||q)
α = 0.5
![Page 21: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/21.jpg)
21
Minimum alpha-divergence
q is Gaussian, minimizes Dα(p||q)
α = 1
![Page 22: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/22.jpg)
22
Minimum alpha-divergence
q is Gaussian, minimizes Dα(p||q)
α = ∞
![Page 23: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/23.jpg)
23
Properties of alpha-divergence
• α ≤ 0 seeks the mode with largest mass (not tallest)
– zero-forcing: p(x)=0 forces q(x)=0
– underestimates the support of p
• α ≥ 1 stretches to cover everything
– inclusive: p(x)>0 forces q(x)>0
– overestimates the support of p
[Frey,Patrascu,Jaakkola,Moran 00]
![Page 24: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/24.jpg)
24
Structure of alpha space
α0 1
zero
forcing
inclusive (zero
avoiding)
MFBP,
EP
FBP,
PEP
TRW
![Page 25: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/25.jpg)
25
• If q is an exact minimum of alpha-divergence:
• Normalizing constant:
• If α=1: Gaussian q matches mean,variance of p
– Fully factorized q matches marginals of p
Other properties
![Page 26: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/26.jpg)
26
Two-node example
• q is fully-factorized, minimizes α-divergence to p
• q has correct marginals only for α = 1 (BP)
x y
![Page 27: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/27.jpg)
27
Two-node example
α = 1 (BP)
Bimodal
distributionBadGood
•Marginals
•Mass
•Zeros
•One peak
•Zeros
•Peak
heights
•Marginals
•Mass
α = 0 (MF)
α ≤ 0.5
![Page 28: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/28.jpg)
28
Two-node example
α = ∞
Bimodal
distributionBadGood
•Zeros
•Marginals
•Peak
heights
![Page 29: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/29.jpg)
29
Lessons
• Neither method is inherently superior –
depends on what you care about
• A factorized approx does not imply
matching marginals (only for α=1)
• Adding y to the problem can change the
estimated marginal for x (though true
marginal is unchanged)
![Page 30: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/30.jpg)
30
Outline
• Example of message passing
• Interpreting message passing
• Divergence measures
• Message passing from a divergence measure
• Big picture
![Page 31: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/31.jpg)
31
Distributed divergence minimization
![Page 32: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/32.jpg)
32
• Write p as product of factors:
• Approximate factors one by one:
• Multiply to get the approximation:
Distributed divergence minimization
![Page 33: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/33.jpg)
33
Global divergence to local divergence
• Global divergence:
• Local divergence:
![Page 34: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/34.jpg)
34
Message passing
• Messages are passed between factors
• Messages are factor approximations:
• Factor a receives
– Minimize local divergence to get
– Send to other factors
– Repeat until convergence
• Produces all 6 algs
![Page 35: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/35.jpg)
35
Global divergence vs. local divergence
In general, local ≠ global
• but results are similar
• BP doesn’t minimize global KL, but comes
close
0
MF
αlocal = global
no loss from
message passing
local ≠ global
![Page 36: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/36.jpg)
36
Experiment
• Which message passing algorithm is
best at minimizing global Dα(p||q)?
• Procedure:
1. Run FBP with various αL
2. Compute global divergence for various
αG
3. Find best αL (best alg) for each αG
![Page 37: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/37.jpg)
37
Results
• Average over 20 graphs, random singleton and pairwise potentials: exp(wijxixj)
• Mixed potentials (w ~ U(-1,1)):
– best αL = αG (local should match global)
– FBP with same α is best at minimizing Dα• BP is best at minimizing KL
![Page 38: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/38.jpg)
38
Outline
• Example of message passing
• Interpreting message passing
• Divergence measures
• Message passing from a divergence measure
• Big picture
![Page 39: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/39.jpg)
39
Hierarchy of algorithms
BP
• fully factorized
• KL(p||q)
EP
• exp family
• KL(p||q)
FBP
• fully factorized
• Dα(p||q)
Power EP
• exp family
• Dα(p||q)
MF
• fully factorized
• KL(q||p)
TRW
• fully factorized
• Dα(p||q),α>1
Structured MF
• exp family
• KL(q||p)
![Page 40: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/40.jpg)
40
Matrix of algorithms
BP
• fully factorized
• KL(p||q)
EP
• exp family
• KL(p||q)
FBP
• fully factorized
• Dα(p||q)
Power EP
• exp family
• Dα(p||q)
divergence
measure
Other families?
(mixtures)
MF
• fully factorized
• KL(q||p)
TRW
• fully factorized
• Dα(p||q),α>1approximation family
Structured MF
• exp family
• KL(q||p)
Other
divergences?
![Page 41: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/41.jpg)
41
Other Message Passing Algorithms
Do they correspond to divergence measures?
• Generalized belief propagation [Yedidia,Freeman,Weiss 00]
• Iterated conditional modes [Besag 86]
• Max-product belief revision
• TRW-max-product [Wainwright,Jaakkola,Willsky 02]
• Laplace propagation [Smola,Vishwanathan,Eskin 03]
• Penniless propagation [Cano,Moral,Salmerón 00]
• Bound propagation [Leisink,Kappen 03]
![Page 42: Tom Minka Microsoft Research Cambridge, UK · 2 Message-Passing Algorithms Power EP PEP [Minka 04] Fractional belief propagation FBP [Wiegerinck,Heskes 02] [Wainwright,Jaakkola,Willsky](https://reader033.vdocuments.net/reader033/viewer/2022051607/602e2c0e01d84710c6271e6b/html5/thumbnails/42.jpg)
42
Future work
• Understand existing message passing
algorithms
• Understand local vs. global divergence
• New message passing algorithms:
– Specialized divergence measures
– Richer approximating families
• Other ways to minimize divergence