approximate message passinggpapan/pos12/talks/montanari.pdf · approximate message passing mohsen...
TRANSCRIPT
Approximate Message Passing
Mohsen Bayati, David Donoho, Adel JavanmardIain Johnstone, Marc Lelarge, Arian Maleki, Andrea Montanari
Stanford University
December 8, 2012
Andrea Montanari (Stanford) AMP December 8, 2012 1 / 68
Statistical estimation
y = f (�;noise)
� ! Unknown objecty ! Observations
f ( � ;noise) ! Parametric model
Problem: Estimate � from observations y .
Andrea Montanari (Stanford) AMP December 8, 2012 2 / 68
Statistical estimation
y = f (�;noise)
� ! Unknown objecty ! Observations
f ( � ;noise) ! Parametric model
Problem: Estimate � from observations y .
Andrea Montanari (Stanford) AMP December 8, 2012 2 / 68
A broad convergence
I Statistics
[Genomics, . . . ]
I Data mining
[Collaborative �ltering, Predictive analytics, . . . ]
I Signal processing
[Compressive sampling, . . . ]
I Inverse problems
[Medical imaging, Seismographic imaging,. . . ]
+Data, + Computation, Exploit hidden structure
Andrea Montanari (Stanford) AMP December 8, 2012 3 / 68
How should we think about these problems?
Andrea Montanari (Stanford) AMP December 8, 2012 4 / 68
How should we think about these problems?
Optimization?
maximize Likelihood(�jy)� Complexity(�)
`Separation principle'
I Modeler/statistician proposes convex cost function.
I Optimization expert proposes simple iterative algorithm.
I Run for 20 iterations and hope for the best.
Andrea Montanari (Stanford) AMP December 8, 2012 5 / 68
How should we think about these problems?
Beyond separation?
y ! �̂1 ! �̂2 ! �̂3 ! : : :
I Constrained complexity per iteration
I Fixed number of iterations (say 20)
I What is minimum MSE achievable?
Andrea Montanari (Stanford) AMP December 8, 2012 6 / 68
Outline
I A long example (algorithm + heuristics)
I A list of theorems/pointers
Andrea Montanari (Stanford) AMP December 8, 2012 7 / 68
A long example
Andrea Montanari (Stanford) AMP December 8, 2012 8 / 68
What type of example?
I Image processing (because they make nice �gures)
I Compressed sensing (simple)
WILL NOT MENTION SPARSITY!
Andrea Montanari (Stanford) AMP December 8, 2012 9 / 68
What type of example?
I Image processing (because they make nice �gures)
I Compressed sensing (simple)
WILL NOT MENTION SPARSITY!
Andrea Montanari (Stanford) AMP December 8, 2012 9 / 68
What type of example?
I Image processing (because they make nice �gures)
I Compressed sensing (simple)
WILL NOT MENTION SPARSITY!
Andrea Montanari (Stanford) AMP December 8, 2012 9 / 68
An image
� = 2 Cn
Unknown object (n = 5122 � 2:5 � 105)
Andrea Montanari (Stanford) AMP December 8, 2012 10 / 68
Noiseless linear measurements
y = A� = A�
Want to reconstruct �
Andrea Montanari (Stanford) AMP December 8, 2012 11 / 68
Noiseless linear measurements
y = A� = A�
Want to reconstruct �
Andrea Montanari (Stanford) AMP December 8, 2012 11 / 68
Measurement structure
A = eFReF = subsampled Fourier matrix
R =
26666664+1
�1�1
+1+1
�1
37777775 = random modulation
! y 2 Cm , m = 0:17n
Andrea Montanari (Stanford) AMP December 8, 2012 12 / 68
An approach popular in this community
y = A� + z ; z � N(0; �2Im�m)
p�jY (�jy) / expn� 1
2�2ky �A�k22
op�(�)
/mYa=1
expn� 1
2�2
�ya �ATa �
�2o nYi=1
p�i(�i )
Andrea Montanari (Stanford) AMP December 8, 2012 13 / 68
An approach popular in this community
y = A� + z ; z � N(0; �2Im�m)
p�jY (�jy) / expn� 1
2�2ky �A�k22
op�(�)
/mYa=1
expn� 1
2�2
�ya �ATa �
�2o nYi=1
p�i(�i )
Andrea Montanari (Stanford) AMP December 8, 2012 13 / 68
An approach popular in this community
y = A� + z ; z � N(0; �2Im�m)
p�jY (�jy) / expn� 1
2�2ky �A�k22
op�(�)
/mYa=1
expn� 1
2�2
�ya �ATa �
�2o nYi=1
p�i(�i )
Andrea Montanari (Stanford) AMP December 8, 2012 13 / 68
Factor graph!
1 a m
1 i n
p�jY (�jy) /mYa=1
expn� 1
2�2
�ya �ATa �
�2o nYi=1
p�i(�i )
Use BP!
Andrea Montanari (Stanford) AMP December 8, 2012 14 / 68
Many issues
I Anyone knows the prior distribution of natural images?
I Computation per iteration, memory �(mn).
I Very loopy graph.
I . . .
Let us try something simpler!
Andrea Montanari (Stanford) AMP December 8, 2012 15 / 68
Many issues
I Anyone knows the prior distribution of natural images?
I Computation per iteration, memory �(mn).
I Very loopy graph.
I . . .
Let us try something simpler!
Andrea Montanari (Stanford) AMP December 8, 2012 15 / 68
Constructing a �rst estimate
y = A�
Matched �lter
�̂1 =1
mAyy
Andrea Montanari (Stanford) AMP December 8, 2012 16 / 68
How good is this?
E �̂1 = (one line calculation) = �
Andrea Montanari (Stanford) AMP December 8, 2012 17 / 68
Check it out
�̂1 = Ayy = � =
Does not look that good!
Andrea Montanari (Stanford) AMP December 8, 2012 18 / 68
Check it out
�̂1 = Ayy = � =
Does not look that good!
Andrea Montanari (Stanford) AMP December 8, 2012 18 / 68
Idea
= + `noise'
Andrea Montanari (Stanford) AMP December 8, 2012 19 / 68
Idea
= +
Andrea Montanari (Stanford) AMP December 8, 2012 20 / 68
How big is the `noise'?
Efk�̂1 � �k22g = (two lines calculation) =1� �
�k�k22
Andrea Montanari (Stanford) AMP December 8, 2012 21 / 68
Matched �lter blows up noise
MSEout =1� �
�MSEin
Andrea Montanari (Stanford) AMP December 8, 2012 22 / 68
Let's check
0
1
2
3
4
5
6
7
8
9
0 0.5 1 1.5 2 2.5 3 3.5 4
MSEin
MSEout
Andrea Montanari (Stanford) AMP December 8, 2012 23 / 68
Denoising
�̂1 � � + � z ; zi � N(0; 1)
Idea: Treat �̂1 as e�ective observations in denoising
Andrea Montanari (Stanford) AMP December 8, 2012 24 / 68
Denoising
�̂1 � � + � z ; zi � N(0; 1)
Idea: Treat �̂1 as e�ective observations in denoising
Andrea Montanari (Stanford) AMP December 8, 2012 24 / 68
Denoising by nonlocal means
by = � + � z ;
�̂i =
Pj W (i ; j )byjPj W (i ; j )
;
W (i ; j ) =
(1 if kPatch(i ; by)� Patch(j ; by)k22 � � �2;
0 otherwise
[Buades, Coll, Morel, 2005]
�̂ � �(by)Andrea Montanari (Stanford) AMP December 8, 2012 25 / 68
Denoising by nonlocal means
by = � + � z ;
�̂i =
Pj W (i ; j )byjPj W (i ; j )
;
W (i ; j ) =
(1 if kPatch(i ; by)� Patch(j ; by)k22 � � �2;
0 otherwise
[Buades, Coll, Morel, 2005]
�̂ � �(by)Andrea Montanari (Stanford) AMP December 8, 2012 25 / 68
Denoising by nonlocal means
by = � + � z ;
�̂i =
Pj W (i ; j )byjPj W (i ; j )
;
W (i ; j ) =
(1 if kPatch(i ; by)� Patch(j ; by)k22 � � �2;
0 otherwise
[Buades, Coll, Morel, 2005]
�̂ � �(by)Andrea Montanari (Stanford) AMP December 8, 2012 25 / 68
Patches
i
j
Andrea Montanari (Stanford) AMP December 8, 2012 26 / 68
Will it work?
�̂2 = �(�̂1) = �(Ayy) = �� �
Andrea Montanari (Stanford) AMP December 8, 2012 27 / 68
Let's try
�̂1 = Ayy = �̂2 = �(Ayy) =
Better than garbage!
Andrea Montanari (Stanford) AMP December 8, 2012 28 / 68
How much better?
0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4 5 6 7 8 9
MSEin
MSEout
Andrea Montanari (Stanford) AMP December 8, 2012 29 / 68
How much better?
0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4 5 6 7 8 9
MSEin
MSEout
c1 x
c2px
?
Andrea Montanari (Stanford) AMP December 8, 2012 30 / 68
Let us repeat the denoising experiment
Andrea Montanari (Stanford) AMP December 8, 2012 31 / 68
Let us repeat the denoising experiment: y = � + � z
y
�(y)
� = 1 � = 0:5 � = 0:25 � = 0:12
Andrea Montanari (Stanford) AMP December 8, 2012 32 / 68
Quantitatively
0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4 5 6 7 8 9
MSEin
MSEout
c1 x
c2px
Andrea Montanari (Stanford) AMP December 8, 2012 33 / 68
Quantitatively
1e-05
0.0001
0.001
0.01
0.1
0.001 0.01 0.1 1 10
MSEin
MSEout
c1 x
c2px
Andrea Montanari (Stanford) AMP December 8, 2012 34 / 68
Approximate denoiser characterization
MSEout = cpMSEin
(enough for our purposes)
(see also Maleki, Baraniuk, Narayan, 2012)
(Arias-Castro, Willett, 2012)
Andrea Montanari (Stanford) AMP December 8, 2012 35 / 68
What we achieved so far
0
1
2
3
4
5
6
7
8
9
0 0.5 1 1.5 2 2.5 3 3.5 4 0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4 5 6 7 8 9
MSEin
MSEout
MSEin
MSEout
DenoiserMatched �lter
Andrea Montanari (Stanford) AMP December 8, 2012 36 / 68
What we achieved so far
0 1 2 3 4 5 6 7 8 9 0
0.5
1
1.5
2
2.5
3
3.5
4
0
0.02
0.04
0.06
0.08
0.1
0 1 2 3 4 5 6 7 8 9
MSEin
MSEout
MSEout
MSEin
DenoiserMatched �lter
Andrea Montanari (Stanford) AMP December 8, 2012 37 / 68
What we achieved so far
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 1 2 3 4 5 6 7 8 9
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 38 / 68
What we achieved so far
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
What about iterating?
Andrea Montanari (Stanford) AMP December 8, 2012 39 / 68
What we achieved so far
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
What about iterating?
Andrea Montanari (Stanford) AMP December 8, 2012 39 / 68
How do we iterate?
Will tell you later!
Andrea Montanari (Stanford) AMP December 8, 2012 40 / 68
t = 1
�̂1 =
Andrea Montanari (Stanford) AMP December 8, 2012 41 / 68
t = 2
�̂2 =
Andrea Montanari (Stanford) AMP December 8, 2012 42 / 68
t = 3
�̂3 =
Andrea Montanari (Stanford) AMP December 8, 2012 43 / 68
t = 3
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew Non obvious!
Andrea Montanari (Stanford) AMP December 8, 2012 44 / 68
t = 4
�̂4 =
Andrea Montanari (Stanford) AMP December 8, 2012 45 / 68
t = 4
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 46 / 68
t = 5
�̂5 =
Andrea Montanari (Stanford) AMP December 8, 2012 47 / 68
t = 5
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 48 / 68
t = 6
�̂6 =
Andrea Montanari (Stanford) AMP December 8, 2012 49 / 68
t = 6
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 50 / 68
t = 7
�̂7 =
Andrea Montanari (Stanford) AMP December 8, 2012 51 / 68
t = 7
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 52 / 68
t = 8
�̂8 =
Andrea Montanari (Stanford) AMP December 8, 2012 53 / 68
t = 8
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 54 / 68
t = 9
�̂9 =
Andrea Montanari (Stanford) AMP December 8, 2012 55 / 68
t = 9
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 56 / 68
t = 0; 1; 2; 3; : : : ; 20
Andrea Montanari (Stanford) AMP December 8, 2012 57 / 68
t = 0; 1; 2; 3; : : : ; 20
0.0001
0.001
0.01
0.1
1
10
100
0.001 0.01 0.1 1 10 100
MSEold
MSEnew
Andrea Montanari (Stanford) AMP December 8, 2012 58 / 68
How do we iterate?
Andrea Montanari (Stanford) AMP December 8, 2012 59 / 68
Approximate Message Passing (AMP)
�̂2t = �(�̂2t�1)
�̂2t+1 = �̂2t +Ay r t
r t = y �A�̂2t + bt rt�1
bt =1
mdiv�(�̂2t�1)
(can be computed explicitly)
[Thouless, Anderson, Palmer, 1977, Kabashima, 2003,
Donoho, Maleki, Montanari, 2009, Donoho, Johnstone, Montanari, 2012]
Andrea Montanari (Stanford) AMP December 8, 2012 60 / 68
Connection with Belief Propagation
mi!j = mi + "i!j
Linearize in "i!j
Very di�erent from naive mean �eld!
Andrea Montanari (Stanford) AMP December 8, 2012 61 / 68
Connection with Belief Propagation
mi!j = mi + "i!j
Linearize in "i!j
Very di�erent from naive mean �eld!
Andrea Montanari (Stanford) AMP December 8, 2012 61 / 68
Connection with Perturbation, Optimization,Statistics?
I Robustness wrt pX 2DistributionClass
(� = minimax denoider in DistributionClass)
I Can rigorously track evolution over A random(What about A = Adet + �Arand?)
Andrea Montanari (Stanford) AMP December 8, 2012 62 / 68
Connection with Perturbation, Optimization,Statistics?
I Robustness wrt pX 2DistributionClass
(� = minimax denoider in DistributionClass)
I Can rigorously track evolution over A random(What about A = Adet + �Arand?)
Andrea Montanari (Stanford) AMP December 8, 2012 62 / 68
A list of theorems/pointers
Andrea Montanari (Stanford) AMP December 8, 2012 63 / 68
A list of theorems/pointers
I Connection with optimization [Bayati, Montanari, 2011]
I Minimax theory for sparse/block-sparse/TV[Donoho, Johnstone, Montanari, 2012]
I Bayesian reconstruction up to information dimension[Donoho, Javanmard, Montanari, 2012]
I Universality [Bayati, Lelarge, Montanari, 2012]
I Analysis of Generalized AMP [Javanmard, Montanari, 2012]
I Application to sparse PCA [In preparation, 2013]
Andrea Montanari (Stanford) AMP December 8, 2012 64 / 68
Related work
I (Non-rigorous) replica method.[Tanaka 2002, Guo, Verdú 2005, Kabashima, Tanaka 2009, Rangan,
Fletcher, Goyal 2009, Caire, Tulino, Shamai, Verdú 2012. . . ]
I Alternative argument for robust regression (e.g. min� ky �A�k1)[Bean, Bickel, El Karoui, Lim, Yu 2012]
I Generalized linear models [Rangan 2011]
I Graphical model priors [Schniter et al. 2010-. . . ]
I Low-rank matrices [Rangan, Fletcher 2012]
Andrea Montanari (Stanford) AMP December 8, 2012 65 / 68
Conclusion
Andrea Montanari (Stanford) AMP December 8, 2012 66 / 68
Conclusion
I Can do message passing/BP without Bayesian assumptions!
I There is something between naive mean �eld and BP
Thanks!
Andrea Montanari (Stanford) AMP December 8, 2012 67 / 68
Noise distribution?
error
Fre
quen
cy
−5 0 5
050
0010
000
1500
020
000
2500
030
000
Andrea Montanari (Stanford) AMP December 8, 2012 68 / 68