previous algorithms for graph partition
TRANSCRIPT
1
Chapter 6 Advanced Algorithms for Inference in Gibbs fields
In this chapter, we study three advanced MCMC methods for computing in the Gibbs fields (flat or two level models)
1, Swendsen-Wang Cuts for segmentation and labeling
2, DDMCMC and its applications in image segmentation
3 C4 Cl stering ith Cooperati e and Competiti e Constraints
Stat232B. Stat Computing and Inference, S.C. Zhu
3, C4 ---Clustering with Cooperative and Competitive Constraints
Previous algorithms for graph partition
Generic algorithms:Gibbs sampler (Geman,Geman ’84) – Inefficient
Specialized algorithms:Graph Cuts (Boykov, Veksler, Zabih ‘01)
Belief Propagation (Yedidia et al’00)
Gibbs sampler (Geman,Geman 84) Inefficient
PDE optimization (like region competition) – greedy, local minima
Stat232B. Stat Computing and Inference, S.C. Zhu
Belief Propagation (Yedidia et al 00)
Swendsen-Wang (Swendsen,Wang ’87)
2
1, Swendsen-Wang cuts
state A state B
The original idea of cluster sampling and SW
V0V0
V2V2
V1V1
Each edge in the lattice e=<s,t> is associated with probability =1-e-.
Stat232B. Stat Computing and Inference, S.C. Zhu
Essential ideas of SW
When two variables are tightly coupled. It is best if we move along the direction of the line.
x2
The clustering process connects the coupled dimensionsprobabilistically, then the moves are more effective.
x1
3
Some theoretical results about SW on Potts model
1. (Gore and Jerrum 97) constructed a “worst case”SW does not mix rapidly if G is a complete graph with n>2, and a certain .
2. (Cooper and Frieze 99) had positive resultsIf G is a tree, SW mixing time is O(|G|) for any b.If G has constant connectivity O(1), the SW has polynomial mixing time for .
The real limit of SW are
1 it i l lid f I i /P tt d l1, it is only valid for Ising/Potts models.
2, it makes no use of the data (external fields) in forming clustersand slow down critically in the presence of external fields.
Stat232B. Stat Computing and Inference, S.C. Zhu
Segmentation and graph partition
Image segmentation– Group pixels based on intensity
– For speed, one can use an over-segmentation by edge detection and edge tracing
input image
over-segmentation with atomic regionsadjacency graph
graph partition (labeling)
image segmentation result
4
The graph partition problem
Given:– A graph Go=(V,E)
• Nodes V are image elementsNodes V a e age e e e ts
• Edges E represent spatial relationship or similarity
– A probability p((V) |I) or energy E((V) ) defined on partitions (V)
Find a partition (V) that maximizes p((V) |I)
A graph Go A partition of the graph
Improving the clustering step
e.g. image segmentation: KL divergence of histogramsThe edge probability qij is decided by local features Fi,Fj
Histogram Hi
Histogram Hj
Atomic regions on the input image Edge weightsIntensity Histograms
is a marginal probability of p(W|I)
In general
5
Random connected components
T=1 T=2 T=4 T=8
The temperature T is used in the marginal probability.
Sample 1
Sample 2
Sample 3
The Swendsen-Wang Cuts algorithm
Swendsen-Wang Cuts: SWCInput: Go=<V, Eo>, discriminative probabilities qe, e Eo,
and generative posterior probability p(W|I).Output: Samples W p(W|I)
1. Initialize a graph partition 2. Repeat, for current state A= π
Output: Samples W~p(W|I).
3. Repeat for each subgraph Gl=<Vl, El>, l=1,2,...,n in A4. For e El turn e=“on” with probability qe.
5. Partition Gl into nl connected components:gli=<Vli, Eli>, i=1,...,nl
6. Collect all the connected components inCP=V : l=1 n i=1 n
V2
The initial graph Go
V2
xxx
State A
7. Select a connected component V0CP at random
9. Accept the move with probability α(AB).
CP=Vli: l=1,...,n, i=1,...,nl.
V0
CP
V0
V1
x
x
x
x
x
x
8. Propose to reassign V0 to a subgraph Gl’, l' follows a probability q(l'|V0,A)
x
V0
V1
x
x
x
xx
xx
x
State B
6
SW Cuts: the acceptance probability
Metropolis-Hastings
Theorem The acceptance probability for the Swendsen-Wang Cuts algorithm is
Barbu and Zhu 2003
State A State B
Outline of the proof
V0
V
V2
x
x
xx
x
x
xx
x
V0
V
V2
x
x
x
x
We compute the ratio:
All configurations of edges that take state A to B must have all
V1
x
xV1 xx
g gedges of the cut C(Vo,V1-Vo) turned off.
7
Outline of the proof
V0
V2
x
x
xx
x
x
x
V0
V2
x
x
x
Cancellation of the sums occurs because of the symmetry between states A and B: Any CP that takes state A to B is also a CP that takes state B to A
State A State B
V1
x
x
xx
xV1 x
x
x
x
to A Any configuration of “on” edges in state A appears in state B and vice versa
The reassignment probability
The reassignment probability q(l |V0,A) can also be data-driven.
V0
V1
V2
x
x
x
x
x
x
H(V3) H(V2)
H(V1-V0) H(V0)
8
Comparison with the Gibbs sampler
3 9
4
4.1
4.2x 10
5
Gibbs, random initialization
3 9
4
4.1
4.2x 10
5
Gibbs, random initialization
0 200 400 600 800 1000 12003.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
en
erg
y
time(s)
Gibbs, uniform initialization
SWC
0 2 4 6 8 10 12 14 16 18 203.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
en
erg
y
time(s)
Gibbs, uniform initialization
SWC, uniform initialization
SWC, random initialization
Convergence comparison of SWC and the Gibbs sampler on the cheetah image, starting from a random state or from the state where all nodes have label 0. Right – zoom in view of the first 20 seconds.
Examples of segmentation
Stat232B. Stat Computing and Inference, S.C. Zhu
a. input image b. over-segmentationwith atomic regions
c. segmentation result
9
Advantages of the SW-Cuts algorithm
– Generally applicable – allows usage of complex models beyond the scope of the specialized algorithmsbeyond the scope of the specialized algorithms.
– Computationally efficient – performance comparable with the specialized algorithms.
– Reversible and ergodic – theoretically guaranteed to eventually find the global optimum.
We can obtain acceptance probability
A generalized Gibbs sampler
if we select the probability q(l|V0,A) to reassign V0 to Vl (obtaining state Al)
Th b i ll fli th l b l f th t d b h b li d Gibb lThen we basically flip the label of the connected subgraph by a generalized Gibbs sampler.
10
The importance of q(l |V0,A)
4
4.05x 10
5
3.8
3.85
3.9
3.95
en
erg
y
Generalized Gibbs samplerSWC
0 20 40 60 80 100 120 140 160 180 2003.75
time(s)
Convergence of SWC with data-driven q(l |V0,A) (blue) and of the generalized Gibbs sampler (red), starting from a random state.
The Bayesian formulation for maximizing a posterior probability
Let I be an image and W be a semantic representation of the world
2, Data-Driven Markov Chain Monte Carlo
ppp (W) W)| (I I) |(W w
max arg w
max arg W*
Let I be an image and W be a semantic representation of the world.
)I|W(~WWW ),...,,( k21 p
In statistics, we sample from a posterior probability to preserve ambiguities.
Stat232B. Stat Computing and Inference, S.C. Zhu
p
W
11
Example: Image Segmentation
):θ ,l,R,(nW n,...,2,1iiii
/ : n
jiin21 R RR)R ..., ,R ,(R pg ji,nπ πn
)R ..., ,R ,(R 7217π1
2
43
5
76
is a 7-partition of the lattice.
Stat232B. Stat Computing and Inference, S.C. Zhu
1in
The partition space is
|
1nππ n
A permutation group
Likelihood models (no objects or templates)
: iid Gaussian for pixel intensities : non-parametric histograms
: Markov random fields for texture : spline model for lighting variations
Stat232B. Stat Computing and Inference, S.C. Zhu
: iid Gaussian for color (LUV) : mixture of Gaussians for color
: spline model for smooth color variations (e.g. sky, shading, …)
12
Sampling the posterior distribution
To design transition kernel:
Markov Chain:
Stat232B. Stat Computing and Inference, S.C. Zhu
g
atomic particles
A 7-partition
space
Formulating and visualizing the search space
iΩ
a) solution space c) an atomic space
7π
b) a sub-space of 7 regions
atomic
spaces
1C1C
2C2C 2C
3C 3C
Stat232B. Stat Computing and Inference, S.C. Zhu
a). solution space c). an atomic spaceb). a sub-space of 7 regions
13
1R2R
In Image Segmentation
3R
grey-scale
flat clutter texture shading
color
flat shading texturepartition
Stat232B. Stat Computing and Inference, S.C. Zhu
iΩ
p
spaces
parameter spaces
1C1C
2C2C
2C
3C 3C
7
Basic requirements for MCMC design
We have the following conditions for valid MCMC design in 202CWe have the following conditions for valid MCMC design in 202C
1: stochastic --- each row sums to 1.2: irreducible --- has 1 communication class3: aperiodic --- any power of K has 1 communication class4: globally balanced5: positive recurrent --- (not an issue in finite space).
Stat232B. Stat Computing and Inference, S.C. Zhu
p ( p )
14
What is Data-Driven Markov Chain Monte Carlo?
)I|W(~W pThe complexity of sampling the posterior is in the Metropolis-Hastings jumps
Consider a reversible jump WW Consider a reversible jump BA WW
))|()I|(
)|()I|(,1(minor)
)()I|(
)()I|(,1(min)(
ABA
BAB
BAA
ABBBA WWqWp
WWqWp
WWGWp
WWGWpWW
In DDMCMC, ))I,|()I|(
)I,|()I|(,1(min)(
ABA
BABBA WWqWp
WWqWpWW
Without looking at the data, the pre-designed proposal probabilities are often uniform distributions, thus it is a blind (exhaustive) search !
Stat232B. Stat Computing and Inference, S.C. Zhu
)I,|()I|( ABA WWqWp
If )I|()I,|(),I|()I,|( ABABAB WpWWqWpWWq
Then the MC is well-informed, it may converges (hit the W*) in a small number of steps !
The Markov chain consists of many processes
Suppose a Markov chain consists of many sub-chains, and the transition probability is a linear sumthe transition probability is a linear sum,
If each sub-kernel has the same invariant probability,
Then the whole Markov chain follows
15
The connectivity at each state x
We denote the set of states connected to x by the i-th type moves by
e.g. Ki be a probability within set i proportional to (x).
x is connected to a set
MCMC Kernels consist of many components
Transition kernel is a mixture of many sub-kernels corresponding to the various operators.
1rK
Each observes detailed balance equation, but may not be irreducible.
asymmetric
Stat232B. Stat Computing and Inference, S.C. Zhu
2K
1lK
AWsymmetric
16
Metropolized Gibbs sampler
e
Consider a pair of reversible jumps Jm between x and y.
t i
(y)iΩ l (x)irΩ
x ye
Proposal according to the conditional probabilities --- like a Gibbs sampler
);(,)'(
)(),(Qir xy
y
yyx ir
Proposal matrix Q
asymmetric
Stat232B. Stat Computing and Inference, S.C. Zhu
)('y xir
);(,)'(
)(),(Q
)('
il yxx
xxy il
x yil
x 0, 0,… 00, 0,… 0
Key issues
1. How do we decide the samplingdimensions, directions, group transforms, and sets i(x
in a systematic and principled way?
2. How do we schedule the visiting order governed by p(i)?i.e. choosing the moving directions, groups, and sets
Stat202C: Monte Carlo methods © S-C Zhu
17
K1l: Splitting of a region into two.
K1r: Merging two regions into one.
MCMC Moves in image segmentation
K2: Switching the model type for a region.
K3: Diffusion of region boundary -- region competition
Stat232B. Stat Computing and Inference, S.C. Zhu
Split Merge Switch Model Diffusion
Data-Driven Methods in the object spaces (death-birth)
))'(
1,
)'(
1min()(),(
)(')('
yxxy
mr
mlmr
xyyyxK
( )Ω
We conjecture that the Metropolised-Gibbs sampler is the best design strategy on average ---- mixing very fast under the constraints of scopes.
But at each step, it need to evaluate theexpensive posterior probability over a rather large scope
)()( yx mlmr
Stat232B. Stat Computing and Inference, S.C. Zhu
q
(x)mrΩWe replace the condition probability by bottom-up (discriminative) methods which are estimated locally with lower cost. We show that such approximations indeed reduce mixing time.
18
Metropolized Gibbs sampler
(x)1rΩ)()
)(
)(
),(Q
),(Q,1min(),(Q),(K xy
x
y
yx
xyyxyx il
il
irilil
Proposal according to the conditional probabilities---signature of a Gibbs samplernormalized within the set of connected states.
)(,)'(
)(),(Qil xy
y
yyx il
Proposal matrix Q
x(x)2Ω
(x)1ΩL
)('y xil
)(,)'(
)(),(Q
)('
ir yxx
xxy ir
x yir
x 0, 0,… 00, 0,… 0
Stat202C: Monte Carlo methods © S-C Zhu
Metropolized Gibbs sampler
)())(
)(
),(Q
),(Q,1min(),( xy
x
y
yx
xyyx il
il
ir
)(
))'(
)'(,1min(
))(
)(,1min(
)('
)('
)'(
)(
)'(
)(
)('
)('
yx
xy
y
y
x
x
ir
il
xily
yirx
x
y
x
y
In case the sets are symmetric:
Then it is always accepted:The Gibbs sampler becomes a special case.
Stat202C: Monte Carlo methods © S-C Zhu
)()( yx iril
1),( yx
19
Metropolized Gibbs sampler
Mixing Metropolis and Gibbs designs.
One can improve the traditional Gibbs sampler by prohibiting theOne can improve the traditional Gibbs sampler by prohibiting the MC from staying at its current state in the conditional probability. Thus it becomes asymmetric and needs a Metropolis acceptance step to “re-balance”.
The diagonal elements in the proposal matrix are set to zero.This is an desirable property of MC design in order to make the
).(),(,)(1
)(),(Q xxxy
x
yyx
Stat202C: Monte Carlo methods © S-C Zhu
MC to “mix fast”.
))(1
)(1,1min(),(
x
yyx
Split and Merge
Split?
20
Computing the marginal posterior probabilities: Clustering in Color Space c1
Mean-shift clustering (Cheng, 1995, Meer et al 2001)
K
)θg(θωI)|q(θ
Input
1i
ii )θg(θωI)|q(θ
Stat232B. Stat Computing and Inference, S.C. Zhu
saliency maps 1 2 3 4 5 6The brightness represents how likely a pixel belongs to a cluster.
Computing marginal posterior probabilities in the partition space π
Edge detection and tracing at three scales of details:
Stat232B. Stat Computing and Inference, S.C. Zhu
21
Partition maps:
Proposals by Edge Detection at Different Scales (before SW-cut was invented)
Scale 1
Stat232B. Stat Computing and Inference, S.C. Zhu
Scale 2 Scale 3
Super-pixels and connected components
T=1 T=2 T=4 T=8
Sample 1
Sample 2
Stat232B Swendson-Wang Cut,
Sample 3
22
Clustering in the Partition Space
an adjacency graph: each vertex is a basic element : pixels, small-regions, edges, ….each link e=<a, b> is associated with a probability/ratio for similarity
))F(I)F(I|on""q(e (b)(a)))F(I),F(I|off""q(e
))F(I),F(I|onq(e
(b)(a)
(b)(a)
be
Stat232B. Stat Computing and Inference, S.C. Zhu
a
Clustering in the Partition Space
Sampling the edges independently, we get connected components:
Stat232B. Stat Computing and Inference, S.C. Zhu
These connected sub-graphs are the clusters in the partition space
sampling c ~ q( C | F(I)) on π
23
Graph Partitioning – Generalizing SW
The red edges are the bridges
Theorem Accepting the label change proposal with probability:
),(),,( 'c
lcc
lc VVVEVVVE
AG BGAG
Stat232B. Stat Computing and Inference, S.C. Zhu
results in an ergodic and reversible Markov Chain.
Diffusion Processes on the boundary
The Markov chains realized reversible jumps between sub-spaces of varying dimensions.
Withi b f fi d di i th i diff i dWithin a subspace of fixed dimension, there are various diffusion processes expressedas partial differential equations.
For example, the region competition for curve evolution (Zhu, Lee, and Yuille, 95)
Ra
R ( ))( ( )( )
Let v be a point on the boundary between two regions, its motionis governed by the region-competition equation.
Stat232B. Stat Computing and Inference, S.C. Zhu
(s)n))θ|y)p(I(x,log
)θ|y)p(I(x,logκ(s)(μ
dt
(s)vd
b
a
Rb y(s))(x(s),(s)v
24
Stochastic Diffusion and PDE
1R2R
R
The continuous Langevin equation simulates a Markov Chain with stationary density
3R
Stat232B. Stat Computing and Inference, S.C. Zhu
For example, the movement of changing point is driven by
Running DDMCMC
Starting with 3 different initial segments below
MC 1 MC 2 MC 3
energy plots of three MCMCs
Stat232B. Stat Computing and Inference, S.C. Zhu
input W1 I1~p( I |W1) W2 I2~ p(I|W2)
25
Saliency maps (the brightness represents how likely a pixel belongs to a cluster.)
Proposals for region models by clustering
y p ( )
color values (L,u,v)
Stat232B. Stat Computing and Inference, S.C. Zhu texture
A Demo
Segmentation Synthesis
Stat232B. Stat Computing and Inference, S.C. Zhu
snapshot of solution sampled by DDMCMC
26
Experiments: Color Image Segmentation
Input segment synthesis I ~ p( I | W*)
Stat232B. Stat Computing and Inference, S.C. Zhu
Input segment synthesis I ~ p( I | W*)
Experiments: Color Image Segmentation
Stat232B. Stat Computing and Inference, S.C. Zhu
27
Input segment synthesis I ~ p( I | W*)
Experiments: Color Image Segmentation
Stat232B. Stat Computing and Inference, S.C. Zhu
a. Input image b. segmented regions c. synthesis I ~ p( I | W*)
Stat232B. Stat Computing and Inference, S.C. Zhu
28
Image Segmentation result on the public dataset
Input segment synthesis I ~ p( I | W*)
Stat232B. Stat Computing and Inference, S.C. Zhu
Performance on the Berkeley Benchmark Study
test images DDMCMC manual segment “error” measure
0 1083
(David Martin et al, 2001)
0.3082
0.1083
Stat232B. Stat Computing and Inference, S.C. Zhu
0.5627
29
a. Input image b. segmented regions c. synthesis I ~ p( I | W*)
Examples of Failure
Stat232B. Stat Computing and Inference, S.C. Zhu
Uninformed MCMC
Speed Comparison
MCMC with clustering
MCMC with partition
Stat232B. Stat Computing and Inference, S.C. Zhu
MCMC with bothground truth
30
Running Time Comparison Against Gibbs Sampler
3.2
3.25x 10
5
3.2
3.25x 10
5
ergy
0 500 1000 1500 2000 25003
3.05
3.1
3.15
ener
gy
0 10 20 30 40 50 60 70 80 90 1003
3.05
3.1
3.15
3.2
time(s)
ener
gy
ene
Stat232B. Stat Computing and Inference, S.C. Zhu
time(s)
Time = #sweeps Zoom-in view
Red curve: Gibbs sampler for graph partition and labelingBlue curve: Improved SW algorithm for graph partition.
Generic Images parsing
scene
objects
patterns
Stat232B. Stat Computing and Inference, S.C. Zhu
p
parts
textons Example: parsing (Tu et al, 2000-2004)
31
From segmentation to parsing
Face images of FERET datasetText images of San Francisco street scenes.
Stat232B. Stat Computing and Inference, S.C. Zhu
Adaboost in the Label Space
---- An example from Viola and Jones, 2001.
(a) the first two face features (b) an example of face detection
Adaboost is a learning algorithm which makes decision by combining a number
Stat232B. Stat Computing and Inference, S.C. Zhu
g g y gof simple features. As T and training samplers become large enough, it weakly
converges to the log ratio of the posterior probability.
32
Image Parsing Results Tu, Chen, Yuille, and Zhu, iccv2003
Input Regions Objects Synthesis
Stat232B. Stat Computing and Inference, S.C. Zhu
Image Parsing ResultsInput Regions Objects Synthesis
Stat232B. Stat Computing and Inference, S.C. Zhu
33
An example
face text region model switching
Markov kernel
Diagram for Integrating Top-down generative andBottom-up discriminativeMethods.
deathbirth deathbirth split merge
+
generativeinference
discriminativeinference
weighted particles
input image
face detection text detection edge detection model clustering
inference
34
Bayesian: (Top-down)
Summary: generative vs. discriminative
Data-driven: (Bottom-up)
?
Stat232B. Stat Computing and Inference, S.C. Zhu
Integrating generative and discriminative
Stat232B. Stat Computing and Inference, S.C. Zhu
35
Review: MCMC developments related to vision
Metropolis 1946
Hastings 1970
Waltz 1972 (labeling)
Rosenfeld Hummel Zucker 1976 (relaxation) Hastings 1970Rosenfeld, Hummel, Zucker 1976 (relaxation)
Geman brothers 1984, (Gibbs sampler)Miller, Grenander,1994
Heat bathKirkpatrick 1983
Swendsen-Wang 1987 (clustering) Green 1995
Swendsen-Wang Cut 2003DDMCMC 2001-2005
C4 2009