previous algorithms for graph partition

35
1 Chapter 6 Advanced Algorithms for Inference in Gibbs fields In this chapter, we study three advanced MCMC methods for computing in the Gibbs fields (flat or two level models) 1, Swendsen-Wang Cuts for segmentation and labeling 2, DDMCMC and its applications in image segmentation 3 C4 Cl stering ith Cooperati e and Competiti e Constraints Stat232B. Stat Computing and Inference, S.C. Zhu 3, C4 ---Clustering with Cooperative and Competitive Constraints Previous algorithms for graph partition Generic algorithms: Gibbs sampler (Geman,Geman 84) Inefficient Specialized algorithms: Graph Cuts (Boykov, Veksler, Zabih ‘01) Belief Propagation (Yedidia et al00) Gibbs sampler (Geman,Geman 84) Inefficient PDE optimization (like region competition) – greedy, local minima Stat232B. Stat Computing and Inference, S.C. Zhu Belief Propagation (Yedidia et al 00) Swendsen-Wang (Swendsen,Wang ’87)

Upload: others

Post on 21-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Previous algorithms for graph partition

1

Chapter 6 Advanced Algorithms for Inference in Gibbs fields

In this chapter, we study three advanced MCMC methods for computing in the Gibbs fields (flat or two level models)

1, Swendsen-Wang Cuts for segmentation and labeling

2, DDMCMC and its applications in image segmentation

3 C4 Cl stering ith Cooperati e and Competiti e Constraints

Stat232B. Stat Computing and Inference, S.C. Zhu

3, C4 ---Clustering with Cooperative and Competitive Constraints

Previous algorithms for graph partition

Generic algorithms:Gibbs sampler (Geman,Geman ’84) – Inefficient

Specialized algorithms:Graph Cuts (Boykov, Veksler, Zabih ‘01)

Belief Propagation (Yedidia et al’00)

Gibbs sampler (Geman,Geman 84) Inefficient

PDE optimization (like region competition) – greedy, local minima

Stat232B. Stat Computing and Inference, S.C. Zhu

Belief Propagation (Yedidia et al 00)

Swendsen-Wang (Swendsen,Wang ’87)

Page 2: Previous algorithms for graph partition

2

1, Swendsen-Wang cuts

state A state B

The original idea of cluster sampling and SW

V0V0

V2V2

V1V1

Each edge in the lattice e=<s,t> is associated with probability =1-e-.

Stat232B. Stat Computing and Inference, S.C. Zhu

Essential ideas of SW

When two variables are tightly coupled. It is best if we move along the direction of the line.

x2

The clustering process connects the coupled dimensionsprobabilistically, then the moves are more effective.

x1

Page 3: Previous algorithms for graph partition

3

Some theoretical results about SW on Potts model

1. (Gore and Jerrum 97) constructed a “worst case”SW does not mix rapidly if G is a complete graph with n>2, and a certain .

2. (Cooper and Frieze 99) had positive resultsIf G is a tree, SW mixing time is O(|G|) for any b.If G has constant connectivity O(1), the SW has polynomial mixing time for .

The real limit of SW are

1 it i l lid f I i /P tt d l1, it is only valid for Ising/Potts models.

2, it makes no use of the data (external fields) in forming clustersand slow down critically in the presence of external fields.

Stat232B. Stat Computing and Inference, S.C. Zhu

Segmentation and graph partition

Image segmentation– Group pixels based on intensity

– For speed, one can use an over-segmentation by edge detection and edge tracing

input image

over-segmentation with atomic regionsadjacency graph

graph partition (labeling)

image segmentation result

Page 4: Previous algorithms for graph partition

4

The graph partition problem

Given:– A graph Go=(V,E)

• Nodes V are image elementsNodes V a e age e e e ts

• Edges E represent spatial relationship or similarity

– A probability p((V) |I) or energy E((V) ) defined on partitions (V)

Find a partition (V) that maximizes p((V) |I)

A graph Go A partition of the graph

Improving the clustering step

e.g. image segmentation: KL divergence of histogramsThe edge probability qij is decided by local features Fi,Fj

Histogram Hi

Histogram Hj

Atomic regions on the input image Edge weightsIntensity Histograms

is a marginal probability of p(W|I)

In general

Page 5: Previous algorithms for graph partition

5

Random connected components

T=1 T=2 T=4 T=8

The temperature T is used in the marginal probability.

Sample 1

Sample 2

Sample 3

The Swendsen-Wang Cuts algorithm

Swendsen-Wang Cuts: SWCInput: Go=<V, Eo>, discriminative probabilities qe, e Eo,

and generative posterior probability p(W|I).Output: Samples W p(W|I)

1. Initialize a graph partition 2. Repeat, for current state A= π

Output: Samples W~p(W|I).

3. Repeat for each subgraph Gl=<Vl, El>, l=1,2,...,n in A4. For e El turn e=“on” with probability qe.

5. Partition Gl into nl connected components:gli=<Vli, Eli>, i=1,...,nl

6. Collect all the connected components inCP=V : l=1 n i=1 n

V2

The initial graph Go

V2

xxx

State A

7. Select a connected component V0CP at random

9. Accept the move with probability α(AB).

CP=Vli: l=1,...,n, i=1,...,nl.

V0

CP

V0

V1

x

x

x

x

x

x

8. Propose to reassign V0 to a subgraph Gl’, l' follows a probability q(l'|V0,A)

x

V0

V1

x

x

x

xx

xx

x

State B

Page 6: Previous algorithms for graph partition

6

SW Cuts: the acceptance probability

Metropolis-Hastings

Theorem The acceptance probability for the Swendsen-Wang Cuts algorithm is

Barbu and Zhu 2003

State A State B

Outline of the proof

V0

V

V2

x

x

xx

x

x

xx

x

V0

V

V2

x

x

x

x

We compute the ratio:

All configurations of edges that take state A to B must have all

V1

x

xV1 xx

g gedges of the cut C(Vo,V1-Vo) turned off.

Page 7: Previous algorithms for graph partition

7

Outline of the proof

V0

V2

x

x

xx

x

x

x

V0

V2

x

x

x

Cancellation of the sums occurs because of the symmetry between states A and B: Any CP that takes state A to B is also a CP that takes state B to A

State A State B

V1

x

x

xx

xV1 x

x

x

x

to A Any configuration of “on” edges in state A appears in state B and vice versa

The reassignment probability

The reassignment probability q(l |V0,A) can also be data-driven.

V0

V1

V2

x

x

x

x

x

x

H(V3) H(V2)

H(V1-V0) H(V0)

Page 8: Previous algorithms for graph partition

8

Comparison with the Gibbs sampler

3 9

4

4.1

4.2x 10

5

Gibbs, random initialization

3 9

4

4.1

4.2x 10

5

Gibbs, random initialization

0 200 400 600 800 1000 12003.2

3.3

3.4

3.5

3.6

3.7

3.8

3.9

en

erg

y

time(s)

Gibbs, uniform initialization

SWC

0 2 4 6 8 10 12 14 16 18 203.2

3.3

3.4

3.5

3.6

3.7

3.8

3.9

en

erg

y

time(s)

Gibbs, uniform initialization

SWC, uniform initialization

SWC, random initialization

Convergence comparison of SWC and the Gibbs sampler on the cheetah image, starting from a random state or from the state where all nodes have label 0. Right – zoom in view of the first 20 seconds.

Examples of segmentation

Stat232B. Stat Computing and Inference, S.C. Zhu

a. input image b. over-segmentationwith atomic regions

c. segmentation result

Page 9: Previous algorithms for graph partition

9

Advantages of the SW-Cuts algorithm

– Generally applicable – allows usage of complex models beyond the scope of the specialized algorithmsbeyond the scope of the specialized algorithms.

– Computationally efficient – performance comparable with the specialized algorithms.

– Reversible and ergodic – theoretically guaranteed to eventually find the global optimum.

We can obtain acceptance probability

A generalized Gibbs sampler

if we select the probability q(l|V0,A) to reassign V0 to Vl (obtaining state Al)

Th b i ll fli th l b l f th t d b h b li d Gibb lThen we basically flip the label of the connected subgraph by a generalized Gibbs sampler.

Page 10: Previous algorithms for graph partition

10

The importance of q(l |V0,A)

4

4.05x 10

5

3.8

3.85

3.9

3.95

en

erg

y

Generalized Gibbs samplerSWC

0 20 40 60 80 100 120 140 160 180 2003.75

time(s)

Convergence of SWC with data-driven q(l |V0,A) (blue) and of the generalized Gibbs sampler (red), starting from a random state.

The Bayesian formulation for maximizing a posterior probability

Let I be an image and W be a semantic representation of the world

2, Data-Driven Markov Chain Monte Carlo

ppp (W) W)| (I I) |(W w

max arg w

max arg W*

Let I be an image and W be a semantic representation of the world.

)I|W(~WWW ),...,,( k21 p

In statistics, we sample from a posterior probability to preserve ambiguities.

Stat232B. Stat Computing and Inference, S.C. Zhu

p

W

Page 11: Previous algorithms for graph partition

11

Example: Image Segmentation

):θ ,l,R,(nW n,...,2,1iiii

/ : n

jiin21 R RR)R ..., ,R ,(R pg ji,nπ πn

)R ..., ,R ,(R 7217π1

2

43

5

76

is a 7-partition of the lattice.

Stat232B. Stat Computing and Inference, S.C. Zhu

1in

The partition space is

|

1nππ n

A permutation group

Likelihood models (no objects or templates)

: iid Gaussian for pixel intensities : non-parametric histograms

: Markov random fields for texture : spline model for lighting variations

Stat232B. Stat Computing and Inference, S.C. Zhu

: iid Gaussian for color (LUV) : mixture of Gaussians for color

: spline model for smooth color variations (e.g. sky, shading, …)

Page 12: Previous algorithms for graph partition

12

Sampling the posterior distribution

To design transition kernel:

Markov Chain:

Stat232B. Stat Computing and Inference, S.C. Zhu

g

atomic particles

A 7-partition

space

Formulating and visualizing the search space

a) solution space c) an atomic space

b) a sub-space of 7 regions

atomic

spaces

1C1C

2C2C 2C

3C 3C

Stat232B. Stat Computing and Inference, S.C. Zhu

a). solution space c). an atomic spaceb). a sub-space of 7 regions

Page 13: Previous algorithms for graph partition

13

1R2R

In Image Segmentation

3R

grey-scale

flat clutter texture shading

color

flat shading texturepartition

Stat232B. Stat Computing and Inference, S.C. Zhu

p

spaces

parameter spaces

1C1C

2C2C

2C

3C 3C

7

Basic requirements for MCMC design

We have the following conditions for valid MCMC design in 202CWe have the following conditions for valid MCMC design in 202C

1: stochastic --- each row sums to 1.2: irreducible --- has 1 communication class3: aperiodic --- any power of K has 1 communication class4: globally balanced5: positive recurrent --- (not an issue in finite space).

Stat232B. Stat Computing and Inference, S.C. Zhu

p ( p )

Page 14: Previous algorithms for graph partition

14

What is Data-Driven Markov Chain Monte Carlo?

)I|W(~W pThe complexity of sampling the posterior is in the Metropolis-Hastings jumps

Consider a reversible jump WW Consider a reversible jump BA WW

))|()I|(

)|()I|(,1(minor)

)()I|(

)()I|(,1(min)(

ABA

BAB

BAA

ABBBA WWqWp

WWqWp

WWGWp

WWGWpWW

In DDMCMC, ))I,|()I|(

)I,|()I|(,1(min)(

ABA

BABBA WWqWp

WWqWpWW

Without looking at the data, the pre-designed proposal probabilities are often uniform distributions, thus it is a blind (exhaustive) search !

Stat232B. Stat Computing and Inference, S.C. Zhu

)I,|()I|( ABA WWqWp

If )I|()I,|(),I|()I,|( ABABAB WpWWqWpWWq

Then the MC is well-informed, it may converges (hit the W*) in a small number of steps !

The Markov chain consists of many processes

Suppose a Markov chain consists of many sub-chains, and the transition probability is a linear sumthe transition probability is a linear sum,

If each sub-kernel has the same invariant probability,

Then the whole Markov chain follows

Page 15: Previous algorithms for graph partition

15

The connectivity at each state x

We denote the set of states connected to x by the i-th type moves by

e.g. Ki be a probability within set i proportional to (x).

x is connected to a set

MCMC Kernels consist of many components

Transition kernel is a mixture of many sub-kernels corresponding to the various operators.

1rK

Each observes detailed balance equation, but may not be irreducible.

asymmetric

Stat232B. Stat Computing and Inference, S.C. Zhu

2K

1lK

AWsymmetric

Page 16: Previous algorithms for graph partition

16

Metropolized Gibbs sampler

e

Consider a pair of reversible jumps Jm between x and y.

t i

(y)iΩ l (x)irΩ

x ye

Proposal according to the conditional probabilities --- like a Gibbs sampler

);(,)'(

)(),(Qir xy

y

yyx ir

Proposal matrix Q

asymmetric

Stat232B. Stat Computing and Inference, S.C. Zhu

)('y xir

);(,)'(

)(),(Q

)('

il yxx

xxy il

x yil

x 0, 0,… 00, 0,… 0

Key issues

1. How do we decide the samplingdimensions, directions, group transforms, and sets i(x

in a systematic and principled way?

2. How do we schedule the visiting order governed by p(i)?i.e. choosing the moving directions, groups, and sets

Stat202C: Monte Carlo methods © S-C Zhu

Page 17: Previous algorithms for graph partition

17

K1l: Splitting of a region into two.

K1r: Merging two regions into one.

MCMC Moves in image segmentation

K2: Switching the model type for a region.

K3: Diffusion of region boundary -- region competition

Stat232B. Stat Computing and Inference, S.C. Zhu

Split Merge Switch Model Diffusion

Data-Driven Methods in the object spaces (death-birth)

))'(

1,

)'(

1min()(),(

)(')('

yxxy

mr

mlmr

xyyyxK

( )Ω

We conjecture that the Metropolised-Gibbs sampler is the best design strategy on average ---- mixing very fast under the constraints of scopes.

But at each step, it need to evaluate theexpensive posterior probability over a rather large scope

)()( yx mlmr

Stat232B. Stat Computing and Inference, S.C. Zhu

q

(x)mrΩWe replace the condition probability by bottom-up (discriminative) methods which are estimated locally with lower cost. We show that such approximations indeed reduce mixing time.

Page 18: Previous algorithms for graph partition

18

Metropolized Gibbs sampler

(x)1rΩ)()

)(

)(

),(Q

),(Q,1min(),(Q),(K xy

x

y

yx

xyyxyx il

il

irilil

Proposal according to the conditional probabilities---signature of a Gibbs samplernormalized within the set of connected states.

)(,)'(

)(),(Qil xy

y

yyx il

Proposal matrix Q

x(x)2Ω

(x)1ΩL

)('y xil

)(,)'(

)(),(Q

)('

ir yxx

xxy ir

x yir

x 0, 0,… 00, 0,… 0

Stat202C: Monte Carlo methods © S-C Zhu

Metropolized Gibbs sampler

)())(

)(

),(Q

),(Q,1min(),( xy

x

y

yx

xyyx il

il

ir

)(

))'(

)'(,1min(

))(

)(,1min(

)('

)('

)'(

)(

)'(

)(

)('

)('

yx

xy

y

y

x

x

ir

il

xily

yirx

x

y

x

y

In case the sets are symmetric:

Then it is always accepted:The Gibbs sampler becomes a special case.

Stat202C: Monte Carlo methods © S-C Zhu

)()( yx iril

1),( yx

Page 19: Previous algorithms for graph partition

19

Metropolized Gibbs sampler

Mixing Metropolis and Gibbs designs.

One can improve the traditional Gibbs sampler by prohibiting theOne can improve the traditional Gibbs sampler by prohibiting the MC from staying at its current state in the conditional probability. Thus it becomes asymmetric and needs a Metropolis acceptance step to “re-balance”.

The diagonal elements in the proposal matrix are set to zero.This is an desirable property of MC design in order to make the

).(),(,)(1

)(),(Q xxxy

x

yyx

Stat202C: Monte Carlo methods © S-C Zhu

MC to “mix fast”.

))(1

)(1,1min(),(

x

yyx

Split and Merge

Split?

Page 20: Previous algorithms for graph partition

20

Computing the marginal posterior probabilities: Clustering in Color Space c1

Mean-shift clustering (Cheng, 1995, Meer et al 2001)

K

)θg(θωI)|q(θ

Input

1i

ii )θg(θωI)|q(θ

Stat232B. Stat Computing and Inference, S.C. Zhu

saliency maps 1 2 3 4 5 6The brightness represents how likely a pixel belongs to a cluster.

Computing marginal posterior probabilities in the partition space π

Edge detection and tracing at three scales of details:

Stat232B. Stat Computing and Inference, S.C. Zhu

Page 21: Previous algorithms for graph partition

21

Partition maps:

Proposals by Edge Detection at Different Scales (before SW-cut was invented)

Scale 1

Stat232B. Stat Computing and Inference, S.C. Zhu

Scale 2 Scale 3

Super-pixels and connected components

T=1 T=2 T=4 T=8

Sample 1

Sample 2

Stat232B Swendson-Wang Cut,

Sample 3

Page 22: Previous algorithms for graph partition

22

Clustering in the Partition Space

an adjacency graph: each vertex is a basic element : pixels, small-regions, edges, ….each link e=<a, b> is associated with a probability/ratio for similarity

))F(I)F(I|on""q(e (b)(a)))F(I),F(I|off""q(e

))F(I),F(I|onq(e

(b)(a)

(b)(a)

be

Stat232B. Stat Computing and Inference, S.C. Zhu

a

Clustering in the Partition Space

Sampling the edges independently, we get connected components:

Stat232B. Stat Computing and Inference, S.C. Zhu

These connected sub-graphs are the clusters in the partition space

sampling c ~ q( C | F(I)) on π

Page 23: Previous algorithms for graph partition

23

Graph Partitioning – Generalizing SW

The red edges are the bridges

Theorem Accepting the label change proposal with probability:

),(),,( 'c

lcc

lc VVVEVVVE

AG BGAG

Stat232B. Stat Computing and Inference, S.C. Zhu

results in an ergodic and reversible Markov Chain.

Diffusion Processes on the boundary

The Markov chains realized reversible jumps between sub-spaces of varying dimensions.

Withi b f fi d di i th i diff i dWithin a subspace of fixed dimension, there are various diffusion processes expressedas partial differential equations.

For example, the region competition for curve evolution (Zhu, Lee, and Yuille, 95)

Ra

R ( ))( ( )( )

Let v be a point on the boundary between two regions, its motionis governed by the region-competition equation.

Stat232B. Stat Computing and Inference, S.C. Zhu

(s)n))θ|y)p(I(x,log

)θ|y)p(I(x,logκ(s)(μ

dt

(s)vd

b

a

Rb y(s))(x(s),(s)v

Page 24: Previous algorithms for graph partition

24

Stochastic Diffusion and PDE

1R2R

R

The continuous Langevin equation simulates a Markov Chain with stationary density

3R

Stat232B. Stat Computing and Inference, S.C. Zhu

For example, the movement of changing point is driven by

Running DDMCMC

Starting with 3 different initial segments below

MC 1 MC 2 MC 3

energy plots of three MCMCs

Stat232B. Stat Computing and Inference, S.C. Zhu

input W1 I1~p( I |W1) W2 I2~ p(I|W2)

Page 25: Previous algorithms for graph partition

25

Saliency maps (the brightness represents how likely a pixel belongs to a cluster.)

Proposals for region models by clustering

y p ( )

color values (L,u,v)

Stat232B. Stat Computing and Inference, S.C. Zhu texture

A Demo

Segmentation Synthesis

Stat232B. Stat Computing and Inference, S.C. Zhu

snapshot of solution sampled by DDMCMC

Page 26: Previous algorithms for graph partition

26

Experiments: Color Image Segmentation

Input segment synthesis I ~ p( I | W*)

Stat232B. Stat Computing and Inference, S.C. Zhu

Input segment synthesis I ~ p( I | W*)

Experiments: Color Image Segmentation

Stat232B. Stat Computing and Inference, S.C. Zhu

Page 27: Previous algorithms for graph partition

27

Input segment synthesis I ~ p( I | W*)

Experiments: Color Image Segmentation

Stat232B. Stat Computing and Inference, S.C. Zhu

a. Input image b. segmented regions c. synthesis I ~ p( I | W*)

Stat232B. Stat Computing and Inference, S.C. Zhu

Page 28: Previous algorithms for graph partition

28

Image Segmentation result on the public dataset

Input segment synthesis I ~ p( I | W*)

Stat232B. Stat Computing and Inference, S.C. Zhu

Performance on the Berkeley Benchmark Study

test images DDMCMC manual segment “error” measure

0 1083

(David Martin et al, 2001)

0.3082

0.1083

Stat232B. Stat Computing and Inference, S.C. Zhu

0.5627

Page 29: Previous algorithms for graph partition

29

a. Input image b. segmented regions c. synthesis I ~ p( I | W*)

Examples of Failure

Stat232B. Stat Computing and Inference, S.C. Zhu

Uninformed MCMC

Speed Comparison

MCMC with clustering

MCMC with partition

Stat232B. Stat Computing and Inference, S.C. Zhu

MCMC with bothground truth

Page 30: Previous algorithms for graph partition

30

Running Time Comparison Against Gibbs Sampler

3.2

3.25x 10

5

3.2

3.25x 10

5

ergy

0 500 1000 1500 2000 25003

3.05

3.1

3.15

ener

gy

0 10 20 30 40 50 60 70 80 90 1003

3.05

3.1

3.15

3.2

time(s)

ener

gy

ene

Stat232B. Stat Computing and Inference, S.C. Zhu

time(s)

Time = #sweeps Zoom-in view

Red curve: Gibbs sampler for graph partition and labelingBlue curve: Improved SW algorithm for graph partition.

Generic Images parsing

scene

objects

patterns

Stat232B. Stat Computing and Inference, S.C. Zhu

p

parts

textons Example: parsing (Tu et al, 2000-2004)

Page 31: Previous algorithms for graph partition

31

From segmentation to parsing

Face images of FERET datasetText images of San Francisco street scenes.

Stat232B. Stat Computing and Inference, S.C. Zhu

Adaboost in the Label Space

---- An example from Viola and Jones, 2001.

(a) the first two face features (b) an example of face detection

Adaboost is a learning algorithm which makes decision by combining a number

Stat232B. Stat Computing and Inference, S.C. Zhu

g g y gof simple features. As T and training samplers become large enough, it weakly

converges to the log ratio of the posterior probability.

Page 32: Previous algorithms for graph partition

32

Image Parsing Results Tu, Chen, Yuille, and Zhu, iccv2003

Input Regions Objects Synthesis

Stat232B. Stat Computing and Inference, S.C. Zhu

Image Parsing ResultsInput Regions Objects Synthesis

Stat232B. Stat Computing and Inference, S.C. Zhu

Page 33: Previous algorithms for graph partition

33

An example

face text region model switching

Markov kernel

Diagram for Integrating Top-down generative andBottom-up discriminativeMethods.

deathbirth deathbirth split merge

+

generativeinference

discriminativeinference

weighted particles

input image

face detection text detection edge detection model clustering

inference

Page 34: Previous algorithms for graph partition

34

Bayesian: (Top-down)

Summary: generative vs. discriminative

Data-driven: (Bottom-up)

?

Stat232B. Stat Computing and Inference, S.C. Zhu

Integrating generative and discriminative

Stat232B. Stat Computing and Inference, S.C. Zhu

Page 35: Previous algorithms for graph partition

35

Review: MCMC developments related to vision

Metropolis 1946

Hastings 1970

Waltz 1972 (labeling)

Rosenfeld Hummel Zucker 1976 (relaxation) Hastings 1970Rosenfeld, Hummel, Zucker 1976 (relaxation)

Geman brothers 1984, (Gibbs sampler)Miller, Grenander,1994

Heat bathKirkpatrick 1983

Swendsen-Wang 1987 (clustering) Green 1995

Swendsen-Wang Cut 2003DDMCMC 2001-2005

C4 2009