mrfs uc berkeley

8/7/2019 MRFs UC Berkeley

1/159

,Prof.TrevorDarrell

[email protected]

Lecture20:MarkovRandomFields


2/159

YuriBoykov ot ers,asnote

2


3/159


4/159

,

observed and some are not. Theres some probabilistic relationship betweenthe 5 variables, described by their joint probability,

, , , , .

If we want to find out what the likely state of variable x1 is (say, the position

of the hand of some erson we are observin , what can we do?

Two reasonable choices are: (a) find the value of x1 (and of all the other

, , , ,

solution.

Or (b) marginalize over all the other variables and then take the mean or the

maximum of the other variables. Mar inalizin then takin the mean is

4

equivalent to finding the MMSE solution. Marginalizing, then taking the

max, is called the max marginal solution and sometimes a useful thing to do.


5/159

,

),,,,(5432 ,,,

54321 xxxx xxxxxP

If the system really is high dimensional, that will quickly become

intractable. But if there is some modularity in ),,,,( 54321 xxxxxPthen things become tractable again.

Suppose the variables form a Markov chain: x1 causes x2 which causes x3,

.

x x x x x

5


6/159


7/159

Directed graphical models

A directed, acyclic graph.

Nodes are random variables. Can be scalars or vectors, continuous or

discrete.

The direction of the edge tells the parent-child-relation:

parent child

parent nodes of node i. That conditional probability isi x i |x i

conditional probabilities:

n

x1 ...x n x i |x

i

i1


8/159

Another modular probabilistic structure, more common in visionproblems, is an undirected graph:

1x 2x 3x 4x 5x

T e o nt pro a ty or t s grap s g ven y:

),(),(),(),(),,,,( 5443322154321 xxxxxxxxxxxxxP

Where is called a compatibility function. We can),( 21 xx

8

for the directed graph described in the previous slides; for that example,

we could use either form.


9/159

Undirected graphical models

A set of nodes joined by undirected edges.

The graph makes conditional independencies explicit: If two nodes are

not linked, and we condition on every other node in the graph, then

those two nodes are conditionally independent.

Conditionally independent, because are

not connected by a line in the undirected

graphical model

9


10/159

Undirected graphical models: cliques

Clique: a fully connected set of nodes

not a clique

clique

A maximal clique is a clique that cant include more nodes of the graph

w/o losing the clique property.

Maximal cliqueNon-maximal clique

10


11/159

Undirected graphical models:

Hammersley-Clifford theorem addresses the pdf factorization implied

by a graph: A distribution has the Markov structure implied by an

undirected graph iff it can be represented in the factored form

Px

1

Z xc

Potential functions of

set of maximal cliques

states of variables in

maximal cliqueNormalizing constant

So for this graph, the factorization is ambiguous:. (something called

factor graphs makes it explicit).

11


12/159

Markov Random Fields

Allows rich probabilistic models for.

But built in a local, modular way. Learn, .

12


13/159

MRF nodes as pixels

13Winkler, 1995, p. 32


14/159

image patches

(xi, yi)image

14

xi, xj

scene


15/159

Network joint probability

iiji yxxx

ZyxP ),(),(),(

scene

ima e

Scene-scene

compatibility

Image-scene

compatibility

,

functionneighboring local

function

15


16/159

In order to use MRFs:

Given observations y, and the parameters ofthe MRF how infer the hidden variables x?

How learn the parameters of the MRF?

16


17/159

Outline of MRF section

Inference in MRFs. Gibbs sampling, simulated annealing

Iterated condtional modes (ICM)

Variational methods Belief propagation

Graph cuts

Vision applications of inference in MRFs. Learning MRF parameters.

17

Iterative proportional fitting (IPF)


18/159


Inference in MRFs. Gibbs sampling, simulated annealing


Variational methods Belief propagation

Graph cuts


18



19/159

Derivation of belief propagation

y1 y2 y3

, 11

),( 21 xx

, 22

),( 32 xx

, 33

x1 x2 x3

,,,,,sumsummean 3213211321

yyxxxxxxx

MMSE

19


20/159

The posterior factorizes

),,,,,(sumsummean 3213211 yyyxxxPx MSE

),(sumsummean 111321

321

yxxxxx

MMSE

xxx

),(),( 2122 xxyx

),(mean

,,

1111

yxxx

MMSE

),(),(sum 21222

xxyxx ),( 11 yx ),( 22 yx ),( 33 yx

x x x

20

,, 32333x

),( 21 xx ),( 32 xx


21/159

Propagation rules

),,,,,(sumsummean 3213211 yyyxxxPx MSE

),(sumsummean 111321

321

yxxxxx

MMSE

xxx

),(),( 2122 xxyx

),(mean

,,

1111

yxxx

MMSE

),( 11 yx ),( 22 yx ),( 33 yx

x x x

),(),(sum 21222

xxyxx

21

),( 21 xx ),( 32 xx,, 32333x


22/159

Propagation rules

),(mean 1111

yxxx

MMSE

),(),(sum 21222 xxyxx ,,sum 3233

3

xxyxx

)(),(),(sum)( 23

222211

2

12

xMyxxxxMx

),( 11 yx ),( 22 yx ),( 33 yx

x x x

22

),( 21 xx ),( 32 xx


23/159

Propagation rules

),(mean 1111

yxxx

MMSE

),(),(sum 21222 xxyxx ,,sum 3233

3

xxyxx

)(),(),(sum)( 23

222211

2

12

xMyxxxxMx

),( 11 yx ),( 22 yx ),( 33 yx

x x x

23

),( 21 xx ),( 32 xx


24/159

neighbor rule

Given everything that I know, heres what Ithink ou should think

different states, and how my states relate to

,probabilities of your states should be)

24


25/159

Belief propagation messages

A message: can be thought of as a set of weights on

each of your possible states

To send a message: Multiply together all the incomingmessages, except from the node youre sending to,

over the senders states.

ijNk

jjji

x

i

j

i xxxxj \)(

ij )(),()(

jii j

25


26/159

To find a nodes beliefs: Multiply together all the

messages coming in to that node.

j

)()(j

k

jjj xMxb

26


27/159

j )( )()( jNk jjjj xxb

kj xxxx )(),()(

i

ijNkxj \)(

27

=


28/159

O timal solution in a chain or tree:Belief Propagation

Do the right thing Bayesian algorithm.

Kalman filter.

forward/backward algorithm (and MAP

.

28


29/159

No factorization with loops!

),(mean 1111

yxxx

MMSE

),(),(sum 21222 xxyxx ,,sum 3233

3

xxyxx 31

),( xx

y2

y3

x1

x2

x3

29


30/159

Justification for running belief propagation

Experimental results:

Error-correctin codes Kschischan and Fre 1998

Freeman and Pasztor 1999

McEliece et al., 1998

Theoretical results:

Frey, 2000

For Gaussian processes, means are correct.

Large neighborhood local maximum for MAP.

Weiss and Freeman, 1999

Equivalent to Bethe approx. in statistical physics.

Tree-wei hted re arameterization

We ss an Freeman, 2000

Yedidia, Freeman, and Weiss, 2000

30

Wainwright, Willsky, Jaakkola, 2001


31/159

Results from Bethe free energy analysis

Fixed point of belief propagation equations iff. Bethe

approx mat on stat onary po nt.

Belief propagation always has a fixed point.

Connection with variational methods for inference: both

minimize approximations to Free Energy,

var a ona : usua y use pr ma var a es.

belief propagation: fixed pt. equs. for dual variables.

propagation algorithms.

31

Yuille, Welling, etc.


32/159

References on BP and GBP

J. Pearl, 1985 classic

Y. Weiss NIPS 1998 Inspires application of BP to vision

W. Freeman et al learning low-level vision, IJCV 1999 A lications in su er-resolution motion shadin / aint

discrimination

H. Shum et al, ECCV 2002 Application to stereo

M. Wainwright, T. Jaakkola, A. Willsky Reparameterization version

J. Yedidia AAAI 2000

32

The clearest place to read about BP and GBP.


33/159


Inference in MRFs.

,

Iterated conditional modes (ICM)

Application examplesuper-resolution

Gra h cuts Application example--stereo

Variational methods

Application exampleblind deconvolution

Learning MRF parameters.

33



34/159


35/159


36/159

, ,for MAP estimation in MRFs

E Edata Esmooth

Dp ( fp ) Vp,q (fp, fq ),

36


37/159

Graph cuts

Algorithm: uses node label swaps or expansionsas moves in the algorithm to reduce the energy.waps many a e s a once, no us one a a me,

as with ICM.

flow algorithms from network theory.

Can offer bounds on optimality.

See Boykov, Veksler, Zabih, IEEE PAMI 23 (11)Nov. 2001 (available on web).

37


38/159

38

Two moves: swap and expansion


39/159

Two moves: swap and expansion

39

Th t f l h b t i t


40/159

The cost of an alpha-beta assignment

for each node in P labeled not a member of P

p p,q , qqNp ,qP

a mem er o P

for each connected P neighbor:

Vp,q (,)p

Dp () Vp,q (, fq )

40

qNp ,qP

for each node in P labeled


41/159

Flash of inspiration here

41

The cost of any possible alpha-beta


42/159

The cost of any possible alpha beta

assignment described as a cut in a graph

tp Dp () Vp,q (, fq )

q p ,q

e V

tp

, ,p qep,q

tp Dp () Vp,q (, fq )qNp ,qP

p

42

The cost of any possible alpha-beta assignment described as a cut in


43/159

e cost o a y poss b e a p a beta ass g e t desc bed as a cut

this graph, with edge-weights defined depending on the MRF energies.

tq Dq () Vq,r(, fr)

rNq ,rPtp

ep,q p,q ,p qep,q

tp p p,q , qqNp ,qP

tp

tp Dp () Vp,q (, fq )

qN ,qP

43


44/159

44


45/159

45

Slide by Yuri Boykov


46/159

Minimums-tcuts algorithms

,

us -re a e o erg- ar an,



47/159

Augmenting Paths

T along non-saturateded es

source

S T

sink

Increase flow along

edge saturates

A graph with two terminals



48/159

Augmenting Paths


source

S T

sink

Increase flow along

edge saturates

A graph with two terminals Find next path

Increase flow



49/159

Augmenting Paths


source

S T

sink

Increase flow along

edge saturates

A graph with two terminalsIterate until all

paths from S to T have at

eas one sa ura e e geMAX FLOWMIN CUT

Break between the two sets always cutting connections that are at capacity


50/159


51/159

ICCV 2001

http://citeseer.ist.psu.edu/rd/0%2C251059%2C1%2C0.25%2CDownload/http:

51

//coblitz.codeen.org:3125/citeseer.ist.psu.edu/cache/papers/cs/11281/http:zSzz

Szwww.cs.cornell.eduzSzrdzzSzPaperszSzBVZ-iccv99.pdf/boykov99fast.pdf


52/159

52


53/159

53


54/159

54


55/159

To Boykov slides

http://palantir.swarthmore.edu/cvpr05/CVPR05_tutorial_yyb.ppt

And see:

htt ://www-static.cc. atech.edu/ vu/ erce tion/ ro ects/ ra hcuttextures/#video

Andhttp://en.wikipedia.org/wiki/Max-flow_min-cut_theorem

55http://www.cse.yorku.ca/~aaw/Wang/MaxFlowStart.htm

CVPR05 TutorialSlide by Yuri Boykov


56/159

CVPR 05 Tutorial

Yuri Boykov, University of Western Ontario

Pedro Felzenszwalb, University of Chicago

am n a , orne n vers ty



57/159

Outline

s-tGraph cuts

Binary energy minimization

- -a-expansion algorithm

A lications in Com uter Vision


58/159



59/159

Texture synthesis

2D Graph cut shortest path on a graph


60/159

2D Graph cut shortest path on a graph

Shortest pathsExample:

find the shortest Graph Cuts

approach(live wire, intelligent scissors)

closed contour in a given

domain of a graph

approach

p

p->p for a pointp. minimum cut thatseparates red regionRepeat for all points on the

.optimal contour.

live-wire: [Falcao, Udupa, Samarasekera, Sharma 1998] intelligent scissors: [Mortensen, Barrett 1998]



61/159

DP / Shortest-paths

Successes are primarily in 1D problems an t ea w t epen ent scan- nes n stereo

Cant handle bubbles (3D snakes) or

o ec oun ar es n vo ume r c a a

Graph Cuts can be seen as a generalization to N-D



62/159

Stereo example

Independent -

scan-lines(via DP)

(via Graph Cuts) Ground truth



63/159

(simple example la Boykov&Jolly, ICCV01)

n-linkst a cuthard

constraint

s

constraint

Minimum cost cut can becomputed in polynomial time

22exp

pq

pqw

(max-flow/min-cut algorithms)

pqI



64/159

Image Restoration (e.g. Greig at.al. 1989)

Wu & Leahy 1993

Nested Cuts, Veksler 2000

Multi-scan-line Stereo Multi-camera stereo

Roy & Cox 1998, 1999

Ishikawa & Geiger 1998, 2003

Boykov, Veksler, Zabih 1998, 2001 Kolmogorov & Zabih 2002, 2004

Object Matching/Recognition (Boykov & Huttenlocher 1999)

N-D Object extraction (photo-video editing, medical imaging)

Boykov, Jolly, Funka-Lea 2000, 2001, 2004

Boykov & Kolmogorov 2003

Rother, Blake, Kolmogorov 2004

Texture synthesis (Kwatra, Schodl, Essa, Bobick 2003)

Shape reconstruction (Snow,Viola,Zabih 2000)

Motion (e.g. Xiao, Shah 2004)


65/159

Pixel interactions:

convex vs discontinuity-preserving



66/159

convex vs. discontinuity-preserving

Robust

discontinuity preserving

interactions

Convex

interactions

V(L)

Potts

model

V(L)

linear

L=Lp-Lq

L=Lp-Lq

model

V(L)V(L)

L=Lp-LqL=Lp-Lq

Pixel interactions:

convex vs discontinuit reservin



67/159

convex vs. discontinuit - reservin

linear V

truncated

linear V

a-expansion moveSlide by Yuri Boykov


68/159

Basic idea: break multi-way cut computation into a

sequence of binarys-tcuts

other labelsa

Iteratively, each label competes with the other labels for space in the image

a-expansion movesSlide by Yuri Boykov


69/159

In each a-expansion a given label a grabs space from other labels

initial solution-ex ansion

-expansion

-expansion-expansion

-expansion

-expansion-expansion

For each move we choose expansion that gives the largest decrease in the energy:

binary optimization problem

a-expansion algorithm vs. standardSlide by Yuri Boykov


70/159

single a-expansion move-

(simulated annealing, ICM,)

Lar e number of ixels can chan e Onl one ixel can chan e its label at

their labels simultaneously

Finding an optimal move is

a time

Finding an optimal move is

computationally intensive O(2^n)

(s-tcuts)

computationally trivial

a-expansion move vs.

standard moves



71/159

standard moves

original imagenoisy image

Potts energy minimization

a local minimum

w.r.t. expansion moves

a local minimum

w.r.t. one-pixel moves



72/159

- .

norma ze corre a on,start for annealing, 24.7% err

s mu a e annea ng,19 hours, 20.3% err

a-expansions (BVZ 89,01)

90 seconds, 5.8% err

Annealing Our method

40000

60000

80000

100000

oothnessEne

rgy

0

1 10 100 1000 10000 100000

Time in seconds

S

a-expansion algorithm vs.

local-update algorithms (SA, ICM, ...)



73/159

p g ( , , )

-

Finds local minimum of Finds local minimum of energy

,

energy w respec o very

strong moves

-

moves

In practice, results do not Initialization is important

solution is within the factor of

2 from the global minima

solution could be arbitrarily far

from the global minima

depend on initialization

In practice, one cycle through alllabels gives sufficiently good

results

May not know when to stop.Practical complexity may be

worse than exhaustive search

Applies to a restricted class of

energies

Can be applied to anything


74/159

end of Boykov slides

Middlebury stereo web page


75/159

75


76/159

76


77/159

Algorithm comparisons

77


78/159

78


79/159

propagation

Comparison of Graph Cuts with Belief

Propagation for Stereo, using Identical

, .

Marshall F. Tappen William T. Freeman

79

, ,


80/159

, ,

propagation disparity solution energies

80


81/159

Graph cuts versus belief propagation

Graph cuts consistently gave slightly lower energy

solutions for that stereo-problem MRF, although,

cuts implementation than what we used

Advantages of Belief Propagation: or s or any compat ty unct ons, not a restr cte

set like graph cuts.

I find it very intuitive.

Extensions: sum-product algorithm computes MMSE,

and Generalized Belief Propagation gives you veryaccurate solutions, at a cost of time.

81


82/159

MAP versus MMSE

82


83/159


Inference in MRFs.

Gibbs sampling, simulated annealing


Variational methods

Belief propagation

Graph cuts


83 Iterative proportional fitting (IPF)


84/159

Vision applications of MRFs

Super-resolution

Labelling shading and reflectance

any ot ers

84


85/159

Super-resolution

Image: low resolution image

ultimate oal...

mage

ene

85

sc

Pixel-based images

are not resolution


86/159

n epen enPixel replication

Cubic spline,

sharpened

Trainin -based

Polygon-basedgraphics

super-resolution

86

mages are

resolution

independent

3 approaches to perceptual

h i


87/159

sharpeningitude

frequencies. spatial frequencyampl

se mu t p e rames to o ta n

higher sampling rate in a still frame.

(3) Estimate high frequencies not

present in image, although implicitly

defined.amplit

ude

87

, ,

well call super-resolution.spatial frequency

S l i h h


88/159

Super-resolution: other approaches

Schultz and Stevenson, 1994

Pentland and Horowitz, 1993

fractal image compression (Polvere, 1998; Iterated Systems)

astronomical image processing (eg. Gull and Daniell, 1978;

pixons http://casswww.ucsd.edu/puetter.html)

- , , . ,

Yi Ma: Image super-resolution as sparse representation of rawimage patches. CVPR 2008

88

Training images, ~100,000 image/scene patch pairs


89/159

Images from two Corel database categories:

.

89

Do a first interpolation


90/159

Zoomed low-resolution

Low-resolution

90


91/159

Zoomed low-resolution Full frequency original

Low-resolution

91

Full freq. originalRepresentation

Zoomed low-freq.


92/159

92

Full freq. originalRepresentation

Zoomed low-freq.


93/159

True high freqs

93

ow- an npu

(contrast normalized,

PCA fitted)(to minimize the complexity of the relationships we have to learn,

we remove the lowest frequencies from the input image,

and normalize the local contrast level).

Gather ~100,000 patches


94/159

Training data samples (magnified)

......low freqs.

.

94


95/159


96/159

Example: input image patch, and closest


97/159

Closest image

patches from database

Corresponding

high-resolution

97


98/159

98


99/159

Image-scene compatibilityy

function (xi yi)


100/159

function, (xi, yi)

Assume Gaussian noise takes you fromx

o serve mage pa c o syn e c samp e:

100

Markov network


101/159

mage pa c es

(xi, yi)

scene patches

101

xi, xj

Belief Propagation After a few iterations of belief propagation, thealgorithm selects spatially consistent high resolution

interpretations for each low-resolution patch of the

input image.


102/159

.

.

Iter. 3

102

Zooming 2 octaves

We apply the super-resolutionalgorithm recursively, zooming


103/159

algorithm recursively, zooming

up 2 powers of 2, or a factor of 4

85 x 51 in ut

n eac mens on.

103Cubic spline zoom to 340x204 Max. likelihood zoom to 340x204

Now we examine the effect of the prior

assumptions made about images on the

high resolution reconstruction.

First, cubic spline interpolation.


104/159

Original (cubic spline implies thin

200x232

104

O i i l ( bi li i li hi


105/159

Original (cubic spline implies thin

Cubic spline

200x232

105

Next, train the Markov network

algorithm on a world of random noise

O i i l

.


106/159

Original

Training images

True

106

The algorithm learns that, in such a

world, we add random noise when zoom

O i i l

.


107/159

Original

Training images

ar ov

network

True

107

Next, train on a world of vertically

oriented rectangles.

Original


108/159

Original

Training images

True

108

The Markov network algorithm

hallucinates those vertical rectangles that

Original

.


109/159

Original

Training images

ar ov

network

True

109

Now train on a generic collection of

images.

Original


110/159

Original

Training images

True

110


111/159

Generic training images

Next, train on a generic


112/159

Next, train on a generic

.

Using the same camera

as for the test image, but

a random collection of

photographs.

112

CubicSpline

Original

70x70


113/159

Markov

,

training:

generic

True

280x280

113

Kodak Imaging Science Technology Lab test.

3 test images, 640x480, to bezoomed up by 4 in each


114/159

zoomed up by 4 in each

mens on.

8 judges, making 2-alternative,

forced-choice comparisons.

114


115/159

Bicubic Interpolation

Mitra's Directional Filter

Fuzzy Logic Filter

VISTA

115


116/159

Bicubic spline Altamira VISTA

116


117/159

Bicubic spline Altamira VISTA

117

User preference test results

The observer data indicates that six of the observers ranked


118/159

reeman s a gor t m as t e most pre erre o t e ve teste

algorithms. However the other two observers rank Freemans algorithm

as the least preferred of all the algorithms.

Freemans algorithm produces prints which are by far the sharpest

. ,of artifacts (spurious detail that is not present in the original

scene). Apparently the two observers who did not prefer Freemans

a gor t m a strong o ect ons to t e art acts. e ot er o servers

apparently placed high priority on the high level of sharpness in theimages created by Freemans algorithm.

118


119/159

119


120/159

120


121/159

Training images

121

Training image


122/159

122

Processed image


123/159

123

Vision applications of MRFs


124/159

Super-resolution

Labelling shading and reflectance

any ot ers

124

Motion a lication

image patches


125/159

image

scene patches

125

scene

motion algorithm?


126/159

Aperture problem

information

126

The a erture roblem


127/159

127

The a erture roblem


128/159

128

Motion analysis: related work


129/159

Markov network

Luett en, Karl, Willsk and collaborators.

Neural network or learning-based

Nowlan & T. J. Sen owski Sereno.

Optical flow analysis

,

Black & Jepson; Simoncelli; Grzywacz &

Yuille; Hildreth; Horn & Schunk; etc.

129

Motion estimation results(maxima of scene probability distributions displayed)

Inference:

Image data


130/159

Iterations 0 and 1

130

Initial guesses only

show motion at edges.

Motion estimation results(maxima of scene probability distributions displayed)


131/159

Figure/ground still

Iterations 2 and 3

131

.


132/159


133/159

Forming an Image

Illuminate the surface to get:


134/159

Surface (Height Map) Shading Image

134

The shading image is the interaction of the shape

of the surface and the illumination

Painting the Surface


135/159

Scene

Add a reflectance pattern to the surface. Points

135inside the squares should reflect less light

Goal


136/159

Image Shading Image Reflectance

136

Basic Steps

1. Compute thex andy image derivatives2. Classify each derivative as being caused by

e t ers a ng or a re ec ance c ange


137/159

g g

3. Set derivatives with the wrong label to zero.

-.

squares solution of the derivatives.

137Originalx derivative imageClassify each derivative

(White is reflectance)

Learning the Classifiers Combine multiple classifiers into a strong classifier usinga oost reun an c ap re


138/159

p

Choose weak classifiers greedily similar to (Tieu and Viola

2000 Train on synthetic images

Assume the light direction is from the right

Shading Training Set Reflectance Change Training Set

138

Using Both Color and

ray- ca e n orma on


139/159

Results without

considering gray-scale

139

Some Areas of the Image Are

y u u


140/159

Is the change here better explained as

Input

or

140

Shading Reflectance

Propagating Information an sam gua e areas y propaga ng

information from reliable areas of the image

n o am guous areas o e mage


141/159

141

Propagating Information

Consider relationship betweennei hborin derivatives


142/159

Use Generalized Belief

Pro a ation to infer labels

142

Setting Compatibilities Set compatibilities


143/159

Set compatibilities

according to imagecon ours

All derivatives along acontour should have

the same label Derivatives along an

strongly influenceeach other 0.5 1.0

=

143

1

1),( jxxi

Improvements Using Propagation


144/159

Input Image Reflectance Image

With Propagation

Reflectance Image

Without Propagation

144


145/159

145


146/159

Input Image Shading Image Reflectance Image

146


147/159

147


148/159

148

Outline of MRF section Inference in MRFs.


149/159

Gibbs sampling, simulated annealing Iterated conditional modes (ICM)

Variational methods

Belief propagation Graph cuts

Vision applications of inference in MRFs.

Learning MRF parameters.

149 Iterative proportional fitting (IPF)

Learning MRF parameters, labeled dataIterative proportional fitting lets you


150/159

make a maximum likelihoodestimate a joint distribution from

observations of various marginal

distributions.

150

True joint


151/159

True joint

probability

Observedmarginal

151

distributions

Initial guess at joint probability


152/159

152


153/159

Scale the previous iterations estimate for the joint

probability by the ratio of the true to the predicted

marginals.

probability, given the observations of the marginals.

153See: Michael Jordans book on graphical models

Convergence of to correct marginals by IPF algorithm


154/159

154

Convergence of to correct marginals by IPF algorithm


155/159

155


156/159

Application to MRF parameter estimation

potentials, c(xc), the empirical marginals equal


157/159

the model mar inals,

This leads to the IPF u date rule for x

Performs coordinate ascent in the likelihood of the

MRF arameters iven the observed data.

157

Reference: unpublished notes by Michael Jordan

Learning MRF parameters, labeled data

Iterative proportional fitting lets you make a maximum

likelihood estimate a joint distribution from observations

of various marginal distributions.


158/159

(1) measure the pairwise marginal statistics (histogram

state co-occurrences in labeled training data).

(2) guess the clique potentials (use measured marginals tostart).

o n erence o ca cu a e e mo e s marg na s or

every node pair.

4 scale each cli ue otential for each state air b the

158

empirical over model marginal

Slide Credits Bill Freeman


159/159

others, as noted

159

mrfs uc berkeley

Documents