markov random fields - university of pittsburgh

16
1 CS 3750 Advanced Machine Learning CS 3750 Machine Learning Lecture 4 Milos Hauskrecht [email protected] 5329 Sennott Square Graphical models: inference CS 3750 Advanced Machine Learning Inferences Last lecture: Exact inferences on chains and trees Factor graph representation and inference A B C D E A B C D chain tree

Upload: others

Post on 03-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

1

CS 3750 Advanced Machine Learning

CS 3750 Machine Learning

Lecture 4

Milos Hauskrecht

[email protected]

5329 Sennott Square

Graphical models: inference

CS 3750 Advanced Machine Learning

Inferences

Last lecture:

• Exact inferences on chains and trees

• Factor graph representation and inference

A B

C

D

E

A B C D

chain tree

2

Inferences on factor graphs

Slides by C. Bishop

CS 3750 Advanced Machine Learning

Factor graph

A graphical representation that lets us express a factorization of a

function over a set of variables

A factor graph is bipartite graph where:

• One layer is formed by variables

• Another layer is formed by factors or functions on subsets of

variables

Example: a function over variables x1, x2 , … x5

g(x1, x2 , …x5 ) = fA(x1) fB(x2) fC(x1,x2 ,x3) fD(x3 ,x4) fE(x3 ,x5)

x1 x2 x3 x4 x5

fA fB fc fD fE

3

Inferences

Slides by C. Bishop

CS 3750 Advanced Machine Learning

Inferences on factor graphs

• Efficient inference algorithms for factor graphs built

for trees [Frey, 1998; Kschischnang et al., 2001] :

• Sum-product algorithm

• Max product algorithm

4

CS 3750 Advanced Machine Learning

Inferences

Last lecture:

• Exact inferences on chains and trees

• Factor graph representation and inference on factor

graphs

Open question:

• Exact inferences on arbitrary MRFs and BBNs

Clique tree algorithm:

• Clique tree = Junction tree = tree decomposition

of the graph

CS 3750 Advanced Machine Learning

Tree decomposition of the graph

• A tree decomposition of

a graph G:

– A tree T with a vertex set

associated to every node.

– For all edges {v,w}G:

there is a set containing

both v and w in T.

– For every v G : the nodes

in T that contain v form a

connected subtree.

BC

D E

F

A A

B C C

B D

E

H

FGF

CG G

A G

H

5

CS 3750 Advanced Machine Learning

Conversion of the BBN and MRF to a Clique tree

MRF conversion:

Option 1:

• Via triangulation to form a chordal graph

• Cliques in the chordal graph define the clique tree

Option 2:

• from the induced graph built by running the

variable elimination procedure

• Cliques are factors generated during the procedure

BBN conversion:

• Convert the BBN to an MRF – a moral graph

• Apply MRF conversion

CS 3750 Advanced Machine Learning

Clique tree algorithms

• We have precompiled a clique tree.

• So how to take advantage of the clique tree to perform

inferences?

6

CS 3750 Advanced Machine Learning

VE on the Clique tree

• Variable Elimination on the clique tree

– works on factors

• Makes factor a data structure

– Sends and receives messages

CS 3750 Advanced Machine Learning

Clique trees

• Example clique tree

C

D I

G

L

S

JHC,D

G,J,S,L

G,S,IG,I,D

H,G,J

S,K

7

CS 3750 Advanced Machine Learning

Clique tree properties

• Running intersection property

– if Ci and Cj both contain a vertex X, then all cliques on the

unique path between them also contain X

• Sepset

– separation set: Variables X on one side of sepset are

separated from the variables Y on the other side in the

factor graph given variables in S

jiij CCS

CS 3750 Advanced Machine Learning

Clique trees

• Running intersection:

E.g. Cliques involving G form

a connected subtree.

C

D I

G

L

S

JHC,D

G,J,S,L

G,S,IG,I,D

H,G,J

S,K

8

CS 3750 Advanced Machine Learning

Clique trees

• Sepsets:

• Variables X on one side of a sepset are

separated from the variables Y on the

other side given variables in S

C

D I

G

L

S

JHC,D

G,J,S,L

G,S,IG,I,D

H,G,J

S,K

D G,I

G,S

G,J

S

Sepsets

jiij CCS

CS 3750 Advanced Machine Learning

Clique trees

C,D

G,J,S,L

G,S,IG,I,D

H,G,J

Initial potentials : Assign factors to cliques and multiply them.

C

D I

G

L

S

JH

0

i

),(0 DC ),,(0 DIG),,(0 ISG

),,,(0 LSJG

),,(0 JGH

S,K

K

),,(),(),,,(),,(),,(),(

),,,,,,,,(

000000 JGHKSLSJGISGDIGDC

HKLJSIGDCp

),(0 KS

9

CS 3750 Advanced Machine Learning

Message Passing VE

• Query for P(J)

– Eliminate C:

C

DCD ],[)( 0

11 C,D

G,J,S,L

G,S,IG,I,D

S,K

H,G,J

Message sentfrom [C,D]to [G,I,D]

D

Message received at [G,I,D] --[G,I,D] updates:

],,[)(],,[ 0

212 DIGDDIG

C

D I

G

L

S

JH

K

CS 3750 Advanced Machine Learning

Message Passing VE

• Query for P(J)

– Eliminate D: D

DIGIG ],,[),( 22

C,D

G,J,S,L

G,S,IG,I,D

SK

H,G,J

Message sentfrom [G,I,D]to [G,S,I]

D

Message received at [G,S,I] --[G,S,I] updates:

],,[),(],,[ 0

323 ISGIGISG

G,I

C

D I

G

L

S

JH

K

10

CS 3750 Advanced Machine Learning

Message Passing VE

• Query for P(J)

– Eliminate I:

C,D

G,J,S,L

G,S,IG,I,D

S,K

H,G,J

Message sentfrom [G,S,I]to [G,J,S,L]

D

Message received at [G,J,S,L] --[G,J,S,L] updates:

],,,[),(],,,[ 0

434 LSJGSGLSJG

G,Ii G,S

!

[G,J,S,L] is not ready!

I

ISGSG ],,[),( 33 C

D I

G

L

S

JH

K

CS 3750 Advanced Machine Learning

Message Passing VE

• Query for P(J)

– Eliminate H:

C,D

G,J,S,L

G,S,IG,I,D

S,K

H,G,J

Message sentfrom [H,G,J]to [G,J,S,L]

D

],,,[),(),(],,,[ 0

4434 LSJGJGSGLSJG

G,I i G,S

And …

H

JGHJG ],,[),( 54

h G,J

C

D I

G

L

S

JH

K

11

CS 3750 Advanced Machine Learning

Message Passing VE

• Query for P(J)

– Eliminate K:

C,D

G,J,S,L

G,S,IG,I,D

S,K

H,G,J

Message sentfrom [S,K]to [G,J,S,L]

D

All messages received at [G,J,S,L][G,J,S,L] updates:

],,,[)(),(),(],,,[ 0

46434 LSJGSJGSGLSJG

G,I i G,S

K

KSS ],[)( 0

6

h G,J

C

D I

G

L

S

JH

K

S

And calculate P(J) from it by summing out G,S,L

CS 3750 Advanced Machine Learning

Message Passing VE

• [G,J,S,L] clique potential

• … is used to finish the inference

C,D

G,J,S,L

G,S,IG,I,D

S,K

H,G,J

D

G,Ii G,S

G,J h

S

12

CS 3750 Advanced Machine Learning

Message passing VE

• Often, many marginals are desired

– Inefficient to re-run each inference from scratch

– One distinct message per edge & direction

• Methods :

– Compute (unnormalized) marginals for any vertex

(clique) of the tree

– Results in a calibrated clique tree

• Recap: three kinds of factor objects

– Initial potentials, final potentials and messages

ijjiji SC

j

SC

i

CS 3750 Advanced Machine Learning

Two-pass message passing VE

• Chose the root clique, e.g. [S,K]

• Propagate messages to the root

C,D

G,J,S,L

G,S,IG,I,D

S,K

H,G,J

D

G,Ii G,S

G,J h

S

13

CS 3750 Advanced Machine Learning

Two-pass message passing VE

• Send messages back from the root

C,D

G,J,S,L

G,S,IG,I,D

S,K

H,G,J

D

G,I

G,J i

S

G,S h

CS 3750 Advanced Machine Learning

Message Passing: BP

• Belief propagation

– A different algorithm but equivalent to variable

elimination in terms of the results

– Asynchronous implementation

14

CS 3750 Advanced Machine Learning

Message Passing: BP

• Each node: multiply all the messages and divide by the one

that is coming from node we are sending the message to

– Clearly the same as VE

– Initialize the messages on the edges to 1

iji

ijiiji

SC jiNk

iki

ij

SC iNk

iki

ij

SC

i

ji

\)(

0)(

0

CS 3750 Advanced Machine Learning

Message Passing: BP

A,B C,DB,CB C

B

CB ),(0

232

),(),(),(),( 0

2

0

3

3,2

320

33 CBDCDCDCB

),(0

1 BA ),(0

2 CB ),(0

3 DC

Store the last messageon the edge and divideeach passing message by the last stored.

13,2

B

CB ),(0

2323,2 New message stored

15

CS 3750 Advanced Machine Learning

Message Passing: BP

A,B C,DB,CB C

B

CB ),(0

23,2

3,2

0

3

0

2

0

33 ),(),(),(),( DCCBDCDCB

),(0

1 BA ),(0

2 CB

D

DC ),(323

),(3 DC

DD

DCCBCDCC

CB

CCBCB ),(),()(),(

)(

),(

)(),(),( 0

3

0

23,2

0

3

3,2

0

2

3,2

230

22

Store the last messageon the edge and divideeach passing message by the last stored.

D BD

CBDCDC ),(),(),( 0

2

0

33233,2 New message

CS 3750 Advanced Machine Learning

Message Passing: BP

A,B C,DB,CB C

),(),(),( 0

2

0

33 CBDCDCB

),(0

1 BA ),(2 CB

D

DC ),(323

),(3 DC

D

DCCBCB ),(),(),( 0

3

0

22

Store the last messageon the edge and divideeach passing message by the last stored.

D B

CBDC ),(),( 0

2

0

33,2

),(),(),(

),(),(

),()(

),(),( 20

2

0

3

0

2

0

3

2

3,2

2322 CB

CBDC

CBDC

CBC

CBCB

D B

D B

The same as before

16

CS 3750 Advanced Machine Learning

Loopy belief propagation

• The asynchronous BP algorithm works on clique trees

• What if we run the belief propagation algorithm on a non-tree

structure?

• Sometimes converges

• If it converges it leads to an approximate solution

• Advantage: tractable for large graphs

A

D

B

C

A,B B

A,D

A

B,C

C,D

C

D

CS 3750 Advanced Machine Learning

Loopy belief propagation

• If the BP algorithm converges, it converges to an optimum of

the Bethe free energy

See papers:

• Yedidia J.S., Freeman W.T. and Weiss Y. Generalized Belief

Propagation, 2000

• Yedidia J.S., Freeman W.T. and Weiss Y. Understanding

Belief Propagation and Its Generalizations, 2001