copyright (c) 2002 by snu cse biointelligence lab. 1 survey: foundations of bayesian networks o,...

Post on 04-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 1

SURVEY: Foundations of Bayesian Networks

O, Jangmin

2002/10/29

Last modified 2002/10/29

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 2

Contents

• From DAG to Junction TreeFrom DAG to Junction Tree• From Elimination Tree to Junction Tree• Junction Tree Algorithms• Learning Bayesian Networks

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 3

Typical Example of DAG

A

B C

F

DG

Simple DAG

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 4

1. Topological Sort

Algorithm 4.1 [Topological sort]• Begin with all vertices unnumbered.• Set counter i = 1.• While any vertices remain:

– Select any vertex that has no parents;– number the selected vertex as i;– delete the numbered vertex and all its adjacent edges from

the graph;– increment i by 1.

Objective: acquiring well-orderingWell-ordering: predecessors of any node have lower number than .

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 5

1. Topological Sort (1)

A

B C

F

DG

Simple DAG

1

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 6

1. Topological Sort (2)

A

B C

F

DG

Simple DAG

1

2

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 7

1. Topological Sort (3)

A

B C

F

DG

Simple DAG

1

2 3

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 8

1. Topological Sort (4)

A

B C

F

DG

Simple DAG

1

2 3

4

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 9

1. Topological Sort (5)

A

B C

F

DG

Simple DAG

1

2 3

4

5

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 10

1. Topological Sort (6)

A

B C

F

DG

Simple DAG

1

2 3

4

5

6

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 11

2. Moral Graph

• Making moral graph of DAG– Add undirected edge between the nodes which

have same child.– Remove directions

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 12

2. Moral Graph (1)

A

B C

F

DG

Simple DAG

1

2 3

4

5

6

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 13

2. Moral Graph (2)

A

B C

F

DG

Simple DAG

1

2 3

4

5

6

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 14

Junction tree

• Definition– Tree from nodes C1, C2,...

– Intersection of C1 and C2 is contained in every node on path between C1 and C2.

• Corollaries– Decomposable, chordal, junction tree of cliques,

perfect numbering: all are equal in undirected graph.

Perfect numbering: ne(vj) {v1, ..., vj-1} induce complete subgraph.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 15

3. Maximum Cardinality Search (1)

Algorithm 4.9 [Maximum Cardinality Search]• Set Output := ‘G is chordal’.• Set counter i := 1.• Set L = .• For all v V, set c(v) := 0.• While L V:

– Set U := V \ L.– Select any vertex v maximizing c(v) over v V and label it i.– If vi :=ne(vi) L is not complete in G:

Set Output :=‘G is not chordal’.– Otherwise, set c(w) = c(w) + 1 for each vertex w ne(vi) U.– Set L = L {vi}.– Increment i by 1.

• Report Output.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 16

3. Maximum Cardinality Search (2)

A

B C

F

DG

Simple DAG

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 17

3. Maximum Cardinality Search (2)

A

B C

F

DG

1, ={}

..

.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 18

3. Maximum Cardinality Search (3)

A

B C

F

DG

1, =

..

..

2, ={A}

.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 19

3. Maximum Cardinality Search (4)

A

B C

F

DG

1, =

..

2, ={A}

..

3, ={A, B}

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 20

3. Maximum Cardinality Search (5)

A

B C

F

DG

1, =

2, ={A}

..

3, ={A, B}

4, ={A, B}

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 21

3. Maximum Cardinality Search (6)

A

B C

F

DG

1, =

2, ={A}

.

3, ={A, B}

4, ={A, B}

5, ={B, C}

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 22

3. Maximum Cardinality Search (7)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 23

3. Maximum Cardinality Search (8)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

Output = “G is chordal”

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 24

4. Cliques of Chordal Graph (1)

Algorithm 4.11 [Finding the Cliques of a Chordal Graph]• From numbering (v1,..., vk) obtained by maximum cardinality s

earch i = cardinality of vi

• Make ladder nodes. i = ladder node if i = k

or i = ladder node if i < k and i+1 < 1 + i

• Define cliques– Cj = {j} j

C1, C2... Posess RIP (running intersection property).

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 25

4. Cliques of Chordal Graph (2)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

C1 = {A, B, C}

C2 = {A, B, D}

C3 = {B, C, F}

C4 = {F, G}

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 26

Running Intersection Property

• RIP : definition– Given (C1, C2, ..., Ck),– For all 1 < j k, there is an i < j such that Cj (C1 ... Cj-1) Ci.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 27

5. Junction Tree Construction (1)

Algorithm 4.8 [Junction Tree Construction]• From the cliques (C1, ..., Cp) of a chordal graph ordered with

RIP,• Associate a node of the tree with each clique Cj.

• For j = 2, ..., p, add an edge between Cj and Ci where i is any one value in {1, ..., j-1} such that Cj (C1 ... Cj-1) Ci.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 28

5. Junction Tree Construction (2)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

C1 = {A, B, C}

C2 = {A, B, D}

C3 = {B, C, F}

C4 = {F, G}

ABC

ABD

BCF

FG

C1

C2

C3

C4

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 29

5. Junction Tree Construction (3)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

C1 = {A, B, C}

C2 = {A, B, D}

C3 = {B, C, F}

C4 = {F, G}

ABC

ABD

BCF

FG

C1

C2

C3

C4

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 30

5. Junction Tree Construction (4)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

C1 = {A, B, C}

C2 = {A, B, D}

C3 = {B, C, F}

C4 = {F, G}

ABC

ABD

BCF

FG

C1

C2

C3

C4

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 31

5. Junction Tree Construction (5)

A

B C

F

DG

1, =

2, ={A} 3, ={A, B}

4, ={A, B}

5, ={B, C}

6, ={F}

C1 = {A, B, C}

C2 = {A, B, D}

C3 = {B, C, F}

C4 = {F, G}

ABC

ABD

BCF

FG

C1

C2

C3

C4

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 32

Contents

• From DAG to Junction Tree• From Elimination Tree to Junction From Elimination Tree to Junction

TreeTree• Junction Tree Algorithms• Learning Bayesian Networks

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 33

Triangulation (1)

• When need triangulation?– If MCS (Maximum Cardinality Search)

failed.

• Triangulation– introduces Fill-in.– produces perfect numbering.

• Optimal triangulation: NP-hard– Size of each cliques matters...

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 34

Triangulation (2)

Algorithm 4.13 [One-step Look Ahead Triangulation]• Start with all vertices unnumbered, set counter i := k.• While there are still some unnumbered vertices:

– Select an unnumbered vertex v to optimize the criterion c(v). or– Select v = (i) [ is an order].– Label it with the number i.– Form the set Ci consisting of vi and its unnumbered neighbours.

– Fill in edges where none exist between all pairs of vertices in Ci.

– Eliminate vi and decrement i by 1.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 35

Triangulation (3)

A

B C

F

DG

= (A,B,C,D,F,G)

6, C6 = {F, G}

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 36

Triangulation (4)

A

B C

F

DG

= (A,B,C,D,F,G)

6, C6 = {F, G}

5, C5 = {B,C,F}

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 37

Triangulation (5)

A

B C

F

DG

= (A,B,C,D,F,G)

6, C6 = {F, G}

5, C5 = {B,C,F}

4, C4 = {A,B,D}

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 38

Triangulation (6)

A

B C

F

DG

= (A,B,C,D,F,G)

6, C6 = {F, G}

5, C5 = {B,C,F}

4, C4 = {A,B,D}

3, C3 = {A,B,C}

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 39

Triangulation (7)

A

B C

F

DG

= (A,B,C,D,F,G)

6, C6 = {F, G}

5, C5 = {B,C,F}

4, C4 = {A,B,D}

3, C3 = {A,B,C}

2, C2 = {A,B}

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 40

Triangulation (8)

A

B C

F

DG

= (A,B,C,D,F,G)

6, C6 = {F, G}

5, C5 = {B,C,F}

4, C4 = {A,B,D}

3, C3 = {A,B,C}

2, C2 = {A,B}

1, C1 = {A} Elimination set• Cj contains vj.

• vj Cl for all l < j.

• (C1,..., Ck) has RIP.• The cliques of the triangulat

ed graph G’ are contained in (C1,..., Ck).

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 41

Elimination Tree Construction (1)

Algorithm 4.14 [Elimination Tree Construction]• Associate a node of the tree with each set Ci.

• For j = 1, ..., k, if Cj contains more than one vertex, add an edge between Cj and Ci where i is the largest index of a vertex in Cj \ {vj}

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 42

Elimination Tree Construction (2)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 43

Elimination Tree Construction (3)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 44

Elimination Tree Construction (4)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 45

Elimination Tree Construction (5)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 46

Elimination Tree Construction (6)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 47

Elimination Tree Construction (7)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 48

From etree to jtree (1)

Lemma 4.16– Let C1,..., Ck be a sequence of sets with RIP

– Assume that Ct Cp for some t p and that p is minimal with this property for fixed t. Then:

(i) If t > p, then C1, ..., Ct-1, Ct+1, ..., Ck has the running intersection property

(ii) If t < p, then C1,..., Ct-1, Cp, Ct+1, ..., Cp-1, Cp+1,..., Ck has the RIP.

Simple removal of redundant elimination set might lead to destroy RIP.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 49

From etree to jtree (2)

A:

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C1

Condition (ii): t = 1, p = 2

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 50

From etree to jtree (3)

Condition (ii): t = 2, p = 3

B:A C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

C2

C:AB

F:BC

D:AB

G:FC6

C5

C4

C3

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 51

MST for making jtree (1)

Algorithm• From Elimination set (C1, ..., Ck)

• Remove redundant Cis• Make junction graph.

– If |Ci Cj | > 0 add edge between Ci and Cj.

– Set weight of the edge as |Ci Cj |.

• Construct MST (Maximum Weight Spanning Tree)

The resulting tree is junction tree. Also the clique set has RIP.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 52

MST for making jtree (2)

ABC

BCFABD

FG

2 2

1

1

ABC

BCFABD

FG

2 2

1

Junction graph MST

C1

C2

C3

C4

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 53

MST for making jtree (3)

• Optimal jtree (for a fixed elimination ordering)– cost of edge e = (v, w)

– Use cost of edge to break tie when constructing MST. (minimum preferred)

on. can take valuesdiscrete of # :

)(

ii

vi iv

wvwv

Xq

qq

qqqe

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 54

Contents

• From DAG to Junction Tree• From Elimination Tree to Junction Tree• Junction Tree AlgorithmsJunction Tree Algorithms• Learning Bayesian Networks

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 55

Collect phase

jji

jij μ

)(childjkjkj Sμ

Ck

Cj

Ci Ci’

• From leaf to root

separator

projection

Initial potential

Updated potential

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 56

Distribute phase

• From root to leaf j

* contains marginal distribution of clique j.

ji

ijjijjk

iijchildijiij

jkjj

SSμμ

μ

*

'),(''

*

Ck

Cj

Ci Ci’

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 57

Contents

• From DAG to Junction Tree• From Elimination Tree to Junction Tree• Junction Tree Algorithms• Learning Bayesian NetworksLearning Bayesian Networks

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 58

Learning Paradigm

• Known structure or unknown structure• Full observability or partial observability• Frequentist or Bayesian

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 59

Ks, Fo, Fr (1)

• Given training set D = {D1, ..., DM}

• MLE of parameters of each CPD– MLE (Maximum likelihood Estimates)– CPD (Conditional Probability Distribution)

M

m

n

i

M

mmiim DXPaXPGDL

1 1 1

)),(|(log)|Pr(log

Decomposition, for each node# of nodes

# of data

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 60

Ks, Fo, Fr (2)

• Multinomial distributions– , for tabular CPD– Log-likelihood

– MLE

))(|(def

jXPakXP iiijk

ijkijk

ijk

i m kjijkijkm

i m kj

Iijk

N

I

L ijkm

log

log

log

,

,)|)(,(

def

miiijkm DjXPakXII

m

miiijk DjXPakXIN )|)(,(def

' '

ˆ

k ijk

ijkijk N

N constraint: ji

k ijk , allfor 1

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 61

Ks, Fo, Fr (3)

• MLE of Multinomial distr.– Constrained optimization

ij k

ijkijijkijk

ijkNO )1(log

ijijk

ijk

ijk

N

d

dO

ijkijijkN

k

ijkijk

ijkN

ijk

ijkN

''

ˆ

kijk

ijkijk N

N

Derivatives of ijk

Setting Derivatives of ijk zero

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 62

Ks, Fo, Fr (4)

• Conditional linear Gaussian distributions

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 63

Ks, Fo, Ba (1)

• Frequentist: point estimation• Bayesian: distributional estimation

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 64

Ks, Fo, Ba (2)

• Multinomial distributions– Two assumptions on prior

• Global independence:

• Local independence:

– Global independence + likelihood equivalence leads to Dirichlet prior: Conjugate prior for multinomial

},...,1,,...,1,{ ,)(1 iiijki

n

i i rkqjP

},...,1,{ ,)(1 iijkij

q

j iji rkP i

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 65

Ks, Fo, Ba (3)

• Remark on Bayesian– P(|D) P(D| )*P()

– Conjugate priors• Posterior has same form with prior distribution.• Many exponential family belongs to conjugate

priors.

PosteriorLikelihood

Prior

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 66

Ks, Fo, Ba (4)

• Multinomial distributions– Dirichlet prior on tabular CPDs

ij: multinomial r.v. with ri possible values

• Posterior distribution

• Posterior mean

))(|( jXPaXP iiij

),...,(~ 1 iijrijij Dirichlet

i

i

ijk

r

k ijrijijkijij B

P1 1

1

),...,(

1)|(

1

1 ),...,(

k k

k kB

)!1()( nn

),...,(~| 11 ii ijrijrijijij NNDirichletD

ir

l ijlijl

ijkijkijk

N

NDE

1

]|[

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 67

Ks, Fo, Ba (5)

• Dirichlet distribution– Hyper parameter ijk

• Positive number • Pseudo count• # of imaginary cases ijk - 1

– Posterior distribution• Combined count between pseudo count and # of obser

ved data• Simple sum

),...,(~ 1 iijrijij Dirichlet

),...,(~| 11 ii ijrijrijijij NNDirichletD

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 68

Ks, Fo, Ba (6)

• Gaussian distributions

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 69

Ks, Po, Fr (1)

• Log likelihood

• Not decomposable into a sum of local terms, one per node– EM algorithm

m hm

mm

DVhHP

DPL

),(log

)(loghidden

visible (observed)

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 70

Ks, Po, Fr (2)

• EM algorithm– From Jensen’s inequality

1),log()log( j

j jjjj

jj yy

m hmm

m hmm

m h m

mm

m h m

mm

m hm

VhqVhqVhHPVhq

Vhq

VhHPVhq

Vhq

VhHPVhq

VhHPL

)|(log)|(),(log)|(

)|(

),(log)|(

)|(

),()|(log

),(log

1)|( h mVhqconstraint:

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 71

Ks, Po, Fr (3)

– Maximizing w.r.t. q (E-step)

m hmmh

m hmm

m hmhm

Vhq

VhqVhqVHPVhqO

))|(1(

)|(log)|(),(log)|(

mhmmhm

VhqVHPVhdq

dO 1)|(log),(log)|(

mhe

VHPVhq mh

m

1

),()|(

h

mhh

m VHPe

Vhqmh

),(1

)|( 1

)(),(1m

hmh VPVHPe mh

)|()|( mm VhPVhq

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 72

Ks, Po, Fr (4)

– Maximizing w.r.t (M-step)• After q is maximized to p(h|Vm)• Maximizing Expected complete-data log-likelihood

• Iteration until convergence– E-step

• Calculate expected complete-data log-likelihood– M-step

• Get * maximizing expected complete-data log-likelihood

m h

mm VhHPVhpQ )'|,(log),|()|'(

)|'(maxarg*'

Q

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 73

Ks, Po, Fr (5)

• Multinomial distribution– E-step

– M-step

ijk

ijkijkNEQ 'log][)|'( ijkijk

ijkNL log

)|)(,(def

miiijkm DjXPakXII

m

miiijk DjXPakXIN )|)(,(def

mmiiijk DjXPakXPNE ),|)(,(][

)|'(maxarg'

Q

''][

][

kijk

ijkijk NE

NE

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 74

Ks, Po, Ba (1)

• Gibbs sampling: stochastic version of EM• Variational Bayes: P(, H|V) q(|V)q(H|V)

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 75

Us, Fo, Fr (1)

• Issues– Hypothesis space– Evaluation function– Search algorithm

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 76

Us, Fo, Fr (2)

• Search space– DAG

• # of DAGs ~ O(2n^2)• 10 nodes ~ O(1018) DAGs• Finding optimal DAG: doomed to failure

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 77

Us, Fo, Fr (3)

• Search algorithm– Local search

• Operators: adding, deleting, reversing a single arcChoose G somehow

While not convergedFor each G’ in nbd(G)

Compute score(G’)G* := arg maxG’ score(G’)

If score(G*) > score(G)then G :=G*

else converged := true Psedo-code for hill-climbing. nbd(G) is the neighborhood of G, i.e., the

models that can be reached by applying a single local change operator.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 78

Us, Fo, Fr (4)

• Search algorithm– PC algorithm

• Starts with fully connected undirected graph• CI (conditional independence) test

– If X Y|S, arc between X and Y is removed.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 79

Us, Fo, Fr (5)

• Scoring function– MLE selects fully connected graph.– score(G) P(D|G)P(G)

– Automatically penalizing effect on complex model.• has more parameters.• Not much probability mass to the space where data act

ually lies.

)(

)()|()|( model MAP

DP

GPGDPDGP

penalizing complex models

)|(),|()|()(score GPGDPGDPG

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 80

Us, Fo, Fr (6)

• Scoring function– Under global independences, and

conjugate priors

– Integration at closed form

n

iii

n

iiiii

XXPa

PXPaXPGDPi

1

def

1

)),((score

)()),(|()|(

Decomposition as factored form

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 81

Us, Fo, Fr (7)

• Scoring function– Under not conjugate priors: approximation– Laplace approximation: BIC (Bayesian Information

Criterioin)

– Case of multinomial distribution

Md

GDPGDP G log2

)ˆ,|(log)|(log

dim. of the model

ML estimate of params.

Md

N

Md

DXPaXPG

i

i jkijkijk

im

i miii

log2

log

log2

),ˆ),(|(log)(scoreBIC

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 82

Us, Fo, Fr (8)

• Scoring function– Advantage of decomposed score– Marginal likelihood at most two different

terms in single link mismatched graphs.• Ex) G1:X1X2 X3 X4, G2:X1 X2X3 X4

),(score),(score),(score)(score

),(score)(score),,(score)(score

)|(

)|(

4332211

4333211

1

2

XXXXXXX

XXXXXXX

GDP

GDP

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 83

Us, Fo, Fr (9)

• Scoring function– Marginal likelihood for the multinomial distributio

n with Dirichlet prior – Bayesian Dirichlet (BD) score

n

i

q

j

r

k

Nijk

i iijkGDPGDP

1 1 1

),|()|(

ii

i

i

ii

r

k ijk

ijkijkn

i

q

j ijij

ij

n

i

q

j ijrij

ijrijrijij

N

N

B

NNBGDP

11 1

1 1 1

11

)(

)(

)(

)(

),...,(

),...,()|(

posterior mean

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 84

Us, Fo, Ba (1)

• Posterior over all models is intractable– Focusing on some features

• Bayesian model averaging

• Needs to calculate P(G|D)

– Solution MCMC: Metropolis-Hastings algorithm• Only need to ratio R. Integration is avoided.

G

GfDGPDfP )()|()|( f(G)=1 if G contains a certain edge

')'()'|(

)()|()|(

GGPGDP

GPGDPDGP

Integration is intractable.

)|(

)|(

)(

)(

)|(

)|(

1

2

1

2

1

2

GDP

GDP

GP

GP

DGP

DGP

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 85

Us, Fo, Ba (2)

• Calculation of P(G|D)– Sampling GChoose G somehowWhile not converged

Pick a G’ u.a.r. from nbd(G)Compute R = P(G’|D)q(G|G’)/P(G|D)q(G’|G)Sample u ~ uniform(0,1)If u < min{1, R}

then G := G’

Psedo-code for MC3 algorithm. u.a.r. means uniformly at random.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 86

Us, Po, Fr (1)

• Partially observable– Computation of marginal likelihood:

Intractable– Not decomposable to the product of local

terms

– Solutions• Approximating the marginal likelihood• Structural EM

Z

GPGZVPGVP

)|(),|,()|(

hidden variables

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 87

Us, Po, Fr (2)

• Approximating the marginal likelihood– Candidate’s method

),|(

)|(),|()|(

*

**

GDP

GPGDPGDP

G

GG

from Gibbs sampling

from BN’s inference algorithm

trivial

MLE of params.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 88

Us, Po. Fr (3)

• Structural EM– Idea: decomposition of expected complete-

data log-likelihood (BIC-score)– Search inside EM

• (EM inside Search is high cost process)

Md

NG i

i jkijkijk log

2log)(BICscore

Md

NG i

i jkijkijk log

2ˆlog)(EBICscore

MLE of params.

m

miiijk DjXPakXPN ),|)(,(

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 89

Us, Po, Ba (1)

• Combined MCMC– MCMC for Bayesian model averaging– MCMC over the values of the unobserved

nodes.

Copyright (c) 2002 by SNU CSE Biointelligence Lab. 90

Conclusion

• Has learning of structure important meaning?– In paper, Yes.– In engineering, No.

• What can AI do for human?• What can human do for Machine

learning algorithm?

top related