Download - New Probabilistic Graphical Models (Cmput 651): Hybrid Network …rgreiner/C-651/SLIDES/MB... · 2013. 1. 2. · 24/11/2008 Reading: Handout on Hybrid Networks (Ch. 13 from older

Probabilistic Graphical Models (Cmput 651):Hybrid Network

Matthew Brown

24/11/2008

Reading: Handout on Hybrid Networks

(Ch. 13 from older version of Koller‐Friedman)1

Cmput 651 - Hybrid Networks 24/11/2008

Space of topics

Directed UnDirected

Semantics

Learning

Discrete

Continuous

Inference

2


Outline

Inference in purely continuous nets

Hybrid network semantics

Inference in hybrid networks

3


Linear Gaussian Bayesian networks (KF Definition 6.2.1)

Definition:

A linear Gaussian Bayesian network satisfies:

• all variables continuous• all CPDs are linear Gaussians

4

A B C

D

E

Example:

P (A) = N (µA, σ2A)P (B) = N (µB , σ2B)P (C) = N (µC , σ2C)P (D|A, B) = N (βD,0 + βD,1A + βD,2B, σ2D)P (E|C, D) = N (βE,0 + βE,1C + βE,2D,σ2E)


Inference in linear Gaussian Bayes nets

Recall: linear Gaussian Bayes nets (LGBN) equivalent to multivariate Gaussian distribution

To marginalize, could convert LGBN to Gaussianmarginalization trivial for Gaussian

But ignores structureexample

LGBN: 3n‐1 parametersGaussian: n2+n parameters

bad for large n, eg: > 1000 5

X1 X2 Xn...

p(Xi|Xi−1) = N (βi + αiXi−1;σ2i )


Variable elimination

Marginalize out unwanted X using integrationrather than sum, as in discrete case

Note:

Variable elimination gives exact answers for continuous nets

(not for hybrid nets)

6


Variable elimination example

7

X1 X3 X4

X2 p(X4) =∫

X1,X2,X3

P (X1, X2, X3, X4)

=∫

X1,X2,X3

P (X1)P (X2)P (X3|X1, X2)P (X4|X3)

=∫

X1

P (X1)∫

X2

P (X2)∫

X3

P (X3|X1, X2)P (X4|X3)

Need a way to represent intermediate factors.Not Gaussian ‐ eg: conditional probabilities not (jointly) Gaussian

Need elimination, product, etc. on this representation


Canonical forms (KF Handout Def’n 13.2.1)

Definition:

canonical form

Also written 

8


Canonical forms and Gaussians (KF Handout 13.2.1)

Canonical forms can represent Gaussians:

So:

9


Canonical forms and Gaussians (KF Handout 13.2.1)

Canonical forms can representGaussians

Other things (when K‐1 not defined)

eg: linear Gaussian CPDs

Can also use conditional forms (multivariate linear Gaussian   P(X|Y) ) to represent linear Gaussian CPDs or Gaussians.

10


Operations on canonical forms (KF Handout 13.2.2)

Factor product:

When scopes don’t overlap, must extend them:

Product of

     and

1st:

similarly for 

product:11



Factor division (for belief‐update message passing)

Note multiplying or dividing by vacuous canonical form C(0,0,0) has no effect.

12



Marginalization

given                              over set of variables {X,Y}

want

    require KYY positive definite so that integral is finite

marginal

13



Conditioning

given                              over set of variables {X,Y}

want to condition on Y=y

‐>

14

Notice: Y no longer part of canonical form after conditioning (unlike tables).


Inference on linear Gaussian Bayesian nets(KF Handout 13.2.3)

Factor operationssimple, closed form

‐> Variable elimination

‐> Sum‐product message passing

‐> Belief‐update message passing

Note on conditioning:conditioned variables disappear from canonical form

unlike with factor reduction on table factors

‐> must restrict all factors relevant to inference based on evidence Y=y before doing inference 15



Computational performancecanonical form operations polynomial in factor scope size n

product & division O(n2)

marginalization ‐> matrix inversion ≤ O(n3)

‐> inference in LGBNslinear in # cliquescubic in max. clique size

for discrete networks

factor operations on table factors exponential in scope size

16



Computational performance (cont’d)‐ for low dimensionality (small # variables), Gaussian representation can be more efficient

‐ for high dimensionality and low tree width, message passing on LGBN much more efficient

17


Summary

Inference on linear Gaussian Bayesian nets:use canonical forms

variable elimination or clique tree calibration

exact

efficient

18


Outline




19


Hybrid networks (KF 5.5.1)

Hybrid networks combine discrete and continuous variables

20


Conditional linear Gaussian (CLG) models (KF 5.1)

Definition:

Given: continuous variable X with

       discrete parents

 continuous parents

X has a conditional linear Gaussian CPD

if for each assignment

∃ coefficients                          and variance 

such that

21


Conditional linear Gaussian (CLG) models (KF 5.1)

Definition:

A Bayesian network is a

conditional linear Gaussian network

if:• discrete nodes have only discrete parents• continuous nodes have conditional linear Gaussian CPDs

‐ continuous parents cannot have discrete children.

‐ mixture (weighted average) of Gaussians

weight = probability of discrete assignment22


CLG example

23

Country Gender

Weight

HeightWeight is CLG withcontinuous parent heightdiscrete parents country and gender

p(W |h, c, g) = N (βc,g,0 + βc,g,1h;σ2c,g)


Discrete nodes with continuous parents

Option 1 ‐ hard threshold:eg: continuous X ‐> discrete Y

Y = 0 if X 

Linear sigmoid (Logistic or soft threshold)

25

p(Y = 1|x) = exp(θT x)

1 + exp(θT x)

x

P(Y=1|x)


Multivariate logit

26

Eg: stock tradingbuy (red)hold (green)sell (blue)

as function of stock pricelbuy = ‐3*(price‐18)lhold = 1lsell = 3*(price‐22)

Price

P(trade|price)

Price

Trade


Discrete node with discrete & continuous parents

Continuous parents’ input filtered through multivariate logit

Assignment to discrete parents’ determines coefficients for logit

27


Example hybrid net

28

stock trade (discrete) = {buy, hold, sell}parents: price (continuous), strategy (discrete) = {1 or 2}

strategy 1 (reddish)lbuy = ‐3*(price‐18)lhold = 1lsell = 3*(price‐22)

strategy 2 (blue/green)lbuy = ‐3*(price‐16)lhold = 1lsell = 1*(price‐26)

Price

P(trade|price,strategy)

Price Strategy

Trade


Outline



Inference in hybrid networksIssues

Non‐linear dependencies in continuous nets

Discrete & continuous nodes: CLGs

General hybrid networks

29


Variable elimination example (Handout Example 13.1.1)

Discrete D1 ... DnContinuous X1 ... Xn

30

p(D1 . . . Dn, X1 . . . Xn) =

(n∏

i=1

p(Di)

)p(X1|D1)

n∏

i=2

p(Xi|Di, Xi−1)

p(X2) =∑

D1,D2

∫

X1

p(D1, D2, X1, X2)

=∑

D1,D2

∫

X1

p(D1)p(D2)p(X1|D1)p(X2|D2, X1)

=∑

D2

p(D2)∫

X1

p(X2|D2, X1)∑

D1

p(X1|D1)p(D1)

‐> simple in principal (but see next slide)


Difficulties with inference in hybrid nets

1. must restrict representation (i.e. factors)implicit in choice to use CLGs for example

2. marginalization difficult with arbitrary hybrid netsespecially with non‐linear dependencies among nodes

continuous parent ‐> discrete node requires non‐linearity!

3. intermediate factors hard to represent / work with

eg: mixture of Gaussians from conditional linear Gaussian (CLG) representation

‐> approximation necessary with hybrid nets

31


Difficult marginalization (KF Handout Example 13.1.3)

32

Y XP (Y ) = N (0; 1)P (X) = N (Y 2; 1)

p(x, y) =1Z

exp(−y2 − (x− y2)2)

p(x) =∫

y

1Z

exp(−y2 − (x− y2)2)

‐> No analytic (closed form) solution!

Marginal

Joint

X non‐linear in Y


Variable elimination example (Handout Example 13.1.2)

Discrete binary D1 ... DnX1X2 ... XnWant P(X2)P(X1,X2) is a mixture of four Gaussians, 1 / assignment to {D1,D2}:

Can show P(X2) also a mixture of four Gaussians.not trivial to represent and work with 33

p(X1|d1) = N (β1,d1 ;σ21,d1)

p(Xi|di, xi−1) = N (βi,di + αi,dixi−1;σ2i,di)


Discretization (KF Handout 13.1.3)

What about discretizing continuous variables?

Usually no:typically need fine‐grained representation of continuous X

i.e. large # bins

especially where P(X) large

need inference to find where P(X) large to discretize efficiently

defeats the purpose

‐> # bins usually excessively hugeAND table factors suffer from curse of dimensionality

exponential in |Val(X)|

34


Summary


Difficulties with variable eliminationfrom non‐linear dependencies

‐> non‐Gaussian intermediate factors

from mixing discrete & continuous variables‐> mixtures of Gaussians

General approach = approximate difficult intermediate factors with Gaussians

35


Outline







36


Approximating intermediate factors in VE (KF Handout 13.3.1)

General approach:during variable elimination, when difficult intermediate factor encountered, approximate with Gaussian

BUT Gaussians cannot represent:conditional distributions (CPDs)

general (unnormalized) factors

‐> must make sure to approximate only valid distributions with Gaussians

eg: to eliminate X from P(X|Y), must first multiply into a factor P(Y) to give p(X,Y)

‐> CPDs must be multiplied into factors in a topological ordering

i.e. an ordering with parents always before children

37


Example (KF Handout Example 13.3.2)

Cliques: C1 = {X,Y,Z}, C2 = {Z,W}Want P(Z|W=w1)

Variable elimination:Step 0:

initialize all cliques to vacuous canonical form C(0,0,0)i.e. initial potentials not product of initial factors

‐> C1’s initial factors: P(X),P(Y),P(Z|X,Y)

38


Example ‐ cont’d (KF Handout Example 13.3.2)

Cliques: C1 = {X,Y,Z}, C2 = {Z,W}Want P(Z|W=w1) 

Variable elimination:Step 1:

linearize P(X)i.e. approximate with Gaussian

represent as canonical form

then multiply into C1’s potential (C(0,0,0) initially)

Step 2: same for P(Y)

could do P(Y) in step 1, then P(X)

‐> C1’s potential =39

P̂ (X, Y )




Variable elimination:C1 has 

Step 3:

estimate

                                          (represented as canonical form)

eliminate X,Y: 

pass            as message to C2

40

P̂ (Z)

P̂ (X, Y, Z) ≈ P (X, Y, Z) = P (X, Y )P (Z|X, Y )P̂ (X, Y, Z) ∼ N

Note: distributionP̂ (X, Y )P (Z|X, Y )

P̂ (Z) =∫

X,YP̂ (X, Y, Z)




Variable elimination:C2 has 

Step 4:

estimate

                                          (represented as canonical form)

Step 5:

set W=w1

pass message                             to C1 (canonical form)

Step 6: 41

Note: distributionP̂ (Z)P (W |Z)

P̂ (W, Z) ≈ P (W, Z) = P (Z)P (W |Z)P̂ (W, Z) ∼ N

P̂ (W = w1, Z)

P̂ (Z|W = w1) = P̂ (W = w1, Z)

P̂ (Z)


Definition (KF Handout Def’n 13.3.1)

Definition: A clique tree T with a root clique Cr allows topological incorporation if for any variable X, the clique to which X’s CPD is assigned is upstream to or equal to the cliques to which X’s parents’ CPDs are assigned.

42


Approximating with Gaussians (KF Handout 13.3.2, 13.3.3)

Local approximations:Taylor series

Numerical integration

Global approximation

43


Outline







44


Inference in general hybrid nets (KF Handout 13.4.1)

NP‐hardeven for polytrees

mixture of exponentially many Gaussians

(1 / assignment to discrete variables)

eg: 2n assignments for n binary variables

even easiest casecontinuous nodes have at most one discrete binary parent

i.e. mixture of at most two Gaussians

even for easiest approximate inferenceon discrete binary nodes with relative error 

Canonical tables (KF Handout Def’n 13.4.3)

Definition:

A canonical table ϕ over discrete D and continuous X has entries ϕ(d):

one per assignment D=d

entry ϕ(d) = canonical form C(X;Kd,hd,gd)

Can represent:table factors

linear Gaussians

CLGs46


Canonical table example

47

Country Gender

Weight

Heightdiscrete country, gendercontinuous height, weight

Female Male

Canada C(KCan,F,hCan,F,gCan,F) C(KCan,M,hCan,M,gCan,M)

USA C(KUSA,F,hUSA,F,gUSA,F) C(KUSA,M,hUSA,M,gUSA,M)

China C(KChi,F,hChi,F,gChi,F) C(KChi,M,hChi,M,gChi,M)

India C(KInd,F,hInd,F,gInd,F) C(KInd,M,hInd,M,gInd,M)

Germany C(KGer,F,hGer,F,gGer,F) C(KGer,M,hGer,M,gGer,M)


Operations on canonical tables (KF Handout 13.4.2.1)

Extensions of canonical form operations:Product

Division

Marginalization over continuous variables 

Marginalization over discrete variables‐> factor not necessarily representable with canonical table

‐> approximate with Gaussians whenever marginalizing(in form of canonical table)

(see next slide)

48


Marginalization example (KF Handout 13.4.5)

49

Binary D, continuous XCanonical table:

Two Gaussians (blue, green)Red: sum (marginalization over D)‐> not Gaussian!

cannot be represented by canonical table(see next slide)


Marginalization example ‐ cont’d (KF Handout 13.4.5)

50

Binary D, continuous XCanonical table:

Two Gaussians (blue, green)Red: Gaussian approximation to sum over blue and green


Marginalization on canonical tables (KF Handout 13.4.2.1)

Weak marginalizationapproximate marginal as Gaussian

necessary when marginalizing across mixture of GaussiansNote: canonical tables MUST represent valid mixture

Strong marginalizationexact

marginalize over:marginalize out continuous variables only

factor over discrete only

identical canonical forms51


Inference in hybrid nets (KF Handout 13.4.2.2)

Cannot marginalize discrete variables‐> must restrict elimination order

KF Handout Example 13.4.10A,B,C discrete; X,Y,Z continuous

possible clique tree:

neither leaf clique can start message passing

eg: {B,X,Y} has CPDs for P(B), P(Y|B,X) but not P(X)

‐> canonical form over {X,Y} = linear Gaussian CPDs, not Gaussians ‐> cannot marginalize out B 52


Strong rooted clique trees

Definition: A clique Cr in a clique tree is a strong root if for each clique C1 and its upstream neighbour C2

C1‐C2 ⊆ {continuous variables}C1∩C2 ⊆ {discrete variables}

In a strongly rooted clique tree, upward pass toward strong root does not require any weak marginalization.

‐ in downward pass, all required factors present for weak marginalization to proceed

Example ‐ strongly rooted clique tree (from example on previous slide):

middle clique = strong root53


Strong root

sometimes, exist non‐strongly rooted clique tree that still allow inference

example (refer to example 2 slide previous)

Also, issue of building strongly rooted treessee KF Handout 13.4.2.4

54


Outline







55


Inference in general hybrid nets (KF Handout 13.4.3)

Two issues:non‐linear dependencies

intermediate factors

‐> marginalization on canonical tables ‐> non‐canonical tabular factor

solution: approximate with Gaussians(in form of canonical tables)‐> applies to both issues, as discussed above

‐> allows discrete nodes with continuous parentseg: can model thermostat

56


Approximate methods

Above, discussed variable‐elimination‐based methods

Also:particle based (KF Handout 13.5)

global approximate methods

57


Download - New Probabilistic Graphical Models (Cmput 651): Hybrid Network …rgreiner/C-651/SLIDES/MB... · 2013. 1. 2. · 24/11/2008 Reading: Handout on Hybrid Networks (Ch. 13 from older

Top Related