introduction to graphical models
DESCRIPTION
Introduction to Graphical Models. Brookes Vision Lab Reading Group. Graphical Models. To build a complex system using simpler parts. System should be consistent Parts are combined using probability Undirected – Markov random fields Directed – Bayesian Networks. Overview. Representation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/1.jpg)
Introduction to Graphical Models
Brookes Vision Lab Reading Group
![Page 2: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/2.jpg)
Graphical Models
• To build a complex system using simpler parts.
• System should be consistent• Parts are combined using probability• Undirected – Markov random fields• Directed – Bayesian Networks
![Page 3: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/3.jpg)
Overview
• Representation• Inference• Linear Gaussian Models• Approximate inference• Learning
![Page 4: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/4.jpg)
Causality : Sprinkler “causes” wet grass
Representation
![Page 5: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/5.jpg)
Conditional Independence
• Independent of ancestors given parents• P(C,S,R,W) = P(C) P(S|C) P(R|C,S) P(W|C,S,R)• = P(C) P(S|C) P(R|C) P(W|S,R)
• Space required for n binary nodes– O(2n) without factorization– O(n2k) with factorization, k = maximum fan-in
![Page 6: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/6.jpg)
Inference
• Pr(S=1|W=1) = Pr(S=1,W=1)/Pr(W=1) = 0.2781/0.6471 = 0.430• Pr(R=1|W=1) = Pr(R=1,W=1)/Pr(W=1) = 0.4581/0.6471 = 0.708
![Page 7: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/7.jpg)
Explaining Away
• S and R “compete” to explain W=1
• S and R are conditionally dependent
• Pr(S=1|R=1,W=1) = 0.1945
![Page 8: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/8.jpg)
Inference
where
where
![Page 9: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/9.jpg)
Inference
• Variable elimination• Choosing optimal ordering – NP hard• Greedy methods work well• Computing several marginals• Dynamic programming avoids redundant
computation• Sound familiar ??
![Page 10: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/10.jpg)
Bayes Balls for Conditional Independence
![Page 11: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/11.jpg)
A Unifying (Re)View
Linear GaussianModel (LGM)
FA SPCA PCA LDS
Mixture of Gaussians VQ HMM
Continuous-State LGM
Basic Model
Discrete-State LGM
![Page 12: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/12.jpg)
Basic Model● State of a system is a k-vector x (unobserved)● Output of a system is a p-vector y (observed) ● Often k << p
● Basic model ● xt+1 = A xt + w● yt = C xt + v
● A is the k x k transition matrix● C is a p x k observation matrix● w = N(0, Q)● v = N(0, R)
● Noise processes are essential
Zero mean w.l.o.g
![Page 13: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/13.jpg)
Degeneracy in Basic Model
• Structure in Q can be moved to A and C• W.l.o.g. Q = I• R cannot be restricted as yt are observed• Components of x can be reordered arbitrarily.• Ordering is based on norms of columns of C.• x1 = N(µ1, Q1)• A and C are assumed to have rank k.• Q, R, Q1 are assumed to be full rank.
![Page 14: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/14.jpg)
Probability Computation
• P( xt+1 | xt ) = N(A xt, Q ; xt+1)
• P( yt | xt ) = N( C xt, R; yt)
• P({x1,..,xT,{y1,..,yT}) =
P(x1) П P(xt+1|xtП P(yt|xt)• Negative log probability
![Page 15: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/15.jpg)
Inference● Given model parameters {A, C, Q, R, µ1, Q1}● Given observations y● What can be infered about hidden states x ?● Total likelihood
● Filtering : P (x(t) | {y(1), ... , y(t)})● Smoothing: P (x(t) | {y(1), ... , y(T)})● Partial smoothing: P (x(t) | {y(1), ... , y(t+t')})● Partial prediction: P (x(t) | {y(1), ... , y(t-t')})● Intermediate values of recursive methods for computing total likelihood.
![Page 16: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/16.jpg)
Learning• Unknown parameters {A, C, Q, R, µ1, Q1}• Given observations y• Log-likelihood
F(Q,Ө) – free energy
![Page 17: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/17.jpg)
EM algorithm• Alternate between maximizing F(Q,Ө) w.r.t. Q and
Ө.
• F = L at the beginning of M-step• E-step does not change Ө• Therefore, likelihood does not decrease.
![Page 18: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/18.jpg)
Continuous-State LGM
Continuous-State LGM
Static Data Modeling Time-series Modeling
● No temporal dependence ● Factor analysis● SPCA● PCA
● Time ordering of data crucial● LDS (Kalman filter models)
![Page 19: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/19.jpg)
Static Data Modelling
• A = 0• x = w• y = C x + v• x1 = N(0,Q)• y = N(0, CQC'+R)• Degeneracy in model• Learning : EM
– R restricted• Inference
![Page 20: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/20.jpg)
Factor Analysis
• Restrict R to be diagonal.• Q = I• x – factors• C – factor loading matrix• R – uniqueness• Learning – EM , quasi-Newton optimization• Inference
![Page 21: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/21.jpg)
SPCA
• R = єI• є – global noise level• Columns of C span the principal subspace.• Learning – EM algorithm• Inference
![Page 22: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/22.jpg)
PCA• R = lim є->0 єI• Learning
– Diagonalize sample covariance of data– Leading k eigenvalues and eigenvectors define C– EM determines leading eigenvectors without
diagonalization• Inference
– Noise becomes infinitesimal– Posterior collapses to a single point
![Page 23: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/23.jpg)
Linear Dynamical Systems
• Inference – Kalman filter• Smoothing – RTS recursions• Learning – EM algorithm
– C known – Shumway and Stoffer, 1982– All unknown – Ghahramani and Hinton, 1995
![Page 24: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/24.jpg)
Discrete-State LGM
• xt+1 = WTA[A xt + w]
• yt = C xt + v• x1 = WTA[N(µ1,Q1)]
![Page 25: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/25.jpg)
Discrete-State LGM
Discrete-state LGM
Static Data Modeling Time-series Modeling
● Mixture of Gaussians● VQ
● HMM
![Page 26: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/26.jpg)
Static Data Modelling
• A = 0• x = WTA[w]• w = N(µ,Q)• Y = C x + v• лj = P(x = ej)
• Nonzero µ for nonuniform лj
• y = N(Cj, R)
• Cj – jth column of C
![Page 27: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/27.jpg)
Mixture of Gaussians• Mixing coefficients of cluster лj
• Mean – columns Cj
• Variance – R• Learning: EM (corresponds to ML
competitive learning)• Inference
![Page 28: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/28.jpg)
Vector Quantization• Observation noise becomes infinitesimal• Inference problem solved by 1NN rule• Euclidean distance for diagonal R• Mahalanobis distance for unscaled R• Posterior collapses to closest cluster• Learning with EM = batch version of k-
means
![Page 29: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/29.jpg)
Time-series modelling
![Page 30: Introduction to Graphical Models](https://reader036.vdocuments.net/reader036/viewer/2022081502/56815d01550346895dcb0091/html5/thumbnails/30.jpg)
HMM
• Transition matrix T• Ti,j = P(xt+1 = ej | xt = ei)• For every T, there exist A and Q• Filtering : forward recursions• Smoothing: forward-backward algorithm• Learning: EM (called Baum-Welsh
reestimation)• MAP state sequences - Viterbi