factorial mixture of gaussians and the marginal independence model ricardo silva...
TRANSCRIPT
![Page 1: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/1.jpg)
Factorial Mixture of Gaussiansand the Marginal Independence Model
Ricardo Silva [email protected]
Joint work-in-progress with Zoubin Ghahramani
![Page 2: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/2.jpg)
Goal
• To model sparse distributions subject to marginal independence constraints
![Page 3: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/3.jpg)
Why?
X1
Y1 Y2 Y3 Y4 Y5
X2
![Page 4: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/4.jpg)
Why?
X1
Y1
X2
Y2
X3
Y3
H12 H23
![Page 5: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/5.jpg)
Why?
![Page 6: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/6.jpg)
How?
X1
Y1
X2
Y2
X3
Y3
![Page 7: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/7.jpg)
How?
![Page 8: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/8.jpg)
Context
• Yi = fi(X, Y) + Ei, where Ei is an error term
• E is not a vector of independent variables• Assumed: sparse structure of marginally
dependent/independent variables• Goal: estimating E-like distributions
![Page 9: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/9.jpg)
Why not latent variable models?
• Requires further decisions• How many latents? Which children?• Silly when marginal structure is sparse• In the Bayesian case:– Drag down MCMC methods with (sometimes much)
extra autocorrelation– Requires priors over parameters that you didn’t even
care in the first place
• (Note: this talk is not about Bayesian inference)
![Page 10: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/10.jpg)
Example
![Page 11: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/11.jpg)
Example
![Page 12: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/12.jpg)
Bi-directed models: The story so far
• Gaussian models– Maximum likelihood (Drton and Richardson, 2003)– Bayesian inference (Silva and Ghahramani, 2006,
2008)
• Binary models– Maximum likelihood (Drton and Richardson, 2008)
![Page 13: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/13.jpg)
New model: mixture of Gaussians
• Latent variables: mixture indicators– Assumed #levels is decided somewhere else
• No “real” latent variables
![Page 14: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/14.jpg)
Caveat emptor
• I think you should buy this, but be warned that speed of computation is not the primary concern of this talk
![Page 15: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/15.jpg)
Simple?
Y1 Y2 Y3
C
Y1, Y2, Y3 jointly Gaussian with sparse covariance matrix c indexed by C
![Page 16: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/16.jpg)
Not really
Y1 Y2 Y3
C
![Page 17: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/17.jpg)
Required: a factorial mixture of Gaussians
c1
c2
c3
![Page 18: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/18.jpg)
A parameterization under latent variables
Assume Z variables are zero-mean Gaussians, C variables are binary
![Page 19: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/19.jpg)
A parameterization under latent variables
![Page 20: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/20.jpg)
Implied indexing
ij should be indexed only by those cliques containing both Yi and Yj
![Page 21: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/21.jpg)
Implied indexing
i should be indexed only by those cliques containing Yi
![Page 22: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/22.jpg)
Factorial mixture of Gaussians and the marginal independence model
• The general case for all latent structures
• Constraints: in the indexing
whenever c and c’ agree on clique indicators in the intersection of the cliques with Yi and Yj
![Page 23: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/23.jpg)
Factorial mixture of Gaussians and the marginal independence model
• The parameter pool (besides mixture probs.):– {c[i]}, the mean vector– {ii
c[i]}, the variance vector
– {ijc[ij]}, the covariance vector
• For Yi and Yj linked in the bi-directed graph (since covariance is zero otherwise): marginal independence constraints
• Given c, we assemble the corresponding mean and covariance
![Page 24: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/24.jpg)
Size of the parameter space
• Let L[i j] be the size of the largest clique intersection, L[i] largest clique
• Let k be the maximum number of values any mixture indicator can take
• Let p be the number of variables, e the number of edges in bi-directed graph
• Total number of parameters:
O(ekL[i j] + pkL[i])
![Page 25: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/25.jpg)
Size of the parameter space
• Notice this is not a simple function of sparseness:– Dependence on the number of clique
intersections– Non-sparse models can have few cliques
• In decomposable models, number of clique intersections is given by the branch factor of the junction tree
![Page 26: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/26.jpg)
Maximum likelihood estimation
• An EM framework
![Page 27: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/27.jpg)
Maximum likelihood estimation
• An EM framework
![Page 28: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/28.jpg)
Algorithms
• First, solving the exact problem (exponential in the number of cliques)
• Constraints– Positive definite constraints– Marginal independence constraints
• Gradient-based methods– Moving in all dimensions quite unstable• Violates constraints
– Move over a subset while keeping part fixed
![Page 29: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/29.jpg)
Iterative conditional fitting: Gaussian case
• See Drton and Richardson (2003)• Choose some Yi• Fix the covariance of Y\i
• Fit the covariance of Yi with Y\i, and its variance
• Marginal independence constraints introduced directly
![Page 30: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/30.jpg)
12 = 3,12
Iterative conditional fitting: Gaussian case
Y1 Y2 Y3
11 12
12 22
Y1
Y2
Y3
1
2
2
33
Y1
Y2
Y3
23
13
13
23= 31
32= 0
32
1311 + 12 = 023 13 = f( , 12)23
bb
b b
b3bb
b b b b
![Page 31: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/31.jpg)
Iterative conditional fitting: Gaussian case
Y1 Y2 Y3 Y1
Y2
Y3
3
Y1
Y2
Y3
23
R2.1
Y3 = R2.1 + 3, where R2.1 is the residualof the regression of Y2 on Y1
23
b
b
![Page 32: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/32.jpg)
How does it change in the mixture of Gaussians case?
Y1 Y2 Y3
C12 C23
Y1 Y2 Y3 Y1 Y2 Y3
Y1 Y2 Y3
C12 C23
![Page 33: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/33.jpg)
Parameter expansion
• Yi = b1cR1 + b2
cR2 + ... + c
• c does vary over all mixture indicators• That is, we create an exponential number of
parameters– Exponential in the number of cliques
• Where do the constraints go?
![Page 34: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/34.jpg)
Parameter constraints
• Equality constraints are back
• Similar constraints for the variances
b1cR1j
c + b2c R2j
c + ... + bkc Rkj
c =
b1c’ R1j
c’ + b2c’ R2j
c’ + ... + bkc’ Rkj
c’
![Page 35: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/35.jpg)
Parameter constraints
• Variances of c , c, have to be positive• Positive definiteness for all c is then
guaranteed (Schur’s complement)
![Page 36: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/36.jpg)
Constrained EM
• Maximize expected conditional of Yi given everybody else subjected to– An exponential number of constraints– An exponential number of parameters– Box constraints on gamma
• What does this buy us?
![Page 37: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/37.jpg)
Removing parameters
• Covariance equality constraints are linear
• Even a naive approach can work:– Choose a basis for b (e.g., one bij
c corresponding to each non-zero ij
c[ij])
– Basis is of tractable size (under sparseness)– Rewrite EM function as a function of basis only
b1cR1j
c + b2c R2j
c + ... + bkc Rkj
c =
b1c’ R1j
c’ + b2c’ R2j
c’ + ... + bkc’ Rkj
c’
![Page 38: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/38.jpg)
Quadratic constraints
• Equality of variances introduce quadratic constraints tying s and bs
• Proposed solution: – fix all ii
c[i] first– fit bijs with such fixed parameters
• Inequality constraints, non-convex optimization– Then fit s given b– Number of parameters back to tractable
• Always an exponential number of constraints• Note: reparameterization also takes an exponential number
of steps
![Page 39: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/39.jpg)
Relaxed optimization• Optimization still expensive. What to do?• Relaxed approach: fit bs ignoring variance
equalities– Fix , fit b– Quadratic program, linear equality constraints
• I just solve it in closed formula– s end up inconsistent
• Project them back to solution space without changing b – always possible
• May decrease expected log-likelihood– Fit given bs
• Nonlinear programming, trivial constraints
![Page 40: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/40.jpg)
Recap
• Iterative conditional fitting: maximize expected conditional log-likelihood
• Transform to other parameter space– Exact algorithm: quadratic inequality constraints
“instead” of SD ones– Relaxed algorithm: no constraints• No constraints?
![Page 41: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/41.jpg)
Approximations
• Taking expectations is expensive what to do?
• Standard approximations use a ``nice’’ ’(c)– E.g., mean-field methods
• Not enough!
![Page 42: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/42.jpg)
A simple approach
• The Budgeted Variational Approximation• As simple as it gets: maximize a variational
bound forcing most combinations of c to give a zero value to ’(c)– Up to a pre-fixed budget
• How to choose which values?• This guarantees positive-definitess only of
those (c) with non-zero conditionals ’(c)– For predictions, project matrices first into PD cone
![Page 43: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/43.jpg)
Under construction
• Implementation and evaluation of algorithms– (Lesson #1: never, ever, ever, use MATLAB
quadratic programming methods)
• Control for overfitting– Regularization terms
• The first non-Gaussian directed graphical model fully closed under marginalization?
![Page 44: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/44.jpg)
Under construction: Bayesian methods
• Prior: product of experts for covariance and variance entries (times a BIG indicator function)
• MCMC method: a M-H proposal based on the relaxed fitting algorithm– Is that going to work well?
• Problem is “doubly-intractable”– Not because of a partition function, but because of
constraints– Are there any analogues to methods such as
Murray/Ghahramani/McKay’s?
![Page 45: Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva silva@statslab.cam.ac.uk Joint work-in-progress with Zoubin Ghahramani](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649c755503460f94929bf5/html5/thumbnails/45.jpg)
Thank You