mixture of gaussians models - redwood center for ... · assign datapoints to clusters each...
TRANSCRIPT
![Page 1: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/1.jpg)
Mixture of GaussiansModels
![Page 2: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/2.jpg)
Outline● Inference, Learning, and Maximum Likelihood
● Why Mixtures? Why Gaussians?
● Building up to the Mixture of Gaussians○ Single Gaussians○ Fully-Observed Mixtures○ Hidden Mixtures
![Page 3: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/3.jpg)
Perception Involves Inference and Learning● Must infer the hidden causes, α, of sensory data, x
○ Sensory data: air pressure wave frequency composition, patterns of electromagnetic radiation○ Hidden causes: proverbial tigers in bushes, lecture slides, sentences
● Must learn the correct model for the relationship between hidden causes and sensory data
○ Models will be parameterized, with parameters θ○ We will use quality of prediction as our figure of merit
![Page 4: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/4.jpg)
Generative models
explanation or prediction
inference
![Page 5: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/5.jpg)
Maximum Likelihood and Maximum a Posteriori
● The model parameters θ that make the data most probable are called the maximum likelihood parameters
● or hidden causes α or causes
INFERENCE →
LEARNING →
![Page 6: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/6.jpg)
● Taking logs doesn’t change the answer
● Logs turn multiplication into addition
● Logs turn many natural operations on probabilities into linear algebra operations
● Negative log probabilities arise naturally in information theory
In practice, we maximize log-likelihoods
![Page 7: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/7.jpg)
The Maximum Likelihood Answer Depends on Model Class
![Page 8: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/8.jpg)
Outline● Inference, Learning, and Maximum Likelihood
● Why Mixtures? Why Gaussians?
● Building up to the Mixture of Gaussians○ Single Gaussians○ Fully-Observed Mixtures○ Hidden Mixtures
![Page 9: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/9.jpg)
Why Mixtures?
![Page 10: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/10.jpg)
What is a Mixture Model?
DATA →
LIKELIHOOD →
PRIOR →
● This is precisely analogous to using a basis to approximate a vector
![Page 11: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/11.jpg)
Example Mixture Datasets
Mixtures of Uniforms
Mixtures of GaussiansSpike-And-Gaussian Mixtures
![Page 12: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/12.jpg)
Why Gaussians?
![Page 13: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/13.jpg)
Why Gaussians?
Wikipedia
![Page 14: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/14.jpg)
Why Gaussians? An unhelpfully terse answer.● Gaussians satisfy a particular differential equation:
● From this differential equation, all the properties of the Gaussian family can be derived without solving for the explicit form.○ Gaussians are isotropic, Fourier transform of a Gaussian is a Gaussian, sum of Gaussian RVs
is Gaussian, Central Limit Theorem
● See this blogpost for details: http://bit.ly/gaussian-diff-eq
![Page 15: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/15.jpg)
Why Gaussians?● Gaussians are everywhere, thanks to the Central Limit Theorem
● Gaussians are the maximum entropy distribution with a given center (mean) and spread (std dev)
● Inference on Gaussians is linear algebra
![Page 16: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/16.jpg)
Central Limit Theorem
● Statistics: adding up independent random variables with finite variances results in a Gaussian distribution
● Science: if we assume that many small, independent random factors produce the noise in our results, we should see a Gaussian distribution
![Page 17: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/17.jpg)
Central Limit Theorem in Action
![Page 18: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/18.jpg)
Central Limit Theorem in Action
![Page 19: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/19.jpg)
Why Gaussians?● Gaussians are everywhere, thanks to the Central Limit Theorem
● Gaussians are the maximum entropy distribution with a given center (mean) and spread (std dev)
● Inference on Gaussians is linear algebra
![Page 20: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/20.jpg)
Gaussians are a natural MAXENT distribution● The principle of maximum entropy (MAXENT) will be covered in detail later
● Teaser: MAXENT maps statistics of data to probability distributions in a principled, faithful manner
● For the most common choice of statistic, mean ± s.d., the MAXENT is a Gaussian
![Page 21: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/21.jpg)
Why Gaussians?● Gaussians are everywhere, thanks to the Central Limit Theorem
● Gaussians are the maximum entropy distribution with a given center (mean) and spread (std dev)
● Inference on Gaussians is linear algebra
![Page 22: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/22.jpg)
Inference with Gaussians is “just” linear algebra● The log-probabilities of a Gaussian are a negative-definite quadratic form
● Quadratic forms can be mapped onto matrices
● So solving an inference problem becomes solving a linear algebra problem
● Linear algebra is the Scottie Pippen of mathematics
![Page 23: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/23.jpg)
Outline● Inference, Learning, and Maximum Likelihood
● Why Mixtures? Why Gaussians?
● Building up to the Mixture of Gaussians○ Single Gaussians○ Fully-Observed Mixtures○ Hidden Mixtures
![Page 24: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/24.jpg)
What is a Gaussian Mixture Model?
Example
DATA →
LIKELIHOOD →
PRIOR →
![Page 25: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/25.jpg)
Maximum Likelihood for Gaussian Mixture ModelsPlan of Attack:
1. ML for a single Gaussian2. ML for a fully-observed mixture3. ML for a hidden mixture
![Page 26: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/26.jpg)
Maximum Likelihood for a Single Gaussian
![Page 27: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/27.jpg)
Maximum Likelihood for a Single Gaussian
![Page 28: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/28.jpg)
Maximum Likelihood for a Single Gaussian
By a similar argument:
![Page 29: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/29.jpg)
Maximum Likelihood for Gaussian Mixture ModelsPlan of Attack:
1. ML for a single Gaussian2. ML for a fully-observed mixture3. ML for a hidden mixture
![Page 30: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/30.jpg)
Maximum Likelihood for Fully-Observed Mixture● “Observed Mixture” means we receive datapoints (x,α).
● Examples: classification (discrete), regression (continuous)
Memming Wordpress Blog
Cats Dogs
![Page 31: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/31.jpg)
Maximum Likelihood for Fully-Observed Mixture● For each mixture element, the problem is exactly the same - what are the
parameters of a single Gaussian?
● Because we know which mixture each data point came from, we can solve all these problems separately, using the same method as for a single Gaussian.
● How do we figure out the mixture weights w?
![Page 32: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/32.jpg)
Bonus: We Can Now Classify Unlabeled Datapoints● We can label new datapoints x with a corresponding α using our model
● This is the key idea behind supervised learning approaches in general.
● How do we label them?○ Max Likelihood method - find the closest mean (in z-score units), that’s our label○ Fully Bayesian method - maintain a distribution over the labels - p(α | x ; θ)
Memming Wordpress BlogCats
Dogs
![Page 33: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/33.jpg)
Maximum Likelihood for Gaussian Mixture Models
Plan of Attack:
1. ML for a single Gaussian2. ML for a fully-observed mixture3. ML for a hidden mixture
![Page 34: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/34.jpg)
Hidden Variables Example: Spike Sorting
Martinez, Quiran Quiroga, et al., 2009Journal of Neuroscience Methods
![Page 35: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/35.jpg)
Maximum Likelihood for Models with Hidden Variables● p(x | μ,Σ,α) is the same, but now we don’t have the labels α.
● Problem: if we had the labels, we could find the parameters (just as before), and if we had the parameters, we could compute the labels (again, just as before). It’s a chicken-and-egg problem!
● Solution: let’s just “make-believe” we have the parameters.
![Page 36: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/36.jpg)
Our Clustering Algorithm on Spike Sorting
Wikipedia
![Page 37: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/37.jpg)
Wikipedia
Our Clustering Algorithm on Spike Sorting
![Page 38: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/38.jpg)
The K-Means Algorithm1. Make up K values for the means of the clusters
○ Usually initialized randomly
2. Assign datapoints to clusters○ Each datapoint is assigned to the nearest cluster
3. Update the cluster means to the new empirical means4. Repeat 2-4.
![Page 39: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/39.jpg)
Issues with K-Means1. Cluster assignment step (inference) is not Bayesian
2. Small changes in data can cause big changes in behavior
![Page 40: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/40.jpg)
“Soft” Clustering?
Bishop - PRML
![Page 41: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/41.jpg)
Expectation-Maximization for Means1. Make up K values for the means2. (E) Infer p(α|x) for each x and α3. (M) Update the means via weighted average
a. Weight the contribution of datapoint x by p(α|x)
4. Repeat 2-4.
![Page 42: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/42.jpg)
Full Expectation-Maximization1. Make up K values for the means, covariances, and mixture weights2. (E) Infer p(α|x) for each x and α3. (M) Update the parameters with weighted averages
a. Weight the contribution of datapoint x by p(α|x)
4. Repeat 2-4.
![Page 43: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/43.jpg)
E-Step: Bayes’ Rule for Inference
![Page 44: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/44.jpg)
M-Step: Direct Maximization
![Page 45: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/45.jpg)
Bonus Slides: Information Geometry
![Page 46: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/46.jpg)
A Geometric View
Family of Model Distributions All Probability Distributions
![Page 47: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/47.jpg)
Geometry of MLE
Family of Model Distributions(here, Gaussians with different means)
All Probability Distributions
![Page 48: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/48.jpg)
Geometry of EM
Family of Model Distributions
All Probability Distributions
Family of Data Labelings
(same p(x), different p(a|x))
![Page 49: Mixture of Gaussians Models - Redwood Center for ... · Assign datapoints to clusters Each datapoint is assigned to the nearest cluster 3. Update the cluster means to the new empirical](https://reader035.vdocuments.net/reader035/viewer/2022081613/5fb9ba5e905ec326a17950d9/html5/thumbnails/49.jpg)
Family of Model Distributions
Family of Data Labelings
E-Step: InferenceM-Step: Learning