what is it? when would you use it? why does it work? how do you implement it?

What is it?

When would you use it?

Why does it work?

How do you implement it?

Where does it stand in relation to other methods?

EM algorithm reading group

Introduction & Motivation

Theory

Practical

Comparison with other methods

Expectation Maximization (EM)

• Iterative method for parameter estimation where you have missing data

• Has two steps: Expectation (E) and Maximization (M)

• Applicable to a wide range of problems

• Old idea (late 50’s) but formalized by Dempster, Laird and Rubin in 1977

• Subject of much investigation. See McLachlan & Krishnan book 1997.

Applications of EM (1)

• Fitting mixture models

• Probabilistic Latent Semantic Analysis (pLSA)– Technique from text community

P(w,d) P(w|z) P(z|d)

• Learning parts and structure models

• Automatic segmentation of layers in video

http://www.psi.toronto.edu/images/figures/cutouts_vid.gif

Motivating example

0 1-2 -1-4 -3 4 52 3

Model:

Parameters:

OBJECTIVE: Fit mixture of Gaussian model with C=2 components

keep fixed

i.e. only estimate x

Likelihood functionLikelihood is a function of parameters, Probability is a function of r.v. x DIFFERENT TO LAST PLOT

Probabilistic modelImagine model generating data

Need to introduce label, z, for each data point

Label is called a latent variablealso called hidden, unobserved, missing

0 1-2 -1-4 -3 4 52 3

Simplifies the problem:if we knew the labels, we can decouple the components as estimate

parameters separately for each one

Intuition of EME-step: Compute a distribution on the labels of the points, using current parameters

M-step: Update parameters using current guess of label distribution.

Theory

Complete log-likelihood (CLL)

Log-likelihood [Incomplete log-likelihood (ILL)]

Expected complete log-likelihood (ECLL)

Some definitionsObserved data

Latent variables

Iteration index

Continuous I.I.D

Discrete 1 ... C

Use Jensen’sinequality

Lower bound on log-likelihood

AUXILIARY FUNCTION

Jensen’s InequalityJensen’s inequality:

For a real continuous concave function and

Equality holds when all x are the same

1. Definition of concavity. Consider where

2. By induction:

Recall key result : Auxiliary function is LOWER BOUND on likelihood

EM is alternating ascent

Alternately improve q then :

Is guaranteed to improve likelihood itself….

E-step: Choosing the optimal q(z|x,)

Turns out that q(z|x,) = p(z|x,t) is the best.

E-step: What do we actually compute?

nComponents x nPoints matrix (columns sum to 1):

Component 1

Component 2

Responsibility of component for point :

M-Step

Entropy termECLL

Auxiliary function separates into ECLL and entropy term:

M-Step

From previous slide:

Recall definition of ECLL:

From E-step

Let’s see what happens for

Practical

InitializationMean of data + random offsetK-Means

TerminationMax # iterationslog-likelihood changeparameter change

ConvergenceLocal maximaAnnealed methods (DAEM)Birth/death process (SMEM)

Numerical issuesInject noise in covariance matrix to prevent blowupSingle point gives infinite likelihood

Number of componentsOpen problemMinimum description lengthBayesian approach

Practical issues

Local minima

Robustness of EM

What EM won’t do

Pick structure of model# componentsgraph structure

Find global maximum

Always have nice closed-form updatesoptimize within E/M step

Avoid computational problemssampling methods for computing expectations

Comparison with other methods

Why not use standard optimization methods?

• No step size

• Works directly in parameter space model, thus parameter constraints are obeyed

• Fits naturally into graphically model frame work

• Supposedly faster

In favour of EM:

Gradient

Newton

Gradient

Newton

Acknowledgements

Shameless stealing of figures and equations and explanations from:

Frank DellaertMichael JordanYair Weiss

what is it? when would you use it? why does it work? how do you implement it?

em algorithm

function of parameters

current parametersmstep

expectation e

estimate parameters

update parameters

fit mixture of gaussian

missing data

Documents

five social media tips you can implement today

are you ready to implement it solutions? - adrian stoian ·...

plan it! write it! publish it!...first author...

programmatic marketing workshop - iab.it 4 ad1 implement...

what is it? when would you use it? why does it work? how do...

what is it? when would you use it? why does it work? ...

the complete guide · this ebook will teach you everything...

5 reasons why you should implement kitaron

seo strategy: basic strategies you can implement today

the consensus problem...consensus is important • with...

what is an evp and how do you implement it?

15 marketo protips: advice you can implement today

strategies did it implement to remain competitive?

3 signs you should implement time keeping software

are you smart enough to implement smart grid?

what is it? when would you use it? why does it work? ...

unless… you do it right - doncrowther.com€¦ ·...

industrie 4.0: implement it! - werkzeugbau-akademie.de ·...

can you really implement taxonomies in native sharepoint?

mathematics - educationpng.gov.pg · the teachers guide is...