expectation maximization a “gentle” introduction

11
Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science

Upload: bian

Post on 14-Jan-2016

101 views

Category:

Documents


2 download

DESCRIPTION

Expectation Maximization A “Gentle” Introduction. Scott Morris Department of Computer Science. Basic Premise. Given a set of observed data, X, what is the underlying model that produced X? Example: distributions – Gaussian, Poisson, Uniform - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Expectation Maximization A “Gentle” Introduction

Expectation MaximizationExpectation Maximization A “Gentle” Introduction

Scott Morris

Department of Computer Science

Page 2: Expectation Maximization A “Gentle” Introduction

Basic Premise

• Given a set of observed data, X, what is the Given a set of observed data, X, what is the underlying model that produced X?underlying model that produced X?– Example: distributions – Gaussian, Poisson, UniformExample: distributions – Gaussian, Poisson, Uniform

• Assume we know (or can intuit) what type of Assume we know (or can intuit) what type of model produced data model produced data

• Model has m parameters (Model has m parameters (ΘΘ1..1..ΘΘm)m)– Parameters are unknown, we would like to estimate Parameters are unknown, we would like to estimate

themthem

Page 3: Expectation Maximization A “Gentle” Introduction

Maximum Likelihood Estimators (MLE)

• P(P(ΘΘ|X) = Probability that a set of given |X) = Probability that a set of given parameters are “correct” ?? parameters are “correct” ??

• Instead define “likelihood” of the Instead define “likelihood” of the parameters given the data, L(parameters given the data, L(ΘΘ|X)|X)

• What if data is continuous?

Page 4: Expectation Maximization A “Gentle” Introduction

MLE continued• We are solving an optimization problem

• Often solve log() of Likelihood instead. – Why is this the same?

• Any method that maximizes the likelihood function is called a Maximum Likelihood Estimator

Page 5: Expectation Maximization A “Gentle” Introduction

Simple Example: Least Squares Fit

• Input: N points in R^2Input: N points in R^2• Model: A single line, y = ax+bModel: A single line, y = ax+b

– Parameters: a, bParameters: a, b

• Origin? Maximum Likelihood EstimatorOrigin? Maximum Likelihood Estimator

• Input: N points in R^2Input: N points in R^2• Model: A single line, y = ax+bModel: A single line, y = ax+b

– Parameters: a, bParameters: a, b

• Origin? Maximum Likelihood EstimatorOrigin? Maximum Likelihood Estimator

Page 6: Expectation Maximization A “Gentle” Introduction

Expectation Maximization

• An elaborate technique for maximizing An elaborate technique for maximizing the likelihood functionthe likelihood function

• Often used when observed data is Often used when observed data is incompleteincomplete– Due to problems in observation processDue to problems in observation process– Due to unknown or difficult distribution Due to unknown or difficult distribution

function(s)function(s)

• Iterative ProcessIterative Process• Still a local techniqueStill a local technique

Page 7: Expectation Maximization A “Gentle” Introduction

EM likelihood function

• Observed data X, assume missing Observed data X, assume missing data Y.data Y.

• Let Z be the complete dataLet Z be the complete data– Joint density functionJoint density function– P(z|P(z|ΘΘ) = p(x,y|) = p(x,y|ΘΘ) = p(y|x,) = p(y|x,ΘΘ)p(x|)p(x|ΘΘ))

• Define new likelihood functionDefine new likelihood functionL(L(ΘΘ|Z) = p(X,Y||Z) = p(X,Y|ΘΘ))

• X,X,ΘΘ are constants, so L() is a are constants, so L() is a random variable dependent on the random variable dependent on the random variable Y.random variable Y.

Page 8: Expectation Maximization A “Gentle” Introduction

“E” Step of EM Algorithm

• Since L(Since L(ΘΘ|Z) is itself a random variable, |Z) is itself a random variable, we can compute its expected value:we can compute its expected value:

• Can be thought of as computing the Can be thought of as computing the expected value of Y given the current expected value of Y given the current estimate of estimate of ΘΘ..

Page 9: Expectation Maximization A “Gentle” Introduction

““M” step of EM AlgorithmM” step of EM AlgorithmOnce we have expectation computed, Once we have expectation computed,

optimize optimize ΘΘ using the MLE. using the MLE.

Convergence – Various results proving Convergence – Various results proving convergence cited.convergence cited.

Generalized EM – Instead of finding Generalized EM – Instead of finding optimal optimal ΘΘ, choose one that increases , choose one that increases the MLEthe MLE

Page 10: Expectation Maximization A “Gentle” Introduction

Mixture Models

• Assume “mixture” of probability distributions:

• Log-likelihood function is difficult to optimize, use a trick:– Assume unobserved data items Y whose

values inform us which distribution generated each item in X.

Page 11: Expectation Maximization A “Gentle” Introduction

Update Equations

• After much derivation, estimates for new parameters in terms of old result:– Θ = (μ,Σ)

• Where μ is the mean and Σ is the variance of a d-dimensional normal distribution