expectation maximization a “gentle” introduction
DESCRIPTION
Expectation Maximization A “Gentle” Introduction. Scott Morris Department of Computer Science. Basic Premise. Given a set of observed data, X, what is the underlying model that produced X? Example: distributions – Gaussian, Poisson, Uniform - PowerPoint PPT PresentationTRANSCRIPT
Expectation MaximizationExpectation Maximization A “Gentle” Introduction
Scott Morris
Department of Computer Science
Basic Premise
• Given a set of observed data, X, what is the Given a set of observed data, X, what is the underlying model that produced X?underlying model that produced X?– Example: distributions – Gaussian, Poisson, UniformExample: distributions – Gaussian, Poisson, Uniform
• Assume we know (or can intuit) what type of Assume we know (or can intuit) what type of model produced data model produced data
• Model has m parameters (Model has m parameters (ΘΘ1..1..ΘΘm)m)– Parameters are unknown, we would like to estimate Parameters are unknown, we would like to estimate
themthem
Maximum Likelihood Estimators (MLE)
• P(P(ΘΘ|X) = Probability that a set of given |X) = Probability that a set of given parameters are “correct” ?? parameters are “correct” ??
• Instead define “likelihood” of the Instead define “likelihood” of the parameters given the data, L(parameters given the data, L(ΘΘ|X)|X)
• What if data is continuous?
MLE continued• We are solving an optimization problem
• Often solve log() of Likelihood instead. – Why is this the same?
• Any method that maximizes the likelihood function is called a Maximum Likelihood Estimator
Simple Example: Least Squares Fit
• Input: N points in R^2Input: N points in R^2• Model: A single line, y = ax+bModel: A single line, y = ax+b
– Parameters: a, bParameters: a, b
• Origin? Maximum Likelihood EstimatorOrigin? Maximum Likelihood Estimator
• Input: N points in R^2Input: N points in R^2• Model: A single line, y = ax+bModel: A single line, y = ax+b
– Parameters: a, bParameters: a, b
• Origin? Maximum Likelihood EstimatorOrigin? Maximum Likelihood Estimator
Expectation Maximization
• An elaborate technique for maximizing An elaborate technique for maximizing the likelihood functionthe likelihood function
• Often used when observed data is Often used when observed data is incompleteincomplete– Due to problems in observation processDue to problems in observation process– Due to unknown or difficult distribution Due to unknown or difficult distribution
function(s)function(s)
• Iterative ProcessIterative Process• Still a local techniqueStill a local technique
EM likelihood function
• Observed data X, assume missing Observed data X, assume missing data Y.data Y.
• Let Z be the complete dataLet Z be the complete data– Joint density functionJoint density function– P(z|P(z|ΘΘ) = p(x,y|) = p(x,y|ΘΘ) = p(y|x,) = p(y|x,ΘΘ)p(x|)p(x|ΘΘ))
• Define new likelihood functionDefine new likelihood functionL(L(ΘΘ|Z) = p(X,Y||Z) = p(X,Y|ΘΘ))
• X,X,ΘΘ are constants, so L() is a are constants, so L() is a random variable dependent on the random variable dependent on the random variable Y.random variable Y.
“E” Step of EM Algorithm
• Since L(Since L(ΘΘ|Z) is itself a random variable, |Z) is itself a random variable, we can compute its expected value:we can compute its expected value:
• Can be thought of as computing the Can be thought of as computing the expected value of Y given the current expected value of Y given the current estimate of estimate of ΘΘ..
““M” step of EM AlgorithmM” step of EM AlgorithmOnce we have expectation computed, Once we have expectation computed,
optimize optimize ΘΘ using the MLE. using the MLE.
Convergence – Various results proving Convergence – Various results proving convergence cited.convergence cited.
Generalized EM – Instead of finding Generalized EM – Instead of finding optimal optimal ΘΘ, choose one that increases , choose one that increases the MLEthe MLE
Mixture Models
• Assume “mixture” of probability distributions:
• Log-likelihood function is difficult to optimize, use a trick:– Assume unobserved data items Y whose
values inform us which distribution generated each item in X.
Update Equations
• After much derivation, estimates for new parameters in terms of old result:– Θ = (μ,Σ)
• Where μ is the mean and Σ is the variance of a d-dimensional normal distribution