the em algorithm ling 572 fei xia 03/01/07. what is em? em stands for “expectation...
Post on 21-Dec-2015
220 views
TRANSCRIPT
![Page 1: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/1.jpg)
The EM algorithm
LING 572
Fei Xia
03/01/07
![Page 2: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/2.jpg)
What is EM?
• EM stands for “expectation maximization”.
• A parameter estimation method: it falls into the general framework of maximum-likelihood estimation (MLE).
• The general form was given in (Dempster, Laird, and Rubin, 1977), although essence of the algorithm appeared previously in various forms.
![Page 3: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/3.jpg)
Outline
• MLE
• EM– Basic concepts– Main ideas of EM– EM for PM models– An example: Forward-backward algorithm
![Page 4: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/4.jpg)
MLE
![Page 5: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/5.jpg)
What is MLE?
• Given
– A sample X={X1, …, Xn}
– A vector of parameters θ
• We define– Likelihood of the data: P(X | θ)– Log-likelihood of the data: L(θ)=log P(X|θ)
• Given X, find )(maxarg
LML
![Page 6: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/6.jpg)
MLE (cont)
• Often we assume that Xis are independently identically distributed (i.i.d.)
• Depending on the form of P(X | θ), solving optimization problem can be easy or hard.
)|(logmaxarg
)|(logmaxarg
)|,...,(logmaxarg
)|(logmaxarg
)(maxarg
1
ii
ii
n
ML
XP
XP
XXP
XP
L
![Page 7: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/7.jpg)
An easy case
• Assuming– A coin has a probability p of being heads, 1-p
of being tails.– Observation: We toss a coin N times, and the
result is a set of Hs and Ts, and there are m Hs.
• What is the value of p based on MLE, given the observation?
![Page 8: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/8.jpg)
An easy case (cont)
)1log()(log
)1(log)|(log)(
pmNpm
ppXPL mNm
01
))1log()(log()(
p
mN
p
m
dp
pmNpmd
dp
dL
p= m/N
![Page 9: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/9.jpg)
EM: basic concepts
![Page 10: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/10.jpg)
Basic setting in EM
• X is a set of data points: observed data• Θ is a parameter vector.• EM is a method to find θML where
• Calculating P(X | θ) directly is hard.• Calculating P(X,Y|θ) is much simpler,
where Y is “hidden” data (or “missing” data).
)|(logmaxarg
)(maxarg
XP
LML
![Page 11: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/11.jpg)
The basic EM strategy
• Z = (X, Y)– Z: complete data (“augmented data”)– X: observed data (“incomplete” data)– Y: hidden data (“missing” data)
![Page 12: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/12.jpg)
The “missing” data Y
• Y need not necessarily be missing in the practical sense of the word.
• It may just be a conceptually convenient technical device to simplify the calculation of P(X | θ).
• There could be many possible Ys.
![Page 13: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/13.jpg)
Examples of EM
HMM PCFG MT Coin toss
X
(observed)
sentences sentences Parallel data Head-tail sequences
Y (hidden) State sequences
Parse trees Word alignment
Coin id sequences
θ aij
bijk
P(ABC) t(f | e)
d(aj | j, l, m), …
p1, p2, λ
Algorithm Forward-backward
Inside-outside
IBM Models N/A
![Page 14: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/14.jpg)
The EM algorithm
• Consider a set of starting parameters
• Use these to “estimate” the missing data
• Use “complete” data to update parameters
• Repeat until convergence
![Page 15: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/15.jpg)
Highlights
• General algorithm for missing data problems
• Requires “specialization” to the problem at hand
• Examples of EM:– Forward-backward algorithm for HMM– Inside-outside algorithm for PCFG– EM in IBM MT Models
![Page 16: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/16.jpg)
EM: main ideas
![Page 17: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/17.jpg)
Idea #1: find θ that maximizes the likelihood of training data
)|(logmaxarg
)(maxarg
XP
LML
![Page 18: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/18.jpg)
Idea #2: find the θt sequence
No analytical solution iterative approach, find
s.t.
,....,...,, 10 t
....)(...)()( 10 tlll
![Page 19: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/19.jpg)
Idea #3: find θt+1 that maximizes a tight lower bound of )()( tll
a tight lower bound
])|,(
)|,([log)()(
1),|( t
i
in
ixyP
t
yxP
yxPEll t
i
![Page 20: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/20.jpg)
Idea #4: find θt+1 that maximizes the Q function
)]|,([logmaxarg
])|,(
)|,([logmaxarg
1),|(
1),|(
)1(
yxPE
yxp
yxpE
i
n
ixyP
ti
in
ixyP
t
ti
ti
Lower bound of )()( tll
The Q function
![Page 21: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/21.jpg)
The EM algorithm
• Start with initial estimate, θ0
• Repeat until convergence– E-step: calculate
– M-step: find
),(maxarg)1( tt Q
)|,(log),|(),(1
yxPxyPQ it
n
i yi
t
![Page 22: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/22.jpg)
Important classes of EM problem
• Products of multinomial (PM) models
• Exponential families
• Gaussian mixture
• …
![Page 23: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/23.jpg)
The EM algorithm for PM models
![Page 24: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/24.jpg)
PM models
mi
yxici
i
yxici
i
yxici pppyxp ),,(),,(),,(
1
...)|,(
1 ji
ip
Where is a partition of all the parameters, and for any j
),...,( 1 m
![Page 25: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/25.jpg)
HMM is a PM
kji
j
kw
ik
jsis
ji
wss
yxsscj
w
i
yxsscji
ssp
ssp
yxp
,,
),,(
),,(
)(
)(
)|,(
,
1
1
kijk
jij
b
a
![Page 26: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/26.jpg)
PCFG
• PCFG: each sample point (x,y):– x is a sentence– y is a possible parse tree for that sentence.
)|()|,(1
ii
n
ii AAPyxP
)|(
)|(
)|(
)|,(
VPsleepsVPP
NPJimNPP
SVPNPSP
yxP
![Page 27: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/27.jpg)
PCFG is a PM
,
),,()(
)|,(
A
yxAcAp
yxp
A
Ap 1)(
![Page 28: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/28.jpg)
Q-function for PM
)log),,(),|((...)log),,(),|((
)log),|((...)log),|((
))(log(),|(
)|,(log),|(
),(
11
),,(
1
),,(
1
),,(
1
1
1
1
jij
tn
i yiji
j
tn
i yi
yxjC
jj
tn
i yi
yxjC
jj
tn
i yi
yxjC
k jj
tn
i yi
it
n
i yi
t
pyxjCxyPpyxjCxyP
pxyPpxyP
pxyP
yxPxyP
Q
k
i
k
i
k
),(1tQ
![Page 29: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/29.jpg)
Maximizing the Q function
jj
it
n
i yi
t pyxjCxyPQ log),,(),|(),(11
1
11
j
jp
Maximize
Subject to the constraint
11
)log),,(),|((),(ˆ1
1j
jjj
it
n
i yi
t ppyxjCxyPQ
Use Lagrange multipliers
![Page 30: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/30.jpg)
Optimal solution
11
)log),,(),|((),(ˆ1
1j
jjj
it
n
i yi
t ppyxjCxyPQ
0)/),,(),|((),(ˆ
1
1
ji
tn
i yi
j
t
pyxjCxypp
Q
),,(),|(1
yxjCxyp
pi
tn
i yi
j
Normalization factor
Expected count
![Page 31: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/31.jpg)
PCFG example
• Calculate expected counts
• Update parameters
![Page 32: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/32.jpg)
EM: An example
![Page 33: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/33.jpg)
Forward-backward algorithm
• HMM is a PM (Product of Multi-nominal) Model
• Forward-backward algorithm is a special case of the EM algorithm for PM Models.
• X (observed data): each data point is an O1T.
• Y (hidden data): state sequence X1T.
• Θ (parameters): aij, bijk, πi.
![Page 34: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/34.jpg)
Expected counts
)(
)|,(
),,(*),|(
),,(*),|()(
1
111
1111
1
t
OjXiXP
ssXOcountOXP
ssYXcountXYPsscount
T
tij
Tt
T
tt
jX
iTTTT
Yjiji
T
![Page 35: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/35.jpg)
Expected counts (cont)
),()(
),(*),|,(
),,(*),|(
),,(*),|()(
1
111
1111
1
kk
T
tij
kkTt
T
tt
jX
w
iTTTT
j
w
iY
j
w
i
wOt
wOOjXiXP
ssXOcountOXP
ssYXcountXYPsscount
T
k
kk
![Page 36: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/36.jpg)
The inner loop for forward-backward algorithm
Given an input sequence and1. Calculate forward probability:
• Base case• Recursive case:
2. Calculate backward probability:• Base case:• Recursive case:
3. Calculate expected counts:
4. Update the parameters:
),,,,( BAKS
ii )1(
tijoiji
ij batt )()1(
1)1( Ti
tijoijj
ji batt )1()(
N
mmm
jijoijiij
tt
tbatt t
1
)()(
)1()()(
T
tij
N
j
T
tij
ij
t
ta
11
1
)(
)(
T
tij
T
tijkt
ijk
t
twob
1
1
)(
)(),(
![Page 37: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/37.jpg)
HMM
• A HMM is a tuple :– A set of states S={s1, s2, …, sN}.
– A set of output symbols Σ={w1, …, wM}.
– Initial state probabilities
– State transition prob: A={aij}.
– Symbol emission prob: B={bijk}
• State sequence: X1…XT+1
• Output sequence: o1…oT
}{ i
),,,,( BAS
![Page 38: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/38.jpg)
Forward probability
The probability of producing oi,t-1 while ending up in state si:
),()( 1,1 iXOPt tt
def
i
![Page 39: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/39.jpg)
Calculating forward probability
tijoiji
i
ttttti
tttttti
tti
t
ttj
bat
iXjXoPiXOP
iXOjXoPiXOP
jXiXOP
jXOPt
)(
)|,(*),(
),|,(*),(
),,(
),()1(
11,1
1,111,1
1,1
1,1
ii )1(Initialization:
Induction:
![Page 40: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/40.jpg)
Backward probability
• The probability of producing the sequence Ot,T, given that at time t, we are at state s i.
)|()( , iXOPt tTt
def
i
![Page 41: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/41.jpg)
Calculating backward probability
tijoijj
j
tTtttj
t
tttTtttj
t
ttTtj
t
tTt
def
i
bat
jXOPiXjXoP
ojXiXOPiXjXoP
iXjX,OoP
iXOPt
)1(
)|(*)|,(
),,|(*)|,(
)|,(
)|()(
1,11
1),1(1
1),1(
,
1)1( TiInitialization:
Induction:
![Page 42: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/42.jpg)
Calculating the prob of the observation
)1()(1
i
N
iiOP
)1()(1
TOPN
ii
)()(
),()(
1
1
tt
iXOPOP
i
N
ii
N
it
![Page 43: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/43.jpg)
Estimating parameters
• The prob of traversing a certain arc at time t given O: (denoted by pt(i, j) in M&S)
N
mmm
jijoiji
tt
ttij
tt
tbat
OP
OjXiXP
OjXiXPt
t
1
1
1
)()(
)1()(
)(
),,(
)|,()(
![Page 44: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/44.jpg)
N
jttti OjXiXPOiXPt
11 )|,()|()(
)()(1
ttN
jiji
The prob of being at state i at time t given O:
![Page 45: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/45.jpg)
Expected counts
Sum over the time index:• Expected # of transitions from state i to j in O:
• Expected # of transitions from state i in O:
N
j
T
tij
T
t
N
jij
T
ti ttt
1 11 11
)()()(
T
tij t
1
)(
![Page 46: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/46.jpg)
Update parameters
)1(1expˆ ii ttimeatistateinfrequencyected
T
tij
N
j
T
tij
T
ti
T
tij
ij
t
t
t
t
istatefromstransitionofected
jtoistatefromstransitionofecteda
11
1
1
1
)(
)(
)(
)(
#exp
#exp
T
tij
T
tijkt
ijk
t
two
jtoistatefromstransitionofected
observedkwithjtoistatefromstransitionofectedb
1
1
)(
)(),(
#exp
#exp
![Page 47: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/47.jpg)
Final formulae
N
mmm
jijoijiij
tt
tbatt t
1
)()(
)1()()(
T
tij
N
j
T
tij
ij
t
ta
11
1
)(
)(
T
tij
T
tijkt
ijk
t
twob
1
1
)(
)(),(
![Page 48: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/48.jpg)
Summary
![Page 49: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/49.jpg)
Summary
• The EM algorithm– An iterative approach – L(θ) is non-decreasing at each iteration– Optimal solution in M-step exists for many classes of
problems.
• The EM algorithm for PM models– Simpler formulae– Three special cases
• Inside-outside algorithm• Forward-backward algorithm• IBM Models for MT
![Page 50: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/50.jpg)
Relations among the algorithms
The generalized EM
The EM algorithm
PM Gaussian MixInside-OutsideForward-backwardIBM models
![Page 51: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/51.jpg)
Strengths of EM
• Numerical stability: in every iteration of the EM algorithm, it increases the likelihood of the observed data.
• The EM handles parameter constraints gracefully.
![Page 52: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/52.jpg)
Problems with EM
• Convergence can be very slow on some problems and is intimately related to the amount of missing information.
• It guarantees to improve the probability of the training corpus, which is different from reducing the errors directly.
• It cannot guarantee to reach global maxima (it could get struck at the local maxima, saddle points, etc)
The initial estimate is important.
![Page 53: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/53.jpg)
Additional slides
![Page 54: The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general](https://reader036.vdocuments.net/reader036/viewer/2022062407/56649d575503460f94a360ba/html5/thumbnails/54.jpg)
The EM algorithm for PM models
// for each iteration
// for each training example xi
// for each possible y
// for each parameter
// for each parameter