![Page 1: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/1.jpg)
Lirong Xia
Hidden Markov Models
Tue, March 28, 2014
![Page 2: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/2.jpg)
• Markov decision process (MDP)
– transition probability only depends on (state,action)
in the previous step
• Reinforcement learning
– unknown probability/rewards
• Markov models
• Hidden Markov models2
The “Markov”s we have learned so far
![Page 3: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/3.jpg)
Markov Models
3
• A Markov model is a chain-structured BN– Conditional probabilities are the same (stationarity)– Value of X at a given time is called the state– As a BN:
– Parameters: called transition probabilities
p(X1) p(X|X-1)
![Page 4: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/4.jpg)
• p(X=sun)=p(X=sun|X-1=sun)p(X=sun)+
p(X=sun|X-1=rain)p(X=rain)
• p(X=rain)=p(X=rain|X-1=sun)p(X=sun)+
p(X=rain|X-1=rain)p(X=rain)4
Computing the stationary distribution
![Page 5: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/5.jpg)
Hidden Markov Models
5
• Hidden Markov models (HMMs)– Underlying Markov chain over state X– Effects (observations) at each time step– As a Bayes’ net:
![Page 6: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/6.jpg)
Example
6
• An HMM is defined by:– Initial distribution: p(X1)
– Transitions: p(X|X-1)
– Emissions: p(E|X)
Rt-1 p(Rt)
t 0.7f 0.3
Rt p(Ut)
t 0.9f 0.2
![Page 7: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/7.jpg)
Filtering / Monitoring
7
• Filtering, or monitoring, is the task of tracking the distribution B(X) (the belief state) over time
• B(Xt) = p(Xt|e1:t)
• We start with B(X) in an initial setting, usually uniform
• As time passes, or we get observations, we update B(X)
![Page 8: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/8.jpg)
Example: Robot Localization
8
Sensor model: never more than 1 mistakeMotion model: may not execute action with small prob.
![Page 9: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/9.jpg)
HMM weather example: a question
s
c r
.1
.2
.6
.3.4
.3
.3
.5
.3
• You have been stuck in the lab for three days (!)• On those days, your labmate was dry, wet, wet,
respectively• What is the probability that it is now raining outside?
• p(X3 = r | E1 = d, E2 = w, E3 = w)
p(w|s) = .1
p(w|c)
= .3 p(w|
r) = .8
![Page 10: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/10.jpg)
Filtering
• Computationally efficient approach: first compute
p(X1 = i, E1 = d) for all states i
• p(Xt, e1:t) = p(et | Xt)Σxt-1 p(xt-1, e1:t-1) p(Xt | xt-1)
s
c r
.1
.2
.6
.3.4
.3
.3
.5
.3
p(w|s) = .1
p(w|c)
= .3 p(w|
r) = .8
![Page 11: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/11.jpg)
• Formal algorithm for filtering
– Elapse of time
• compute p(Xt+1|Xt,e1:t) from p(Xt|e1:t)
– Observe
• compute p(Xt+1|e1:t+1) from p(Xt+1|e1:t)
– Renormalization
• Introduction to sampling
11
Today
![Page 12: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/12.jpg)
Inference Recap: Simple Cases
12
1
1 1 1 1 1
1 1
1 1 1
| , ( , ) = ( ) ( | )
X
p X e p X e p ep x e
p x p e x
1 1|p X e 2p X
1
1
2 1 2
1 2 1
,
= |x
x
p x p x x
p x p x x
![Page 13: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/13.jpg)
Elapse of Time
13
• Assume we have current belief p(Xt-1|evidence to t-1)
B(Xt-1)=p(Xt-1|e1:t-1)
• Then, after one time step passes:
p(Xt|e1:t-1)=Σxt-1p(Xt|xt-1)p(Xt-1|e1:t-1)
• Or, compactly
B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1)
• With the “B” notation, be careful about – what time step t the belief is about, – what evidence it includes
![Page 14: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/14.jpg)
Observe and renormalization
14
• Assume we have current belief p(Xt| previous
evidence):
B’(Xt)=p(Xt|e1:t-1)
• Then:
p(Xt|e1:t)∝p(et|Xt)p(Xt|e1:t-1)
• Or:
B(Xt) ∝p(et|Xt)B’(Xt)
• Basic idea: beliefs reweighted by likelihood of
evidence
• Need to renormalize B(Xt)
![Page 15: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/15.jpg)
Recap: The Forward Algorithm
15
• We are given evidence at each time and want to know
• We can derive the following updates
1
1
1
1:
1 1:
1
1:
1 1: 1
1 1:1 1
( , )
,
|
, ,
| |
= | | ,
t
t
t
t t X
t t tx
t t t tx
t
t t
t t
t tt t tx
p x e
p x x e
p x x p e x
p e x p x x
p x e
p x e
p x e
We can normalize as we go if we want
to have p(x|e) at each time step, or
just once at the end…
![Page 16: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/16.jpg)
Example HMM
16
Rt-1 p(Rt)
t 0.7f 0.3
Rt-1 p(Ut)
t 0.9f 0.2
![Page 17: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/17.jpg)
Observe and time elapse
17
Observe
Time elapse and renormalize
• Want to know B(Rain2)=p(Rain2|+u1,+u2)
![Page 18: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/18.jpg)
Online Belief Updates
18
• Each time step, we start with p(Xt-1 | previous evidence):
• Elapse of time
B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1)
• ObserveB(Xt) ∝p(et|Xt)B’(Xt)
• Renormalize B(Xt)
• Problem: space is |X| and time is |X|2 per time step
– what if the state is continuous?
![Page 19: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/19.jpg)
• Real-world robot localization
19
Continuous probability space
![Page 20: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/20.jpg)
20
Sampling
![Page 21: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/21.jpg)
Approximate Inference
21
• Sampling is a hot topic in machine learning, and it’s really simple
• Basic idea:– Draw N samples from a sampling distribution S– Compute an approximate posterior probability– Show this converges to the true probability P
• Why sample?– Learning: get samples from a distribution you don’t know– Inference: getting a sample is faster than computing the
right answer (e.g. with variable elimination)
![Page 22: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/22.jpg)
Prior Sampling
22
+c+s 0.1-s 0.9
-c+s 0.5-s 0.5
|p S C
+c+r 0.8-r 0.2
-c+r 0.2-r 0.8
|p R C
+s
+r+w 0.99
-w 0.01
-r+w 0.90
-w 0.10
-s
+r+w 0.90
-w 0.10
-r+w 0.01
-w 0.99
| ,p W S R
+c 0.5-c 0.5
p C
Samples:
+c, -s, +r, +w
-c, +s, -r, +w
![Page 23: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/23.jpg)
Prior Sampling (w/o evidences)
23
• This process generates samples with probability:
i.e. the BN’s joint probability
• Let the number of samples of an event be
• Then
• I.e., the sampling procedure is consistent
![Page 24: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/24.jpg)
Example
24
• We’ll get a bunch of samples from the BN:+c, -s, +r, +w+c, +s, +r, +w-c, +s, +r, -w+c, -s, +r, +w-c, -s, -r, +w
• If we want to p(W)– We have counts <+w:4, -w:1>– Normalize to get p(W) = <+w:0.8, -w:0.2>– This will get closer to the true distribution with more
samples– Can estimate anything else, too– What about p(C|+w)? p(C|+r,+w)? p(C|-r,-w)?– Fast: can use fewer samples if less time (what’s the
drawback?)
![Page 25: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/25.jpg)
Rejection Sampling
25
• Let’s say we want p(C)– No point keeping all samples around– Just tally counts of C as we go
• Let’s say we want p(C|+s)– Same thing: tally C outcomes, but
ignore (reject) samples which don’t have S=+s
– This is called rejection sampling– It is also consistent for conditional
probabilities (i.e., correct in the limit)
+c, -s, +r, +w+c, +s, +r, +w-c, +s, +r, -w+c, -s, +r, +w-c, -s, -r, +w
![Page 26: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/26.jpg)
Likelihood Weighting
26
• Problem with rejection sampling:– If evidence is unlikely, you reject a lot of samples– You don’t exploit your evidence as you sample – Consider p(B|+a)
• Idea: fix evidence variables and sample the rest
• Problem: sample distribution not consistent!• Solution: weight by probability of evidence given
parents
-b, -a-b, -a-b, -a-b, -a+b, +a
-b, +a-b, +a-b, +a-b, +a+b, +a
![Page 27: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/27.jpg)
Likelihood Weighting
27
+c+s 0.1-s 0.9
-c+s 0.5-s 0.5
|p S C
+c+r 0.8-r 0.2
-c+r 0.2-r 0.8
|p R C
+s
+r+w 0.99
-w 0.01
-r+w 0.90
-w 0.10
-s
+r+w 0.90
-w 0.10
-r+w 0.01
-w 0.99
| ,p W S R
+c 0.5-c 0.5
p C
Samples:
+c, +s, +r, +w
……
![Page 28: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/28.jpg)
Likelihood Weighting
28
• Sampling distribution if z sampled and e fixed evidence
• now, samples have weights
• Together, weighted sampling distribution is consistent
1
, |l
WS i ii
S z e p z Parents Z
1
, |m
i ii
w z e p e Parents E
1 1
, , | |
,
l m
WS i i i ii i
S z e w z e p z Parents Z p e Parents E
p z e
![Page 29: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/29.jpg)
Ghostbusters HMM
29
– p(X1) = uniform
– p(X|X’) = usually move clockwise, but sometimes move in a random direction or stay in place
– p(Rij|X) = same sensor model as before: red means close, green means far away.
1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9
p(X1)
1/6 1/6 1/2
0 1/6 0
0 0 0
p(X|X’=<1,2>)
![Page 30: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/30.jpg)
Example: Passage of Time
30
• As time passes, uncertainty “accumulates”
Transition model: ghosts usually go clockwise
T = 1 T = 2 T= 5
![Page 31: Lirong Xia Hidden Markov Models Tue, March 28, 2014](https://reader038.vdocuments.net/reader038/viewer/2022102906/56649cd65503460f9499d58e/html5/thumbnails/31.jpg)
Example: Observation
31
• As we get observations, beliefs get reweighted, uncertainty “decreases”
| 'B p ex X B X
Before observation After observation