the university of manchesterstudentnet.cs.manchester.ac.uk/ugt/comp14112/... · • the naïve...
TRANSCRIPT
![Page 1: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/1.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
COMP14112
Lecture 11Lecture 11
Markov Chains, HMMs and Speech
Revision
1
![Page 2: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/2.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er What have we covered in the speech
lectures?
• Extracting features from raw speech data
• Classification and the naive Bayes classifier
• Training
• Sequence data• Sequence data
• Markov models
• Hidden Markov models
2
![Page 3: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/3.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
1. Features and data
• We have to represent sensory information in a useful way: sound waves and robust sensor data are two examples.
• Good “features” are domain specific, but we often end up with a vector of numbers called a feature vector or up with a vector of numbers called a feature vector or data point
• For speech we use MFCC features derived form segmented data
• Methods for processing the feature vectors are general
• Probabilistic approaches are popular- not the only approach, but certainly a leading one
3
![Page 4: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/4.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
2. Classification
• Given a data point x, what class does it belong
to?
• You constructed probabilistic classifiers in Labs
2 and 3to distinguish between “yes” and “no”2 and 3to distinguish between “yes” and “no”
• You should know what makes a good classifier
– how would you assess its performance?
• Lots of applications – one of the key AI tools
4
![Page 5: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/5.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
2.1 Probabilistic classification
• For a data point x …
– Estimate the probability density p(x|Ci)for each class i
– Apply Bayes theorem
( )( ) ( )CpCp x
– Apply classification rule: for two classes, p(C1|x) > 0.5 � Class of x = C1
• Multiple classes?
( )( ) ( )
( ) ( )∑=
i
ii CpCp
CpCpCp
x
x
x11
1
5
![Page 6: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/6.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
2.2 Naïve Bayes classifier
• The naïve Bayes assumption can be used if
data are vectors
– Feature vector components are conditionally
independent given the classindependent given the class
– See lecture notes and Lab 2 for application to time
averaged MFCC features derived from speech
– Examples sheet 6 for discrete valued data example
( ) ( ) ( ) ( ) ( )idiiii CxpCxpCxpCxpCp L221
=x
6
![Page 7: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/7.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
2.3 1-D Classification
• You’ve seen some example classification rules
• For 1-D data, a single feature x
7
![Page 8: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/8.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
2.4 n-D Classification
• For 2-D data with feature vector x = [x1, x2]
8
![Page 9: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/9.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
3. Training
• When we fit a probability density or probabilistic model to data, we have an example of training
• In the Labs, you’ve seen data being used to estimate parameters of a normal distribution and a HMM
• The data that’s used for this is training data• The data that’s used for this is training data
• Training is fundamental to machine learning, a large and important area of research in CS
• NB the performance of the Lab classifier would have improved with more training data
9
![Page 10: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/10.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
4. Sequence data
• In some cases the data arrives in a sequence
– We used speech data
• Other examples
– Video – Video
– Sequential games
• Anything real-time
– DNA sequence data
10
![Page 11: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/11.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
5. Markov chains
• You should know– Definition of a first order Markov process
– Parameters are transition probabilities
( ) ( )1121
,,,−−−
= ttttt sspssssp L
– Parameters are transition probabilities
– Normalisation condition
– Can be represented as a directed graph or a transition matrix
– Can be unfolded in time to show all paths of a fixed length (Examples sheet 7 and past paper)
– How to do a simple probabilistic calculation
11
![Page 12: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/12.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
5. Markov chains
ENDSTART
hh
b
ay
0.5
0.50.5
0.5
0.25
• What are the missing numbers?
• Unroll the model for exactly three time steps
• What is the probability that the sequence will be “hi”?
• What is the probability that a sequence of length 3 will be “hi”?
b
12
![Page 13: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/13.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
5. Markov chains
• Naïve application of probabilistic calculations is prohibitively slow in Markov chains
• In the lectures we saw a more efficient method based on recursion (Examples sheet 8)
Don’t need to remember the recursive algorithm • Don’t need to remember the recursive algorithm used there, but should be able to apply it to a similar example
• Computationally efficient algorithms are very important – imagine what happens when a problem is scaled up.
13
![Page 14: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/14.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
6. Hidden Markov models
• HMMs have two parts
– Markov chain model of states. The parameters of
the Markov chain model are the transition
probabilities: p(st|st-1)probabilities: p(st|st-1)
– Emission probability distribution for feature
vectors: p(xt|st)
– In Lab 3 this is a normal density parameterised by
mean and variance for each component of x
14
![Page 15: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/15.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
6. Hidden Markov models
• In Lab 3 you explored three things
– Training: constructing an HMM from labelled data (what is labelled data?)
– Classification: using the Forward algorithm to calculate p(x1,x2,…,xT|Ci) and plugging it into Bayescalculate p(x1,x2,…,xT|Ci) and plugging it into Bayestheorem
– Decoding: using the Vitterbi algorithm to find the most likely path through the hidden states
• You should be able to understand the tasks, but don’t have to recall details of the algorithms
15
![Page 16: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/16.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
6. Hidden Markov models
• Simple example of decoding (Lab 3) is
removing the silence from speech signals
• The data without silence is easier to classify
(as in Lab 2)(as in Lab 2)
START STOPsil sil
yes
no
1.0
0.960.96 0.02
0.02
0.01
0.01
0.99
0.99
0.04
16
![Page 17: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/17.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
7. Applications to speech
• Survey of tasks and performance (Examples
sheet 5)
• Segmentation and MFCC features
• Phonemes and phoneme HMMs• Phonemes and phoneme HMMs
• Triphones
• Decoding speech
• Simple language models
17
![Page 18: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/18.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
Other applications
• These methods can be generalised to many applications– TrueSkill Ranking system in Xbox live
• http://research.microsoft.com/mlp/trueskill/
– Vision applications• http://videolectures.net/mlss09uk_blake_cv
– Speech– Speech• http://videolectures.net/mlss09uk_bishop_ibi
– Medicine• Probabilistic “graphical models” to update probability of illness given
symptoms
– Biology• Standard way to determine gene function and location of genes in
DNA sequence
18
![Page 19: The University of Manchesterstudentnet.cs.manchester.ac.uk/ugt/COMP14112/... · • The naïve Bayes assumption can be used if data are vectors – Feature vector components are conditionally](https://reader033.vdocuments.net/reader033/viewer/2022050219/5f64b0c90cb03771362125b9/html5/thumbnails/19.jpg)
Th
e U
niv
ers
ity
of
Ma
nch
est
er
How to revise
• Work through Example class sheets and past
paper(s)
• Make sure you understand the relationship
between the labs and the notesbetween the labs and the notes
• Notes, lectures, example sheet solutions and
on the course website
http://intranet.cs.man.ac.uk/csonly/courses/COMP10412/
19