ling 570 day #3 stemming, probabilistic automata, markov chains/model
TRANSCRIPT
![Page 1: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/1.jpg)
Ling 570
Day #3
Stemming, Probabilistic Automata, Markov Chains/Model
![Page 2: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/2.jpg)
2
MORPHOLOGY AND FSTS
![Page 3: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/3.jpg)
3
FST as Translator
FR: ce bill met de le baume sur une blessure
EN: this bill puts balm on a sore wound
Last Class
![Page 4: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/4.jpg)
4
FST Application Examples
• Case folding:– He said he said
• Tokenization:– “He ran.” “ He ran . “
• POS tagging:– They can fish PRO VERB NOUN
![Page 5: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/5.jpg)
5
FST Application Examples
• Pronunciation:– B AH T EH R B AH DX EH R
• Morphological generation:– Fox s Foxes
• Morphological analysis:– cats cat s
![Page 6: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/6.jpg)
6
Roadmap
• Motivation:– Representing words
• A little (mostly English) Morphology• Stemming
![Page 7: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/7.jpg)
7
The Lexicon
• Goal: Represent all the words in a language• Approach?
![Page 8: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/8.jpg)
8
The Lexicon
• Goal: Represent all the words in a language• Approach?
– Enumerate all words?
![Page 9: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/9.jpg)
9
The Lexicon
• Goal: Represent all the words in a language• Approach?
– Enumerate all words?• Doable for English
– Typical for ASR (Automatic Speech Recognition)– English is morphologically relatively impoverished
![Page 10: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/10.jpg)
10
The Lexicon
• Goal: Represent all the words in a language• Approach?
– Enumerate all words?• Doable for English
– Typical for ASR (Automatic Speech Recognition)– English is morphologically relatively impoverished
• Other languages?
![Page 11: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/11.jpg)
11
The Lexicon
• Goal: Represent all the words in a language• Approach?
– Enumerate all words?• Doable for English
– Typical for ASR (Automatic Speech Recognition)– English is morphologically relatively impoverished
• Other languages?– Wildly impractical
» Turkish: 40,000 forms/verb;
uygarlas¸tıramadıklarımızdanmıs¸sınızcasına
“(behaving) as if you are among those whom we could not civilize”
![Page 12: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/12.jpg)
12
Morphological Parsing
• Goal: Take a surface word form and generate a linguistic structure of component morphemes
![Page 13: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/13.jpg)
13
Morphological Parsing
• Goal: Take a surface word form and generate a linguistic structure of component morphemes
• A morpheme is the minimal meaning-bearing unit in a language.
![Page 14: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/14.jpg)
14
Morphological Parsing
• Goal: Take a surface word form and generate a linguistic structure of component morphemes
• A morpheme is the minimal meaning-bearing unit in a language.– Stem: the morpheme that forms the central
meaning unit in a word– Affix: prefix, suffix, infix, circumfix
![Page 15: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/15.jpg)
15
Morphological Parsing
• Goal: Take a surface word form and generate a linguistic structure of component morphemes
• A morpheme is the minimal meaning-bearing unit in a language.– Stem: the morpheme that forms the central
meaning unit in a word– Affix: prefix, suffix, infix, circumfix
• Prefix: e.g., possible impossible
![Page 16: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/16.jpg)
16
Morphological Parsing
• Goal: Take a surface word form and generate a linguistic structure of component morphemes
• A morpheme is the minimal meaning-bearing unit in a language.– Stem: the morpheme that forms the central
meaning unit in a word– Affix: prefix, suffix, infix, circumfix
• Prefix: e.g., possible impossible• Suffix: e.g., walk walking
![Page 17: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/17.jpg)
17
Morphological Parsing
• Goal: Take a surface word form and generate a linguistic structure of component morphemes
• A morpheme is the minimal meaning-bearing unit in a language.– Stem: the morpheme that forms the central meaning
unit in a word– Affix: prefix, suffix, infix, circumfix
• Prefix: e.g., possible impossible• Suffix: e.g., walk walking• Infix: e.g., hingi humingi (Tagalog)
![Page 18: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/18.jpg)
18
Morphological Parsing
• Goal: Take a surface word form and generate a linguistic structure of component morphemes
• A morpheme is the minimal meaning-bearing unit in a language.– Stem: the morpheme that forms the central meaning
unit in a word– Affix: prefix, suffix, infix, circumfix
• Prefix: e.g., possible impossible• Suffix: e.g., walk walking• Infix: e.g., hingi humingi (Tagalog)• Circumfix: e.g., sagen gesagt (German)
![Page 19: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/19.jpg)
19
Surface Variation & Morphology
• Searching (a la Bing) for documents about:– Televised sports
![Page 20: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/20.jpg)
20
Surface Variation & Morphology
• Searching (a la Bing) for documents about:– Televised sports
• Many possible surface forms:– Televised, television, televise,..– Sports, sport, sporting,…
![Page 21: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/21.jpg)
21
Surface Variation & Morphology
• Searching (a la Bing) for documents about:– Televised sports
• Many possible surface forms:– Televised, television, televise,..– Sports, sport, sporting,…
• How can we match?
![Page 22: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/22.jpg)
22
Surface Variation & Morphology
• Searching (a la Bing) for documents about:– Televised sports
• Many possible surface forms:– Televised, television, televise,..– Sports, sport, sporting,…
• How can we match?– Convert surface forms to common base form
• Stemming or morphological analysis
![Page 23: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/23.jpg)
23
Two Perspectives
• Stemming:– writing
![Page 24: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/24.jpg)
24
Two Perspectives
• Stemming:– writing write (or writ)– Beijing
![Page 25: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/25.jpg)
25
Two Perspectives
• Stemming:– writing write (or writ)– Beijing Beije
• Morphological Analysis:
![Page 26: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/26.jpg)
26
Two Perspectives
• Stemming:– writing write (or writ)– Beijing Beije
• Morphological Analysis:– writing write+V+prog
![Page 27: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/27.jpg)
27
Two Perspectives
• Stemming:– writing write (or writ)– Beijing Beije
• Morphological Analysis:– writing write+V+prog– cats cat + N + pl– writes write+V+3rdpers+Sg
![Page 28: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/28.jpg)
Stemming
• Simple type of morphological analysis• Supports matching using base form• e.g. Television, televised, televising televise
![Page 29: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/29.jpg)
Stemming
• Simple type of morphological analysis• Supports matching using base form• e.g. Television, televised, televising televise• Most popular: Porter stemmer
![Page 30: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/30.jpg)
Stemming
• Simple type of morphological analysis• Supports matching using base form• e.g. Television, televised, televising televise• Most popular: Porter stemmer
• Task: Given surface form, produce base form– Typically, removes suffixes
![Page 31: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/31.jpg)
Stemming
• Simple type of morphological analysis• Supports matching using base form• e.g. Television, televised, televising televise• Most popular: Porter stemmer
• Task: Given surface form, produce base form– Typically, removes suffixes
• Model:– Rule cascade– No lexicon!
![Page 32: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/32.jpg)
32
Stemming
• Used in many NLP/IR applications• For building equivalence classes
ConnectConnectedConnectingConnectionConnections
Porter Stemmer, simple and efficientWebsite: http://www.tartarus.org/~martin/PorterStemmer
On patas: ~/dropbox/12-13/570/porter
Same class;
suffixes irrelevant
![Page 33: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/33.jpg)
Porter Stemmer
• Rule cascade:– Rule form:
• (condition) PATT1 PATT2
![Page 34: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/34.jpg)
Porter Stemmer
• Rule cascade:– Rule form:
• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε
![Page 35: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/35.jpg)
Porter Stemmer
• Rule cascade:– Rule form:
• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε• ATIONAL ATE
![Page 36: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/36.jpg)
Porter Stemmer
• Rule cascade:– Rule form:
• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε• ATIONAL ATE
– Rule partial order:• Step1a: -s• Step1b: -ed, -ing
![Page 37: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/37.jpg)
Porter Stemmer
• Rule cascade:– Rule form:
• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε• ATIONAL ATE
– Rule partial order:• Step1a: -s• Step1b: -ed, -ing• Step 2-4: derivational suffixes
![Page 38: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/38.jpg)
Porter Stemmer
• Rule cascade:– Rule form:
• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε• ATIONAL ATE
– Rule partial order:• Step1a: -s• Step1b: -ed, -ing• Step 2-4: derivational suffixes• Step 5: cleanup
• Pros:
![Page 39: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/39.jpg)
Porter Stemmer
• Rule cascade:– Rule form:
• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε• ATIONAL ATE
– Rule partial order:• Step1a: -s• Step1b: -ed, -ing• Step 2-4: derivational suffixes• Step 5: cleanup
• Pros: Simple, fast, buildable for a variety of languages• Cons:
![Page 40: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/40.jpg)
Porter Stemmer
• Rule cascade:– Rule form:
• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε• ATIONAL ATE
– Rule partial order:• Step1a: -s• Step1b: -ed, -ing• Step 2-4: derivational suffixes• Step 5: cleanup
• Pros: Simple, fast, buildable for a variety of languages• Cons: Overaggressive and underaggressive
![Page 41: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/41.jpg)
41
STEMMING & EVAL
![Page 42: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/42.jpg)
42
Evaluating Performance
• Measures of Stemming Performance rely on similar metrics used in IR:– Precision: measure of the proportion of selected items
the system got right• precision = tp / (tp + fp)• # of correct answers / # of answers given
– Recall: measure of the proportion of the target items the system selected• recall = tp / (tp + fn)• # of correct answers / # of possible correct answers
– Rule of thumb: as precision increases, recall drops, and vice versa
• Metrics widely adopted in Stat NLP
![Page 43: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/43.jpg)
43
Precision and Recall
• Take a given stemming task– Suppose there are 100 words that could be
stemmed– A stemmer gets 52 of these right (tp)– But it inadvertently stems 10 others (fp)
Precision = 52 / (52 + 10) = .84
Recall = 52 / (52 + 48) = .52
![Page 44: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/44.jpg)
44
Precision and Recall
• Take a given stemming task– Suppose there are 100 words that could be
stemmed– A stemmer gets 52 of these right (tp)– But it inadvertently stems 10 others (fp)
Precision = 52 / (52 + 10) = .84
Recall = 52 / (52 + 48) = .52
Note: easy to get precision of 1.0. Why?
![Page 45: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/45.jpg)
45
Baseline Tokenizer 1 Tokenizer 2 Tokenizer 3 Tokenizer 4After After After After Aftercoming coming coming coming comingclose close close close close Precision Recall F-Measure
to to to to to Tokenizer 1 0.827586 0.888889 0.858237548
a a a a a Tokenizer 2 0.961538 0.925926 0.943732194
partial partial partial partial partial Tokenizer 3 0.928571 0.962963 0.945767196
settlementsettlement settlement settlement settlement Tokenizer 4 1 1 1
a a a a ayear year year year yearago ago ago ago ago, , , , ,shareholdersshareholders shareholders shareholders shareholderswho who who who whofiled filed filed filed filedcivil civil civil civil civilsuits suits suits suits suitsagainst against against against againstIvan Ivan Ivan Ivan IvanF. F F. F. F.Boesky . Boesky Boesky Boesky& Boesky & & &Co. & Co. Co Co.L.P. Co L.P. . L.P.Drexel . Drexel's L.P. Drexel's L.P plaintiffs Drexel 'splaintiffs . ' 's plaintiffs' Drexel plaintiffs '
's 'plaintiffs
![Page 46: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/46.jpg)
WEIGHTED AUTOMATA & MARKOV CHAINS
![Page 47: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/47.jpg)
PFA Definition
• A Probabilistic Finite-State Automaton is a 6-tuple:– A set of states Q– An alphabet Σ– A set of transitions: δsubset Q x Σ x Q– Initial state probabilities: Q R+
– Transition probabilities: δ R+
– Final state probabilities: Q R+
![Page 48: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/48.jpg)
PFA Recap
• Subject to constraints:
• Computing sequence probabilities
![Page 49: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/49.jpg)
PFA Example
• Example– I(q0)=1
– I(q1)=0
– F(q0)=0
– F(q1)=0.2
– P(q0,a,q1)=1; P(q1,b,q1) =0.8
– P(abn) = I(q0)*P(q0,a,q1)*P(q1,b,q1)n*F(q1)
– = 0.8n*0.2
![Page 50: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/50.jpg)
Markov Chain
• A Markov Chain is a special case of a PFA in which the sequence uniquely determines which states the automaton will go through.
• Markov Chains can not represent inherently ambiguous problems– Can assign probability to unambiguous
sequences
![Page 51: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/51.jpg)
Markov Chain for Words
![Page 52: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/52.jpg)
Markov Chain for Pronunciation
• Observations: 0/1
![Page 53: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/53.jpg)
Markov Chain for Walking through Groningen
![Page 54: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/54.jpg)
Markov Chain: “First-order observable Markov Model”
• A set of states – Q = q1, q2…qN; the state at time t is qt
![Page 55: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/55.jpg)
Markov Chain: “First-order observable Markov Model”
• A set of states – Q = q1, q2…qN; the state at time t is qt
• Transition probabilities: – a set of probabilities A = a01a02…an1…ann.
– Each aij represents the probability of transitioning from state i to state j
– The set of these is the transition probability matrix A
![Page 56: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/56.jpg)
Markov Chain: “First-order observable Markov Model”
• A set of states – Q = q1, q2…qN; the state at time t is qt
• Transition probabilities: – a set of probabilities A = a01a02…an1…ann.
– Each aij represents the probability of transitioning from state i to state j
– The set of these is the transition probability matrix A• Distinguished start and final states
– q0,qF
![Page 57: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/57.jpg)
Markov Chain: “First-order observable Markov Model”
• A set of states – Q = q1, q2…qN; the state at time t is qt
• Transition probabilities: – a set of probabilities A = a01a02…an1…ann.
– Each aij represents the probability of transitioning from state i to state j
– The set of these is the transition probability matrix A• Distinguished start and final states
– q0,qF
• Current state only depends on previous state
![Page 58: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/58.jpg)
Markov Models
• The parameters of a MM can be arranged in matrices
• The A-matrix for the set of transition probabilities:
p11 p12 …p1j
A = p21 p22 …p2j
…[ ]
![Page 59: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/59.jpg)
Markov Models
• The parameters of a MM can be arranged in matrices
• The A-matrix for the set of transition probabilities:
p11 p12 …p1j
A = p21 p22 …p2j
…
• What’s missing?
[ ]
![Page 60: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/60.jpg)
Markov Models
• The parameters of a MM can be arranged in matrices
• The A-matrix for the set of transition probabilities:
p11 p12 …p1j
A = p21 p22 …p2j
…
• What’s missing? Starting probabilities.
[ ]
![Page 61: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/61.jpg)
Markov Models
• Exercise– Build the transition probability matrix over
this set of dataThe duck died.
The car killed the duck.
The duck died under her car.
We duck under the car.
We retrieve the poor duck.
– Build the starting probability matrix
![Page 62: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/62.jpg)
Markov Models
• Exercise– Given your model, what’s the probability for
each of the following sentences?The duck died under her car.
We duck under the car.
The duck under the car.
We retrieve killed the duck.
We the poor duck died.
We retrieve the poor duck under the car.
– For a given start state (The, We), what’s the most likely string (of the above)?
![Page 63: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/63.jpg)
![Page 64: Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model](https://reader036.vdocuments.net/reader036/viewer/2022070410/56649ea25503460f94ba6128/html5/thumbnails/64.jpg)
HMMs
• Next class