nlp. introduction to nlp rule-based stochastic –hmm (generative) –maximum entropy mm...
DESCRIPTION
Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-basedTRANSCRIPT
![Page 1: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/1.jpg)
NLP
![Page 2: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/2.jpg)
Introduction to NLP
Statistical POS Tagging
![Page 3: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/3.jpg)
Part of Speech Tagging Methods
• Rule-based• Stochastic
– HMM (generative)– Maximum Entropy MM (discriminative)
• Transformation-based
![Page 4: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/4.jpg)
HMM Tagging
• T = argmax P(T|W)– where T=t1,t2,…,tn
• By Bayes’ theorem– P(T|W) = P(T)P(W|T)/P(W)
• Thus we are attempting to choose the sequence of tags that maximizes the right hand side of the equation– P(W) can be ignored– P(T) is called the prior, P(W|T) is called the likelihood.
![Page 5: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/5.jpg)
HMM Tagging
• Complete formula– P(T)P(W|T) = ΠP(wi|w1t1…wi-1ti-1ti)P(ti|t1…ti-2ti-1)
• Simplification 1: – P(W|T) = ΠP(wi|ti)
• Simplification 2: – P(T)= ΠP(ti|ti-1)
• Bigram approximation– T = argmax P(T|W) = argmax ΠP(wi|ti) P(ti|ti-1)
![Page 6: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/6.jpg)
Maximum Likelihood Estimates• P(NN|JJ) = C(JJ,NN)/C(JJ)=22301/89401 = .249• P(this|DT) = C(DT,this)/C(DT)=7037/103687
= .068
![Page 7: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/7.jpg)
Example• The/DT rich/JJ like/VBP to/TO travel/VB ./.
![Page 8: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/8.jpg)
Example
DT NN VBP TO NN .
The rich like travel .to
DT NN VBP TO VB .
The rich like travel .to
![Page 9: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/9.jpg)
Evaluating Taggers
• Data set– Training set– Development set– Test set
• Tagging accuracy– how many tags right
• Results– Accuracy around 97% on PTB trained on 800,000 words– (50-85% on unknown words; 50% for trigrams)– Upper bound 98% - noise (e.g., errors and inconsistencies in the data, e.g., NN
vs JJ)
![Page 10: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/10.jpg)
Transformation Based Learning• [Brill 1995]• Example
– P(NN|sleep) = .9– P(VB|sleep) = .1– Change NN to VB when the previous tag is TO
• Types of rules:– The preceding (following) word is tagged z– The word two before (after) is tagged z– One of the two preceding (following) words is tagged z– One of the three preceding (following) words is tagged z– The preceding word is tagged z and the following word is tagged w
![Page 11: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/11.jpg)
Transformation Based Tagger
![Page 12: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/12.jpg)
Transformation Based Tagger
![Page 13: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/13.jpg)
Transformation Based Tagger
![Page 14: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/14.jpg)
Thoughts about POS Taggers
• New domains– Lower performance
• Distributional clustering– Combine statistics about semantically related words– Example: names of companies– Example: days of the week– Example: animals
![Page 15: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/15.jpg)
External Link
• Jason Eisner’s awesome interactive spreadsheet about learning HMMs– http://cs.jhu.edu/~jason/papers/#eisner-2002-tnlp – http://cs.jhu.edu/~jason/papers/eisner.hmm.xls
![Page 16: NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based](https://reader036.vdocuments.net/reader036/viewer/2022081803/5a4d1b497f8b9ab0599a4ddd/html5/thumbnails/16.jpg)
NLP