persian part of speech tagging
DESCRIPTION
Persian Part Of Speech Tagging. Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran. Decision Trees. Decision Tree (DT): Tree where the root and each internal node is labeled with a question. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/1.jpg)
1
Persian Part Of Speech Tagging
Mostafa Keikha
Database Research Group (DBRG)
ECE Department, University of Tehran
![Page 2: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/2.jpg)
2
Decision Trees
Decision Tree (DT): Tree where the root and each internal node is
labeled with a question. The arcs represent each possible answer to the
associated question. Each leaf node represents a prediction of a
solution to the problem. Popular technique for classification; Leaf
node indicates class to which the corresponding tuple belongs.
![Page 3: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/3.jpg)
3
Decision Tree Example
![Page 4: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/4.jpg)
4
Decision Trees
A Decision Tree Model is a computational model consisting of three parts: Algorithm to create the tree Algorithm that applies the tree to data
Creation of the tree is the most difficult part. Processing is basically a search similar to that in
a binary search tree (although DT may not be binary).
![Page 5: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/5.jpg)
5
Decision Tree Algorithm
![Page 6: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/6.jpg)
6
Using DT in POS Tagging
Compute Ambiguity classes Each term may have
different tags Ambiguity class for each
term: set of all possible tags
compute # of occurrence for each tag in each ambiguity class
Ambiguity Class
# of occurrence
a b c d10 20 25 40
b c d 40 39 50
b d 60 55
![Page 7: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/7.jpg)
7
Using DT in POS Tagging
Create Decision Tree on Ambiguity classes
In each level delete tag with minimum occurrence
a b c d10 20 25 40
b c d40 39 50
b d60 55
b
![Page 8: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/8.jpg)
8
Using DT in POS Tagging
Advantage Easy to understand Easy to implement
Disadvantage Context independent
![Page 9: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/9.jpg)
9
Using DT in POS Tagging
Known Tokens Results
Run PercentTokensCorrectAccuracy
197.9739392336376492.34%
298.0635563032896592.50%
397.9639752836778992.51%
497.9241056138157892.94%
597.9740307937230592.36%
Average97.976392144.2362880.292.474%
![Page 10: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/10.jpg)
11
POS tagging using HMMs
Let W be a sequence of words W = w1 , w2 , … , wn
Let T be the corresponding tag sequence T = t1 , t2 , … , tn
Task : Find T which maximizes P ( T | W )
T’ = argmaxT P ( T | W )
![Page 11: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/11.jpg)
12
POS tagging using HMMs
By Bayes Rule,
P ( T | W ) = P ( W | T ) * P ( T ) / P ( W )
T’ = argmaxT P ( W | T ) * P ( T )
Transition Probability,
P ( T ) = P ( t1 ) * P ( t2 | t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | t1 … tn-1 )
Applying Tri-gram approximation,
P ( T ) = P ( t1 ) * P ( t2 | t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | tn-2 tn-1 )
Introducing a dummy tag, $, to represent the beginning of a sentence,
P ( T ) = P ( t1 | $ ) * P ( t2 | $ t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | tn-2 tn-1 )
![Page 12: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/12.jpg)
13
POS tagging using HMMs
Smoothing Transition Probabilities
Sparse data problem
Linear interpolation method
P'(ti | ti - 2 , ti - 1) = λ1 P( ti ) + λ2 P(ti | ti - 1 ) + λ3 P(ti | ti - 2 , ti - 1)
such that the s sum to 1
![Page 13: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/13.jpg)
14
POS tagging using HMMs
Calculation of λs
![Page 14: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/14.jpg)
15
POS tagging using HMMs
Emission Probability,
P(W | T ) ≈ P(w1 | t1) * P(w2 | t2) * . . . * P(wn | tn)
Context Dependency
To make more dependent on the context the emission probability is calculated as:
P(W | T ) ≈ P(w1 | $ t1) * P(w2 | t1 t2) ...* P(wn | tn-1 tn)
![Page 15: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/15.jpg)
16
POS tagging using HMMs
Smoothing technique is applied
P' (wi | ti-1 ti) = θ1 P(wi | ti) + θ2 P(wi | ti-1 ti) Sum of all θs is equal to 1
θs are different for different words.
![Page 16: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/16.jpg)
17
POS tagging using HMMs
1(
2(
3(
4(
5(
6(
![Page 17: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/17.jpg)
18
POS tagging using HMMs
![Page 18: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/18.jpg)
19
POS tagging using HMMs
![Page 19: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/19.jpg)
20
POS tagging using HMMs
Lexicon generation probability
![Page 20: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/20.jpg)
21
POS tagging using HMMs
![Page 21: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/21.jpg)
22
P(N V ART N | files like a flower) = 4.37*10-6
POS tagging using HMMs
![Page 22: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/22.jpg)
23
POS tagging using HMMs
Known Tokens Results
Run PercentTokensCorrectAccuracy
198.0739429038221196.94%
298.1634591334591397.18%
398.0439784934389496.96%
498.0241097039848796.96%
598.0740346039147597.03%
Average98.072390496.437239697.01%
![Page 23: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/23.jpg)
24
Unknown Tokens Results
Run PercentTokensCorrectAccuracy
11.937760582975.12%
21.846689535780.09%
31.967956615377.34%
41.988283643577.69%
51.937945624678.62%
Average1.9287726.6600477.77%
![Page 24: Persian Part Of Speech Tagging](https://reader036.vdocuments.net/reader036/viewer/2022062423/56814dec550346895dbb5c0c/html5/thumbnails/24.jpg)
25
Overall Results
Run TokensCorrectAccuracy
140205038804096.52%
236265835127096.86%
340580539189096.57%
441925340492296.58%
541140539772196.67%
Average400234.2386768.696.64%