Download - Persian Part Of Speech Tagging
1
Persian Part Of Speech Tagging
Mostafa Keikha
Database Research Group (DBRG)
ECE Department, University of Tehran
2
Decision Trees
Decision Tree (DT): Tree where the root and each internal node is
labeled with a question. The arcs represent each possible answer to the
associated question. Each leaf node represents a prediction of a
solution to the problem. Popular technique for classification; Leaf
node indicates class to which the corresponding tuple belongs.
3
Decision Tree Example
4
Decision Trees
A Decision Tree Model is a computational model consisting of three parts: Algorithm to create the tree Algorithm that applies the tree to data
Creation of the tree is the most difficult part. Processing is basically a search similar to that in
a binary search tree (although DT may not be binary).
5
Decision Tree Algorithm
6
Using DT in POS Tagging
Compute Ambiguity classes Each term may have
different tags Ambiguity class for each
term: set of all possible tags
compute # of occurrence for each tag in each ambiguity class
Ambiguity Class
# of occurrence
a b c d10 20 25 40
b c d 40 39 50
b d 60 55
7
Using DT in POS Tagging
Create Decision Tree on Ambiguity classes
In each level delete tag with minimum occurrence
a b c d10 20 25 40
b c d40 39 50
b d60 55
b
8
Using DT in POS Tagging
Advantage Easy to understand Easy to implement
Disadvantage Context independent
9
Using DT in POS Tagging
Known Tokens Results
Run PercentTokensCorrectAccuracy
197.9739392336376492.34%
298.0635563032896592.50%
397.9639752836778992.51%
497.9241056138157892.94%
597.9740307937230592.36%
Average97.976392144.2362880.292.474%
11
POS tagging using HMMs
Let W be a sequence of words W = w1 , w2 , … , wn
Let T be the corresponding tag sequence T = t1 , t2 , … , tn
Task : Find T which maximizes P ( T | W )
T’ = argmaxT P ( T | W )
12
POS tagging using HMMs
By Bayes Rule,
P ( T | W ) = P ( W | T ) * P ( T ) / P ( W )
T’ = argmaxT P ( W | T ) * P ( T )
Transition Probability,
P ( T ) = P ( t1 ) * P ( t2 | t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | t1 … tn-1 )
Applying Tri-gram approximation,
P ( T ) = P ( t1 ) * P ( t2 | t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | tn-2 tn-1 )
Introducing a dummy tag, $, to represent the beginning of a sentence,
P ( T ) = P ( t1 | $ ) * P ( t2 | $ t1 ) * P ( t3 | t1 t2 ) …… * P ( tn | tn-2 tn-1 )
13
POS tagging using HMMs
Smoothing Transition Probabilities
Sparse data problem
Linear interpolation method
P'(ti | ti - 2 , ti - 1) = λ1 P( ti ) + λ2 P(ti | ti - 1 ) + λ3 P(ti | ti - 2 , ti - 1)
such that the s sum to 1
14
POS tagging using HMMs
Calculation of λs
15
POS tagging using HMMs
Emission Probability,
P(W | T ) ≈ P(w1 | t1) * P(w2 | t2) * . . . * P(wn | tn)
Context Dependency
To make more dependent on the context the emission probability is calculated as:
P(W | T ) ≈ P(w1 | $ t1) * P(w2 | t1 t2) ...* P(wn | tn-1 tn)
16
POS tagging using HMMs
Smoothing technique is applied
P' (wi | ti-1 ti) = θ1 P(wi | ti) + θ2 P(wi | ti-1 ti) Sum of all θs is equal to 1
θs are different for different words.
17
POS tagging using HMMs
1(
2(
3(
4(
5(
6(
18
POS tagging using HMMs
19
POS tagging using HMMs
20
POS tagging using HMMs
Lexicon generation probability
21
POS tagging using HMMs
22
P(N V ART N | files like a flower) = 4.37*10-6
POS tagging using HMMs
23
POS tagging using HMMs
Known Tokens Results
Run PercentTokensCorrectAccuracy
198.0739429038221196.94%
298.1634591334591397.18%
398.0439784934389496.96%
498.0241097039848796.96%
598.0740346039147597.03%
Average98.072390496.437239697.01%
24
Unknown Tokens Results
Run PercentTokensCorrectAccuracy
11.937760582975.12%
21.846689535780.09%
31.967956615377.34%
41.988283643577.69%
51.937945624678.62%
Average1.9287726.6600477.77%
25
Overall Results
Run TokensCorrectAccuracy
140205038804096.52%
236265835127096.86%
340580539189096.57%
441925340492296.58%
541140539772196.67%
Average400234.2386768.696.64%