a connectionist approach to part-of-speech tagging

30
A Connectionist approach to Part-Of-Speech Tagging F. Zamora-Martínez, M.J. Castro-Bleda, S. España-Boquera, S. Tortajada, P. Aibar Departamento de Sistemas Informáticos y Computación Universidad Politécnica de Valencia, Spain 6-8 October 2009, Funchal, Madeira A Connectionist approach to POS Tagging ICNC 2009 5-7 October 2009, Funchal 1 / 30

Upload: francisco-zamora-martinez

Post on 18-Nov-2014

319 views

Category:

Technology


2 download

DESCRIPTION

In this paper, we describe a novel approach to Part-Of-Speech tagging based on neural networks. Multilayer perceptrons are used following corpus-based learning from contextual and lexical information. The Penn Treebank corpus has been used for the training and evaluation of the tagging system. The results show that the connectionist approach is feasible and comparable with other approaches.

TRANSCRIPT

Page 1: A Connectionist approach to Part-Of-Speech Tagging

A Connectionist approach to Part-Of-SpeechTagging

F. Zamora-Martínez, M.J. Castro-Bleda, S. España-Boquera,S. Tortajada, P. Aibar

Departamento de Sistemas Informáticos y ComputaciónUniversidad Politécnica de Valencia, Spain

6-8 October 2009, Funchal, Madeira

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 1 / 30

Page 2: A Connectionist approach to Part-Of-Speech Tagging

Index

1 POS tagging

2 Probalilistic tagging

3 Connectionist tagging

4 The Penn Treebank Corpus

5 The connectionist POS taggers

6 Conclusions

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 2 / 30

Page 3: A Connectionist approach to Part-Of-Speech Tagging

What is Part-Of-Speech (POS) tagging?

T = τ1, τ2, . . . , τk: a set of POS tagsΩ = ω1, ω2, . . . , ωm: the vocabulary of the application

The goal of a Part-Of-Speech tagger is to associate each word in a textwith its correct lexical-syntactic category (represented by a tag).

The grand jury commented on a number of other topicsDT JJ NN VBD IN DT NN IN JJ NNS

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 3 / 30

Page 4: A Connectionist approach to Part-Of-Speech Tagging

Ambiguity and applications

Words often have more than one POS tag: backThe back door = JJOn my back = NNWin the voters back = RBPromised to back the bill = VB

Ambiguity!!!

Applications: speech synthesis, speech recognition, informationretrieval, word-sense disambiguation, machine translation, ...

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 4 / 30

Page 5: A Connectionist approach to Part-Of-Speech Tagging

How hard is POS tagging? Measuring ambiguity

Peen Treebank (45-tag corpus)Unambiguous (1 tag) 38,857 (81%)Ambiguous (2-7 tags) 8,844 (19%)Details: 2 tags 6,731

3 tags 1,6214 tags 3575 tags 906 tags 327 tags 6 (well, set, round, open, fit, down)8 tags 4 (’s, half, back, a)9 tags 3 (that, more, in)

A simple approach which assigns only the most common tag to eachword performs with 90% accuracy!

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 5 / 30

Page 6: A Connectionist approach to Part-Of-Speech Tagging

Unknown Words

How can one assign a tag to a given word if that word is unknown tothe tagger?

Unknown words are the hardest problem for POS tagging!

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 6 / 30

Page 7: A Connectionist approach to Part-Of-Speech Tagging

Index

1 POS tagging

2 Probalilistic tagging

3 Connectionist tagging

4 The Penn Treebank Corpus

5 The connectionist POS taggers

6 Conclusions

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 7 / 30

Page 8: A Connectionist approach to Part-Of-Speech Tagging

Probabilistic model

We are given a sentence: what is the best sequence of tags whichcorresponds to the sequence of words?

Probabilistic view: Consider all possible sequences of tags and out ofthis universe of sequences, choose the tag sequence which is mostprobable given the observation sequence of words.

tn1 = argmax

tn1

P(tn1 |wn

1 ) = argmaxtn1

P(wn1 |tn

1 )P(tn1 ).

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 8 / 30

Page 9: A Connectionist approach to Part-Of-Speech Tagging

Probabilistic model: Simplifications

To simplify:1 Words are independent of each other and a word’s identity only

depends on its tag→ lexical probabilities:

P(wn1 |tn

1 ) ≈n∏

i=1

P(wi |ti)

2 Another one establishes that the probability of one tag to appearonly depends on its predecessor tag (bigram, trigram, ...) →contextual probabilities:

P(tn1 ) ≈

n∏i=1

P(ti |ti−1).

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 9 / 30

Page 10: A Connectionist approach to Part-Of-Speech Tagging

Probabilistic model: Limitations

With these assumptions, a typical probabilistic model is expressed as:

tn1 = argmax

tn1

P(tn1 |wn

1 ) ≈ argmaxtn1

n∏i=1

P(wi |ti)P(ti |ti−1),

where tn1 is the best estimation of POS tags for the given sentence

wn1 = w1w2 . . .wn and considering that P(t1|t0) = 1.

1 It does not model long-distance relationships.2 The contextual information takes into account the context on the

left while the context on the right is not considered.

Both limitations can be overwhelmed using ANNs models.

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 10 / 30

Page 11: A Connectionist approach to Part-Of-Speech Tagging

Index

1 POS tagging

2 Probalilistic tagging

3 Connectionist tagging

4 The Penn Treebank Corpus

5 The connectionist POS taggers

6 Conclusions

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 11 / 30

Page 12: A Connectionist approach to Part-Of-Speech Tagging

Connectionist model

Promised to back the billVB TO ????? DT NN

MLPs as POS tags classifiers:MLP Input:

back — wi : the ambiguous input word, loc. cod. → projection layerVB , TO, DT — ci : the tags of the words surrounding the ambiguousword to be tagged (past and future context), locally codifiedJJ, NN, RB, VB — Ti : the set of POS tags that have been found tobe related to the ambiguous input word wi in the training corpus,locally codified

MLP Output:the probability of each tag given the input:Pr(JJ|input)=0.1, Pr(NN|input)=0.3, Pr(RB|input)=0.2,Pr(VB|input)=0.4

Therefore, the network learnt the following mapping:

F (wi , ci ,Ti , ti ,Θ) = PrΘ(ti |wi , ci ,Ti)

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 12 / 30

Page 13: A Connectionist approach to Part-Of-Speech Tagging

And what about Unknown Words?

When evaluating the model, there are words that have never beenseen during training; therefore, they do not belong neither to thevocabulary of known ambiguous words nor to the vocabulary of knownnon-ambiguous words→ “Unknown words”: the hardest problem forthe network to tag correctly.

Two solutions:1 MLPAll model was trained with known ambiguous words and

unknown words.2 MLPCombined system is composed by two independent MLPs

MLPKnow : the MLP specialized for known ambiguous wordsMLPUnk : the MLP specialized in unknown words

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 13 / 30

Page 14: A Connectionist approach to Part-Of-Speech Tagging

Solution 1: MLPAllOne MLP for tagging known ambiguous words and unknown words

wi : known ambiguousinput word locallycodified at the input ofthe projection layer

ui : unknown words arecodified as anadditional input unit

Context: two labels ofpast context and onelabel of future context

Ti : set of POS tagsrelated to theambiguous input wordwi . When the input isan unknown word,every tag in Ti isactivated.

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 14 / 30

Page 15: A Connectionist approach to Part-Of-Speech Tagging

Solution 2: MLPCombinedTwo MLPs: MLPKnow + MLPUnk

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 15 / 30

Page 16: A Connectionist approach to Part-Of-Speech Tagging

MLPCombined : POS tagging systemTwo MLPs: MLPKnow + MLPUnk (detail)

MLPUnk : A multilayer perceptron dedicated to POS tag unknown words. Theinput to the multilayer perceptron is the context of the unknown word at time i .

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 16 / 30

Page 17: A Connectionist approach to Part-Of-Speech Tagging

Index

1 POS tagging

2 Probalilistic tagging

3 Connectionist tagging

4 The Penn Treebank Corpus

5 The connectionist POS taggers

6 Conclusions

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 17 / 30

Page 18: A Connectionist approach to Part-Of-Speech Tagging

The Penn Treebank Corpus

This corpus consists of a set of English texts from the Wall StreetJournal distributed in 25 directories containing 100 files withseveral sentences each one.The total number of words is about one million, being 49 000different.The whole corpus was automatically labeled with a POS tag and asyntactic labeling.The POS tag labeling consists of a set of 45 different categories.One more tag was added to take into account the beginning of asentence (the ending of a sentence is in the original set of tags),thus resulting in a total amount of 46 different POS tags.

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 18 / 30

Page 19: A Connectionist approach to Part-Of-Speech Tagging

The Penn Treebank Corpus: 46 POS tags

Tag Description ExampleCC Coordin. Conjunction and, but, orCD Cardinal number one, two, threeDT Determiner a, theFW Foreign word mea culpaIN Preposition/sub-conj of, in, byJJ Adjective yellowJJR Adj., comparative biggerJJS Adj., superlative wildestNN Noun, sing. or mass llamaNNS Noun, plural llamasNNP Proper noun, singular IBM. . .

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 19 / 30

Page 20: A Connectionist approach to Part-Of-Speech Tagging

The Penn Treebank Corpus: Partitions

Dataset Directory Num. of Num. of Vocabularysentences words size

Training 00-18 38 219 912 344 34 064Tuning 19-21 5 527 131 768 12 389Test 22-24 5 462 129 654 11 548Total 00-24 49 208 1 173 766 38 452

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 20 / 30

Page 21: A Connectionist approach to Part-Of-Speech Tagging

The Penn Treebank Corpus: Vocabulary (I)

In order to reduce the dimensions of the vocabulary and to obtainsome samples of unknown words for training:

1 A cut-off was set to an absolute frequency of 10 repetitions. If aword had a frequency smaller than 10, then it was treated as anunknown word.

2 POS tags appearing in a word 90% less than the most repeatedtag in such word, and less than 3 times in an absolute frequency,were eliminated, because these tags are mainly erroneous tags.

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 21 / 30

Page 22: A Connectionist approach to Part-Of-Speech Tagging

For example, the word “are” is ambiguous in the corpus because itappears

3 639 times like a VBP andone time like NN, NNP and IN.

The NN, NNP and IN tags are errors, as shown in the sentence:

“Because many of these subskills –the symmetry of geometricalfigures, metric measurement of volume, or pie and bar graphs, forexample– are only a small part of the total fifth-grade curriculum, Mr.Kaminski says, the preparation kits wouldn’t replicate too many, if theirreal intent was general instruction or even general familiarization withtest procedures.”

where “are” is labeled like NN.

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 22 / 30

Page 23: A Connectionist approach to Part-Of-Speech Tagging

The Penn Treebank Corpus: Vocabulary (II)

Vocabulary in TrainingAmbiguous words: 2 563 (from 378 898 ambiguous running words)Unknown word symbol: 2 826 (from 48 824 unknown runningwords)

Dataset Num. of Unambiguous Ambiguous Unknownrunningwords

Training 912 344 484 622 378 898 48 824Tuning 131 768 65 552 53 956 6 733Test 129 654 63 679 54 607 5 906Total 1 173 766 613 853 487 461 61 463

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 23 / 30

Page 24: A Connectionist approach to Part-Of-Speech Tagging

Index

1 POS tagging

2 Probalilistic tagging

3 Connectionist tagging

4 The Penn Treebank Corpus

5 The connectionist POS taggers

6 Conclusions

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 24 / 30

Page 25: A Connectionist approach to Part-Of-Speech Tagging

The connectionist POS taggers

Projection layer.Error backpropagation algorithm for training.The topology and parameters of multilayer perceptrons in thetrainings were selected in previous experimentation.For the experiments we have used a toolkit for pattern recognitiontasks developed by our research group.

MLP MLPCombined

parameters MLPAll MLPKnow MLPUnk

Input l. size |T |(p + f + 1) + |Ω| + 1 |T |(p + f + 1) + |Ω| |T |(p + f )Output l. size |T | |T | |T |Projection l. size 128 128 –Hidden l. size 100 100 100Hidden l. act. func. Hyperbolic TangentOutput l. act. func. SoftmaxLearning rate 0.005Momentum 0.001Weight decay 0.0000001

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 25 / 30

Page 26: A Connectionist approach to Part-Of-Speech Tagging

Performance on the tuning set

POS tagging error rate for the tuning set varying the context (p is thepast context, and f is the future context). Known refers to thedisambiguation error for known ambiguous words. Unk refers to thePOS tag error for unknown words. Total is the total POS tag error, withambiguous, non-ambiguous, and unknown words.

Model p f Known Unk Total2 1 5.8% 37.4% 4.3%

MLPAll 3 1 5.9% 37.2% 4.3%4 1 6.1% 38.3% 4.5%2 1 5.7% 36.8% 4.2%

MLPCombined 3 1 5.9% 37.5% 4.3%4 1 6.0% 38.9% 4.4%

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 26 / 30

Page 27: A Connectionist approach to Part-Of-Speech Tagging

Test POS tagging performance

Best system: MLPCombined system with a past context of two labels anda future context of one label.

POS tagging error rate for the tuning and test sets with theMLPCombined model. Known refers to the disambiguation error forknown ambiguous words. Unk refers to the POS tag error for unknownwords. Total is the total POS tag error, with ambiguous,non-ambiguous, and unknown words.

Partition p f Known Unk TotalTuning 2 1 5.7% 36.8% 4.2%Test 2 1 6.1% 36.7% 4.3%

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 27 / 30

Page 28: A Connectionist approach to Part-Of-Speech Tagging

Index

1 POS tagging

2 Probalilistic tagging

3 Connectionist tagging

4 The Penn Treebank Corpus

5 The connectionist POS taggers

6 Conclusions

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 28 / 30

Page 29: A Connectionist approach to Part-Of-Speech Tagging

Conclusions: Comparison with other tagging systems

Model Know Unk TotalSVMs: Support Vector Machines 6.1 11.0 2.8MT: Machine Translation techniques - 23.5 3.5TnT: Statistic Models 7.8 14.1 3.5NetTagger - - 3.8HMM Tagger - - 5.8RANN: Recurrent NNs - - 8.0Our approach MLPCombined 6.1 36.7 4.3

But if we focus our attention in the error in known ambiguous words,our model is comparable to SVMs (they obtained a 6.1% POS taggingerror rate). The major difference is that the adjustment of the unknownword classification is more accurate in the referenced works than inour approach.

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 29 / 30

Page 30: A Connectionist approach to Part-Of-Speech Tagging

Conclusions: Future works

In this line, our inmediate goal is to improve the performance of theMLPUnk network.

When dealing with unknown words, introducing relevant morphologicalinformation related to the unknown input word can be useful for POStagging.

A Connectionist approach to POS Tagging () ICNC 2009 5-7 October 2009, Funchal 30 / 30