october 2006advanced topics in nlp1 csa3050: nlp algorithms finite state transducers for...

23
October 2006 Advanced Topics in NLP 1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

Upload: esther-harvey

Post on 27-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 1

CSA3050: NLP Algorithms

Finite State Transducers for Morphological Parsing

Page 2: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 2

Acknowledgement

• This lecture is largely based on material from Jurafsky & Martin chapter 3

Page 3: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 3

Resumé

• FSAs are equivalent to regular languages

• FSTs are equivalent to regular relations (over pairs of regular languages)

• FSTs are like FSAs but with complex labels.

• We can use FSTs to transduce between surface and lexical levels.

Page 4: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 4

Morphological Parsing

• Given the input cats, we’d like to outputcat +N +Pl, telling us that cat is a plural noun.

• Given the Spanish input bebo, we’d like to outputbeber +V +PInd +1P +Sg telling us that bebo is the present indicative first person singular form of the Spanish verb beber, ‘to drink’.

Page 5: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 5

Two-Level Paradigm

from Jurafsky & Martin

Page 6: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 6

English Plural

surface lexical

cat cat+N+Sg

cats cat+N+Pl

foxes fox+N+Pl

mice mouse+N+Pl

sheep sheep+N+Pl

sheep+N+Sg

Page 7: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 7

Morphological Anlayser

To build a morphological analyser we need:• lexicon: the list of stems and affixes, together with

basic information about them• morphotactics: the model of morpheme ordering

(eg English plural morpheme follows the noun rather than a verb)

• orthographic rules: these spelling rules are used to model the changes that occur in a word, usually when two morphemes combine (e.g., fly+s = flies)

Page 8: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 8

Lexicon & Morphotactics

• Typically list of word parts (lexicon) and the models of ordering can be combined together into an FSA which will recognise the all the valid word forms.

• For this to be possible the word parts must first be classified into sublexicons.

• The FSA defines the morphotactics (ordering constraints).

Page 9: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 9

Sublexicons to classify the list of word parts

reg-noun irreg-pl-noun

irreg-sg-noun

plural

cat mice mouse -s

fox sheep sheep

geese goose

Page 10: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 10

FSA Expresses Morphotactics (ordering model)

Page 11: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 11

Towards the Analyser

• We can use lexc or xfst to build such an FSA (see lex1.lexc)

• To augment this to produce an analysis we must create a transducer Tnum which maps between the lexical level and an "intermediate" level that is needed to handle the spelling rules of English.

Page 12: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 12

Three Levels of Analysis

Page 13: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 13

1. Tnum: Noun Number Inflection

• multi-character symbols• morpheme boundary ^• word boundary #

Page 14: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 14

Towards the Analyser

• We do this by first allowing the lexicon itself to also have two levels. Since surface geese maps to lexical goose, the new lexical entry will be “g:g o:e o:e s:s e:e” (see lex2.lexc)

• We must also add the appropriate morphological labels (see lex3.lexc)

Page 15: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 15

Intermediate Form to Surface

• The reason we need to have an intermediate form is that funny things happen at morpheme boundaries, e.g.cat^s catsfox^s foxesfly^s flies

• The rules which describe these changes are called orthographic rules or "spelling rules".

Page 16: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 16

More English Spelling Rules

• consonant doubling: beg / begging

• y replacement: try/tries

• k insertion: panic/panicked

• e deletion: make/making

• e insertion: watch/watches

• Each rule can be stated in more detail ...

Page 17: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 17

Spelling Rules

• Chomsky & Halle (1968) invented a special notation for spelling rules.

• A very similar notation is embodied in the "conditional replacement" rules of xfst.

E -> F || L _ Rwhich means replace E with F when it appears between left context L and right context R

Page 18: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 18

A Particular Spelling Rule

This rule does e-insertion

^ -> e || x _ s#

Page 19: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 19

e insertion over 3 levelsThe rule corresponds to the mapping betweensurface and intermediate levels

Page 20: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 20

e insertion as an FST

Page 21: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 21

Incorporating Spelling Rules

• Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned".

• The set of spelling rules is positioned between the surface level and the intermediate level.

• Parallel execution of FSTs can be carried out:– by simulation: in this case FSTs must first be aligned.

– by first constructing a a single FST corresponding to their intersection.

Page 22: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 22

Putting it all together

execution of FSTi

takes place in parallel

Page 23: October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing

October 2006 Advanced Topics in NLP 23

Kaplan and KayThe Xerox View

FSTi are alignedbut separate

FSTi intersectedtogether