lecture 16: fsa, morphology, fstnaraehan/ling1330/lecture16.pdflecture 16: fsa, morphology, fst ling...

20
Lecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

Upload: others

Post on 02-Apr-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

Lecture 16:

FSA, Morphology, FST

Ling 1330/2330 Computational Linguistics

Na-Rae Han, 10/22/2019

Page 2: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

Outline

10/21/2019 2

FSA: HW5 review

Morphology and FST

Jurafsky & Martin (2nd Ed!) Ch.3 Words and Transducers

Hulden (2011) Morphological analysis with FST

foma!

Page 3: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

Deterministic vs. non-deterministic FSA

10/21/2019 3

3STARTa

a b

1 2b

Which string(s) does this FSA accept? Answer: ab, aab, abb,

aaab, aabbb, aaaaabbbbbb, ...

What is its RE equivalent? Answer: a+b+

3STARTa

a b

1 2b

Non-deterministic. There are two path

choices upon reading 'a'

Equivalent FSA, deterministic.

A non-deterministic FSA can be algorithmically converted

(LINK) into an equivalent deterministic FSA.

Page 4: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

4.17

10/21/2019 4

Accepted by this FSA?'' 'a' 'aa' 'aaa' 'aaaa' 'b' 'ab' 'baaa'

Equivalent regular expression?

/a*/

1START

Also: 'aaaaa', 'aaaaaa', …

a

Page 5: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

10/21/2019 5

Accepted by this FSA?'' 'a' 'aa' 'aaa' 'aaaa' 'b' 'ab' 'baaa'

Equivalent regular expression?

/(a|b)*/

1START

a

b

Basically anystring made of a and b!

If a and b are the only alphabet,

equivalent to

/.*/

Page 6: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

4.18

10/21/2019 6

Accepted by this FSA?'' 'a' 'aa' 'aaa' 'b' 'ab' 'aabb' 'baa' 'bab'

Equivalent regular expression?

/a*b*/

2STARTb

Also: 'bb', 'abb', 'aaabbbbb', …

a

1

b

Wrong FSA shown in the textbook. This is the

correct FSA that matches the regex.

Any number of a's followed by any number of b's

Page 7: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

4.18 (as in the textbook p.115)

10/21/2019 7

Accepted by this FSA?'' 'a' 'b' 'ab' 'ba' 'aa' 'aab' 'aaabb'

'aaaaaa' 'bbbbbb' 'bbbaa' 'aaaab' 'abbbbbb'

Equivalent regular expression?

/a*(ab*)?/

3STARTa b

a b

21This part is

in fact redundant.

If b is going to occur at all, it must be preceded by at

least one a

Page 8: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

HW5, Part2 C.

10/21/2019 8

Accepted by this FSA?

'b' does not affect the acceptability.

Accepts any string with an event number of a's, including zero.

Regular expression?

/b*(ab*ab*)*/

/(b*ab*a)*b*/These two expressions are

equivalent.

Page 9: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

D. English morpho-syntax as FSA

10/21/2019 9

Arc labels (=vocabulary): English morphemes

Set of accepted strings: legitimate English words

Which words are in this language?

Which are not?

What's the corresponding regular expression?

/(thank|joy|taste|thought)((ful|less)(ly)?)?/

21 3 4START

thank

joy

taste

thought

ful

less

ly

Page 10: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

D. English morpho-syntax as FSA

10/21/2019 10

Arc labels (=vocabulary): English morphemes

Set of accepted strings: legitimate English words

Which words are in this language?

Which are not?

What's the corresponding regular expression?

/(thank|joy|taste|thought)((ful|less)(ly)?)?/

21 3 4START

thank

joy

taste

thought

ful

less

ly

Can we implement the ENTIRE English

morphological grammar this way, i.e., as a FSA?

Page 11: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

Computational morphology

10/22/2019 11

Morphological parsing/analysis

Input: a word

Output: an analysis of the structure of the word

Morphological generation

Input: an analysis of the structureof the word

Output: a word

beg+V+PresPart

begging

Page 12: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

FST: Finite-State Transducer

10/22/2019 12

FST transition arcs have two levels: UPPER and LOWER. FSTs have two sets of finite

alphabets for each level.

Transitioning involves reading both upper and lower labels.

FSA consists of: A set of states. One state is initial; each state is either final

(=accepting) or non-final.

A set of transition arcs between states with a label.

The machine starts at the initial state, and then transits to a next state through an arc, reading the label

Page 13: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

Upper side and lower side

10/22/2019 13

Upper side (=underlying form)

Lower side (=surface form)

beg+V+PresPart

begging

FST

Confusing? Yes, but remember you look up "begging" to find it's a

present participle form of "beg"

Page 14: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

J&M (ed.2) Ch.3

10/22/2019 14

Lecture continues, based on the book chapter

Make sure to review!

Page 15: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

Introducing: foma

10/21/2019 15

https://fomafst.github.io/

A compiler of finite-state machines (FSA and FST)

FSA: you already know

FST: Finite-State Transducer

A modern incarnation of Xerox's classic FST suite: XFST and LEXC.

Page 16: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

regex in foma: pitfalls

10/22/2019 16

Foma takes regular expression syntax from Xerox's FST tools, which incorporate many linguistic rule conventions

foma's regex syntax differ from the standard (Perl, Python) syntax in some key aspects, most notably:

? vs. ()

() vs. []

Additionally, foma adopts multi-character symbols; SPACEis meaningful.

"abc" is a single symbol, "a b c" is three symbols concatenated

Refer to:

https://github.com/mhulden/foma/blob/master/foma/docs/simpleintro.md#regex-basics

Page 17: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

Foma can compile FSA from regex

10/22/2019 17

Regular expression is compiled into a

deterministic FSA

Page 18: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

English morph syntax as FSA

10/21/2019 18

Here, "thank", "ful", etc. are construed as multi-character symbols.

When building a morphological parsers, we don't normally treat morphemes are such. (WHY?)

Page 19: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

geese and mice

10/22/2019 19

"view" might not work on Win and OS X.

Workaround in Ex8.

Page 20: Lecture 16: FSA, Morphology, FSTnaraehan/ling1330/Lecture16.pdfLecture 16: FSA, Morphology, FST Ling 1330/2330 Computational Linguistics Na-Rae Han, 10/22/2019

Wrapping up

10/21/2019 20

Exercise 8 out

Due Thursday

Install foma and try it out

Thursday: more on Morphology and FST

Eva's presentation on Ilocano morphology