1 unsupervised morphological segmentation with log-linear models hoifung poon university of...

72
1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

Upload: kayla-flood

Post on 26-Mar-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

1

Unsupervised Morphological Segmentation

With Log-Linear Models

Hoifung PoonUniversity of Washington

Joint Work with

Colin Cherry and Kristina Toutanova

Page 2: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

2

Machine Learning in NLP

Page 3: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

3

Machine Learning in NLP

Unsupervised Learning

Page 4: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

4

Machine Learning in NLP

Unsupervised Learning Log-Linear Models

?

Page 5: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

5

Machine Learning in NLP

Unsupervised Learning Log-Linear Models

Little work except for a couple of cases

Page 6: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

6

Machine Learning in NLP

Unsupervised Learning Log-Linear Models

Global Features

?

Page 7: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

7

Machine Learning in NLP

Unsupervised Learning Log-Linear Models

Global Features

???

Page 8: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

8

Machine Learning in NLP

Unsupervised Learning Log-Linear Models

Global Features

We developed a method

for Unsupervised Learningof Log-Linear Modelswith Global Features

Page 9: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

9

Machine Learning in NLP

Unsupervised Learning Log-Linear Models

Global Features

We applied it to morphological segmentation

and reduced F1 errorby 10%50%

compared to the state of the art

Page 10: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

10

Outline

Morphological segmentation Our model Learning and inference algorithms Experimental results Conclusion

Page 11: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

11

Morphological Segmentation

Breaks words into morphemes

Page 12: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

12

Morphological Segmentation

Breaks words into morphemesgovernments

Page 13: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

13

Morphological Segmentation

Breaks words into morphemesgovernments govern ment s

Page 14: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

14

Morphological Segmentation

Breaks words into morphemesgovernments govern ment slm$pxtm

(according to their families)

Page 15: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

15

Morphological Segmentation

Breaks words into morphemesgovernments govern ment slm$pxtm l m$pm t m

(according to their families)

Page 16: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

16

Morphological Segmentation

Breaks words into morphemesgovernments govern ment slm$pxtm l m$pm t m

(according to their families)

Key component in many NLP applications Particularly important for morphologically-rich

languages (e.g., Arabic, Hebrew, …)

Page 17: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

17

Why Unsupervised Learning ?

Text: Unlimited supplies in any language Segmentation labels?

Only for a few languages Expensive to acquire

Page 18: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

18

Why Log-Linear Models ?

Can incorporate arbitrary overlapping features

E.g., Al rb (the lord) Morpheme features:

Substrings Al, rb are likely morphemes Substrings Alr, lrb are not likely morphemes Etc.

Context features: Substrings between Al and # are likely morphemes Substrings between lr and # are not likely morphemes Etc.

Page 19: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

19

Why Global Features ?

Words can inform each other on segmentation E.g., Al rb (the lord), l Al rb (to the lord)

Page 20: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

20

State of the Art in Unsupervised Morphological Segmentation

Use directed graphical models Morfessor [Creutz & Lagus 2007]

Hidden Markov Model (HMM) Goldwater et al. [2006]

Based on Pitman-Yor processes Snyder & Barzilay [2008a, 2008b]

Based on Dirichlet processes Uses bilingual information to help segmentation

Phrasal alignment Prior knowledge on phonetic correspondence

E.g., Hebrew w Arabic w, f; ...

Page 21: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

21

Unsupervised Learning with Log-Linear Models

Few approaches exist to this date Contrastive estimation [Smith & Eisner 2005] Sampling [Poon & Domingos 2008]

Page 22: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

22

This Talk

First log-linear model for unsupervised morphological segmentation

Combines contrastive estimation with sampling Achieves state-of-the-art results Can apply to semi-supervised learning

Page 23: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

23

Outline

Morphological segmentation Our model Learning and inference algorithms Experimental results Conclusion

Page 24: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

24

Log-Linear Model

State variable x X Features fi: X R

Weights λi

Defines probability distribution over the states

1( ) exp ( )i i

i

P x f xZ

Page 25: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

25

Log-Linear Model

State variables x X Features fi: X R

Weights λi

Defines probability distribution over the states

1( ) exp ( )i i

i

P x f xZ

'

exp ( ')i i

x X i

Z f x

Page 26: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

26

States for Unsupervised Morphological Segmentation

Words

wvlAvwn, Alrb, … Segmentation

w vlAv wn, Al rb, … Induced lexicon (unique morphemes)

w, vlAv, wn, Al, rb

Page 27: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

27

Features for Unsupervised Morphological Segmentation

Morphemes and contexts Exponential priors on model complexity

Page 28: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

28

Morphemes and Contexts Count number of occurrences Inspired by CCM [Klein & Manning, 2001]

E.g., w vlAv wn

wvlAvwn(##_##)

vlAv(#w_wn)

w(##_vl)

wn(Av_##)

Page 29: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

29

Complexity-Based Priors

Lexicon prior: On lexicon length (total number of characters) Favor fewer and shorter morpheme types

Corpus prior: On number of morphemes (normalized by word length) Favor fewer morpheme tokens

E.g., l Al rb, Al rb l, Al, rb 5 l Al rb 3/5 Al rb 2/4

Page 30: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

30

Lexicon Prior Is Global Feature

Renders words interdependent in segmentation E.g., lAlrb, Alrb

lAlrb ?

Alrb ?

Page 31: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

31

Lexicon Prior Is Global Feature

Renders words interdependent in segmentation E.g., lAlrb, Alrb

lAlrb l Al rbAlrb ?

Page 32: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

32

Lexicon Prior Is Global Feature

Renders words interdependent in segmentation E.g., lAlrb, Alrb

lAlrb l Al rbAlrb Al rb

Page 33: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

33

Lexicon Prior Is Global Feature

Renders words interdependent in segmentation E.g., lAlrb, Alrb

lAlrb l Alrb

Alrb ?

Page 34: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

34

Lexicon Prior Is Global Feature

Renders words interdependent in segmentation E.g., lAlrb, Alrb

lAlrb l Alrb

Alrb Alrb

Page 35: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

35

Probability Distribution

For corpus W and segmentation S

( , ) ( , )

( , )

1( , ) exp ( )

( ) / ( )

c c

Lex W S c Ctxt W S

Lex W S

w W

n n

P W S LenZ

NumMorph w Len w

Morphemes

Contexts

Lexicon Prior

Corpus Prior

Page 36: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

36

Outline

Morphological segmentation Our model Learning and inference algorithms Experimental results Conclusion

Page 37: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

37

Learning withLog-Linear Models

Maximizes likelihood of the observed data

Moves probability mass to the observed data From where? The set X that Z sums over

Normally, X = {all possible states} Major challenge:

Efficient computation (approximation) of the sum Particularly difficult in unsupervised learning

'

exp ( ')i i

x X i

Z f x

Page 38: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

38

Contrastive Estimation

Smith & Eisner [2005]

X = a neighborhood of the observed data Neighborhood Pseudo-negative examples Discriminate them from observed instances

Page 39: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

39

Problem with Contrastive Estimation

Objects are independent from each other Using global features leads to intractable inference In our case, could not use the lexicon prior

Page 40: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

40

Sampling to the Rescue

Similar to Poon & Domingos [2008]

Markov chain Monte Carlo Estimates sufficient statistics based on samples Straightforward to handle global features

Page 41: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

41

Our Learning Algorithm

Combines both ideas Contrastive estimation

Creates an informative neighborhood Sampling

Enables global feature (the lexicon prior)

Page 42: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

42

Learning Objective

Observed: W* (words) Hidden: S (segmentation) Maximizes log-likelihood of observing the words

( *) log ( *, )S

L W P W S

Page 43: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

43

Neighborhood

TRANS1 Transpose any pair of adjacent characters Intuition: Transposition usually leads to a non-word E.g.,

lAlrb Allrb, llArb, …

Alrb lArb, Arlb, …

Page 44: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

44

Optimization

Gradient descent

| * , ( *, )

E [ ] E [ ]S W W Si ii

L W Sf f

Page 45: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

45

Supervised Learning andSemi-Supervised Learning

Readily applicable if there are

labeled segmentations (S*)

Supervised: Labels for all words Semi-supervised: Labels for some words

| *, * , ( *, *)

E [ ] E [ ]S W S W Si ii

L W Sf f

Page 46: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

46

Inference: Expectation

Gibbs sampling ES|W*[fi]

For each observed word in turn, sample next segmentation, conditioning on the rest

EW,S[fi]

For each observed word in turn, sample a word from neighborhood and next segmentation, conditioning on the rest

Page 47: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

47

Inference: MAP Segmentation

Deterministic annealing Gibbs sampling with temperature Gradually lower the temperature from 10 to 0.1

Page 48: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

48

Outline

Morphological segmentation Our model Learning and inference algorithms Experimental results Conclusion

Page 49: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

49

Dataset

S&B: Snyder & Barzilay [2008a, 2008b] About 7,000 parallel short phrases Arabic and Hebrew with gold segmentation

Arabic Penn Treebank (ATB): 120,000 words

Page 50: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

50

Methodology

Development set: 500 words from S&B Use trigram context in our full model Evaluation: Precision, recall, F1 on

segmentation points

Page 51: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

51

Experiment Objectives

Comparison with state-of-the-art systems Unsupervised Supervised or semi-supervised

Relative contributions of feature components

Page 52: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

52

Experiment: S&B (Unsupervised)

Snyder & Barzilay [2008b] S&B-MONO: Uses monolingual features only S&B-BEST: Uses bilingual information

Our system: Uses monolingual features only

Page 53: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

53

Results: S&B (Unsupervised)

50

55

60

65

70

75

80

S&B-MONOS&B-BESTOur System

F1

Page 54: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

54

Results: S&B (Unsupervised)

50

55

60

65

70

75

80

S&B-MONOS&B-BESTOur System

F1

Page 55: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

55

Results: S&B (Unsupervised)

50

55

60

65

70

75

80

S&B-MONOS&B-BESTOur System

F1

Page 56: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

56

Results: S&B (Unsupervised)

50

55

60

65

70

75

80

S&B-MONOS&B-BESTOur System

F1

Reduces F1 error by 40%

Page 57: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

57

Results: S&B (Unsupervised)

50

55

60

65

70

75

80

S&B-MONOS&B-BESTOur System

F1

Reduces F1 error by 21%

Page 58: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

58

Experiment: ATB (Unsupervised)

Morfessor Categories-MAP [Creutz & Lagus 2007]

Our system

Page 59: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

59

Results: ATB

70

72

74

76

78

MOR-MAPOur System

F1

Reduces F1 error by 11%

Page 60: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

60

Experiment: Ablation Tests

Conducted on the S&B dataset Change one feature component in each test

Priors Context features

Page 61: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

61

Results: Ablation Tests

30

40

50

60

70

80

F1

No priorsLexicon prior only

Corpus prior only

Both priors are crucial

FULL NO-PR COR NO-CTXTLEX

Page 62: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

62

Results: Ablation Tests

30

40

50

60

70

80

F1

No context features

Overlapping context features are important

FULL NO-PR COR NO-CTXTLEX

Page 63: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

63

Experiment: S&B (Supervised and Semi-Supervised)

Snyder & Barzilay [2008a] S&B-MONO-S: Monolingual features and labels S&B-BEST-S: Bilingual information and labels

Our system: Monolingual features and labelsPartial or all labels (25%, 50%, 75%, 100%)

Page 64: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

64

Results: S&B (Supervised and Semi-Supervised)

75

80

85

90

F1

S&BMONO-S

Page 65: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

65

Results: S&B (Supervised and Semi-Supervised)

75

80

85

90

F1

S&BMONO-S

S&BBEST-S

Page 66: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

66

Results: S&B (Supervised and Semi-Supervised)

75

80

85

90

F1

S&BMONO-S

S&BBEST-S

Our-S25%

Page 67: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

67

Results: S&B (Supervised and Semi-Supervised)

75

80

85

90

F1

S&BMONO-S

S&BBEST-S

Our-S25%

Our-S50%

Page 68: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

68

Results: S&B (Supervised and Semi-Supervised)

75

80

85

90

F1

S&BMONO-S

S&BBEST-S

Our-S25%

Our-S50%

Our-S75%

Page 69: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

69

Results: S&B(Supervised and Semi-Supervised)

75

80

85

90

F1

S&BMONO-S

S&BBEST-S

Our-S25%

Our-S50%

Our-S75%

Our-S100%

Reduces F1 error by 46% compared to S&B-MONO-S

Reduces F1 error by 36% compared to S&B-BEST-S

Page 70: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

70

Conclusion

We developed a method

for Unsupervised Learning

of Log-Linear Models

with Global Features Applied it to morphological segmentation Substantially outperforms state-of-the-art systems Effective for semi-supervised learning as well Easy to extend with additional features

Page 71: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

71

Future Work

Apply to other NLP tasks Interplay between neighborhood and features Morphology

Apply to other languages Modeling internal variations of morphemes Leverage multi-lingual information Combine with other NLP tasks (e.g., MT)

Page 72: 1 Unsupervised Morphological Segmentation With Log-Linear Models Hoifung Poon University of Washington Joint Work with Colin Cherry and Kristina Toutanova

72

Thanks Ben Snyder …

For his most generous help with S&B dataset