an unsupervised method for uncovering morphological...

74
An Unsupervised Method for Uncovering Morphological Chains Karthik Narasimhan Regina Barzilay Tommi Jaakkola CSAIL, Massachusetts Institute of Technology 1

Upload: others

Post on 01-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

An Unsupervised Method for Uncovering Morphological Chains

Karthik NarasimhanRegina BarzilayTommi Jaakkola

CSAIL, Massachusetts Institute of Technology

1

Page 2: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Morphological Chains

2

Page 3: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Morphological ChainsChains to model the formation of words.

2

Page 4: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Morphological ChainsChains to model the formation of words.

paint → painting → paintings

2

Page 5: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Morphological ChainsChains to model the formation of words.

paint → painting → paintings

2

Richer representation than traditional scenarios

Page 6: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Morphological ChainsChains to model the formation of words.

paint → painting → paintings

2

Richer representation than traditional scenarios

Segmentation

Page 7: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Morphological ChainsChains to model the formation of words.

paint → painting → paintings

2

Richer representation than traditional scenarios

Segmentation Paradigms

Page 8: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Morphological ChainsChains to model the formation of words.

paint → painting → paintings

2

Richer representation than traditional scenarios

Segmentation Paradigms

Page 9: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Our Approach

3

paint → painting

Core Idea: Unsupervised discriminative model over pairs of words in the chain.

Page 10: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Our Approach

• Orthographic featuresMorfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013

3

paint → painting

Core Idea: Unsupervised discriminative model over pairs of words in the chain.

Page 11: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Our Approach

• Orthographic featuresMorfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013

• Semantic featuresSchone and Jurafsky, 2000; Baroni et al., 2002

3

paint → painting

Core Idea: Unsupervised discriminative model over pairs of words in the chain.

Page 12: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Our Approach

• Orthographic featuresMorfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013

• Semantic featuresSchone and Jurafsky, 2000; Baroni et al., 2002

• Handle transformations. (plan → planning)3

paint → painting

Core Idea: Unsupervised discriminative model over pairs of words in the chain.

Page 13: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Textual Cues

4

Page 14: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Textual CuesOrthographic

4

Page 15: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Textual CuesOrthographic

Patterns in the characters forming words.

4

Page 16: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Textual CuesOrthographic

Patterns in the characters forming words.

paint paints

painted

pain pains

pained

4

Page 17: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Textual CuesOrthographic

Patterns in the characters forming words.

paint paints

painted

pain pains

pained

pain paint

ran rant

4

Page 18: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Textual CuesOrthographic

Patterns in the characters forming words.

paint paints

painted

pain pains

pained

pain paint

ran rant

Semantic

Meaning embedded as vectors.

4

Page 19: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Textual CuesOrthographic

Patterns in the characters forming words.

paint paints

painted

pain pains

pained

pain paint

ran rant

A B cos(A,B)paint paints 0.68paint painted 0.60pain pains 0.60pain paint 0.11ran rant 0.09

Semantic

Meaning embedded as vectors.

4

Page 20: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Textual CuesOrthographic

Patterns in the characters forming words.

paint paints

painted

pain pains

pained

pain paint

ran rant

A B cos(A,B)paint paints 0.68paint painted 0.60pain pains 0.60pain paint 0.11ran rant 0.09

Semantic

Meaning embedded as vectors.

4

Page 21: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Task Setup

5

Training

Unannotated word list with frequencies

Word Vector Learning

Large text corpus

Wikipedia

a ability able about

395134 17793 56802

524355

Page 22: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

6

Page 23: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

6

nation → national → international → internationally

nation → national → nationally → internationally

Multiple chains possible for a word.

Page 24: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

6

nation → national → international → internationally

nation → national → nationally → internationally

Multiple chains possible for a word.

nation → national → international → internationally

nation → national → nationalize

Different chains can share word pairs.

Page 25: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Independence Assumption

7

Page 26: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Independence Assumption

7

Treat word-parent pairs separately

Page 27: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Independence Assumption

7

national

Word (w)

Treat word-parent pairs separately

Page 28: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Independence Assumption

7

national

Word (w)

nation

Parent (p)

Suffix

Type (t)

Treat word-parent pairs separately

Page 29: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Independence Assumption

7

national

Word (w)

nation

Parent (p)

Suffix

Type (t)

Treat word-parent pairs separately

Page 30: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Independence Assumption

7

national

Word (w)

Candidate (z)

nation

Parent (p)

Suffix

Type (t)

Treat word-parent pairs separately

Page 31: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

8

national

Word (w)

nation

Parent (p)

Suffix

Type (t)

Candidate (z)

Page 32: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

P (w, z) / e✓·�(w,z)

8

national

Word (w)

nation

Parent (p)

Suffix

Type (t)

Candidate (z)

Page 33: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

P (w, z) / e✓·�(w,z)

8

national

Word (w)

nation

Parent (p)

Suffix

Type (t)

Candidate (z)

Types - Prefix, Suffix, Transformations, Stop.

Page 34: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

• Templates for handling changes in stem during addition of affixes.

• Repetition template: PQ → PQQR (for each Q in alphabet). Ex.

• Feature template for each transformation.

Transformations

plan → planning

P Q R

9

Page 35: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Transformation types

10

Page 36: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

3 different transformations:

Transformation types

10

Page 37: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

3 different transformations:

• Repetition (plan → planning)

Transformation types

10

Page 38: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

3 different transformations:

• Repetition (plan → planning)

• Deletion (decide→deciding)

Transformation types

10

Page 39: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

3 different transformations:

• Repetition (plan → planning)

• Deletion (decide→deciding)

• Modification (carry → carried)

Transformation types

10

Page 40: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

3 different transformations:

• Repetition (plan → planning)

• Deletion (decide→deciding)

• Modification (carry → carried)

Trade-off between types of transformation and computational tractability.

Transformation types

10

Page 41: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

3 different transformations:

• Repetition (plan → planning)

• Deletion (decide→deciding)

• Modification (carry → carried)

Trade-off between types of transformation and computational tractability.

• These three do well for a range of languages and are computationally tractable: max O(|∑|2) for alphabet ∑

Transformation types

10

Page 42: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Features φ(w,z)

11

Page 43: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Features φ(w,z)

11

Orthographic

Page 44: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Features φ(w,z)

11

Orthographic

• Affixes: Indicator feature for top affixes

Page 45: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Features φ(w,z)

11

Orthographic

• Affixes: Indicator feature for top affixes

• Affix Correlation: pairs of affixes sharing set of stems (inter-, re-), (under-, over-)

Page 46: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Features φ(w,z)

11

Orthographic

• Affixes: Indicator feature for top affixes

• Affix Correlation: pairs of affixes sharing set of stems (inter-, re-), (under-, over-)

• Word freq. of parent

Page 47: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Features φ(w,z)

11

Orthographic

• Affixes: Indicator feature for top affixes

• Affix Correlation: pairs of affixes sharing set of stems (inter-, re-), (under-, over-)

• Word freq. of parent

• Transformation types with character bigrams

Page 48: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Features φ(w,z)

11

Orthographic

• Affixes: Indicator feature for top affixes

• Affix Correlation: pairs of affixes sharing set of stems (inter-, re-), (under-, over-)

• Word freq. of parent

• Transformation types with character bigrams

Semantic

• Cosine similarity between word vectors of word and parent

Page 49: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Features φ(w,z)

11

Orthographic

• Affixes: Indicator feature for top affixes

• Affix Correlation: pairs of affixes sharing set of stems (inter-, re-), (under-, over-)

• Word freq. of parent

• Transformation types with character bigrams

Semantic

• Cosine similarity between word vectors of word and parent

Cosine similarity with player

Page 50: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Learning

12

Page 51: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Learning

• Objective:

12

Page 52: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Learning

• Objective:

12

Y

w

P (w) =Y

w

X

z

P (w, z) =Y

w

X

z

e✓·�(w,z)

Pw02⌃⇤,z0 e✓·�(w

0,z0)

Page 53: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Learning

• Optimize likelihood using convex optimization: LBFGS-B (with regularization)

• Objective:

12

Y

w

P (w) =Y

w

X

z

P (w, z) =Y

w

X

z

e✓·�(w,z)

Pw02⌃⇤,z0 e✓·�(w

0,z0)

Page 54: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Learning

• Optimize likelihood using convex optimization: LBFGS-B (with regularization)

• Not tractable - requires summing over all possible strings in alphabet to calculate normalization constant, Z.

• Objective:

12

Y

w

P (w) =Y

w

X

z

P (w, z) =Y

w

X

z

e✓·�(w,z)

Pw02⌃⇤,z0 e✓·�(w

0,z0)

Page 55: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Learning

• Optimize likelihood using convex optimization: LBFGS-B (with regularization)

• Not tractable - requires summing over all possible strings in alphabet to calculate normalization constant, Z.

• Objective:

12

Y

w

P (w) =Y

w

X

z

P (w, z) =Y

w

X

z

e✓·�(w,z)

Pw02⌃⇤,z0 e✓·�(w

0,z0)

Page 56: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Contrastive Estimation

13

Page 57: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Contrastive Estimation

• Instead, we use Contrastive Estimation (Smith and Eisner, 2005):

13

Page 58: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Contrastive Estimation

• Instead, we use Contrastive Estimation (Smith and Eisner, 2005):

• Neighborhood of invalid words for each word to take probability mass from.

13

Page 59: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Contrastive Estimation

• Instead, we use Contrastive Estimation (Smith and Eisner, 2005):

• Neighborhood of invalid words for each word to take probability mass from.

• Transpose pair of adjacent chars from first and last k chars in word. (Ex. painting → paintnig)

13

Page 60: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Contrastive Estimation

• Instead, we use Contrastive Estimation (Smith and Eisner, 2005):

• Neighborhood of invalid words for each word to take probability mass from.

• Transpose pair of adjacent chars from first and last k chars in word. (Ex. painting → paintnig)

P (w, z) = e✓·�(w,z)P

w02N(w),z0 e✓·�(w0,z0)

13

Page 61: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Prediction• Predict chain in recursive fashion (argmax parent

candidate each time) till stop.

paintings

painting

paint

STOP

14

Page 62: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Segmentation Experiments • Three languages - English, Arabic, Turkish

• Evaluation: Morphological segmentation - Precision, Recall, F1 over individual segmentation points

• Baselines: Morfessor-Baseline, Morfessor CatMAP, AGMorph (Sirts and Goldwater, 2013) and Lee et al. (2011)

15

Page 63: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

F1 scores on MorphoChallenge

0

22.5

45

67.5

90

English Turkish Arabic

Morfessor-B Morfessor-C AGMorph Lee (M2)MorphoChain

16

Page 64: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Effect of data size

17

Page 65: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Affix Analysis

18

Page 66: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Error analysis• Errors (on a random subset of 50 words per language):

Language Over-segment Under-segment

English 10% 86%

Turkish 12% 78%

Arabic 60% 40%

• Most errors (58%) in Turkish due to parent words not present or having low count.

• Root template morphology of Arabic causes 14% of errors.

19

Page 67: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Sample segmentations

20

Page 68: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Morphology in Keyword Spotting

Page 69: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Morphology in Keyword Spotting• Keyword Spotting is the task of identifying keywords in speech

utterances • Major issue: Out of vocabulary words

Page 70: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Morphology in Keyword SpottingA

TWV

0

0.05

0.1

0.15

0.2

KWS-Test

Supervised Unsupervised (Morfessor)

KWS Results on OOV keywords in Turkish LLP (Narasimhan et al., 2014)

• Keyword Spotting is the task of identifying keywords in speech utterances

• Major issue: Out of vocabulary words

Page 71: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Morphology in Keyword SpottingA

TWV

0

0.05

0.1

0.15

0.2

KWS-Test

Supervised Unsupervised (Morfessor)

KWS Results on OOV keywords in Turkish LLP (Narasimhan et al., 2014)

• Adding morphemes helps KWS. • Better morphology can lead to better KWS (supervised vs.

unsupervised) • Need for better unsupervised segmentation.

• Keyword Spotting is the task of identifying keywords in speech utterances

• Major issue: Out of vocabulary words

Page 72: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

• MorphoChain outperforms state-of-the-art unsupervised morphological system on KWS

ATW

V

20.2

20.35

20.5

20.65

20.8

w/o web data

Morfessor MorphoChain

Morfessor vs MorphoChain for KWS

ATW

V

23

23.75

24.5

25.25

26

w/ web data

ATWV scores on Bengali VLLP

*in collaboration with Damianos Karakos and Rich Schwartz at BBN

Page 73: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

Conclusions• A new method for unsupervised morphological

analysis incorporating both orthographic and semantic features.

• Equals or outperforms state-of-the-art systems on morphological segmentation.

• Works well on downstream tasks.

Code: http://people.csail.mit.edu/karthikn/morphochain/

23

Page 74: An Unsupervised Method for Uncovering Morphological Chainspeople.csail.mit.edu/karthikn/assets/pdf/morphochain... · 2017-10-15 · Uncovering Morphological Chains Karthik Narasimhan

24