an unsupervised method for uncovering morphological...

An Unsupervised Method for Uncovering Morphological Chains

Karthik NarasimhanRegina BarzilayTommi Jaakkola

CSAIL, Massachusetts Institute of Technology

1

Morphological Chains

2

Morphological ChainsChains to model the formation of words.

2


paint → painting → paintings

2



2

Richer representation than traditional scenarios



2


Segmentation



2


Segmentation Paradigms

Our Approach

3

paint → painting

Core Idea: Unsupervised discriminative model over pairs of words in the chain.

Our Approach

• Orthographic featuresMorfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013

3

paint → painting


Our Approach


• Semantic featuresSchone and Jurafsky, 2000; Baroni et al., 2002

3

paint → painting


Our Approach


• Semantic featuresSchone and Jurafsky, 2000; Baroni et al., 2002

• Handle transformations. (plan → planning)3

paint → painting


Textual Cues

4

Textual CuesOrthographic

4


Patterns in the characters forming words.

4



paint paints

painted

pain pains

pained

4



paint paints

painted

pain pains

pained

pain paint

ran rant

4



paint paints

painted

pain pains

pained

pain paint

ran rant

Semantic

Meaning embedded as vectors.

4



paint paints

painted

pain pains

pained

pain paint

ran rant

A B cos(A,B)paint paints 0.68paint painted 0.60pain pains 0.60pain paint 0.11ran rant 0.09

Semantic

Meaning embedded as vectors.

4

Task Setup

5

Training

Unannotated word list with frequencies

Word Vector Learning

Large text corpus

Wikipedia

a ability able about

395134 17793 56802

524355

6

nation → national → international → internationally

nation → national → nationally → internationally

Multiple chains possible for a word.

6


nation → national → nationally → internationally

Multiple chains possible for a word.


nation → national → nationalize

Different chains can share word pairs.

Independence Assumption

7


7

Treat word-parent pairs separately


7

national

Word (w)



7

national

Word (w)

nation

Parent (p)

Suffix

Type (t)



7

national

Word (w)

Candidate (z)

nation

Parent (p)

Suffix

Type (t)


8

national

Word (w)

nation

Parent (p)

Suffix

Type (t)

Candidate (z)

P (w, z) / e✓·�(w,z)

8

national

Word (w)

nation

Parent (p)

Suffix

Type (t)

Candidate (z)

P (w, z) / e✓·�(w,z)

8

national

Word (w)

nation

Parent (p)

Suffix

Type (t)

Candidate (z)

Types - Prefix, Suffix, Transformations, Stop.

• Templates for handling changes in stem during addition of affixes.

• Repetition template: PQ → PQQR (for each Q in alphabet). Ex.

• Feature template for each transformation.

Transformations

plan → planning

P Q R

9

Transformation types

10

3 different transformations:


10


• Repetition (plan → planning)


10



• Deletion (decide→deciding)


10




• Modification (carry → carried)


10





Trade-off between types of transformation and computational tractability.


10





Trade-off between types of transformation and computational tractability.

• These three do well for a range of languages and are computationally tractable: max O(|∑|2) for alphabet ∑


10

Features φ(w,z)

11

Features φ(w,z)

11

Orthographic

Features φ(w,z)

11

Orthographic

• Affixes: Indicator feature for top affixes

Features φ(w,z)

11

Orthographic


• Affix Correlation: pairs of affixes sharing set of stems (inter-, re-), (under-, over-)

Features φ(w,z)

11

Orthographic



• Word freq. of parent

Features φ(w,z)

11

Orthographic




• Transformation types with character bigrams

Features φ(w,z)

11

Orthographic





Semantic

• Cosine similarity between word vectors of word and parent

Features φ(w,z)

11

Orthographic





Semantic

• Cosine similarity between word vectors of word and parent

Cosine similarity with player

Learning

12

Learning

• Objective:

12

Learning

• Objective:

12

Y

w

P (w) =Y

w

X

z

P (w, z) =Y

w

X

z

e✓·�(w,z)

Pw02⌃⇤,z0 e✓·�(w

0,z0)

Learning

• Optimize likelihood using convex optimization: LBFGS-B (with regularization)

• Objective:

12

Y

w

P (w) =Y

w

X

z

P (w, z) =Y

w

X

z

e✓·�(w,z)

Pw02⌃⇤,z0 e✓·�(w

0,z0)

Learning

• Optimize likelihood using convex optimization: LBFGS-B (with regularization)

• Not tractable - requires summing over all possible strings in alphabet to calculate normalization constant, Z.

• Objective:

12

Y

w

P (w) =Y

w

X

z

P (w, z) =Y

w

X

z

e✓·�(w,z)

Pw02⌃⇤,z0 e✓·�(w

0,z0)

Contrastive Estimation

13


• Instead, we use Contrastive Estimation (Smith and Eisner, 2005):

13



• Neighborhood of invalid words for each word to take probability mass from.

13




• Transpose pair of adjacent chars from first and last k chars in word. (Ex. painting → paintnig)

13




• Transpose pair of adjacent chars from first and last k chars in word. (Ex. painting → paintnig)

P (w, z) = e✓·�(w,z)P

w02N(w),z0 e✓·�(w0,z0)

13

Prediction• Predict chain in recursive fashion (argmax parent

candidate each time) till stop.

paintings

painting

paint

STOP

14

Segmentation Experiments • Three languages - English, Arabic, Turkish

• Evaluation: Morphological segmentation - Precision, Recall, F1 over individual segmentation points

• Baselines: Morfessor-Baseline, Morfessor CatMAP, AGMorph (Sirts and Goldwater, 2013) and Lee et al. (2011)

15

F1 scores on MorphoChallenge

0

22.5

45

67.5

90

English Turkish Arabic

Morfessor-B Morfessor-C AGMorph Lee (M2)MorphoChain

16

Effect of data size

17

Affix Analysis

18

Error analysis• Errors (on a random subset of 50 words per language):

Language Over-segment Under-segment

English 10% 86%

Turkish 12% 78%

Arabic 60% 40%

• Most errors (58%) in Turkish due to parent words not present or having low count.

• Root template morphology of Arabic causes 14% of errors.

19

Sample segmentations

20

Morphology in Keyword Spotting

Morphology in Keyword Spotting• Keyword Spotting is the task of identifying keywords in speech

utterances • Major issue: Out of vocabulary words

Morphology in Keyword SpottingA

TWV

0

0.05

0.1

0.15

0.2

KWS-Test

Supervised Unsupervised (Morfessor)

KWS Results on OOV keywords in Turkish LLP (Narasimhan et al., 2014)

• Keyword Spotting is the task of identifying keywords in speech utterances

• Major issue: Out of vocabulary words

Morphology in Keyword SpottingA

TWV

0

0.05

0.1

0.15

0.2

KWS-Test

Supervised Unsupervised (Morfessor)

KWS Results on OOV keywords in Turkish LLP (Narasimhan et al., 2014)

• Adding morphemes helps KWS. • Better morphology can lead to better KWS (supervised vs.

unsupervised) • Need for better unsupervised segmentation.

• Keyword Spotting is the task of identifying keywords in speech utterances

• Major issue: Out of vocabulary words

• MorphoChain outperforms state-of-the-art unsupervised morphological system on KWS

ATW

V

20.2

20.35

20.5

20.65

20.8

w/o web data

Morfessor MorphoChain

Morfessor vs MorphoChain for KWS

ATW

V

23

23.75

24.5

25.25

26

w/ web data

ATWV scores on Bengali VLLP

*in collaboration with Damianos Karakos and Rich Schwartz at BBN

Conclusions• A new method for unsupervised morphological

analysis incorporating both orthographic and semantic features.

• Equals or outperforms state-of-the-art systems on morphological segmentation.

• Works well on downstream tasks.

Code: http://people.csail.mit.edu/karthikn/morphochain/

23

http://people.csail.mit.edu/karthikn/morphochain/

an unsupervised method for uncovering morphological...

Documents