penalized ep for graphical models over strings ryan cotterell and jason eisner

43
Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Upload: emmeline-snow

Post on 19-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Penalized EP for Graphical Models Over Strings

Ryan Cotterell and Jason Eisner

Page 2: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Natural Language is Built from Words

Page 3: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Can store info about each word in a table

Index

Spelling

Meaning Pronunciation

Syntax

123 ca [si.ei] NNP (abbrev)

124 can [kɛɪn] NN

125 can [kæn], [kɛn], …

MD

126 cane [keɪn] NN (mass)

127 cane [keɪn] NN

128 canes [keɪnz] NNS

Page 4: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Problem: Too Many Words!

• Technically speaking, # words = • Really the set of (possible) words is ∑*

• Names• Neologisms• Typos• Productive processes: – friend friendless friendlessness

friendlessnessless …– hand+bag handbag (sometimes can iterate)

Page 5: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Solution: Don’t model every cell separately

NoblegasesPositive

ions

Page 6: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Can store info about each word in a table

Index

Spelling

Meaning Pronunciation

Syntax

123 ca [si.ei] NNP (abbrev)

124 can [kɛɪn] NN

125 can [kæn], [kɛn], …

MD

126 cane [keɪn] NN (mass)

127 cane [keɪn] NN

128 canes [keɪnz] NNS

Page 7: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Can store info about each word in a table

Index

Spelling

Meaning Pronunciation

Syntax

123 ca [si.ei] NNP (abbrev)

124 can [kɛɪn] NN

125 can [kæn], [kɛn], …

MD

126 cane [keɪn] NN (mass)

127 cane [keɪn] NN

128 canes [keɪnz] NNS

Ultimate goal: Probabilistically reconstruct all missing entries of this infinite multilingual table, given some entries and some text.

Approach: Linguistics + generative modeling + statistical inference.

Modeling ingredients: Finite-state machines + graphical models.

Inference ingredients: Expectation Propagation (this talk).

Page 8: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Can store info about each word in a table

Index

Spelling

Meaning Pronunciation

Syntax

123 ca [si.ei] NNP (abbrev)

124 can [kɛɪn] NN

125 can [kæn], [kɛn], …

MD

126 cane [keɪn] NN (mass)

127 cane [keɪn] NN

128 canes [keɪnz] NNS

Ultimate goal: Probabilistically reconstruct all missing entries of this infinite multilingual table, given some entries and some text.

Approach: Linguistics + generative modeling + statistical inference.

Modeling ingredients: Finite-state machines + graphical models.

Inference ingredients: Expectation Propagation (this talk).

Page 9: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Predicting Pronunciations of Novel Words (Morpho-Phonology)

d æmnˌe nˈ ɪʃə riz ajnzˈ

r z gnˌɛ ɪe nˈ ɪʃə

dæmnz rizajgnz rizajgne nɪʃədæmne nɪʃə

????

e nɪʃə z rizajgndæmn

damns damnation resigns resignation

How do you pronounce this word?

Page 10: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Predicting Pronunciations of Novel Words (Morpho-Phonology)

d æmnˌe nˈ ɪʃə riz ajnzˈ

r z gnˌɛ ɪe nˈ ɪʃə

dæmnz rizajgnz rizajgne nɪʃədæmne nɪʃə

d æmzˌ

e nɪʃə z rizajgndæmn

damns damnation resigns resignation

How do you pronounce this word?

Page 11: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Graphical Models over Strings

• Use Graphical Model Framework to model many strings jointly!

11

ψ1

X2

X1ring 1rang 2rung 2

ring 10.2rang 13rung 16

ring

rang

rung

ring 2 4 0.1

rang 7 1 2

rung 8 1 3

ψ1

X2

X1

aardvark

0.1

… …

rang 3

ring 4

rung 5

… …

aardvark

…rang

ring

rung

aardvark

0.1 0.2 0.1 0.1

rang 0.1 2 4 0.1

ring 0.1 7 1 2

rung 0.2 8 1 3

ψ1

X2

X1r i n g

ue ε ee

s e ha

s i n gr a n g

uaeε εa

rs

au

r i n gue ε

s e ha

Page 12: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Zooming in on a WFSA

• Compactly represents an (unnormalized) probability distribution over all strings in

• Marginal belief: How do we pronounce damns?

• Possibilities: /damz/, /dams/, /damnIz/, etc..

d/1 a/1 m/1z/.5

s/.25

n/.25

z/1

I/1 z/1

Page 13: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Log-Linear Approximation

• Given a WFSA distribution p, find a log-linear approximation q– min KL(p || q) “inclusive KL divergence”– q corresponds to a smaller/tidier WFSA

• Two Approaches:– Gradient-Based Optimization (Discussed Here)– Closed Form Optimization

Page 14: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

fo = 3

bar = 2

az = 4

foo = 1foo 1.2

bar 0.5

baz 4.3

Fit model that predicts same counts

Broadcast n-gram counts

ML Estimation = Moment Matching

Page 15: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

FSA Approx. = Moment Matching

r i n g

ue ε ee

s e ha

r i n gue ε ee

s e ha

Compute with forward-backward!

xx = 0.1

zz= 0.1

fo = 3

bar = 2

az = 4

foo = 1foo 1.2

bar 0.5

baz 4.3

Fit model that predicts same counts

Page 16: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Gradient-Based Minimization

• Objective: • Gradient with respect to

• Difference between two expectations of feature counts, which are determined by the weighted DFA q

• Features are just n-gram counts!

Arc weights are determined by a parameter vector - just like a log-linear model

Page 17: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Does q need a lot of features?

• Game: what order of n-grams do we need to put probability 1 on a string?

• Word 1: noon– Bigram model? No - Trigram model

• Word 2: papa– Trigram model? No - 4-gram model - very big!

• Word 3: abracadabra– 6-gram model – way too big!

Page 18: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Variable Order Approximations

• Intuition: In NLP marginals are often peaked

– Probability mass mostly on a few similar strings!

• q should reward a few long n-grams– also need short n-gram features for backoff

abra 5.0

^a 5.0

b 4.3

^abrab 5.0

abraca 5.0

zzzzzz -500

6-gram table. Too Big!

Variable order table. Very Small!

Page 19: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Variable Order Approximations• Moral: Use only the n-grams you really need!

Page 20: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Belief Propagation (BP) in a Nutshell

X1

X2

X3

X4

X6

X5

Page 21: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Belief Propagation (BP) in a Nutshell

X1

X2

X3

X4

X6

X5

d/1 a/1 m/1z/.5

s/.25

n/.25

z/1

I/1 z/1

Page 22: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Belief Propagation (BP) in a Nutshell

X1

X2

X3

X4

X6

X5

Page 23: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Computing Marginal Beliefs

X1

X2

X3

X4

X7

X5

Page 24: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Computing Marginal Beliefs

X1

X2

X3

X4

X7

X5

Page 25: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Belief Propagation (BP) in a Nutshell

X1

X2

X3

X4

X6

X5

r i n gue ε ee

s e ha

r i n gue ε

s e ha

r i n gue ε ee

s e ha

r i n gue ε

s e ha

Page 26: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Computing Marginal Beliefs

X1

X2

X3

X4

X7

X5

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e ha

Page 27: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Computing Marginal Beliefs

X1

X2

X3

X4

X7

X5

C

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e ha r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e har i n g

ue ε

s e har i n gue ε

s e haComputation of belief results in large state space

Page 28: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Computing Marginal Beliefs

X1

X2

X3

X4

X7

X5

C

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e ha r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e har i n g

ue ε

s e har i n gue ε

s e haComputation of belief results in large state space

What a hairball!

Page 29: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Computing Marginal Beliefs

X1

X2

X3

X4

X7

X5

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e haApproximation Required!!!

Page 30: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

BP over String-Valued Variables

• In fact, with a cyclic factor graph,messages and marginal beliefs grow unboundedly complex!

X2

X1

ψ2

a

a

εa

aa

a

ψ1

aa

Page 31: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

BP over String-Valued Variables

• In fact, with a cyclic factor graph,messages and marginal beliefs grow unboundedly complex!

X2

X1

ψ2

a

a

εa

aa

a

ψ1

aa

a

Page 32: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

BP over String-Valued Variables

• In fact, with a cyclic factor graph,messages and marginal beliefs grow unboundedly complex!

X2

X1

ψ2

a

a

εa

aa

a

ψ1

aa

a

a a

Page 33: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

BP over String-Valued Variables

• In fact, with a cyclic factor graph,messages and marginal beliefs grow unboundedly complex!

X2

X1

ψ2

a

a

εa

aa

a

ψ1

aa

a a

a a

Page 34: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

BP over String-Valued Variables

• In fact, with a cyclic factor graph,messages and marginal beliefs grow unboundedly complex!

X2

X1

ψ2

a

a

εa

aa

a

ψ1

aa

a a

a a a a a a a a a a a a

a a a a a a a a a

Page 35: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Expectation Propagation (EP) in a Nutshell

X1

X2

X3

X4

X7

X5

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e ha

Page 36: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Expectation Propagation (EP) in a Nutshell

X1

X2

X3

X4

X7

X5

foo 1.2bar 0.5baz 4.3

r i n gue ε

s e ha

r i n gue ε

s e ha

r i n gue ε

s e ha

Page 37: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Expectation Propagation (EP) in a Nutshell

X1

X2

X3

X4

X7

X5

foo 1.2bar 0.5baz 4.3

foo 1.2bar 0.5baz 4.3

r i n gue ε

s e ha

r i n gue ε

s e ha

Page 38: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Expectation Propagation (EP) in a Nutshell

X1

X2

X3

X4

X7

X5

foo 1.2bar 0.5baz 4.3

foo 1.2bar 0.5baz 4.3

foo 1.2bar 0.5baz 4.3

r i n gue ε

s e ha

Page 39: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Expectation Propagation (EP) in a Nutshell

X1

X2

X3

X4

X7

X5

foo 1.2bar 0.5baz 4.3

foo 1.2bar 0.5baz 4.3

foo 1.2bar 0.5baz 4.3

foo 1.2bar 0.5baz 4.3

Page 40: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

EP In a Nutshell

X1

X2

X3

X4

X7

X5

foo 1.2bar 0.5baz 4.3

foo 1.2bar 0.5baz 4.3

foo 1.2bar 0.5baz 4.3

foo 1.2bar 0.5baz 4.3

foo 4.8

bar 2.0

baz 17.2

Approximate belief is now a table of n-grams.

The point-wise product is now super easy!

Page 41: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

KL( || )

How to approximate a message?

foo 1.2bar 0.5baz 4.3

foo 0.2bar 1.1baz -0.3

foo 1.2bar 0.5baz 4.3

foobarbazi n g

u ε

s e ha

Minimize with respect to the parameters θ

r i n gue ε

s e ha

θ

foo 0.2bar 1.1baz -0.3

foobarbaz

i n gu ε

s e ha= i n g

u ε

s e ha=

Page 42: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Results• Question 1: Does EP work in

general (comparison to baseline)?

• Question 2: Do variable order approximations improve over fixed n-grams?

• Unigram EP (Green) – fast, but inaccurate

• Bigram EP (Blue) – also fast and inaccurate

• Trigram EP (Cyan) – slow and accurate

• Penalized EP (Red) – fast and accurate

• Baseline (Black) – accurate and slow (pruning based)

Page 43: Penalized EP for Graphical Models Over Strings Ryan Cotterell and Jason Eisner

Fin

Thanks for you attention!

For more information on structured models and belief propagation, see the Structured Belief Propagation Tutorial at ACL 2015 by Matt Gormley and Jason Eisner.