variational inference for adaptor...
TRANSCRIPT
Variational Inference forAdaptor Gramars
Shay CohenSchool of Computer ScienceCarnegie Mellon University
David BleiComputer Science DepartmentPrinceton University
Noah SmithSchool of Computer ScienceCarnegie Mellon University
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 1/32
OutlineThe lifecycle of unsupervised learning:
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 2/32
Outline
We give a new representation to an existing model (adaptorgrammars)
This representation leads to a new variational inferencealgorithm for adaptor grammars
We do a sanity check on word segmentation, comparing tostate-of-the-art results
Our inference algorithm permits to do dependencyunsupervised parsing with adaptor grammars
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 3/32
Problem 1 - PP Attachment
I saw the boy with the telescope I saw the boy with the telescopeS
VVVVVVVVVVVVV
hhhhhhhhhhhhh
NP VP
eeeeeeeeeeeeeeeeeee
qqqqqqqMMMMMMM
N V NP
qqqqqqqMMMMMMM PP
qqqqqqqMMMMMMM
I saw the boy with thetelescope
S
VVVVVVVVVVVVV
hhhhhhhhhhhhh
NP VP
eeeeeeeeeeeeeeeeeee
MMMMMMM
N V NP
hhhhhhhhhhhhhMMMMMMM
NP
qqqqqqqMMMMMMM PP
qqqqqqqMMMMMMM
I saw the boy with thetelescope
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 4/32
Problem 2 - Word Segmentation
Matthewslikeswordfighting MatthewslikeswordfightingMatthews like sword fighting Matthews likes word fighting
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 5/32
What is missing?Context could resolve this ambiguity
But we want unsupervised learning...
Where do we get the context?
. . . . . .
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 6/32
Problem 1 - PP Attachment(S (NP The boy with the telescope) (V entered) (NP the park))I saw the boy with the telescope I saw the boy with the telescope
S
VVVVVVVVVVVVV
hhhhhhhhhhhhh
NP VP
eeeeeeeeeeeeeeeeeee
qqqqqqqMMMMMMM
N V NP
qqqqqqqMMMMMMM PP
qqqqqqqMMMMMMM
I saw the boy with thetelescope
S
VVVVVVVVVVVVV
hhhhhhhhhhhhh
NP VP
eeeeeeeeeeeeeeeeeee
MMMMMMM
N V NP
hhhhhhhhhhhhhMMMMMMM
NP
qqqqqqqMMMMMMM PP
qqqqqqqMMMMMMM
I saw the boy with thetelescope
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 7/32
Problem 2 - Word Segmentation
Word fighting is the new hobby of computational linguists.Mr. Matthews is a computational linguist.
Matthewslikeswordfighting MatthewslikeswordfightingMatthews like sword fighting Matthews likes word fighting
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 8/32
Dreaming Up Patterns
Context helps. Where do we get it? Adaptor grammars(Johnson et al. 2006)
Define a distribution over trees
New samples depend on the history - “rich get richer”dynamics
Dream up “patterns” as we go along
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 9/32
Adaptor Grammars
Use the Pitman-Yor process with PCFGs as base distribution
To make it fully Bayesian, we also have a Dirichlet prior over thePCFG rules
Originally represented using the Chinese restaurant process(CRP)
CRP is convenient for sampling – not for variational inference
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 10/32
Variational Inference in a Nutshell
“Posterior inference” requires that we find parse trees z1, ..., zngiven raw sentences x1, ..., xn
Mean-field approximation: take all hidden variables: z1, ..., znand parameters θ.Find a posterior of the form q(z1, ..., zn, θ) = q(θ)
∏ni=1 q(zi)
(makes inference tractable)
Makes independence assumptions in the posterior
That’s all! Almost. We need a manageable representation forz1, ..., zn and θ
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 11/32
Sampling vs. Variational Inference
MCMC sampling variational inferenceconvergence guaranteed local maximumspeed slow fastalgorithm randomized objective optimizationparallelization non-trivial easy
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 12/32
Stick Breaking Representation
Sticks are sampled from the GEM distributionEverything which is a number in this slide, belongs to θ
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 13/32
Stick Breaking Representation
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 14/32
Stick Breaking Representation
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 15/32
Stick Breaking Representation
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 16/32
Stick Breaking Representation
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 17/32
Stick Breaking Representation
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 18/32
Truncated Stick Approximation
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 19/32
Sanity Check - Word Segmentation
Task is to segment a sequence of phonemes into wordsExample: yuwanttulUk&tDIs→ yu want tu lUk &t DIs
Models language acquisition in children (using the corpus fromBrent and Cartwright, 1996)
The corpus includes 9,790 utterances
Has been used before with adaptor grammars with threegrammars
Baseline: Sampling method from Johnson and Goldwater, 2009
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 20/32
Word Segmentation - GrammarsUnigram Grammar Sentence
MMMMMMM
qqqqqqq
Word
qqqqqqqMMMMMMM Word
qqqqqqqMMMMMMM
yu want
Sentence→Word+
Word→ Char+
“Word” is adapted (hence, if something was a Wordconstituent previously, it is more likely to appear again)There are additional grammars: collocation grammar andsyllable grammar (take into account more informationabout language)Words are segmented according to “Word” constituentsAll grammars are not recursiveUsed in Johnson and Goldwater (2009)
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 21/32
Word Segmentation - Results
grammar our paper J&G 2009GUnigram 0.84 0.81GColloc 0.86 0.86GSyllable 0.83 0.89
J&G 2009 - Johnson and Goldwater (2009) – best result
Scores reported are F1 measure
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 22/32
Variants
Model:Pitman-Yor Process vs. Dirichlet Process (did not have mucheffect)
Inference:Fixed Stick vs. Dynamic Stick Expansion (fixed stick is better)
Decoding:Minimum Bayes Risk vs. Viterbi (MBR does better)
See paper for details!
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 23/32
Running Time
Running time (clock time) of the sampler and variationalinference is approximately the same (note that implementationsare different)
However, variational inference can be parallelized
Reduction in clock time by factor of 2.8 when parallelizing on 20weaker CPUs
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 24/32
Syntax and Power Law
0 2 4 6 8 10 12
−14
−12
−10
−8
−6
−4
−2
log Rank
log
Fre
quen
cy
EnglishChinesePortugueseTurkish
Motivating adaptorgrammars for unsu-pervised parsing, aplot of log rank ofconstituents vs. theirlog frequency
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 25/32
Recursive Grammars
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 26/32
Recursive Grammars - Solution
Our finite approximation of the stick zeros all “bad” events in thevariational distribution
Equivalent to inference when assuming the model is:
p′(x , z) =p(x , z)I(x , z /∈ bad)∑
(x ,z)/∈bad
p(x , z)
where p is the original adaptor grammar model that givesnon-zero probability to bad events and I is an 0/1 indicator
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 27/32
Unsupervised Parsing Setting
Experiments on the English Penn Treebank
Stripped off punctuation and kept only part-of-speech tags
Used adaptor grammars with dependency model with valence(Klein and Manning, 2004)
DMV has a PCFG representation (Smith, 2006)
We “adapt” the nonterminals that head noun constituents
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 28/32
Unsupervised Parsing - Results
model Viterbi MBRnon-Bayesian 45.8 46.1Dirichlet prior 45.9 46.1with adaptor grammars 48.3 50.2
Results are attachment-accuracy - fraction of parents correctlyidentified
A gain over vanilla Dirichlet, which is the prior used withadaptor grammars on the PCFG rules
Other priors (instead of Dirichlet prior) give performance ∼60.Can use it with adaptor grammars - future work
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 29/32
Summary
We described a variational inference algorithm for adaptorgrammars
We showed it can lead to improvement in performance forvarious grammars
We showed it can be faster than sampling when parallelizationis used
We applied adaptor grammars to dependency unsupervisedparsing
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 30/32
Thanks! Questions?
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 31/32
Sampling vs. Variational Inference
0 50 100 150 200
2500
0030
0000
3500
0040
0000
4500
0050
0000
time step
−lo
g bo
und/
likel
ihoo
d
Shay Cohen, David Blei, Noah Smith Variational Inference for Adaptor Grammars 32/32