![Page 1: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/1.jpg)
1
Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing
Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea
Z honghua liMentor: Jun Lang
2011-10-21 I2R SMT-Reading Group
![Page 2: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/2.jpg)
2
Paper info
• Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing• ACL-08 Long Paper Cited :Thirty Seven• Authors: Hao Z hang Chris Quirk Robert C. Moore Daniel Gildea
![Page 3: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/3.jpg)
3
Core Ideas
• Variational Bayes• Tic-tac-toe pruning• Word-to-phrase bootstrapping
![Page 4: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/4.jpg)
4
Outline
• Paper present– Pipeline–Model– Training– Parsing (Pruning)– Result
• Shortcomings • Discussion
![Page 5: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/5.jpg)
5
Summary of the Pipeline
• Run IBM Model 1 on sentence-aligned data• Use tic-tac-toe pruning to prune the bitext space• Word-based ITG , Variational Bayes training , get the Viterbi alignment
• Non-compositional constraints to constrain the space of phrase pairs
• Phrasal ITG , VB training, Viterbi pass to get the phrasal alignment
![Page 6: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/6.jpg)
6
Phrasal Inversion Transduction Grammar
![Page 7: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/7.jpg)
7
Dirichlet Prior for Phrasal ITG
![Page 8: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/8.jpg)
8
X1 Xn-1 Zn Xn+1 XN…….. ……..
root
0/0 T/Vt/vs/u
i
Review : Inside-Outside Algorithm
…….. …….. ……..
Forward-backward Algorithm: not only used for HMM, but also for any State Space Model
Inside-Outside Algorithm is a special case of Forward-backward Algorithm.
Shujie liu
![Page 9: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/9.jpg)
9
VB Algorithm for Training SITGs - E1• Inside probabilities :
Initialization :
Recursion :
i(s/u-t/v)
t/vs/u S/U
j(s/u-S/U) k (S/U-t/v)
Copy from liu
![Page 10: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/10.jpg)
10
VB Algorithm for Training SITGs - E2• Outside probabilities :
Initialization :
Recursion :
j(s/u-t/v)
t/vS/U s/u
k(S/U-s/u) i (s/u-t/v)
Copy from liu
![Page 11: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/11.jpg)
11
VB Algorithm for Training SITGs - E2• Outside probabilities :
Initialization :
Recursion :
j(s/u-t/v)
t/vS/U s/u
k(S/U-s/u) i (s/u-t/v)
Copy from liu
![Page 12: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/12.jpg)
12
VB Algorithm for Training SITGs - E2• Outside probabilities :
Initialization :
Recursion :
j(s/u-t/v)
t/vS/U s/u
k(S/U-s/u) i (s/u-t/v)
Copy from liu
![Page 13: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/13.jpg)
13
VB Algorithm for Training SITGs - E2• Outside probabilities :
Initialization :
Recursion :
j(s/u-t/v)
t/vS/U s/u
k(S/U-s/u) i (s/u-t/v)
Copy from liu
![Page 14: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/14.jpg)
14
VB Algorithm for Training SITGs - E2• Outside probabilities :
Initialization :
Recursion :
j(s/u-t/v)
t/vS/U s/u
k(S/U-s/u) i (s/u-t/v)
j(s/u-t/v)
S/Us/u
i(S/U-s/u) k (s/u-t/v)
t/v
Copy from liu
![Page 15: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/15.jpg)
15
VB Algorithm for Training SITGs - E2• Outside probabilities :
Initialization :
Recursion :
j(s/u-t/v)
t/vS/U s/u
k(S/U-s/u) i (s/u-t/v)
j(s/u-t/v)
S/Us/u
i(S/U-s/u) k (s/u-t/v)
t/v
Copy from liu
![Page 16: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/16.jpg)
16
VB Algorithm for Training SITGs - M
• s=3 , is the number of right-hand-sides for X• m is the number of observed phrase pairs • ψ is the digamma function
•
![Page 17: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/17.jpg)
17
Pruning
• Tic-tac-toe pruning (Hao Z hang 2005)• Fast Tic-tac-toe pruning (Hao Z hang 2008)• High-precision alignments pruning (Haghighi
ACL2009)– Prune all bitext cells that would invalidate more than 8 of high-precision alignments
• 1-1 alignment posterior pruning (Haghighi ACL2009)– Prune all 1-1 bitext cells that have a posterior below 10-4 in both HMM Models
![Page 18: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/18.jpg)
18
Tic-tac-toe pruning (Hao Z hang 2005)
![Page 19: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/19.jpg)
19
Non-compositional Phrases Constraint
e(i,j) number of links emitted from substring f(l,m) number of links emitted from substring
![Page 20: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/20.jpg)
20
Word Alignment Evaluation
Both 10 iterations trainingEM : lowest AER is achieved after the second iteration , which is 0.40 . At iteration 10, AER for EM increase to 0.42VB : ac is 1e-9 , VB get AER close to 0.35 at iteration 10.
![Page 21: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/21.jpg)
21
End-to-end Evaluation
NIST Chinese-English training dataNIST 2002 evaluation datasets for tuning and evalution10-reference development set was used for MERT4-reference test set was used for evaluation.
![Page 22: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/22.jpg)
22
Shortcomings
• Grammar is not perfect• Itg ordering is context independent• Phrasal pairs are sparse
![Page 23: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/23.jpg)
23
Grammar is not perfect
• Over-counting problem• alternative ITG parse trees have the same word alignment matching, which is called over-counting problem.
ITG Parser Tree Space Word Alignment Space
I am rich !
^^
vv
![Page 24: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/24.jpg)
24
A better-constrained grammar• A series of nested constituents with the same orientation
will always have a left-heavy derivation• And the second parser tree of the former example will not
be generated.
C->1/3 C->2/4 C-> 3/2 C-> 4/1
A -> [C C] B -> <C C>
B -> <A B> ?
![Page 25: Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing](https://reader036.vdocuments.net/reader036/viewer/2022062813/56816530550346895dd7b517/html5/thumbnails/25.jpg)
25
Thanks Q&A