1
Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing
Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea
Z honghua liMentor: Jun Lang
2011-10-21 I2R SMT-Reading Group
2
Paper info
• Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing• ACL-08 Long Paper Cited :Thirty Seven• Authors: Hao Z hang Chris Quirk Robert C. Moore Daniel Gildea
3
Core Ideas
• Variational Bayes• Tic-tac-toe pruning• Word-to-phrase bootstrapping
4
Outline
• Paper present– Pipeline–Model– Training– Parsing (Pruning)– Result
• Shortcomings • Discussion
5
Summary of the Pipeline
• Run IBM Model 1 on sentence-aligned data• Use tic-tac-toe pruning to prune the bitext space
• Word-based ITG , Variational Bayes training , get the Viterbi alignment
• Non-compositional constraints to constrain the space of phrase pairs
• Phrasal ITG , VB training, Viterbi pass to get the phrasal alignment
6
Phrasal Inversion Transduction Grammar
7
Dirichlet Prior for Phrasal ITG
8
X1 Xn-1 Zn Xn+1 XN…….. ……..
root
0/0 T/Vt/vs/u
i
Review : Inside-Outside Algorithm
…….. …….. ……..
Forward-backward Algorithm: not only used for HMM, but also for any State Space Model
Inside-Outside Algorithm is a special case of Forward-backward Algorithm.
Shujie liu
9
VB Algorithm for Training SITGs - E1
• Inside probabilities :
Initialization :
Recursion :
i(s/u-t/v)
t/vs/u S/U
j(s/u-S/U) k (S/U-t/v)
Copy from liu
10
VB Algorithm for Training SITGs - E2
• Outside probabilities :
Initialization :
Recursion :
j(s/u-t/v)
t/vS/U s/u
k(S/U-s/u) i (s/u-t/v)
Copy from liu
11
VB Algorithm for Training SITGs - E2
• Outside probabilities :
Initialization :
Recursion :
j(s/u-t/v)
t/vS/U s/u
k(S/U-s/u) i (s/u-t/v)
Copy from liu
12
VB Algorithm for Training SITGs - E2
• Outside probabilities :
Initialization :
Recursion :
j(s/u-t/v)
t/vS/U s/u
k(S/U-s/u) i (s/u-t/v)
Copy from liu
13
VB Algorithm for Training SITGs - E2
• Outside probabilities :
Initialization :
Recursion :
j(s/u-t/v)
t/vS/U s/u
k(S/U-s/u) i (s/u-t/v)
Copy from liu
14
VB Algorithm for Training SITGs - E2
• Outside probabilities :
Initialization :
Recursion :
j(s/u-t/v)
t/vS/U s/u
k(S/U-s/u) i (s/u-t/v)
j(s/u-t/v)
S/Us/u
i(S/U-s/u) k (s/u-t/v)
t/v
Copy from liu
15
VB Algorithm for Training SITGs - E2
• Outside probabilities :
Initialization :
Recursion :
j(s/u-t/v)
t/vS/U s/u
k(S/U-s/u) i (s/u-t/v)
j(s/u-t/v)
S/Us/u
i(S/U-s/u) k (s/u-t/v)
t/v
Copy from liu
16
VB Algorithm for Training SITGs - M
• s=3 , is the number of right-hand-sides for X• m is the number of observed phrase pairs • ψ is the digamma function
•
17
Pruning
• Tic-tac-toe pruning (Hao Z hang 2005)• Fast Tic-tac-toe pruning (Hao Z hang 2008)• High-precision alignments pruning (Haghighi
ACL2009)– Prune all bitext cells that would invalidate more than 8 of high-precision alignments
• 1-1 alignment posterior pruning (Haghighi ACL2009)– Prune all 1-1 bitext cells that have a posterior below 10-4 in both HMM Models
18
Tic-tac-toe pruning (Hao Z hang 2005)
19
Non-compositional Phrases Constraint
e(i,j) number of links emitted from substring f(l,m) number of links emitted from substring
20
Word Alignment Evaluation
Both 10 iterations trainingEM : lowest AER is achieved after the second iteration , which is 0.40 . At iteration 10, AER for EM increase to 0.42VB : ac is 1e-9 , VB get AER close to 0.35 at iteration 10.
21
End-to-end Evaluation
NIST Chinese-English training dataNIST 2002 evaluation datasets for tuning and evalution10-reference development set was used for MERT4-reference test set was used for evaluation.
22
Shortcomings
• Grammar is not perfect• Itg ordering is context independent• Phrasal pairs are sparse
23
Grammar is not perfect
• Over-counting problem• alternative ITG parse trees have the same word alignment matching, which is called over-counting problem.
ITG Parser Tree Space Word Alignment Space
I am rich !
^^
vv
24
A better-constrained grammar• A series of nested constituents with the same orientation
will always have a left-heavy derivation• And the second parser tree of the former example will not
be generated.
C->1/3 C->2/4 C-> 3/2 C-> 4/1
A -> [C C] B -> <C C>
B -> <A B> ?
25
Thanks Q&A