Download - Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

1

Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea

Z honghua liMentor: Jun Lang

2011-10-21 I2R SMT-Reading Group

2

Paper info

• Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing• ACL-08 Long Paper Cited :Thirty Seven• Authors: Hao Z hang Chris Quirk Robert C. Moore Daniel Gildea

3

Core Ideas

• Variational Bayes• Tic-tac-toe pruning• Word-to-phrase bootstrapping

4

Outline

• Paper present– Pipeline–Model– Training– Parsing (Pruning)– Result

• Shortcomings • Discussion

5

Summary of the Pipeline

• Run IBM Model 1 on sentence-aligned data• Use tic-tac-toe pruning to prune the bitext space• Word-based ITG , Variational Bayes training , get the Viterbi alignment

• Non-compositional constraints to constrain the space of phrase pairs

• Phrasal ITG , VB training, Viterbi pass to get the phrasal alignment

6

Phrasal Inversion Transduction Grammar

7

Dirichlet Prior for Phrasal ITG

8

X1 Xn-1 Zn Xn+1 XN…….. ……..

root

0/0 T/Vt/vs/u

i

Review : Inside-Outside Algorithm

…….. …….. ……..

Forward-backward Algorithm: not only used for HMM, but also for any State Space Model

Inside-Outside Algorithm is a special case of Forward-backward Algorithm.

Shujie liu

9

VB Algorithm for Training SITGs - E1• Inside probabilities :

Initialization :

Recursion :

i(s/u-t/v)

t/vs/u S/U

j(s/u-S/U) k (S/U-t/v)

Copy from liu

10

VB Algorithm for Training SITGs - E2• Outside probabilities :

Initialization :

Recursion :

j(s/u-t/v)

t/vS/U s/u

k(S/U-s/u) i (s/u-t/v)

Copy from liu

11


Initialization :

Recursion :

j(s/u-t/v)

t/vS/U s/u


Copy from liu

12


Initialization :

Recursion :

j(s/u-t/v)

t/vS/U s/u


Copy from liu

13


Initialization :

Recursion :

j(s/u-t/v)

t/vS/U s/u


Copy from liu

14


Initialization :

Recursion :

j(s/u-t/v)

t/vS/U s/u


j(s/u-t/v)

S/Us/u

i(S/U-s/u) k (s/u-t/v)

t/v

Copy from liu

15


Initialization :

Recursion :

j(s/u-t/v)

t/vS/U s/u


j(s/u-t/v)

S/Us/u

i(S/U-s/u) k (s/u-t/v)

t/v

Copy from liu

16

VB Algorithm for Training SITGs - M

• s=3 , is the number of right-hand-sides for X• m is the number of observed phrase pairs • ψ is the digamma function

•

17

Pruning

• Tic-tac-toe pruning (Hao Z hang 2005)• Fast Tic-tac-toe pruning (Hao Z hang 2008)• High-precision alignments pruning (Haghighi

ACL2009)– Prune all bitext cells that would invalidate more than 8 of high-precision alignments

• 1-1 alignment posterior pruning (Haghighi ACL2009)– Prune all 1-1 bitext cells that have a posterior below 10-4 in both HMM Models

18

Tic-tac-toe pruning (Hao Z hang 2005)

19

Non-compositional Phrases Constraint

e(i,j) number of links emitted from substring f(l,m) number of links emitted from substring

20

Word Alignment Evaluation

Both 10 iterations trainingEM : lowest AER is achieved after the second iteration , which is 0.40 . At iteration 10, AER for EM increase to 0.42VB : ac is 1e-9 , VB get AER close to 0.35 at iteration 10.

21

End-to-end Evaluation

NIST Chinese-English training dataNIST 2002 evaluation datasets for tuning and evalution10-reference development set was used for MERT4-reference test set was used for evaluation.

22

Shortcomings

• Grammar is not perfect• Itg ordering is context independent• Phrasal pairs are sparse

23

Grammar is not perfect

• Over-counting problem• alternative ITG parse trees have the same word alignment matching, which is called over-counting problem.

ITG Parser Tree Space Word Alignment Space

I am rich !

^^

vv

24

A better-constrained grammar• A series of nested constituents with the same orientation

will always have a left-heavy derivation• And the second parser tree of the former example will not

be generated.

C->1/3 C->2/4 C-> 3/2 C-> 4/1

A -> [C C] B -> <C C>

B -> <A B> ?

25

Thanks Q&A

Download - Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing

Top Related