phrase extraction in pb-smt

Phrase Phrase Extraction in PB-Extraction in PB-

SMTSMTAnkit K Srivastava

NCLT/CNGL Presentation: May 6, 2009

Phrase Extraction | Ankit | 6-May-09

2

About

Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated

dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote


3

PB-SMT ModelingPB-SMT Modeling




4

PB-SMT

Process sequence of words as opposed to mere words

Segment input, translate input, reorder output

Translation model, Language Model, Decoder

argmaxe p(e|f) = argmaxe p(f|e) p(e)


5

Learning Learning Phrase TranslationsPhrase Translations




6

Extraction I

Input is sentence-aligned parallel corpora Most approaches use word alignments Extract (learn) phrase pairs Build a phrase translation table


7

Extraction II

Get word alignments (src2tgt, tgt2src)

Perform grow-diag-final heuristics Extract phrase pairs consistent with

the word alignments Non-syntactic phrases :: STR

[Koehn et al., ’03]


8

Extraction III

Sentence-aligned and word-aligned text

Monolingual parsing of both SRC & TGT

Align subtrees and extract string pairs

Syntactic phrases


9

Extraction IV

Parse using constituency parser Phrases are syntactic constituents :: CON

[Tinsley et al., ’07]

(ROOT (S (NP (NNP Vinken)) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP (NNP Nov) (CD 29))))))


10

Extraction V

Parse using dependency parser Phrases have head-dependent relationships :: DEP

[Hearne et al., ’08]

HEAD DEPENDENTjoin Vinkenjoin willboard thejoin boardjoin asdirector adirector nonexecutiveas director29 Novjoin 29


11

Extraction VI

Numerous other phrase extractions Estimate phrase translations directly

[Marcu & Wong ’02]

Use heuristic other than grow-diag-final Use marker-based chunks [Groves & Way

’05]

String-to-String translation models herein


12

Head Percolation Head Percolation and and

Phrase ExtractionPhrase Extraction




13

Percolation I It is straightforward to convert constituency tree to

an unlabeled dependency tree [Gaifman ’65]

Use head percolation tables to identify head child in a constituency representation[Magerman ’95]

Dependency tree is obtained by recursively applying head child and non-head child heuristics [Xia & Palmer ’01]


14

Percolation II

(NP (DT the) (NN board))

NP right NN/NNP/CD/JJ

(NP-board (DT the) (NN board))

the is dependent on board


15

Percolation III(ROOT (S (NP (NNP Vinken)) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP (NNP Nov) (CD 29))))))

HEAD DEPENDENTjoin Vinkenjoin willboard thejoin boardjoin asdirector adirector nonexecutiveas director29 Novjoin 29

NP right NN / NNP / CD / JJPP left IN / PPS right VP / SVP left VB / VP

INPUT OUTPUT


16

Percolation IV

cf. slide - Extraction III (syntactic phrases) Parse by applying head percolation tables

on constituency-annotated trees Align trees, extract surface chunks Phrases have head-dependent relations :: PERC


17

Tools, Resources, Tools, Resources, and MT System and MT System

PerformancePerformance




18

System setup IRESOURCE

TYPENAME DETAILS

Corpora JOCEUROPARL

Chiao et al., ‘06Koehn, ’05

Parsers Berkeley ParserSyntex ParserHead Percolation

Petrov et al., ’06Bourigault et al.,’05Xia & Palmer ‘01

Alignment Tools GIZA++Phrase HeuristicsTree Aligner

Och & Ney ’03Koehn et al., ‘03Zhechev ‘09

Lang Modeling SRILM Toolkit Stolcke ‘02

Decoder Moses Koehn et al., ‘07

Evaluation Scripts BLEU NIST METEOR, WER, PER

Papineni et al., ’02, Doddington ’02, Banerjee & Lavie ‘05


19

System setup II

All 4 “systems” are run with the same configurations (with MERT tuning) on 2 different datasets

They only differ in their phrase tables (# chunks)

CORPORA TRAIN DEV TEST

JOC 7,723 400 599

EUROPARL 100,000 1,889 2,000

CORPORA STR CON DEP PERC

JOC 236 K 79 K 74 K 72 K

EUROPARL 2145 K 663 K 583K 565 K


20

System setup III

SYSTEM

BLEU NIST METEOR

WER PER

On JOC (7K) data

31.29 6.31 63.91 61.09 47.34

30.64 6.34 63.82 60.72 45.99

30.75 6.31 64.12 61.34 46.77

29.19 6.09 62.12 62.69 48.21

On EUROPARL (100K) data

STR 28.50 7.00 57.83 57.43 44.11

CON 25.64 6.55 55.26 60.77 46.82

DEP 25.24 6.59 54.65 60.73 46.51

PERC 25.87 6.59 55.63 60.76 46.48


21

Analyzing Str, Con, Analyzing Str, Con, Dep, and PercDep, and Perc



Analysis w.r.t. Europarl data only


22

Analysis I No. of common & unique phrase pairs Maybe we should combine the phrase tables…

Phrase Types

Common to both

Unique in 1st type

Unique in 2nd type

DEP & PERC 369K 213K 195K

CON & PERC 492K 171K 72K

STR & PERC 127K 2,018K 437K

CON & DEP 391K 271K 191K

STR & DEP 128K 2,016K 454K

STR & CON 144K 2,000K 518K


23

Analysis II Concatenate phrase tables and re-estimate

probabilities 15 different phrase table combinations: ∑4Cr , 1≤r≤4

STR + CON + DEP + PERC

UNI BI TRI QUAD

S SC, SD, SP SCD, SCP, SDP SCDP

C CD, CP CDP -

D DP - -

P - - -


24

Analysis III All 15 “systems” are run with the same

configurations (with MERT tuning) They only differ in their phrase tables This is combining at “translation model” level


25

Analysis IV Performance on Europarl


26

Analysis V

REF: Does the commission intend to seek more transparency in this area?

S: Will the commission ensure that more than transparency in this respect? C: The commission will the commission ensure greater transparency in this respect? D: The commission will the commission ensure greater transparency in this respect? P: Does the commission intend to ensure greater transparency in this regard?

SC: Will the commission ensure that more transparent in this respect? SD: Will the commission ensure that more transparent in this respect? SP: Does the commission intend to take to ensure that more than openness in this regard? CD: The commission will the commission ensure greater transparency in this respect? CP: The commission will the commission ensure greater transparency in this respect? DP: The commission will the commission ensure greater transparency in this respect? SCD: Does the commission intend to take to ensure that more transparent commit? SCP: Does the commission intend to take in this regard to ensure greater transparency? SDP: Does the commission intend to take in this regard to ensure greater transparency? CDP: The commission will the commission ensure greater transparency in this respect? SCDP: Does the commission intend to take to ensure that more transparent suspected?


27

Analysis VI

Which phrases does the decoder use?

Decoder trace on S+C+D+P

Out of 11,748 phrases: S(5204); C(2441); D(2319); P(2368)


28

Analysis VII

Automatic per-sentence evaluation using TER on testset of 2000 sentences [Snover et al., ’06]

C (1120); P (331); D (301); S (248)

Manual per-sentence evaluation on a random testset of 100 sentences using pairwise system comparison

P=C (27%); P>D (5%); SC>SCP(11%)


29

Analysis VIII Treat the different phrase table combinations as

individual MT systems Perform system combination using MBR-CN

framework [Du et al., 2009] This is combining at “system” level

SYSTEM BLEU NIST METEOR WER PER

STR 29.46 7.11 58.87 56.43 43.03

CON 28.93 6.79 57.34 58.54 44.83

DEP 28.38 6.81 56.59 58.61 44.74

PERC 29.27 6.82 57.72 58.37 44.53

||MBR|| 29.52 6.85 57.84 58.13 44.40

||CN|| 30.70 7.06 58.52 55.87 42.86


30

Analysis IX

Using Moses baseline phrases (STR) is essential for coverage. SIZE matters!

However, adding any system to STR increases baseline score. Symbiotic!

Hence, do not replace STR, but supplement it.


31

Analysis X

CON seems to be the best combination with STR (S+C seems to be the best performing system)

Has most common chunks with PERC

Does PERC harm a CON system – needs more analysis (bias between CON & PERC)


32

Analysis XI

DEP is different from PERC chunks, despite being equivalent in syntactic representation

DEP can be substituted by PERC

Difference between knowledge induced from dependency and constituency. A different aligner?


33

Analysis XII PERC is a unique

knowledge source. Is it just a simple case of parser combination?

Sometimes, it helps.

Needs more work on finding connection with CON / DEP


34

Customizing Moses Customizing Moses for syntax-for syntax-

supplemented supplemented phrase tablesphrase tables




35

Moses customization

Incorporating syntax (CON, DEP, PERC) Reordering model Phrase scoring (new features) Decoder Parameters Log-linear combination of T-tables Good phrase translations may be lost by

the decoder. How can we ensure they remain intact?

MOSES


36

Work in ProgressWork in Progressandand

Future PlansFuture Plans




37

Ongoing & future work Scaling (data size) (lang. pair) (lang. dir.) Bias between CON & PERC Combining Phrase pairs Combining Systems Classify performance into sentence types

Improve quality of phrase pairs in PBSMT


38

Endnote…Endnote…




39

Endnote Explored 3 linguistically motivated phrase

extractions against Moses phrases

Improves baseline. Highest recorded is 10% relative increase in BLEU on 100K

Rather than pursuing ONE way, combine options

Need more analysis of supplementing phrase table with multiple syntactic T-tables


40

Thank You!


41

Phrase Extraction in PB-SMT

Phrase-based Statistical Machine Translation (PB-SMT) models – the most widely researched

paradigm in MT today – rely heavily on the quality of phrase pairs induced from large amounts of training data. There are numerous methods for

extracting these phrase translations from parallel corpora. In this talk I will describe phrase pairs

induced from percolated dependencies and contrast them with three pre-existing phrase extractions. I will also present the performance of the individual phrase tables and their combinations in a PB-SMT

system. I will then conclude with ongoing experiments and future research directions.


42

Thanks!

Andy Way

Patrik Lambert

John TinsleySylwia Ozdowska

Ventisislav Zhechev

Sergio Penkale

Jinhua Du

phrase extraction in pb-smt

Documents

boardphrase extraction

extraction ivparse

extraction vinumerous

extraction iinput

extraction vparse

extraction iiisentence

phrase pairsbuild

use head percolation