phrase extraction in pb-smt
DESCRIPTION
Phrase Extraction in PB-SMT. Ankit K Srivastava NCLT/CNGL Presentation: May 6, 2009. About. Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated dependencies Experimental setup & evaluation results Other facts & figures - PowerPoint PPT PresentationTRANSCRIPT
Phrase Phrase Extraction in PB-Extraction in PB-
SMTSMTAnkit K Srivastava
NCLT/CNGL Presentation: May 6, 2009
Phrase Extraction | Ankit | 6-May-09
2
About
Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated
dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote
Phrase Extraction | Ankit | 6-May-09
3
PB-SMT ModelingPB-SMT Modeling
Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated
dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote
Phrase Extraction | Ankit | 6-May-09
4
PB-SMT
Process sequence of words as opposed to mere words
Segment input, translate input, reorder output
Translation model, Language Model, Decoder
argmaxe p(e|f) = argmaxe p(f|e) p(e)
Phrase Extraction | Ankit | 6-May-09
5
Learning Learning Phrase TranslationsPhrase Translations
Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated
dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote
Phrase Extraction | Ankit | 6-May-09
6
Extraction I
Input is sentence-aligned parallel corpora Most approaches use word alignments Extract (learn) phrase pairs Build a phrase translation table
Phrase Extraction | Ankit | 6-May-09
7
Extraction II
Get word alignments (src2tgt, tgt2src)
Perform grow-diag-final heuristics Extract phrase pairs consistent with
the word alignments Non-syntactic phrases :: STR
[Koehn et al., ’03]
Phrase Extraction | Ankit | 6-May-09
8
Extraction III
Sentence-aligned and word-aligned text
Monolingual parsing of both SRC & TGT
Align subtrees and extract string pairs
Syntactic phrases
Phrase Extraction | Ankit | 6-May-09
9
Extraction IV
Parse using constituency parser Phrases are syntactic constituents :: CON
[Tinsley et al., ’07]
(ROOT (S (NP (NNP Vinken)) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP (NNP Nov) (CD 29))))))
Phrase Extraction | Ankit | 6-May-09
10
Extraction V
Parse using dependency parser Phrases have head-dependent relationships :: DEP
[Hearne et al., ’08]
HEAD DEPENDENTjoin Vinkenjoin willboard thejoin boardjoin asdirector adirector nonexecutiveas director29 Novjoin 29
Phrase Extraction | Ankit | 6-May-09
11
Extraction VI
Numerous other phrase extractions Estimate phrase translations directly
[Marcu & Wong ’02]
Use heuristic other than grow-diag-final Use marker-based chunks [Groves & Way
’05]
String-to-String translation models herein
Phrase Extraction | Ankit | 6-May-09
12
Head Percolation Head Percolation and and
Phrase ExtractionPhrase Extraction
Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated
dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote
Phrase Extraction | Ankit | 6-May-09
13
Percolation I It is straightforward to convert constituency tree to
an unlabeled dependency tree [Gaifman ’65]
Use head percolation tables to identify head child in a constituency representation[Magerman ’95]
Dependency tree is obtained by recursively applying head child and non-head child heuristics [Xia & Palmer ’01]
Phrase Extraction | Ankit | 6-May-09
14
Percolation II
(NP (DT the) (NN board))
NP right NN/NNP/CD/JJ
(NP-board (DT the) (NN board))
the is dependent on board
Phrase Extraction | Ankit | 6-May-09
15
Percolation III(ROOT (S (NP (NNP Vinken)) (VP (MD will) (VP (VB join) (NP (DT the) (NN board)) (PP (IN as) (NP (DT a) (JJ nonexecutive) (NN director))) (NP (NNP Nov) (CD 29))))))
HEAD DEPENDENTjoin Vinkenjoin willboard thejoin boardjoin asdirector adirector nonexecutiveas director29 Novjoin 29
NP right NN / NNP / CD / JJPP left IN / PPS right VP / SVP left VB / VP
INPUT OUTPUT
Phrase Extraction | Ankit | 6-May-09
16
Percolation IV
cf. slide - Extraction III (syntactic phrases) Parse by applying head percolation tables
on constituency-annotated trees Align trees, extract surface chunks Phrases have head-dependent relations :: PERC
Phrase Extraction | Ankit | 6-May-09
17
Tools, Resources, Tools, Resources, and MT System and MT System
PerformancePerformance
Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated
dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote
Phrase Extraction | Ankit | 6-May-09
18
System setup IRESOURCE
TYPENAME DETAILS
Corpora JOCEUROPARL
Chiao et al., ‘06Koehn, ’05
Parsers Berkeley ParserSyntex ParserHead Percolation
Petrov et al., ’06Bourigault et al.,’05Xia & Palmer ‘01
Alignment Tools GIZA++Phrase HeuristicsTree Aligner
Och & Ney ’03Koehn et al., ‘03Zhechev ‘09
Lang Modeling SRILM Toolkit Stolcke ‘02
Decoder Moses Koehn et al., ‘07
Evaluation Scripts BLEU NIST METEOR, WER, PER
Papineni et al., ’02, Doddington ’02, Banerjee & Lavie ‘05
Phrase Extraction | Ankit | 6-May-09
19
System setup II
All 4 “systems” are run with the same configurations (with MERT tuning) on 2 different datasets
They only differ in their phrase tables (# chunks)
CORPORA TRAIN DEV TEST
JOC 7,723 400 599
EUROPARL 100,000 1,889 2,000
CORPORA STR CON DEP PERC
JOC 236 K 79 K 74 K 72 K
EUROPARL 2145 K 663 K 583K 565 K
Phrase Extraction | Ankit | 6-May-09
20
System setup III
SYSTEM
BLEU NIST METEOR
WER PER
On JOC (7K) data
31.29 6.31 63.91 61.09 47.34
30.64 6.34 63.82 60.72 45.99
30.75 6.31 64.12 61.34 46.77
29.19 6.09 62.12 62.69 48.21
On EUROPARL (100K) data
STR 28.50 7.00 57.83 57.43 44.11
CON 25.64 6.55 55.26 60.77 46.82
DEP 25.24 6.59 54.65 60.73 46.51
PERC 25.87 6.59 55.63 60.76 46.48
Phrase Extraction | Ankit | 6-May-09
21
Analyzing Str, Con, Analyzing Str, Con, Dep, and PercDep, and Perc
Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated
dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote
Analysis w.r.t. Europarl data only
Phrase Extraction | Ankit | 6-May-09
22
Analysis I No. of common & unique phrase pairs Maybe we should combine the phrase tables…
Phrase Types
Common to both
Unique in 1st type
Unique in 2nd type
DEP & PERC 369K 213K 195K
CON & PERC 492K 171K 72K
STR & PERC 127K 2,018K 437K
CON & DEP 391K 271K 191K
STR & DEP 128K 2,016K 454K
STR & CON 144K 2,000K 518K
Phrase Extraction | Ankit | 6-May-09
23
Analysis II Concatenate phrase tables and re-estimate
probabilities 15 different phrase table combinations: ∑4Cr , 1≤r≤4
STR + CON + DEP + PERC
UNI BI TRI QUAD
S SC, SD, SP SCD, SCP, SDP SCDP
C CD, CP CDP -
D DP - -
P - - -
Phrase Extraction | Ankit | 6-May-09
24
Analysis III All 15 “systems” are run with the same
configurations (with MERT tuning) They only differ in their phrase tables This is combining at “translation model” level
Phrase Extraction | Ankit | 6-May-09
26
Analysis V
REF: Does the commission intend to seek more transparency in this area?
S: Will the commission ensure that more than transparency in this respect? C: The commission will the commission ensure greater transparency in this respect? D: The commission will the commission ensure greater transparency in this respect? P: Does the commission intend to ensure greater transparency in this regard?
SC: Will the commission ensure that more transparent in this respect? SD: Will the commission ensure that more transparent in this respect? SP: Does the commission intend to take to ensure that more than openness in this regard? CD: The commission will the commission ensure greater transparency in this respect? CP: The commission will the commission ensure greater transparency in this respect? DP: The commission will the commission ensure greater transparency in this respect? SCD: Does the commission intend to take to ensure that more transparent commit? SCP: Does the commission intend to take in this regard to ensure greater transparency? SDP: Does the commission intend to take in this regard to ensure greater transparency? CDP: The commission will the commission ensure greater transparency in this respect? SCDP: Does the commission intend to take to ensure that more transparent suspected?
Phrase Extraction | Ankit | 6-May-09
27
Analysis VI
Which phrases does the decoder use?
Decoder trace on S+C+D+P
Out of 11,748 phrases: S(5204); C(2441); D(2319); P(2368)
Phrase Extraction | Ankit | 6-May-09
28
Analysis VII
Automatic per-sentence evaluation using TER on testset of 2000 sentences [Snover et al., ’06]
C (1120); P (331); D (301); S (248)
Manual per-sentence evaluation on a random testset of 100 sentences using pairwise system comparison
P=C (27%); P>D (5%); SC>SCP(11%)
Phrase Extraction | Ankit | 6-May-09
29
Analysis VIII Treat the different phrase table combinations as
individual MT systems Perform system combination using MBR-CN
framework [Du et al., 2009] This is combining at “system” level
SYSTEM BLEU NIST METEOR WER PER
STR 29.46 7.11 58.87 56.43 43.03
CON 28.93 6.79 57.34 58.54 44.83
DEP 28.38 6.81 56.59 58.61 44.74
PERC 29.27 6.82 57.72 58.37 44.53
||MBR|| 29.52 6.85 57.84 58.13 44.40
||CN|| 30.70 7.06 58.52 55.87 42.86
Phrase Extraction | Ankit | 6-May-09
30
Analysis IX
Using Moses baseline phrases (STR) is essential for coverage. SIZE matters!
However, adding any system to STR increases baseline score. Symbiotic!
Hence, do not replace STR, but supplement it.
Phrase Extraction | Ankit | 6-May-09
31
Analysis X
CON seems to be the best combination with STR (S+C seems to be the best performing system)
Has most common chunks with PERC
Does PERC harm a CON system – needs more analysis (bias between CON & PERC)
Phrase Extraction | Ankit | 6-May-09
32
Analysis XI
DEP is different from PERC chunks, despite being equivalent in syntactic representation
DEP can be substituted by PERC
Difference between knowledge induced from dependency and constituency. A different aligner?
Phrase Extraction | Ankit | 6-May-09
33
Analysis XII PERC is a unique
knowledge source. Is it just a simple case of parser combination?
Sometimes, it helps.
Needs more work on finding connection with CON / DEP
Phrase Extraction | Ankit | 6-May-09
34
Customizing Moses Customizing Moses for syntax-for syntax-
supplemented supplemented phrase tablesphrase tables
Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated
dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote
Phrase Extraction | Ankit | 6-May-09
35
Moses customization
Incorporating syntax (CON, DEP, PERC) Reordering model Phrase scoring (new features) Decoder Parameters Log-linear combination of T-tables Good phrase translations may be lost by
the decoder. How can we ensure they remain intact?
MOSES
Phrase Extraction | Ankit | 6-May-09
36
Work in ProgressWork in Progressandand
Future PlansFuture Plans
Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated
dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote
Phrase Extraction | Ankit | 6-May-09
37
Ongoing & future work Scaling (data size) (lang. pair) (lang. dir.) Bias between CON & PERC Combining Phrase pairs Combining Systems Classify performance into sentence types
Improve quality of phrase pairs in PBSMT
Phrase Extraction | Ankit | 6-May-09
38
Endnote…Endnote…
Phrase-based statistical machine translation Methods for phrase extraction Phrase induction via percolated
dependencies Experimental setup & evaluation results Other facts & figures Moses customization Ongoing & future work Endnote
Phrase Extraction | Ankit | 6-May-09
39
Endnote Explored 3 linguistically motivated phrase
extractions against Moses phrases
Improves baseline. Highest recorded is 10% relative increase in BLEU on 100K
Rather than pursuing ONE way, combine options
Need more analysis of supplementing phrase table with multiple syntactic T-tables
Phrase Extraction | Ankit | 6-May-09
41
Phrase Extraction in PB-SMT
Phrase-based Statistical Machine Translation (PB-SMT) models – the most widely researched
paradigm in MT today – rely heavily on the quality of phrase pairs induced from large amounts of training data. There are numerous methods for
extracting these phrase translations from parallel corpora. In this talk I will describe phrase pairs
induced from percolated dependencies and contrast them with three pre-existing phrase extractions. I will also present the performance of the individual phrase tables and their combinations in a PB-SMT
system. I will then conclude with ongoing experiments and future research directions.