modified distortion matrices for phrase-based smt arianna bisazza & marcello federico – fbk...
TRANSCRIPT
![Page 1: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/1.jpg)
Modified Distortion Matrices for Phrase-Based SMT
Arianna Bisazza & Marcello Federico – FBK (Italy)
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9w1
0
<s> 0 1 2 3 4 5 6 7 8 9 10w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 0 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 0w8 9 8 7 6 5 4 3 2 0 0w9 10 9 8 7 6 2 2 3 2 0w10 11 10 9 8 7 6 5 4 3 2
![Page 2: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/2.jpg)
2 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
2
PSMT decoding overview
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
![Page 3: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/3.jpg)
3
Freedom of movement
must be encouraged
LM scores
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
3
PSMT decoding overview
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
LM scores
TM
scores
TM scores
ReoM scores
ReoM scores
![Page 4: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/4.jpg)
4
career paths …
while ensuring that
Freedom of movement
must be encouraged
LM scoresLM scoresLM scores
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
4
PSMT decoding overview
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
LM scores
TM scoresTM
scores TM
scor
esTM
scores
ReoM scores
ReoM scores
ReoM scores
ReoM scores
![Page 5: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/5.jpg)
5
LM scoresLM scoresLM scores
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
5
PSMT decoding overview
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
Freedom of movement must be encouraged while ensuring that career paths …
LM scores
TM scoresTM
scores TM
scor
esTM
scores
ReoM scores
ReoM scores
ReoM scores
ReoM scores
![Page 6: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/6.jpg)
6 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
6
Reordering Models
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
ReoM scores
ReoM scores
ReoM scores
ReoM scores
Many solutions have been proposed
with different reo. classes, features,
train modes etc.
Tillman 04, Zens & Ney 06AlOnaizan & Papineni 06Galley & Manning 08Green & al.10, Feng & al.10…
![Page 7: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/7.jpg)
7 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
7
Reordering Models
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
ReoM scores
ReoM scores
ReoM scores
ReoM scores
No matter what reordering model is used, permutation search space must be limited!
The power of all reordering models is bound to the reordering constraints in use
Tillman04, Zens&Ney06AlOnaizan & Papineni06Galley & Manning08Green &al.10, Feng &al.10…
Many solutions have been proposed
with different reo. classes, features,
train modes etc.
Tillman 04, Zens & Ney 06AlOnaizan & Papineni 06Galley & Manning 08Green & al.10, Feng & al.10…
![Page 8: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/8.jpg)
8 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
8
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
ReoM scores
ReoM scores
ReoM scores
ReoM scores
![Page 9: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/9.jpg)
9 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
9
Reordering Constraints
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
#perm.=11!≈40,000,000
ReoM scores
ReoM scores
ReoM scores
ReoM scores
![Page 10: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/10.jpg)
10 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
10
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
Source-to-Source distortion
#perm.=11!≈40,000,000
D(x,y)=|y-x-1|
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
<s>
0 1 2 3 4 5 6 7 8 9 10
w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2
Reordering Constraints
![Page 11: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/11.jpg)
11 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
11
E' necessario incoraggiare tale mobilità garantendo la sicurezza dei percorsi professionali
Source-to-Source distortion
#perm.=11!≈40,000,000
D(x,y)=|y-x-1|
DL=3 #perm.≈7,000
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
<s>
0 1 2 3 4 5 6 7 8 9 10
w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2
DL: distortion limit
Reordering Constraints
![Page 12: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/12.jpg)
12 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
12
The problem with DL…
Arabic-English
AR
EN
AR
EN
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w1
0
<s>
0 1 2 3 4 5 6 7 8 9 10
w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 1110 9 8 7 6 5 4 3 2
![Page 13: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/13.jpg)
13 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
13
German-English
DE
EN
DE
EN
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w1
0
<s>
0 1 2 3 4 5 6 7 8 9 10
w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 1110 9 8 7 6 5 4 3 2
The problem with DL…
![Page 14: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/14.jpg)
14 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
14
Source-to-Source distortion
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
<s>
0 1 2 3 4 5 6 7 8 9 10
w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2
Current solution:
increase the DLimit
#perm.=11! ≈40,000,000
D(x,y)=|y-x-1|
DL=3 #perm.≈7,000
![Page 15: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/15.jpg)
15 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
15
Source-to-Source distortion
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
<s>
0 1 2 3 4 5 6 7 8 9 10
w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2
Current solution:
increase the DLimit
Generally leads to worse translations!
#perm.=11! ≈40,000,000
D(x,y)=|y-x-1|
DL=3 #perm.≈7,000
DL=7 #perm.≈7,000,000
![Page 16: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/16.jpg)
16 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
16
Source-to-Source distortion
#perm.=11! ≈40,000,000
D(x,y)=|y-x-1|
DL=3 #perm.≈7,000
DL=7 #perm.≈7,000,000
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
<s>
0 1 2 3 4 5 6 7 8 9 10
w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 2 0 1 2 3 4 5 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 4w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2
Our solution:
![Page 17: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/17.jpg)
17 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
17
Source-to-Source distortion
#perm.=11! ≈40,000,000
D(x,y)=|y-x-1|
DL=3 #perm.≈7,000
DL=7 #perm.≈7,000,000
DL=3 & modif(D)
#perm.≈20,000
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
<s>
0 1 2 3 4 5 6 7 8 9 10
w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 0 0 7 8w2 3 2 0 1 2 3 0 0 6 7w3 4 3 2 0 1 2 3 4 5 6w4 5 4 3 2 0 1 2 3 4 5w5 6 5 4 3 2 0 1 2 3 0w6 7 6 5 4 3 2 0 1 2 3w7 8 7 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 2 2 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2
Our solution:
modify distortion for each test
sentence
Simplifies the task of reordering models!
![Page 18: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/18.jpg)
18
Rest of the talk:
How to modify the distortion matrix?
What effect on translation quality?
What effect on baseline runtimes?
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
![Page 19: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/19.jpg)
19 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
19
Chunk-basedfuzzy reordering
rulesShallow syntax chunking:
• cheaper and easier than deep parsing
• constrains reorderings in a softer way
Fuzzy (non-determinisic) reordering rules:
• generate N permutations for each matching sequence
• final reordering decision is taken during translation, guided by all SMT models (reoM, LM...)
Few rules for language pair, to only capture long reordering
![Page 20: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/20.jpg)
20 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
20
Arabic-English
“Move verb chunk (and following chunk) to the right by 1 to N
chunks”
Chunk-basedfuzzy reordering
rules
CC1 VC2 PC3 NC4 PC5 Pct6
w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades
![Page 21: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/21.jpg)
21 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
21
Arabic-English
“Move verb chunk (and following chunk) to the right by 1 to N
chunks”
CC1 VC2 PC3 NC4 PC5 Pct6
CC1 VC2PC3 NC4 PC5
VC2PC3 NC4
VC2PC3 NC4 PC5
CC1
CC1
PC5
Pct6
Pct6
Pct6
w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades
Chunk-basedfuzzy reordering
rules
![Page 22: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/22.jpg)
22 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
22
Arabic-English
“Move verb chunk (and following chunk) to the right by 1 to N
chunks”
CC1 VC2 PC3 NC4 PC5 Pct6
CC1 VC2PC3 NC4 PC5
VC2 PC3NC4
VC2PC3 NC4
VC2 PC3NC4 PC5
VC2PC3 NC4 PC5
CC1
CC1
CC1
CC1
PC5
PC5
Pct6
Pct6
Pct6
Pct6
Pct6
w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades
Chunk-basedfuzzy reordering
rules
![Page 23: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/23.jpg)
23 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
23
CC1 VC2 PC3 NC4 PC5 Pct6
CC1 VC2PC3 NC4 PC5
VC2 PC3NC4
VC2PC3 NC4
VC2 PC3NC4 PC5
VC2PC3 NC4 PC5
CC1
CC1
CC1
CC1
PC5
PC5
Pct6
Pct6
Pct6
Pct6
Pct6
w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades
Chunk-basedfuzzy reordering
rulesReordering selection
Reordered source LM
0.9
0.4
0.10.1
0.7
![Page 24: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/24.jpg)
24 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
24
CC1 VC2 PC3 NC4 PC5 Pct6
CC1 VC2PC3 NC4 PC5
VC2 PC3
Pct6
Pct6
w- $Ark fy AltZAhrp E$rAt AlmslHyn mn AlktA}b . and took part in the march dozens of militants from the Brigades
Chunk-basedfuzzy reordering
rulesReordering selection
Reordered source LM
0.9
0.7
0.4
0.10.1
Reorderings to encode in the distortion matrix
NC4 PC5 CC1
![Page 25: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/25.jpg)
25 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
25
Modifying the distortion
matrix
CC1 VC2 PC3 NC4 PC5 Pct6
w0 w1 w2 w3 w4 w5 w6 w7 w8
<s>
0 1 2 3 4 5 6 7 8
CC1 w0 0 1 2 3 4 5 6 7VC2 w1 2 0 1 2 3 4 5 6
PC3
w2 3 2 0 1 2 3 4 5w3 4 3 2 0 1 2 3 4
NC4
w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2
PC5
w6 7 6 5 4 3 2 0 1w7 8 7 6 5 4 3 2 0
Pct6 w8 9 8 7 6 5 4 3 2
CC1 VC2PC3 NC4 PC5
VC2 PC3
Reorderings to encode in the distortion matrix
NC4 PC5 CC1
Pct6
Pct6
![Page 26: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/26.jpg)
26
CC1 VC2PC3 NC4 PC5
VC2 PC3
Reorderings to encode in the distortion matrix
NC4 PC5 CC1
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
26
Modifying the distortion
matrix
CC1 VC2 PC3 NC4 PC5 Pct6
w0 w1 w2 w3 w4 w5 w6 w7 w8
<s>
0 1 2 3 4 5 6 7 8
CC1 w0 0 0 0 3 4 5 6 7VC2 w1 2 0 1 2 3 4 5 6
PC3
w2 3 2 0 1 2 3 4 5w3 4 3 2 0 1 2 3 4
NC4
w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2
PC5
w6 7 6 5 4 3 2 0 1w7 8 7 6 5 4 3 2 0
Pct6 w8 9 8 7 6 5 4 3 2
Pct6
Pct6
![Page 27: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/27.jpg)
27 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
27
Modifying the distortion
matrix
CC1 VC2 PC3 NC4 PC5 Pct6
w0 w1 w2 w3 w4 w5 w6 w7 w8
<s>
0 1 2 3 4 5 6 7 8
CC1 w0 0 0 0 3 4 5 6 7VC2 w1 2 0 1 2 3 4 5 6
PC3
w2 3 2 0 1 2 3 4 5w3 4 2 2 0 1 2 3 4
NC4
w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2
PC5
w6 7 6 5 4 3 2 0 1w7 8 7 6 5 4 3 2 0
Pct6 w8 9 8 7 6 5 4 3 2
CC1 VC2PC3 NC4 PC5
VC2 PC3
Reorderings to encode in the distortion matrix
NC4 PC5 CC1
Pct6
Pct6
![Page 28: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/28.jpg)
28 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
28
Modifying the distortion
matrix
CC1 VC2 PC3 NC4 PC5 Pct6
w0 w1 w2 w3 w4 w5 w6 w7 w8
<s>
0 1 2 3 4 5 6 7 8
CC1 w0 0 0 0 3 4 5 6 7VC2 w1 2 0 1 0 0 4 5 6
PC3
w2 3 2 0 1 2 3 4 5w3 4 2 2 0 1 2 3 4
NC4
w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2
PC5
w6 7 6 5 4 3 2 0 1w7 8 7 6 5 4 3 2 0
Pct6 w8 9 8 7 6 5 4 3 2
CC1 VC2PC3 NC4 PC5
VC2 PC3
Reorderings to encode in the distortion matrix
NC4 PC5 CC1
Pct6
Pct6
![Page 29: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/29.jpg)
29 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
29
Modifying the distortion
matrix
CC1 VC2 PC3 NC4 PC5 Pct6
w0 w1 w2 w3 w4 w5 w6 w7 w8
<s>
0 1 2 3 4 5 6 7 8
CC1 w0 0 0 0 0 0 5 6 7VC2 w1 2 0 1 0 0 4 5 6
PC3
w2 3 2 0 1 2 3 4 5w3 4 2 2 0 1 2 3 4
NC4
w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2
PC5
w6 7 6 5 4 3 2 0 1w7 8 7 6 5 4 3 2 0
Pct6 w8 9 8 7 6 5 4 3 2
CC1 VC2PC3 NC4 PC5
VC2 PC3
Reorderings to encode in the distortion matrix
NC4 PC5 CC1
Pct6
Pct6
![Page 30: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/30.jpg)
30 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
30
Modifying the distortion
matrix
CC1 VC2 PC3 NC4 PC5 Pct6
w0 w1 w2 w3 w4 w5 w6 w7 w8
<s>
0 1 2 3 4 5 6 7 8
CC1 w0 0 0 0 0 0 5 6 7VC2 w1 2 0 1 0 0 4 5 6
PC3
w2 3 2 0 1 2 3 4 5w3 4 2 2 0 1 2 3 4
NC4
w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2
PC5
w6 7 2 5 4 3 2 0 1w7 8 2 6 5 4 3 2 0
Pct6 w8 9 8 7 6 5 4 3 2
CC1 VC2PC3 NC4 PC5
VC2 PC3
Reorderings to encode in the distortion matrix
NC4 PC5 CC1
Pct6
Pct6
![Page 31: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/31.jpg)
31 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
31
Modifying the distortion
matrix
CC1 VC2 PC3 NC4 PC5 Pct6
w0 w1 w2 w3 w4 w5 w6 w7 w8
<s>
0 1 2 3 4 5 6 7 8
CC1 w0 0 0 0 0 0 5 6 7VC2 w1 2 0 1 0 0 4 5 6
PC3
w2 3 2 0 1 2 3 4 0w3 4 2 2 0 1 2 3 0
NC4
w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2
PC5
w6 7 2 5 4 3 2 0 1w7 8 2 6 5 4 3 2 0
Pct6 w8 9 8 7 6 5 4 3 2
CC1 VC2PC3 NC4 PC5
VC2 PC3
Reorderings to encode in the distortion matrix
NC4 PC5 CC1
Pct6
Pct6
![Page 32: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/32.jpg)
32 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
32
Modifying the distortion
matrix
CC1 VC2 PC3 NC4 PC5 Pct6
w0 w1 w2 w3 w4 w5 w6 w7 w8
<s>
0 1 2 3 4 5 6 7 8
CC1 w0 0 0 0 0 0 5 6 7VC2 w1 2 0 1 0 0 4 5 6
PC3
w2 3 2 0 1 2 3 4 0w3 4 2 2 0 1 2 3 0
NC4
w4 5 4 3 2 0 1 2 3w5 6 5 4 3 2 0 1 2
PC5
w6 7 2 5 4 3 2 0 1w7 8 2 6 5 4 3 2 0
Pct6 w8 9 8 7 6 5 4 3 2
CC1 VC2PC3 NC4 PC5
VC2 PC3
Reorderings to encode in the distortion matrix
NC4 PC5 CC1
Pct6
Pct6
![Page 33: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/33.jpg)
33 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
33
Experiments
• Tasks: NIST-MT09 for Ar-En, WMT10 for De-En
• Systems based on Moses, include state-of-the-art hierarchical lexicalized reordering models [Tillmann 04; Koehn & al 05; Galley & Manning 08]
• Baseline Distortion Limits: 5 in Ar-En, 10 in De-En
• Evaluation by: - BLEU for lexical match & local order - KRS for global order
![Page 34: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/34.jpg)
35 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
35
Arabic-English:
Test set: eval09-NW
Distortion modified with 3-best reorderings per rule-matching sequence
Translation QualityTranslation Time
+0.9 BLEU+0.6 KRS(signif.)
![Page 35: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/35.jpg)
37 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
37
German-English:
Test set: newstest10
Distortion modified with 3-best reorderings per rule-matching sequence
Translation QualityTranslation Time
+0.4 BLEU+0.7 KRS(signif.)
![Page 36: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/36.jpg)
38 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
38
Conclusions
• Modified distortion allows for finer & linguistically motivated definition of search space
• We achieve better translation & faster decoding in language pairs where long reordering concentrates on few patterns
• Our method is complementary to reordering modeling
• For now, few reordering rules are needed to modify distortion
• We are currently working on a fully data-driven approach to replace the rules
![Page 37: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/37.jpg)
39 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
39
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w1
0
<s>
0 1 2 3 4 5 6 7 8 9 10
w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 T 0 1 2 3 4 5 6 7w3 4 H 2 0 1 2 3 Y 5 6w4 5 A T T E N T I O N !w5 6 N 4 3 2 0 1 U 3 4w6 7 K 5 4 3 2 F O R 2 3w7 8 S 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2
![Page 38: Modified Distortion Matrices for Phrase-Based SMT Arianna Bisazza & Marcello Federico – FBK (Italy)](https://reader038.vdocuments.net/reader038/viewer/2022110304/551bf4e3550346ad4f8b45f8/html5/thumbnails/38.jpg)
40 A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
A. Bisazza & M. Federico – Modified Distortion Matrices for PSMT
40
w0 w1 w2 w3 w4 w5 w6 w7 w8 w9 w1
0
<s>
0 1 2 3 4 5 6 7 8 9 10
w0 0 1 2 3 4 5 6 7 8 9w1 2 0 1 2 3 4 5 6 7 8w2 3 T 0 1 2 3 4 5 6 7w3 4 H 2 0 1 2 3 Y 5 6w4 5 A T T E N T I O N !w5 6 N 4 3 2 0 1 U 3 4w6 7 K 5 4 3 2 F O R 2 3w7 8 S 6 5 4 3 2 0 1 2w8 9 8 7 6 5 4 3 2 0 1w9 10 9 8 7 6 5 4 3 2 0w10 11 10 9 8 7 6 5 4 3 2