jhu mt lecture_2014-04-15

165
Representing Huge Translation Models

Upload: alopezfoo

Post on 28-Jun-2015

154 views

Category:

Technology


1 download

DESCRIPTION

JHU MT class: Representing huge translation models

TRANSCRIPT

Representing Huge Translation Models

Statistical Machine Translation

parallel text + alignment

Statistical Machine Translation

extractrules

parallel text + alignment

Statistical Machine Translation

extractrules

scorerules

parallel text + alignment

Statistical Machine Translation

extractrules

scorerules

联合国 安全 理事会 的 五个 常任 理事 国都

load rules into memory

decoder

parallel text + alignment

Statistical Machine Translation

extractrules

scorerules

联合国 安全 理事会 的 五个 常任 理事 国都

load rules into memory

decoder

parallel text + alignment

number of rulesdepends on

corpus size...

Statistical Machine Translation

extractrules

scorerules

联合国 安全 理事会 的 五个 常任 理事 国都

load rules into memory

decoder

parallel text + alignment

... and modelcomplexity

Statistical Machine Translation

parallel text + alignment

extractrules

scorerules

联合国 安全 理事会 的 五个 常任 理事 国都

load filtered rules

into memory

decoding algorithm

filter rules for test set

Baseline Translation Model

• Hierarchical Phrase-based translation (Chiang 2007)

• 1M parallel sentences (27M words)

• GIZA++ alignments (Och & Ney 2003, Koehn et al. 2003)

• alignments are dense

• Heuristics used to restrict number of extracted rules

• 67M rules, 6.1Gb of data

• cf. 225M (Zens & Ney 2007), 55M (DeNeefe et al. 2007)

Some Possible Improvements

• 3.5M sentences (2.5M out-of-domain), 100M words

• Discriminatively trained alignments (Ayan & Dorr 2006)

• Key difference: alignments are sparse

• Loose phrase extraction (Ayan & Dorr 2006)

Some Possible Improvements

• 3.5M sentences (2.5M out-of-domain), 100M words

• Discriminatively trained alignments (Ayan & Dorr 2006)

• Key difference: alignments are sparse

• Loose phrase extraction (Ayan & Dorr 2006)

Some Possible Improvements

• 3.5M sentences (2.5M out-of-domain), 100M words

• Discriminatively trained alignments (Ayan & Dorr 2006)

• Key difference: alignments are sparse

• Loose phrase extraction (Ayan & Dorr 2006)

Some Possible Improvements

• 3.5M sentences (2.5M out-of-domain), 100M words

• Discriminatively trained alignments (Ayan & Dorr 2006)

• Key difference: alignments are sparse

• Loose phrase extraction (Ayan & Dorr 2006)

Some Possible Improvements

• 3.5M sentences (2.5M out-of-domain), 100M words

• Discriminatively trained alignments (Ayan & Dorr 2006)

• Key difference: alignments are sparse

• Loose phrase extraction (Ayan & Dorr 2006)

Some Possible Improvements

• Rule extraction time: 77 CPU days

• does not include sorting or scoring!

• Rules counted: 20 billion

• 2 orders of magnitude larger than state of the art

• Estimated unique rules: 6.6 billion

• Estimated extract file size: 917Gb

• Estimated phrase table size: 600Gb

The Problem

• Current models are bounded by resource limitations.

• We’re already pushing the edge of what’s possible.

• Parallel data aren’t getting any smaller.

• Models aren’t getting any less complex.

The Solution

• Translation by pattern matching.

• Novel pattern matching algorithms.

• Exploit ideas developed in bioinformatics, IR

• Support for tera-scale translation models.

Idea: Translation by Pattern Matching(Callison-Burch et al. 05, Zhang & Vogel 05)

联合国 安全 理事会 的 五个 常任 理事 国都

decodingalgorithm

patternmatchingalgorithm

parallel text + alignment in memory

sentence-specific

rulesextract andscore

it persuades him and it disheartens him

Exact Pattern MatchingInput Pattern

it persuades him and it disheartens him

Exact Pattern MatchingInput Pattern=Query Pattern

it persuades him and it disheartens him

Pattern Matching for Phrase-Based MTInput Pattern

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartenshim and it disheartens him

Pattern Matching for Phrase-Based MTit persuades him and it disheartens himInput Pattern

Query Patterns

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

it mars him , it sets him on and it takes him off . #4

Text T

Suffix 4

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

it makes him and it mars him , it sets him on and it takes him ...makes him and it mars him , it sets him on and it takes him off . #him and it mars him , it sets him on and it takes him off . #and it mars him , it sets him on and it takes him off . #it mars him , it sets him on and it takes him off . #mars him , it sets him on and it takes him off . #him , it sets him on and it takes him off . #, it sets him on and it takes him off . #

01234567

...

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

31221510604

and it mars him , it sets him on and it takes him off . #and it takes him off . #him and it mars him , it sets him on and it takes him off . #him off . #him on and it takes him off . #him , it sets him on and it takes him off . #it makes him and it mars him , it sets him on and it takes him ...it mars him , it sets him on and it takes him off . #...

Suffix Arraysit makes him and it mars him , it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

and it mars him , it sets him on and it takes him off . #and it takes him off . #him and it mars him , it sets him on and it takes him off . #him off . #him on and it takes him off . #him , it sets him on and it takes him off . #it makes him and it mars him , it sets him on and it takes him ...it mars him , it sets him on and it takes him off . #...

31221510604

...

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Text T

Suffix Array SA

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it

O(|w| log |T |)

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it (Manber & Myers, 93)

O(|w| log |T |)

O(|w| + log |T |)

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it (Manber & Myers, 93)

O(|w| log |T |)

O(|w| + log |T |)

O(|w|) (Abouelhoda et al., 04)

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text T

Suffix Array SA

Query Pattern w

him and it (Manber & Myers, 93)

O(|w| log |T |)

O(|w| + log |T |)

O(|w|) (Abouelhoda et al., 04)

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18

on baseline model: 0.009 seconds/sentence

(not including extraction/scoring)

Problem: Phrases with Gaps

• Hierarchical phrase-based translation (Chiang 2005, 2007)

• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007

it persuades him and it disheartens him

it X himSource Phrase

Input

Hierarchical Phrases: Phrases with Gaps

• Hierarchical phrase-based translation (Chiang 2005, 2007)

• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007

it persuades him and it disheartens him

it X himSource Phrase

Input

Hierarchical Phrases: Phrases with Gaps

• Hierarchical phrase-based translation (Chiang 2005, 2007)

• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007

it persuades him and it disheartens him

it X himSource Phrase

Input

Hierarchical Phrases: Phrases with Gaps

• Hierarchical phrase-based translation (Chiang 2005, 2007)

• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007

it persuades him and it disheartens him

it X himSource Phrase

Input

Hierarchical Phrases: Phrases with Gaps

• Hierarchical phrase-based translation (Chiang 2005, 2007)

• Quirk et al. 2005, Simard et al. 2005, DeNeefe et al. 2007

it persuades him and it disheartens him

it X and X himSource Phrase

Input

Given an input sentence, efficiently find all hierarchical phrase-based translation rules for that

sentence in the training corpus.

Problem Statement

it persuades him and it disheartens him

Pattern Matching for Hierachical PBMTInput Pattern

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartenshim and it disheartens him

Pattern Matching for Hierarchical PBMTit persuades him and it disheartens himInput Pattern

Query Patterns

Pattern Matching for Hierarchical PBMTit persuades him and it disheartens himInput Pattern

Query Patternsit X and

it X itit X disheartens

it X himpersuades X it

persuades X disheartenspersuades X himit persuades X it

it persuades X disheartensit persuades X him

it X and itit X it disheartens

it X disheartens himit X and X him

persuades him X disheartenspersuades him X him

persuades X it disheartenspersuades X disheartens him

him and X himhim X disheartens him

it persuades him X disheartensit persuades him X him

it persuades X it disheartensit persuades X disheartens him

Pattern Matching for Hierarchical PBMTit persuades him and it disheartens himInput Pattern

Query Patternsit X and it disheartensit X it disheartens him

persuades him and X himpersuades him X disheartens him

persuades X it disheartens himit persuades him and X him

it persuades him X disheartens himit persuades X it disheartens him

it X and it disheartens him

Pattern Matching for Hierarchical PBMTit persuades him and it disheartens himInput Pattern

Query Patternsit X and it disheartensit X it disheartens him

persuades him and X himpersuades him X disheartens him

persuades X it disheartens himit persuades him and X him

it persuades him X disheartens himit persuades X it disheartens him

it X and it disheartens him

This is a variant of approximate pattern matching (Navarro ‘01)

Pattern Matching with Gaps31221510604813

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...1

him X it

Query pattern

...

!

Pattern Matching with Gaps

him X it

Query pattern !

312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Pattern Matching with Gaps

him X it

Query pattern !

312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Pattern Matching with Gaps

him X it

Query pattern

himit

!

Subpatterns wi

312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Pattern Matching with Gaps

him X it

Query pattern

himit

!

Subpatterns wi

312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Pattern Matching with Gaps

him X it

Query pattern

himit

!

Subpatterns wi

ni Occurrences

312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

Pattern Matching with Gaps312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

215106

04813

Pattern Matching with Gaps

215106

04813

Pattern Matching with Gaps

215106

04813

(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

Pattern Matching with Gaps

215106

04813

RILMS (Rahman et al., 06)

(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

Pattern Matching with Gaps

215106

04813

RILMS (Rahman et al., 06)

O(!

i

ni)linear in number of occurrences of subpatterns:

(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

221 seconds

Baseline Timing Result

per sentence

compare: 0.009 seconds per sentencefor contiguous phrases

137 5 27

!

!=w1X...XwI

I!

i=1

(|wi| + log |T | + ni)

!

w

(|w| + log |T |)

2825 3 5 27 82069

contiguous

discontiguous

Complexity Analysis

137 5 27

!

!=w1X...XwI

I!

i=1

(|wi| + log |T | + ni)

!

w

(|w| + log |T |)

2825 3 5 27 82069

contiguous

discontiguous

Complexity Analysis

Exploiting Redundancyit persuades him and it disheartens himInput Pattern

Query Patternsit X and

it X itit X disheartens

it X himpersuades X it

persuades X disheartenspersuades X himit persuades X it

it persuades X disheartensit persuades X him

it X and itit X it disheartens

it X disheartens himit X and X him

persuades him X disheartenspersuades him X him

persuades X it disheartenspersuades X disheartens him

him and X himhim X disheartens him

it persuades him X disheartensit persuades him X him

it persuades X it disheartensit persuades X disheartens him

Exploiting Redundancyit persuades him and it disheartens himInput Pattern

Query Patternsit X and

it X itit X disheartens

it X himpersuades X it

persuades X disheartenspersuades X himit persuades X it

it persuades X disheartensit persuades X him

it X and itit X it disheartens

it X disheartens himit X and X him

persuades him X disheartenspersuades him X him

persuades X it disheartenspersuades X disheartens him

him and X himhim X disheartens him

it persuades him X disheartensit persuades him X him

it persuades X it disheartensit persuades X disheartens him

Exploiting Redundancy

it persuades X disheartens himQuery Pattern

Exploiting Redundancy

it persuades X disheartens himQuery Patternit persuades X disheartensMaximal Prefix

(Zhang & Vogel 2005)

Exploiting Redundancy

it persuades X disheartens himQuery Patternit persuades X disheartens

persuades X disheartens himMaximal PrefixMaximal Suffix

Prefix Tree with Suffix Links

it

persuadeshim

Xhim

him

persuades

X

him

him

221

Baseline

seconds/sentence

Timing Results

177

221

Baseline PrefixTree

seconds/sentence

Timing Results

137 5 27

!

!=w1X...XwI

I!

i=1

(|wi| + log |T | + ni)

!

w

(|w| + log |T |)

2825 3 5 27 82069

contiguous

discontiguous

Complexity Analysis

137 5 27

!

!=w1X...XwI

I!

i=1

(|wi| + log |T | + ni)

!

w

(|w| + log |T |)

2825 3 5 27 82069

contiguous

discontiguous

Complexity Analysis

computations (ranked by time)

cum

ulat

ive

time

(s)

Empirical Analysis

Distribution of Patterns in Training DataFr

eque

ncy

Pattern types (in descending order of frequency)

Distribution of Patterns in Training DataFr

eque

ncy

Pattern types (in descending order of frequency)

Analysis of Problem

• The expensive computations involve at least one frequent subpattern. There are two cases.

• A frequent pattern paired with an infrequent pattern

• Two frequent patterns paired with each other

Frequent × Infrequent Subpatterns

Frequent × Infrequent Subpatterns

Frequent × Infrequent Subpatterns

Frequent × Infrequent Subpatterns

Double Binary SearchBaeza-Yates, 04

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

Double Binary SearchBaeza-Yates, 04

Queryset Q Dataset D

complexity: |Q| log |D|Upper bound

Obtaining Sorted Sets

Obtaining Sorted Sets

Sort via Stratified Tree(van Emde Boas et al. 1977)

Obtaining Sorted Sets

Problem: complexity increases to O(|Q| log |D| + (|Q| + |D|) log log |T |)

Sort via Stratified Tree(van Emde Boas et al. 1977)

Obtaining Sorted Sets

Problem: complexity increases to

Solution: cache sorted set in prefix

tree

O(|Q| log |D| + (|Q| + |D|) log log |T |)

Sort via Stratified Tree(van Emde Boas et al. 1977)

177

221

Baseline PrefixTree

+ double binary

seconds/sentence

Timing Results

174177

221

Baseline PrefixTree

+ double binary

seconds/sentence

Timing Results

Obtaining Sorted Sets

Sort via Stratified Tree

Problem: sort complexityis still very high for very

frequent patterns

Obtaining Sorted Sets

Solution: precompute theinverted index for 1000 most

frequent contiguous patterns

174177

221

Baseline PrefixTree

+ double binary

seconds/sentence

Timing Results

44

174177

221

Baseline PrefixTree

+ double binary

seconds/sentence

Timing Results

+ inverted indices

Frequent × Frequent Subpatterns

Frequent × Frequent Subpatterns

Problem:There is no clever algorithm to

solve this problem

Solution: Precomputationit makes him and it mars him . it sets him on and it takes him off . #it makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text

Solution: Precomputationit makes him and it mars him . it sets him on and it takes him off . #

Most Frequent Patterns

it makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text

it (4)him (4)

Precomputed Pattern Matchesit X him him X it

it X it him X him

Solution: Precomputationit makes him and it mars him . it sets him on and it takes him off . #

Most Frequent Patterns

it makes him and it mars him . it sets him on and it takes him off . #0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Text

it (4)him (4)

Precomputed Pattern Matchesit X him him X it

it X it him X him

(0, 2)

(0, 4)

(0, 6)

(0, 8)

(2, 4)

(2, 6)

(2, 8)

(2, 10)

(4, 6)

(4, 8)

(4, 10)

(4, 13)

(6, 8)

(6, 10)

(6, 13)

(6, 15)

(8, 10)

(8, 13)

(8, 15)(10, 13)

(10, 15)

(13, 15)

44

174177

221

Baseline PrefixTree

+ double binary

seconds/sentence

Timing Results

+ inverted indices

1

44

174177

221

Baseline PrefixTree

+ double binary

seconds/sentence

Timing Results

+ inverted indices

+ precomp

Analysis of Fixed Memory Usage

• Source Text: |T|

• Suffix Array: |T|

• Alignments: |T|

• Target Text: |T|

• Total Cost: 4 |T|

• For 27M words: about 700M

• including indices for 1000 words: about 2.1 Gb

• for 100 words: 1.1Gb, increases time to 1.6 secs/sent

Longer Spans, Longer Phrases

1520253035

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1520253035

1 2 3 4 5 6 7 8 9 10

Maximum Span Length

Maximum Phrase Length

BLEU

BLEU

The Tera-Scale Translation Model

• Task: NIST Chinese-English 2005

• Baseline Model: 30.7

• Tera-Scale Model: 32.6

• All modifications contribute to overall score

• With better language model and number translation:

• Baseline Model: 31.9

• Tera-Scale Model: 34.5

Open Questions

• Can we improve speed?

• Can we improve memory use? Compressed self-indexes?

• Uses for arbitrarily large translation models?

• Context-sensitive models (Chan et al. 2007, Carpuat & Wu 2007)

• Factored models (Koehn et al. 2007)

• Syntax-based model (DeNeefe et al. 2007)

• What other algorithms can we use from bioinformatics?

Compact Adaptable TMs on GPUs

• Technique: massively parallel string pattern matching algorithms for...

• Streaming MT (Levenberg et al. 2010)

• Interactive MT (Denkowski et al. 2014)

• Low-memory MT (Lopez, 2008)

• Online learning (Simianer et al. 2012)

• New in this talk:

• New translation models

• New algorithms

• State-of-the-art translation accuracy

Streaming Translation Models

• Align new training data.

• Append to existing training data.

• Extract new phrase pairs as needed.

(Levenberg et al. 2010)

Streaming Translation Models

• Align new training data → online EM. FAST!

• Append to existing training data. FAST!

• Extract new phrase pairs as needed. SLOW...

(Levenberg et al. 2010)

For each phrase pair J:For each occurrence of J:

Extract its translation E (if possible)For each phrase pair J,E:

Compute p(E|J)

Streaming Translation Models

• Align new training data → online EM. FAST!

• Append to existing training data. FAST!

• Extract new phrase pairs as needed. SLOW...

(Levenberg et al. 2010)

For each phrase pair J:For each occurrence of J:

Extract its translation E (if possible)For each phrase pair J,E:

Compute p(E|J)

Streaming Translation Models

• Align new training data → online EM. FAST!

• Append to existing training data. FAST!

• Extract new phrase pairs as needed. SLOW...

(Levenberg et al. 2010)

For each phrase pair J:For each occurrence of J:

Extract its translation E (if possible)For each phrase pair J,E:

Compute p(E|J)

... But easily parallelizable. GPUs to the rescue!

Streaming Translation Models(Levenberg et al. 2010)

For each phrase pair J:For each occurrence of J:

Extract its translation E (if possible)For each phrase pair J,E:

Compute p(E|J)

In parallel on GPU://find all

// extract all

GPUs

• Graphics Processing Unit

• Single Instruction, Multiple Data

• Limited Memory

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

6, 10

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

6, 10

2, 3

0, 2

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

6, 10

2, 3

0, 2

Parallel pass 1:Find longest substrings

simultaneously

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

6, 10

2, 6

2, 3

2, 3

0, 2

0, 2

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

6, 10

2, 6

2, 3

2, 3

0, 2

0, 2

Suffix Arraysit makes him and it mars him . it sets him on and it takes him off . #

3 12 2 15 10 6 0 4 8 13 1 5 16 11 9 14 7 17 18 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

itpersuades

himand

disheartensit persuades

persuades himhim and

and itit disheartens

disheartens him

it persuades himpersuades him and

him and itand it disheartensit disheartens him

it persuades him andpersuades him and it

him and it disheartensand it disheartens himit persuades him and it

persuades him and it disheartens

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

6, 10

2, 6

2, 3

2, 3

0, 2

0, 2

Parallel pass 2:Find all substrings

simultaneously

Pattern Matching with Gaps312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

him _ it

Query pattern:

himit

Subpatterns:

Pattern Matching with Gaps312215106048131

...

and it mars him , it sets him ...and it takes him off . #him and it mars him . it sets ...him off . #him on and it takes him off . #him , it sets him on and it ...it makes him and it mars ...it mars him , it sets him on ...it sets him on and it takes ...it takes him off . #makes him and it mars him ...

215106

04813

Pattern Matching with Gaps

215106

04813

Pattern Matching with Gaps

215106

04813

(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

on CPU

Pattern Matching with Gaps

215106

04813

(2, 4)(2, 8)(2, 13)(6, 8)(6, 13)(10, 13)

it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

on CPU

Pattern Matching with Gaps

215106

04813

(2, 4)(10, 13)

it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

on CPU

Pattern Matching with Gaps on GPU

215106

it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Pattern Matching with Gaps on GPU

215106

it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Thread 1

Thread 4Thread 3

Thread 2

Pattern Matching with Gaps on GPU

215106

it makes him and it mars him . it sets him on and it takes him off . # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Thread 1

Thread 4Thread 3

Thread 2(2,4)

(10,13)

Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

akemasu / open

Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

watashi wa / I

Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

hako wo / the box

Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

hako wo / open the box✘

Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

hako wo akemasu / open the box

Hierarchical Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

_ akemasu / open _

Hierarchical Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

_ akemasu / open _

Hierarchical Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

_1 _2 akemasu/ _1 open _2

Hierarchical Phrase ExtractionI open the box

watashi

wa

hako

wo

akemasu

watashi wa hako wo _ / I _ the box

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

L,R

L’,R’

0

0

1 2 3

1

2

3

4

45

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

L,R

L’,R’

watashi wa / ???

0

0

1 2 3

1

2

3

4

45

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

L,R

L’,R’

watashi wa / ???

0

0

1 2 3

1

2

3

4

45

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

watashi wa / ???

L,R

L’,R’

45

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

watashi wa / ???

L,R

L’,R’

45

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

watashi wa / I

L,R

L’,R’

45

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

wo akemasu / ???

L,R

L’,R’

45

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

wo akemasu / ???

L,R

L’,R’

45

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

wo akemasu / ???

L,R

L’,R’

45

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

wo akemasu / ???

L,R

L’,R’

45

Phrase Extraction on GPUsI open the box

watashi

wa

hako

wo

akemasu0

0

1 2 3

1

2

3

4

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

wo akemasu / ???

L,R

L’,R’

45

Compute i’,j’ simultaneously for all (sampled) occurrences

Representing L and R ArraysI open the box

watashi

wa

hako

wo

akemasu

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

L,R

L’,R’

0

0

1 2 3

1

2

3

4

45

Fiction!

Representing L and R ArraysI open the box

watashi

wa

hako

wo

akemasu

0,1

0,1

3,4

3,4

1,2

0,2 4,5 -,- 2,4

L,R

L’,R’

7240736

7240737

7240738

Reality!

7240739

7240740

72407417582971

7582972

7582973

7582974

7582975

Representing L and R Arrays

watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2

7240736 7240737 7240738 7240739 7240740

wordR,L

i

Representing L and R Arrays

watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2

7240736 7240737 7240738 7240739 7240740

wordR,L

i

#7240735

Representing L and R Arrays

watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2

7240736 7240737 7240738 7240739 7240740

wordR,L

i

#7240735

P 1 2 3 4 5

Representing L and R Arrays

watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2

7240736 7240737 7240738 7240739 7240740

wordR,L

i

#7240735

P 1 2 3 4 5

7582971

Representing L and R Arrays

watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2

7240736 7240737 7240738 7240739 7240740

wordR,L

i

#7240735

P 1 2 3 4 5

7582971

Representing L and R Arrays

watashi wa hako wo akemasu0,1 0,1 3,4 3,4 1,2

7240736 7240737 7240738 7240739 7240740

wordR,L

i

#7240735

P 1 2 3 4 5

7582971

Experiments

• NIST ’05-’06 Chinese English, train: Xinhua + UN

• Baseline: very fast CPU implementation mostly in C (Lopez 2007, 2008; Chahuneau et al. 2012)

• State-of-the art cdec MT system (Dyer et al. 2010)

• GPU: NVIDIA Tesla K20c (2496 cores, 5Gb memory)

• CPU: dual 2.90GHz Intel Xeon E5-2690

Phrase Extraction Speed

0

750

1500

2250

3000

CPU GPU (2000)

GPU (4000)

GPU (6000)

GPU (8000)

571514454400

5.4

20011794

16131356

14.2

Xinhua (27M words) Xinhua + UN (108M words)

wor

ds p

er se

cond

Translation Accuracy

• Without hierarchical phrases: 30.7

• With hierarchical phrases on CPU: 33.37

• With hierarchical phrases on GPU: 33.46

End-to-end Translation Speed

• With hierarchical phrases on CPU: 180.5s

• With hierarchical phrases on GPU: 98.9s

Conclusions• GPU pattern matching & extraction algorithms

• Fast adaptable MT with streaming models

• Low-memory footprint MT

• State of the art translation accuracy

• Language models on GPUs?

• Decoding on GPUs?