machine translation · 2019-05-13 · rule-based machine translation a. creating rules is long and...

52

Upload: others

Post on 03-Jul-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation
Page 2: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Machine Translation

Kairit’s NLP courseMay 8, 2019Mark Fishel

2

Page 3: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Plan

3

● motivation● how it works

○ rule-based / statistical / neural MT

● NMT in practice○ text domain○ low-resource settings○ our work

Page 4: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Warren Weaver, 1947: “When I look at an article in Russian, I say ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’”

Very brief history of MT,beginning:

4

Page 5: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

I know you are the dream girlof millions of your fans; en → de:

5

Very brief history of MT,not so long ago:

Page 6: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Very brief history of MT,now:

Page 7: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

1. post-editing○ translate automatically, fix manually

(“post-edit”)○ works best with “boring” texts (e.g. legal, technical,

etc.)■ can be 20%–90% faster than manual translation

Use-cases:

7

Page 8: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

What human translators think:

8

Page 9: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

2. gisting○ approximate half-erroneous text in an understandable

language is better than perfect text in an incomprehensible language

○ Google Translate, Bing Translator, jms.

Use-cases:

9

Page 10: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Rule-based MT

10

Page 11: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

1. Let’s create manually a word/phrase dictionary, reordering and inflecting rules, etc. E.g.:

Rule-based machine translation

11

Page 12: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Dictionary:words:● house → maja● car → auto● ...

phrases:● of course → muidugi

Rules:● in X(en) → Xseesütlev

(et)

○ nt. “in Tallinn” = “Tallinnas”

● …

Rule-based machine translation

12

1. Let’s create manually a word/phrase dictionary, reordering and inflecting rules, etc. E.g.:

Page 13: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Rules:● in X(en) → Xseesütlev

(et)

○ nt. “in Tallinn” = “Tallinnas”

● …

Dictionary:words:● house → maja● car → auto● a/the → ???

phrases:● of course → muidugi

Rule-based machine translation

13

1. Let’s create manually a word/phrase dictionary, reordering and inflecting rules, etc. E.g.:

Page 14: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Rules:● in X(en) → Xseesütlev

(et)

○ nt. “in Tallinn” = “Tallinnas”

● …

Dictionary:words:● house → maja● car → auto● a/the → ???● park → park/parkima?phrases:● of course → muidugi

Rule-based machine translation

14

1. Let’s create manually a word/phrase dictionary, reordering and inflecting rules, etc. E.g.:

Page 15: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Dictionary:words:● house → maja● car → auto● a/the → ???● park → park/parkima?● cool → jahe/lahe?phrases:

Rules:● in X(en) → Xseesütlev

(et)

○ nt. “in Tallinn” = “Tallinnas”

● …

Rule-based machine translation

15

1. Let’s create manually a word/phrase dictionary, reordering and inflecting rules, etc. E.g.:

Page 16: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Dictionary:words:● house → maja● car → auto● a/the → ???● park → park/parkima?● cool → jahe/lahe?phrases:

Rules:● in X(en) → Xseesütlev

(et)

○ nt. “in Tallinn” = “Tallinnas”

● in three hours → kolme tunniga??

Rule-based machine translation

16

1. Let’s create manually a word/phrase dictionary, reordering and inflecting rules, etc. E.g.:

Page 17: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

1. Let’s create manually a word/phrase dictionary, reordering and inflecting rules, etc. E.g.:

Dictionary:words:● house → maja● car → auto● a/the → ???● park → park/parkima?● cool → jahe/lahe?phrases:

Reeglid:● in X(en) → Xseesütlev

(et)

○ nt. “in the box” = “karbis”

● in three hours → kolme tunniga??

Rule-based machine translation

a. creating rules is long and boringb. the rules have to be extremely

complicated if we want the translation system to handle a variety of texts

c. new language pair = start from scratch (practically no reusability)

17

Page 18: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: ?

LV: Vai tev ir labāka ideja?

Statistical Translation

Page 19: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: ?

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 20: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: ?

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 21: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: ?

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 22: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: ?

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 23: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: ?

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 24: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: ?

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 25: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: ?

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 26: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: ?

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 27: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: ?

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 28: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: Kas

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 29: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: Kas sul

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 30: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: Kas sul on

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 31: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: Kas sul on parem idee

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 32: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

ET: Kas sul on parem idee?

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 33: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Actual translation:● segment input● translate pieces● reorder● put in context● …

ET: Kas sul on parem idee?

LV: Vai tev ir labāka ideja?

ET: Mul on parem idee.

LV: Man ir labāka ideja.

ET: Kas sul on tõlketekste?

LV: Vai tev ir pārtulkotu tekstu?

ET: Sul peaks olema palju tõlketekste.

LV: Tev jābūt daudz pārtulkotu tekstu.

Statistical Translation

Page 34: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Neural MT, autoregression

34

Page 35: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Neural MT, encoder-decoder

35

Page 36: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Subwords

36

Page 37: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Neural MT, attention mechanism

37

Page 39: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

39

Src: lielakeda.lv

Page 40: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

40

Src: lielakeda.lv

Page 41: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Neural MT, self-attention(Transformer)

41

Page 43: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

43

Src: Illustrated Transformer

Page 44: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Text domain

44

Page 45: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Back-translation

45

Page 46: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Unsupervised translation

46

Page 47: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Politeness

47

Page 48: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Multilinguality

48

Page 49: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

“Monolingual translation”

49

Page 50: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Practice: QE, PEControllability and usability

Technical requirements

50

Page 51: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

Summary:● RBMT vs SMT vs NMT● subwords + self-attention● low-res tricks● “bonuses”● practice ≠ theory

51

Page 52: Machine Translation · 2019-05-13 · Rule-based machine translation a. creating rules is long and boring b. the rules have to be extremely complicated if we want the translation

52

That’s it!