unsupervised machine translation - stanford universityintroduction wmt 2019 english-german system...

336
Unsupervised Machine Translation Mikel Artetxe Facebook AI Research

Upload: others

Post on 09-Aug-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

UnsupervisedMachine Translation

Mikel Artetxe

Facebook AI Research

Page 2: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

Page 3: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

IntroductionWMT 2019 English-German

Page 4: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

IntroductionWMT 2019 English-GermanSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 5: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

WMT 2019 English-GermanSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 6: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-GermanSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 7: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairsSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 8: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairsSystem ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 9: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 10: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 11: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 12: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 13: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 14: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

In my opinion, the new liquidity requirements will help to solve…

En mi opinión, los nuevos requisitos de liquidez contribuirán a…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 15: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

In my opinion, the new liquidity requirements will help to solve…

En mi opinión, los nuevos requisitos de liquidez contribuirán a…

In my opinion, it is really a very good thing that Mr Potočnik…

En mi opinión, que el señor Potočnik haya sido tan inequívoco…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 16: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

In my opinion, the new liquidity requirements will help to solve…

En mi opinión, los nuevos requisitos de liquidez contribuirán a…

In my opinion, it is really a very good thing that Mr Potočnik…

En mi opinión, que el señor Potočnik haya sido tan inequívoco…

In my opinion, this social dimension should, in fact, be brought into…

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 17: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

In my opinion, the new liquidity requirements will help to solve…

En mi opinión, los nuevos requisitos de liquidez contribuirán a…

In my opinion, it is really a very good thing that Mr Potočnik…

En mi opinión, que el señor Potočnik haya sido tan inequívoco…

In my opinion, this social dimension should, in fact, be brought into…

???

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 18: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

In my opinion, this exposes a serious inconsistency in industrial… En mi opinión, esto representa una grave inconsistencia entre…

NMT trained on 27.7M parallel sentence pairs

In my opinion the issues mentioned previously in the Council's…

En mi opinión, estos aspectos han sido tenidos debidamente…

In my opinion, it is clearly correct and natural to discuss issues…

En mi opinión es perfectamente correcto y natural debatir los…In my opinion, however, we could also achieve adequate flood…

En mi opinión también podríamos conseguir una protección…

In my opinion, it is a duty of the Presidency of the European…

A mi juicio, es responsabilidad de la Presidencia de la Unión…

In my opinion, however, this report does not have acceptable…En mi opinión, no obstante, el presente informe no ofrece…

In my opinion, the new liquidity requirements will help to solve…

En mi opinión, los nuevos requisitos de liquidez contribuirán a…

In my opinion, it is really a very good thing that Mr Potočnik…

En mi opinión, que el señor Potočnik haya sido tan inequívoco…

In my opinion, this social dimension should, in fact, be brought into…

En mi opinion, …

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

Page 19: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

Page 20: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

Page 21: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!

Page 22: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient

Page 23: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Page 24: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Page 25: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Page 26: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Can we train MT systems with zero

parallel data?

Page 27: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Can we train MT systems with zero

parallel data?

Scientific & practical interest!

Page 28: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Can we train MT systems with zero

parallel data?

Scientific & practical interest!

Previously explored in statistical decipherment(Knight et al., ACL’06; Ravi & Knight, ACL’11; Dou et al., ACL’15; inter alia)

Page 29: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Can we train MT systems with zero

parallel data?

Scientific & practical interest!

Previously explored in statistical decipherment(Knight et al., ACL’06; Ravi & Knight, ACL’11; Dou et al., ACL’15; inter alia)

…but strong limitations…

Page 30: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Introduction

System ScoreFacebook FAIR 0.347Microsoft sent-doc 0.311Microsoft doc-level 0.296HUMAN 0.240MSRA-MADL 0.214UCAM 0.213⋮ ⋮

super-human performance

human evaluators preferred MT to

human translation!

WMT 2019 English-German

NMT trained on 27.7M parallel sentence pairs

We would need 48 years to read that!- Not sample efficient- Not available for most language pairs

Can we train MT systems with zero

parallel data?

Scientific & practical interest!

Previously explored in statistical decipherment(Knight et al., ACL’06; Ravi & Knight, ACL’11; Dou et al., ACL’15; inter alia)

…but strong limitations…

IMPRESSIVE PROGRESS IN THE LAST 2-3 YEARS!

Page 31: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Outline

Page 32: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus

Outline

Page 33: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus

Outline

L2 corpus

Page 34: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus

non-parallel

Outline

L2 corpus

Page 35: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus

non-parallel

Outline

L2 corpus

Page 36: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus

non-parallel

Outline

L2 corpus

magic

Page 37: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus

non-parallel

Outline

L2 corpus

magic

Page 38: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

Outline

L2 corpus

magic

Page 39: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

Outline

L2 corpus

Page 40: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus

non-parallel

Outline

L2 corpus

Machine Translation

Page 41: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus

non-parallel

OutlineL1 embeddings

L2 corpus

L2 embeddings

Machine Translation

Page 42: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus

non-parallel

OutlineL1 embeddings

L2 corpus

L2 embeddings

Machine Translation

Page 43: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Machine Translation

Page 44: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Page 45: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Page 46: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Word embeddings

Page 47: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Word embeddingsDistributed representations of words

Page 48: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddingsDistributed representations of words

Page 49: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddingsDistributed representations of words

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Page 50: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddingsDistributed representations of words

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Page 51: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddingsDistributed representations of words Learned from co-occurrence patterns

in a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Page 52: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Page 53: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Page 54: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Page 55: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

𝑤I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Page 56: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

𝑤I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Page 57: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

𝑤I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

𝑐

Page 58: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

𝑤I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

𝑐

Page 59: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

Meow

Bark

Dog

Apple

House

cat

Cow

Banana

Pear

Calendar

Word embeddings

e.g. skip-gram with negative sampling(Mikolov et al., NIPS’13)

𝑤I will go to New York by plane .

log 𝜎 𝑤 ⋅ 𝑐 +4&'(

)𝔼*!~," log 𝜎 −𝑤 ⋅ 𝑐-

𝑐

Distributed representations of words Learned from co-occurrence patternsin a monolingual corpus

sim 𝑐𝑜𝑤, 𝑐𝑎𝑡 ≈ cos 𝑤!"# , 𝑤!$% =𝑤!"# ⋅ 𝑤!$%𝑤!"# 𝑤!$%

Page 60: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Page 61: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Page 62: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignment

Page 63: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑋

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

Cross-lingual word embedding alignment

Page 64: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑋 𝑍

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

Cross-lingual word embedding alignment

Page 65: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑋 𝑍

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

BilboBaionaIruñea

BilbaoBayonaPamplona

Cross-lingual word embedding alignment

Page 66: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

BilboBaionaIruñea

BilbaoBayonaPamplona

Cross-lingual word embedding alignment

Page 67: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

BilboBaionaIruñea

BilbaoBayonaPamplona

𝑊

𝑋 𝑍 𝑋𝑊

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

Donostia

Bilbo

Gasteiz

Iruñea

Maule

D. Garazi

Baiona

Cross-lingual word embedding alignment

Page 68: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

BilboBaionaIruñea

BilbaoBayonaPamplona

𝑊

𝑋 𝑍 𝑋𝑊

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

Donostia

Bilbo

Gasteiz

Iruñea

Maule

D. Garazi

Baiona

Cross-lingual word embedding alignment

Page 69: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

BilboBaionaIruñea

BilbaoBayonaPamplona

𝑊

𝑋 𝑍 𝑋𝑊

Donostia

Bilbo

Gasteiz

Iruñea

MauleD. Garazi

Baiona

S. SebastiánBilbao

Vitoria

Pamplona

Mauléon

S. J. Pie

de Puerto

Bayona

Donostia

Bilbo

Gasteiz

Iruñea

Maule

D. Garazi

Baiona

Cross-lingual word embedding alignment

Page 70: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignment

Page 71: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Page 72: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋

Basque

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Page 73: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋 𝑍

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Page 74: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑋 𝑍

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Page 75: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Page 76: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Page 77: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Page 78: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊House

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Page 79: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Page 80: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Page 81: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Page 82: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋:,∗𝑋<,∗⋮

𝑋=,∗

𝑊 ≈

𝑍:,∗𝑍<,∗⋮𝑍=,∗

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Basque English

Cross-lingual word embedding alignment

Mikolov et al. (arXiv’13)

Page 83: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋:,∗𝑋<,∗⋮

𝑋=,∗

𝑊 ≈

𝑍:,∗𝑍<,∗⋮𝑍=,∗

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

Mikolov et al. (arXiv’13)Basque English

Cross-lingual word embedding alignment

Page 84: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋:,∗𝑋<,∗⋮

𝑋=,∗

𝑊 ≈

𝑍:,∗𝑍<,∗⋮𝑍=,∗

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

arg min!

'"

𝑋"∗𝑊 − 𝑍$∗%

Basque English

Cross-lingual word embedding alignment

Page 85: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Zaunka

Miau

Behi

Marru

Txakur

Katu

Banana

Egutegi

Etxe

SagarUdare

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋:,∗𝑋<,∗⋮

𝑋=,∗

𝑊 ≈

𝑍:,∗𝑍<,∗⋮𝑍=,∗

TxakurSagar⋮

Egutegi

DogApple⋮

Calendar

𝑊

𝑋 𝑍 𝑋𝑊

arg min!

'"

𝑋"∗𝑊 − 𝑍$∗%

Basque English

Cross-lingual word embedding alignment

Page 86: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Zaunka

Miau

Behi

Marru

TxakurKatu

Banana

EgutegiEtxe

Sagar

Udare

𝑋 𝑍

Moo

MeowBark

Dog

Apple

House

cat

CowBanana

Pear Calendar

Basque English

Cross-lingual word embedding alignment

Page 87: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑋 𝑍

Cross-lingual word embedding alignment

Page 88: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍

Cross-lingual word embedding alignment

Page 89: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 90: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 91: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 92: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 93: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 94: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 95: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 96: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 97: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 98: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 99: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 100: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 101: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 102: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

Cross-lingual word embedding alignment

Page 103: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊

𝑋 𝑍 𝑋𝑊

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignment

Page 104: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignment

Page 105: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Page 106: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Page 107: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Page 108: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Page 109: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Page 110: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Page 111: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

Cross-lingual word embedding alignmentSelf-learning

Page 112: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs

Cross-lingual word embedding alignmentSelf-learning

Page 113: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs

Cross-lingual word embedding alignmentSelf-learning

Page 114: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

Cross-lingual word embedding alignmentSelf-learning

Page 115: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

Cross-lingual word embedding alignmentSelf-learning

Page 116: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Page 117: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Page 118: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Page 119: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Page 120: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Page 121: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Page 122: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Page 123: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

𝑊∗ = arg min/∈1(3)

4&

min5

𝑋&∗𝑊 − 𝑍5∗6

- 25 word pairs- Numeral list

- Random dict.

Cross-lingual word embedding alignmentSelf-learning

Page 124: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Page 125: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

English

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Page 126: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

two

English

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Page 127: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

two

English

for x in vocab:

sim(“two”, x)

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Page 128: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

two

English

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Page 129: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

two

ItalianEnglish

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Page 130: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

two due (two)

ItalianEnglish

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Page 131: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

two due (two)

ItalianEnglish

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Page 132: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

two due (two) cane (dog)

ItalianEnglish

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Page 133: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

two due (two) cane (dog)

ItalianEnglish

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Page 134: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

two due (two) cane (dog)

ItalianEnglish

Cross-lingual word embedding alignment

Intra-lingual similarity distribution

Page 135: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

Cross-lingual word embedding alignment

Page 136: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

Cross-lingual word embedding alignment

Page 137: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8

Cross-lingual word embedding alignment

Page 138: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

Page 139: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+

Page 140: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+Robust self-learning

Page 141: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+Robust self-learning

- Stochastic dictionary induction

Page 142: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+Robust self-learning

- Stochastic dictionary induction- Frequency-based vocabulary cutoff

Page 143: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+Robust self-learning

- Stochastic dictionary induction- Frequency-based vocabulary cutoff- CSLS retrieval (Conneau et al., ICLR’18)

Page 144: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+Robust self-learning

- Stochastic dictionary induction- Frequency-based vocabulary cutoff- CSLS retrieval (Conneau et al., ICLR’18)- Bidirectional dictionary induction

Page 145: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Mapping

Dictionary

Dictionary

Intra-lingual similarity distributiontwo due (two) cane (dog)

ItalianEnglish

𝑋7 = sorted 𝑋𝑋8 𝑍7 = sorted 𝑍𝑍8

Cross-lingual word embedding alignment

+Robust self-learning

- Stochastic dictionary induction- Frequency-based vocabulary cutoff- CSLS retrieval (Conneau et al., ICLR’18)- Bidirectional dictionary induction- Final symmetric re-weighting

Page 146: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignment

Page 147: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignmentSupervision Method

Page 148: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignmentSupervision Method

5k dict.

Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)

Page 149: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignmentSupervision Method

5k dict.

Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)

25 dict. Artetxe et al. (2017)

Page 150: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignmentSupervision Method

5k dict.

Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)

25 dict. Artetxe et al. (2017)Init.

heurist.Smith et al. (2017), cognatesArtetxe et al. (2017), num.

Page 151: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignmentSupervision Method

5k dict.

Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)

25 dict. Artetxe et al. (2017)Init.

heurist.Smith et al. (2017), cognatesArtetxe et al. (2017), num.

None

Zhang et al. (2017), 𝜆 = 1Zhang et al. (2017), 𝜆 = 10Conneau et al. (2018), code‡

Conneau et al. (2018), paper‡

Artetxe et al. (2018b)

Page 152: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES

5k dict.

Mikolov et al. (2013)Faruqui and Dyer (2014)Shigeto et al. (2015)Dinu et al. (2015)Lazaridou et al. (2015)Xing et al. (2015)Zhang et al. (2016)Artetxe et al. (2016)Smith et al. (2017)Artetxe et al. (2018a)

25 dict. Artetxe et al. (2017)Init.

heurist.Smith et al. (2017), cognatesArtetxe et al. (2017), num.

None

Zhang et al. (2017), 𝜆 = 1Zhang et al. (2017), 𝜆 = 10Conneau et al. (2018), code‡

Conneau et al. (2018), paper‡

Artetxe et al. (2018b)

Page 153: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES

5k dict.

Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†

Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*

Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†

Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*

Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†

Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†

Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*

Smith et al. (2017) 43.1 43.33† 29.42† 35.13†

Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -

Init.heurist.

Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -

None

Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*

Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*

Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*

Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*

Artetxe et al. (2018b) 48.13 48.19 32.63 37.33

Page 154: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES

5k dict.

Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†

Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*

Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†

Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*

Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†

Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†

Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*

Smith et al. (2017) 43.1 43.33† 29.42† 35.13†

Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -

Init.heurist.

Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -

None

Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*

Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*

Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*

Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*

Artetxe et al. (2018b) 48.13 48.19 32.63 37.33

Page 155: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES

5k dict.

Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†

Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*

Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†

Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*

Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†

Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†

Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*

Smith et al. (2017) 43.1 43.33† 29.42† 35.13†

Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -

Init.heurist.

Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -

None

Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*

Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*

Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*

Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*

Artetxe et al. (2018b) 48.13 48.19 32.63 37.33

Page 156: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Cross-lingual word embedding alignmentSupervision Method EN-IT EN-DE EN-FI EN-ES

5k dict.

Mikolov et al. (2013) 34.93† 35.00† 25.91† 27.73†

Faruqui and Dyer (2014) 38.40* 37.13* 27.60* 26.80*

Shigeto et al. (2015) 41.53† 43.07† 31.04† 33.73†

Dinu et al. (2015) 37.7 38.93* 29.14* 30.40*

Lazaridou et al. (2015) 40.2 - - -Xing et al. (2015) 36.87† 41.27† 28.23† 31.20†

Zhang et al. (2016) 36.73† 40.80† 28.16† 31.07†

Artetxe et al. (2016) 39.27 41.87* 30.62* 31.40*

Smith et al. (2017) 43.1 43.33† 29.42† 35.13†

Artetxe et al. (2018a) 45.27 44.13 32.94 36.6025 dict. Artetxe et al. (2017) 37.27 39.60 28.16 -

Init.heurist.

Smith et al. (2017), cognates 39.9 - - -Artetxe et al. (2017), num. 39.40 40.27 26.47 -

None

Zhang et al. (2017), 𝜆 = 1 0.00* 0.00* 0.00* 0.00*

Zhang et al. (2017), 𝜆 = 10 0.00* 0.00* 0.01* 0.01*

Conneau et al. (2018), code‡ 45.15* 46.83* 0.38* 35.38*

Conneau et al. (2018), paper‡ 45.1 0.01* 0.01* 35.44*

Artetxe et al. (2018b) 48.13 48.19 32.63 37.33

Page 157: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Page 158: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Page 159: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

- Neural approach

Page 160: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

- Neural approach

- Statistical approach

Page 161: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

- Neural approach

- Statistical approach

Page 162: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised neural machine translation

Page 163: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised neural machine translation

Page 164: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised neural machine translation

Page 165: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised neural machine translation

Page 166: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Training

Unsupervised neural machine translation

Page 167: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Training

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

Page 168: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

Page 169: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles. There was a

shooting in Los Angeles International Airport.

Page 170: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles. There was a

shooting in Los Angeles International Airport.

Page 171: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles. There was a

shooting in Los Angeles International Airport.

Page 172: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

Page 173: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

Page 174: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervised

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

Page 175: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervisedDenoising

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

Page 176: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervisedDenoising

Unsupervised neural machine translation

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

Page 177: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervisedDenoising

Unsupervised neural machine translation

Une lieu fusillade a eu à l’aéroport de Los international Angeles.

Page 178: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervisedDenoisingBacktranslation

Unsupervised neural machine translation

Une lieu fusillade a eu à l’aéroport de Los international Angeles.

Page 179: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

TrainingSupervisedDenoisingBacktranslation

Unsupervised neural machine translation

Une lieu fusillade a eu à l’aéroport de Los international Angeles.

Page 180: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

A shooting has had place in airport international of Los Angeles.

TrainingSupervisedDenoisingBacktranslation

Unsupervised neural machine translation

Une lieu fusillade a eu à l’aéroport de Los international Angeles.

Page 181: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Une fusillade a eu lieu à l’aéroportinternational de Los Angeles.

A shooting has had place in airport international of Los Angeles.

TrainingSupervisedDenoisingBacktranslation

Unsupervised neural machine translation

Une lieu fusillade a eu à l’aéroport de Los international Angeles.

Page 182: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

- Neural approach

- Statistical approach

Page 183: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

OutlineL1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

- Neural approach

- Statistical approach

Page 184: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Page 185: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Phrase-based SMT

Page 186: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Phrase-based SMTLog-linear model combining

Page 187: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Phrase-based SMTLog-linear model combining

- Phrase table

Page 188: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Phrase-based SMTLog-linear model combining

- Phrase table

nire iritziz in my opinion

nire iritziz in my view

nire iritziz I think

opari bat a present

opari bat one present

opari bat a gift

Page 189: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities

𝝓(I𝒇|I𝒆) 𝝓(I𝒆|I𝒇)nire iritziz in my opinion 0.54 0.63

nire iritziz in my view 0.32 0.68

nire iritziz I think 0.11 0.09

opari bat a present 0.32 0.56

opari bat one present 0.14 0.73

opari bat a gift 0.11 0.49

Page 190: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

𝝓(I𝒇|I𝒆) 𝝓(I𝒆|I𝒇) 𝐥𝐞𝐱(I𝒇|I𝒆) 𝐥𝐞𝐱(I𝒆|I𝒇)nire iritziz in my opinion 0.54 0.63 0.12 0.15

nire iritziz in my view 0.32 0.68 0.09 0.16

nire iritziz I think 0.11 0.09 0.04 0.02

opari bat a present 0.32 0.56 0.21 0.22

opari bat one present 0.14 0.73 0.18 0.32

opari bat a gift 0.11 0.49 0.11 0.13

Page 191: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Language model

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model

Page 192: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Phrasetable

Language model

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

Page 193: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model

Page 194: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)

Page 195: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

Page 196: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty

Page 197: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Phrase-based SMTLog-linear model combining

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

Page 198: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

Page 199: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

Page 200: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

Page 201: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

Page 202: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

Page 203: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

Page 204: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

Page 205: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMT

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

Page 206: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

Page 207: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty- Fixed score to control the length of the output

Page 208: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Page 209: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Page 210: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Reordering model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Page 211: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Page 212: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Page 213: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L1 corpus L2 corpus

learn each component

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Page 214: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

learn each component

Page 215: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

learn each component

Page 216: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

learn each component

Page 217: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

learn each component

Page 218: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Page 219: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 corpus L2 corpus

Page 220: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

L1 corpus L2 corpus

Page 221: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

L1 corpus L2 corpus

Page 222: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Page 223: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Page 224: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Page 225: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

I will go to New York by plane .

Page 226: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝑤I will go to New York by plane .

Page 227: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝑤I will go to New York by plane .

Page 228: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝑤I will go to New York by plane .

𝑐

Page 229: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝑤I will go to New York by plane .

𝑐

Page 230: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝑤I will go to New York by plane .

𝑐

Page 231: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

I will go to New York by plane .𝑐𝑝

Page 232: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

I will go to New York by plane .𝑝 𝑐

Page 233: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

I will go to New York by plane .𝑐𝑝

Page 234: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Page 235: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

For each �̅�, estimate 𝜙 ̅𝑓|�̅� for 100-NN:

Page 236: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝜙 ̅𝑓|�̅� =𝑒 ⁄"#$ &̅,(̅ )

∑(* 𝑒⁄"#$ &̅,(* )

For each �̅�, estimate 𝜙 ̅𝑓|�̅� for 100-NN:

Page 237: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

𝜙 ̅𝑓|�̅� =𝑒 ⁄"#$ &̅,(̅ )

∑(* 𝑒⁄"#$ &̅,(* )

For each �̅�, estimate 𝜙 ̅𝑓|�̅� for 100-NN:

min!$̅#

log𝜙 ̅𝑓|NN ̅$ ̅𝑓 +$̅$

log𝜙 �̅�|NN ̅# �̅�

Page 238: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Page 239: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY…- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Page 240: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTLearn components from monolingual corpora

- Phrase table TRICKY… EASY!!!- Direct/inverse translation probabilities- Direct/inverse lexical weightings

- Language model EASY!!!- N-gram frequency counts with back-off and smoothing

- Reordering model EASY!!!- Distortion model (distance based)- Lexical reordering model

- Word/phrase penalty EASY!!!- Fixed score to control the length of the output

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Page 241: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMT

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus L2 corpus

Page 242: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning

Page 243: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

Phrasetable

tuning

Page 244: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Phrasetable

tuning

Page 245: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:

Phrasetable

tuning

Page 246: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)

Phrasetable

tuning

Page 247: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)

Phrasetable

tuning

Page 248: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)

Phrasetable

tuning

Page 249: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)

Phrasetable

tuning

𝐿 = 𝐿!9!:; 𝐸 + 𝐿!9!:; 𝐹 + 𝐿:< 𝐸 + 𝐿:<(𝐹)

Page 250: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)

Phrasetable

tuning

𝐿 = 𝐿!9!:; 𝐸 + 𝐿!9!:; 𝐹 + 𝐿:< 𝐸 + 𝐿:<(𝐹)

• 𝐿!9!:; 𝐸 = 1 − BLEU T=→? T?→= 𝐸 , 𝐸

Page 251: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Goal: Adjust the weights of the resulting log-linearmodel to optimize some evaluation metric (e.g. BLEU)

… but we don’t have a parallel development set!

Solutions:- Default weights without tuning (Lample et al., EMNLP’18)- Alternating optimization with back-translation (Artetxe et al., EMNLP’18)- Unsupervised optimization criterion (Artetxe et al., ACL’19)

Phrasetable

tuning

𝐿 = 𝐿!9!:; 𝐸 + 𝐿!9!:; 𝐹 + 𝐿:< 𝐸 + 𝐿:<(𝐹)

LP = LP 𝐸 ⋅ LP 𝐹 , LP 𝐸 = max 1,&'( )%→' )'→% *

&'( *

• 𝐿!9!:; 𝐸 = 1 − BLEU T=→? T?→= 𝐸 , 𝐸

• 𝐿:< 𝐸 = max 0, H 𝐹 − H T?→= 𝐸6⋅ LP

Page 252: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpusUnsupervised phrase-based SMT

Page 253: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

phra

se

tabl

e

Page 254: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train

phra

se

tabl

e

Page 255: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 → L2 SMTL1 train

phra

se

tabl

e

Page 256: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train

phra

se

tabl

e

L1 → L2 SMT

Page 257: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e

L1 → L2 SMT

Page 258: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e

L1 → L2 SMT

Page 259: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

eL2 → L1 SMT

L1 → L2 SMT

Page 260: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e Let’s look inside!

L1 → L2 SMT

L2 → L1 SMT

Page 261: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

eph

rase

ta

ble

L1 → L2 SMT

Page 262: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L1 → L2 SMT

Page 263: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

L1 → L2 SMT

Page 264: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e

Spurious targets!

𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

L1 → L2 SMT

Page 265: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L1 → L2 SMT

Page 266: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L2 → L1 SMT

L1 → L2 SMT

Page 267: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L2 train

L1 → L2 SMT

L2 → L1 SMT

Page 268: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L2 train

L1 → L2 SMT

L2 → L1 SMT

Page 269: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L2 train L1’ train

L1 → L2 SMT

L2 → L1 SMT

Page 270: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

phra

se

tabl

e

L2 train L1’ train

L1 → L2 SMT

L2 → L1 SMT

Page 271: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

L2 train L1’ train

L1 → L2 SMT

L2 → L1 SMT

Page 272: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

L2 train L1’ train

L1 → L2 SMT

L2 → L1 SMT

Page 273: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

lexi

cal

reor

derin

g

L2 train L1’ train

L1 → L2 SMT

L2 → L1 SMT

Page 274: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

lexi

cal

reor

derin

g

L2 train L1’ train

L1 → L2 SMT

L2 → L1 SMT

Page 275: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Unsupervised phrase-based SMTL1 corpus

Phrase-based SMT

Word/phrase penalty

Distortion model

Language model

L2 n-gram embeddings

L1 n-gram embeddings

L2 corpus

Cross-lingual mapping

Phrasetable

tuning refine

L1 train L2’ train

phra

se

tabl

e 𝒑 ]𝒍𝟏| ]𝒍𝟐

𝒑 ]𝒍𝟐| ]𝒍𝟏phra

se

tabl

e

lexi

cal

reor

derin

g

L2 train L1’ train

unsup. tuning

unsup. tuning

L1 → L2 SMT

L2 → L1 SMT

Page 276: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

Page 277: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

but…

Page 278: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

but…

NMT >> SMT

Page 279: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

but…

NMT >> SMT

(unsupervised) SMT has a hard ceiling!

Page 280: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

Page 281: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

Page 282: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

Page 283: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train

L2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

Page 284: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 → L2 SMTL1 train

L2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

Page 285: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train

L2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

Page 286: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train L2’ train

L2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

Page 287: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train L2’ train

L2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

Page 288: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train L2’ train

L2 → L1 NMTL2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

Page 289: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train L2’ train

L2 trainL2 → L1 SMT

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

Page 290: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train L2’ train

L2 train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

Page 291: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train L2’ train

L2 train L1’ train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

Page 292: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train L2’ train

L2 train L1’ train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

Page 293: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train L2’ trainL1 → L2 NMT

L2 train L1’ train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

Page 294: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train L2’ train

L2 train L1’ train

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

L1 → L2 NMT

Page 295: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train L2’ train

L2 train L1’ train

𝑁BC8 = 𝑁 ⋅ max(0, 1 − 𝑡/𝑎)

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

L1 → L2 NMT

Page 296: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

NMT hybridization

L1 train L2’ train

L2 train L1’ train

𝑁BC8 = 𝑁 ⋅ max(0, 1 − 𝑡/𝑎)𝑁-C8 = 𝑁 − 𝑁BC8

L2 n-gram embeddings

Phrase-based SMT

Distortion model

Phrasetable

Language model

Word/phrase penalty

L1 n-gram embeddings

Cross-lingual mapping

L1 corpus

tuning

L2 corpus

refine

NMT

L1 → L2 SMT

L2 → L1 NMT

L2 → L1 SMT

L1 → L2 NMT

Page 297: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Outline

- Neural approach

- Statistical approach

Page 298: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Outline

- Neural approach

- Statistical approach

Page 299: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

- Neural approach

- Statistical approach

Page 300: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization

- Neural approach

- Statistical approach

Page 301: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Neural approach

- Statistical approach

Page 302: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Denoising

- Neural approach

- Statistical approach

Page 303: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Page 304: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Page 305: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18) - Denoising

- Back-translation

- Neural approach

- Statistical approach

Page 306: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Page 307: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Page 308: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Works well, but very expensive to train

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Page 309: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Works well, but very expensive to train

- Masking can be seen as a form of noise!

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Page 310: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Works well, but very expensive to train

- Masking can be seen as a form of noise!

- Synthetic parallel corpus (Marie & Fujita, arXiv’18; Artetxe et al., ACL’19)

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Page 311: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Works well, but very expensive to train

- Masking can be seen as a form of noise!

- Synthetic parallel corpus (Marie & Fujita, arXiv’18; Artetxe et al., ACL’19)

- Basic: word-for-word translation using cross-lingual word embeddings

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Page 312: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Machine Translation

L1 corpus

non-parallel

General recipe

L1 embeddings

Cross-lingual embeddingsL2 corpus

L2 embeddings

Initialization Training

- Cross-lingual word embeddings (Artetxe et al., ICLR’18; Lample et al., ICLR’18)

- Deep pretraining (Conneau & Lample, NeurIPS’19; Song et al., ICML’19; Liu et al., arXiv’20)

- Pretrain the full network through masked language modeling(e.g., multilingual BERT, XLM, MASS, mBART)

- Works well, but very expensive to train

- Masking can be seen as a form of noise!

- Synthetic parallel corpus (Marie & Fujita, arXiv’18; Artetxe et al., ACL’19)

- Basic: word-for-word translation using cross-lingual word embeddings

- Better: Unsupervised statistical machine translation

- Denoising

- Back-translation

- Neural approach

- Statistical approach

Page 313: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

Page 314: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

- Languages: French-English, German-English

Page 315: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

Page 316: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

Page 317: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

FR-EN EN-FR DE-EN EN-DE

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

Page 318: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Page 319: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6

+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Page 320: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6

+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Page 321: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6

+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Page 322: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

FR-EN EN-FR DE-EN EN-DECross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6

+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

detok. SacreBLEU 33.2 33.6 26.4 21.2

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Page 323: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

FR-EN EN-FR DE-EN EN-DE

Unsup.

Cross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -

Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

detok. SacreBLEU 33.2 33.6 26.4 21.2

Supervised

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Page 324: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

FR-EN EN-FR DE-EN EN-DE

Unsup.

Cross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -

Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

detok. SacreBLEU 33.2 33.6 26.4 21.2

SupervisedWMT winner 35.0 35.8 29.0 20.6

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Page 325: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

FR-EN EN-FR DE-EN EN-DE

Unsup.

Cross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -

Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

detok. SacreBLEU 33.2 33.6 26.4 21.2

SupervisedWMT winner 35.0 35.8 29.0 20.6Original transformer (Vaswani et al., NIPS’17)* - 41.0 - 28.4

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Page 326: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Results

FR-EN EN-FR DE-EN EN-DE

Unsup.

Cross-lingual embs. (Artetxe et al., ICLR’18)* 15.6 15.1 10.2 6.6+ scaling up (Conneau & Lample, NeurIPS’19)* 29.4 29.4 - -

Deep pre-training (Conneau & Lample, NeurIPS’19)* 33.3 33.4 - -Unsup SMT + NMT (Artetxe et al., ACL’19)* 33.5 36.2 27.0 22.5

detok. SacreBLEU 33.2 33.6 26.4 21.2

SupervisedWMT winner 35.0 35.8 29.0 20.6Original transformer (Vaswani et al., NIPS’17)* - 41.0 - 28.4Large scale back-translation (Edunov et al., EMNLP’18) - 43.8 - 33.8

- Languages: French-English, German-English

- Training: WMT-14 News Crawl

- Test set: WMT-14 newstest (BLEU)

*Tokenized BLEU (about 1-2 points higher)

Page 327: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Conclusions

Page 328: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

ConclusionsUnsupervised machine translation is a reality!

Page 329: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization

Page 330: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization- Align monolingual representations

Page 331: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation

Page 332: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation

Strong results

Page 333: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation

Strong results- Competitive with supervised SOTA from 6 years ago

Page 334: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation

Strong results- Competitive with supervised SOTA from 6 years ago- …in just 2-3 years!

Page 335: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

ConclusionsUnsupervised machine translation is a reality!

- General recipe: back-translation + smart initialization- Align monolingual representations- Generalize from word level to sentence level translation

Strong results- Competitive with supervised SOTA from 6 years ago- …in just 2-3 years!

What’s next?

Page 336: Unsupervised Machine Translation - Stanford UniversityIntroduction WMT 2019 English-German System Score Facebook FAIR 0.347 Microsoft sent-doc 0.311 Microsoftdoc-level 0.296 HUMAN

Thank you!

Twitter: @artetxemEmail: [email protected]