multi language support for virtual...

34
가상 어시스턴트를위한 다국어 지원 April 2020 Soporte multilenguaje para asistentes virtuales 助手的多言支持 Supporto multilingue per assistenti virtuali ن مجازیرا دستیانه برایبانی چند زبا پشتیPrise en charge multilingue pour les assistants virtuels Suporte em vários idiomas para assistentes virtuais Multi Language Support for Virtual Assistants अल असिटट के सलए मटी लवेज िपोटु 仮想アシスタントの多言語サポート

Upload: others

Post on 02-Oct-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

가상어시스턴트를위한다국어지원

April 2020

Soporte multilenguaje para asistentes virtuales

对虚拟助手的多语言支持

Supporto multilingue per assistenti virtuali

پشتیبانی چند زبانه برای دستیاران مجازی

Prise en charge multilingue pour les assistants virtuels

Suporte em vários idiomas para assistentes virtuais

Multi Language Support for Virtual Assistants

वर्चअुल असिस्टेंट के सलए मल्टी लैंग्वेज िपोटु

仮想アシスタントの多言語サポート

Page 2: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Overview

Page 3: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Overview

• Extending the current capabilities of Almond to other languages in a cost and time efficient manner

• Avoiding template development for each new language

Goals:

Page 4: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Overview

• Extending the current capabilities of Almond to other languages in a cost and time efficient manner

• Avoiding template development for each new language

Goals: Solution:

Data collection strategy:

• Using neural machine translation models to produce translated sentences

• Improving translation quality using domain-dependent rules

Training strategies:

• Joint and sequential training

• Enforcing low variance on encoded outputs on same sentences from different languages

Page 5: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Data Collection method

Page 6: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Data Collection methoddisplay all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Sentence Program

English Dataset

Page 7: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Data Collection methoddisplay all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Sentence Program

English Dataset

Pre-Processing

Page 8: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Sentence Program

English Dataset

Pre-Processing

Page 9: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

muestra todas las descripciones de las reseñas creadas por " Jennifer ".

Sentence Program

English Dataset

Pre-Processing

Page 10: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

muestra todas las descripciones de las reseñas creadas por " Jennifer ".

Post-Processing

Sentence Program

English Dataset

Pre-Processing

Feedback Collection&

Rule Generation

Page 11: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

muestra todas las descripciones de las reseñas creadas por " Jennifer ".

Post-Processing

Sentence Program

English Dataset

Pre-Processing

Feedback Collection&

Rule Generation

- Detokenize punctuation- Replace NUMBER with actual values- Lower case all parameter values…

- Replace verbs with their imperative form- Insert missing prepositions- Replace translated parameter values with real values from target language…

Post-processing rules

Pre-processing rules

Page 12: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

muestra todas las descripciones de las reseñas creadas por " Jennifer ".

Post-Processing

Sentence Program

English Dataset

Pre-Processing

Feedback Collection&

Rule Generation

- Detokenize punctuation- Replace NUMBER with actual values- Lower case all parameter values…

- Replace verbs with their imperative form- Insert missing prepositions- Replace translated parameter values with real values from target language…

Post-processing rules

Pre-processing rules

Page 13: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

muestra todas las descripciones de las reseñas creadas por " Jennifer ".

Post-Processing

Sentence Program

English Dataset

Pre-Processing

Feedback Collection&

Rule Generation

- Detokenize punctuation- Replace NUMBER with actual values- Lower case all parameter values…

- Replace verbs with their imperative form- Insert missing prepositions- Replace translated parameter values with real values from target language…

Parameter MatchingPost-processing rules

Pre-processing rules

Page 14: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

muestra todas las descripciones de las reseñas creadas por " Jennifer ".

Post-Processing

muestra todas las descripciones de las reseñas escritas por juan .

now => [description] of @restaurant.review, author == " juan ") => notify

Sentence Program

Dataset intarget language

English Dataset

Pre-Processing

Feedback Collection&

Rule Generation

- Detokenize punctuation- Replace NUMBER with actual values- Lower case all parameter values…

- Replace verbs with their imperative form- Insert missing prepositions- Replace translated parameter values with real values from target language…

Parameter MatchingPost-processing rules

Pre-processing rules

Page 15: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Page 16: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Page 17: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Decoder

Page 18: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Decoder

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Page 19: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Decoder

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Decoder Loss

Page 20: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Decoder

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Decoder Loss

Page 21: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Decoder

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Decoder Loss

Page 22: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Decoder

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Decoder Loss

We are not using the “knowledge” that these sentences are semantically equivalent

Page 23: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Page 24: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Batching

Page 25: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Batching

Encoder

Page 26: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Batching

Encoder

Decoder

Page 27: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Batching

Encoder Loss

Encoder

Decoder

Page 28: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Batching

Encoder Loss

Encoder

Decoder

Page 29: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Decoder Loss

Batching

Encoder Loss

Encoder

Decoder

Page 30: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Decoder Loss

Batching

Encoder Loss

Encoder

Decoder

We now use both losses to guide the training

Page 31: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Experiment results (Farsi)

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

Translated Verified New Params Test

Exact Match Accuracy

Page 32: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Challenges

Page 33: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Challenges

• Google translate is not perfect

• Identifying Language specific traits (single/ plural, missing prepositions, ...)

• Closing the gap between evaluation accuracy and test (real data) accuracy

• Automating and improving collection of natural parameter values for each language

• ...

Page 34: Multi Language Support for Virtual Assistantsweb.stanford.edu/class/cs294s/slides/project_pitch_multi-language... · Translation Model (e.g. Google Translate) display all review descriptions

Challenges

• Google translate is not perfect

• Identifying Language specific traits (single/ plural, missing prepositions, ...)

• Closing the gap between evaluation accuracy and test (real data) accuracy

• Automating and improving collection of natural parameter values for each language

• ...

Bonus:• Started code is available free of charge!

• 18/6 project technical support

• Optional happy hours to celebrate our results

• Will be featured as a contributor in our EMNLP paper