how can artificial intelligence use big data for ... · opennmt –a free neural machine...

How can Artificial

Intelligence use Big

Data for Translating

Documents?

John Ortega ∙ November 27, 2019 ∙ Vilnius, Lithuania

Who I am and what I do?

Machine TranslationThere is no need to do more than mention the

obvious fact that a multiplicity of language impedes

cultural interchange between the peoples of the

earth, and is a serious deterrent to international

understanding. -- Warren Weaver

State-of-the-Art

Two Paradigms

Basic Introdution

Evolution and Motivation

Winding Down

19402019beginning to present

OverviewWhat I will cover today

Statistical and Neural-Based

Systems with Big Data

Rule-Based MT Systems

The challenge and first systems.

Example-Based MT Systems

2019Quality to Achieve

Human-Like

Translations

1940A Challenge for a

Basic Need

Source Sentence MT System Translation

The dog barks at nightSource Sentence

Translation El perro ladra por la noche

The MT system is a black box generally but we need to now what it does.

Machine TranslationInput and Output

MT

EBMT

Analogy

RBMT

Direct

Transfer

Interlingual

SMT

Word

Phrase

NMT

CNN

RNN

Machine TranslationBreaking down the paradigms

High Use

RBMT

20%60%

10%

Rule-Based

Hybrid Machine Translation

RBMT with NMT or SMT

60%

https://www.psmarketresearch.com/market-analysis/machine-translation-market

Medium

Use

SMT

Low UseNMT

Machine TranslationSector use in production is mixed

https://www.psmarketresearch.com/market-analysis/machine-translation-market

PunctuationLong

Sentences

High

Quality

Fast

Training

Example-

Based

Rule-Based

Statistical

Neural

Machine TranslationContrasting strengths of each machine translation system

Read Source Language

Input and Add Rule

System Digests Rules

and Includes in

Catalog

Apply Rule To Target

Language and Verify

System Applies Hierachy

According to Language

Morphology

System Provides Rule-

Based Translation

Rule-Based Machine TranslationAn editing process

Rule-Based Machine TranslationUsing hand-tailored rules

Traceable

Customization

High Quality

Repeatable

Statistical-Based Machine TranslationUsing various statistical methods to choose the best translation

BLEU is the scoring mechanism for translation

quality!

12%

Statistical MT

Based on

Phrases

Moses53%

30.3

31

33

32.3

34.7

33.8

31.5

31.3

33.9

27.7

0 5 10 15 20 25 30 35 40

SMT

NMT

40 50 60 70 80

More than

6 points of

difference

Statistical Machine TranslationComparison to Neural Machine Translation

BLEU scores for Average Sentence Lengths

Statistical Machine TranslationWell-known research paper by Koehn and Knowles on six challenges for machine

translation

Slow Training

Highly Complex

High Quality

Less

Error

Neural Machine TranslationThe new kid on the block

Phrase-Based MT

Neural MT generally captures

semantic details well.

Neural Machine Translation

01 Decode

Choose the best hypothesis

02 Encode

Formulate a hypothesisEncoderDecoderModel

• Sentence-by-Sentence

• Embeddings on the Source are

Encoded

• Classifications are the Target

language

Neural Machine TranslationAn encode-decode model machine translation system

• Free out-of-the box system

• Pre-trained word embeddings

• RNN

• Guillaume Klein et. al in 2017 from

Harvard and Systran

• Predecessor – Nematus

Neural Machine TranslationOpenNMT – a free neural machine translation system

OpenNMT: Neural Machine Translation Toolkit – Klein et.

al

Neural Machine TranslationOpenNMT – a free neural machine translation system

A comparison was done with Knowles on various MT paradigms

and testing on fuzzy-match repair.

F u z z y - M a t c h R e p a i r Te s t

Neural Machine TranslationTesting neural machine translation performance

Neural Machine TranslationThe best non-hybrid machine translation system at this moment

Data that is typically not easily computed using conventional personal computer

mechanisms.

Books20 MB

Chat Data

News

Articles

5 MB

10 MB

Big DataTraining our system by transferring knowledge into it

CPU GPU TPU

• Billions of words

• Tokenization and Cleansing in hours

• Tagging in hours or a day

• Modeling can take weeks or months

Big DataHigh-performance processing of data

• Multiple sources and languages

• Transfer from other models

• Performance-based

Knowledge

• Pre-calculated models (Embeddings)

• Only time to find index (hash lookup)

• Vector Space Modeling

Computations

Big DataTransfer knowledge from one source to another

27

Pre-trained

Based on

Words

Represent

Billions of

Words

Knowledge

Transfer

Trained with

Neural

Networks

No Guarantees

Semantic Knowledge Transfer

Word EmbeddingsCapture knowledge to transfer from one source to another

• Input to a model (linear or not)

• Use vectors from previous run

• Provide similarity not easy to

measure

• Easy-to-load

Word EmbeddingsBuilding an input classifier from learned knowledge

By embedding learned knowledge we are able to capture semantic

context for translating also.

Tokenize / Cleanse

• A classification task using a Neural

Network

• Bag-of-words or Skip Gram

Training

Saving

• Stem words or lemmatize

• Remove stop words

• Freeze model after classification

• Indexes saved to disk for new words

Word EmbeddingsSteps for creating our own word embeddings

Word2VecSkip-Gram

or

CBOW

GloveLog-bilinear

probability

representatio

n

FastTextWord2Vec

extension

Many word embeddings are freely downloadable for almost any project!

The main idea is that an algorithm builds a co-occurrence matrix.

Word EmbeddingsTypes of word embeddings

Word2Vec

https://code.google.com/archive/p/

word2vec/

local vocab_size = 50004 local embedding_size = 500 local

embeddings = torch.Tensor(vocab_size,

embedding_size):uniform() torch.save('enc_embeddings.t7',

embeddings)

OpenNMT Torch Loading

Physical Appearance

Training

Decoding or

Classification

• Several convolutions through hidden

layers

• Rectified Linear Unit (RELU) and

Softmax activation

• Embedding vectors used for each

word or words

• Sentence segments contain words from

embeddings

• Classification is done using dense layers

to match desired classes

Word EmbeddingsTrain a model to use word embeddings

Word embeddings serve as

input word vectors to an

RNN for source language

words.

• Sentence by sentence

encoding/decoding

• Context captured and carried forward

• Grammar

• Syntax

• Phrasal rules

Word EmbeddingsAttention-based machine translation using word embeddings

English German

• Performs best on in-domain data

• Little difference between Nematus and

ONMT

In-Domain

Other Languages

• BLEU score measurement improves

• Speed goes down

• Good for low-resource language

• Good for romance languages (ES, FR, PT,

IT)

• Not so good for some language pairs (EN-

-RU)

Neural Machine Translation with Word EmbeddingsComparison of Nematus to OpenNMT with English and German

Reinforcement LearningLearn by probing

Transfer LearningBring in knowledge

Heuristic LearningArtifact knowledge over time

Mixed LearningCombine several types

Artificial IntelligenceCan we make our MT system learn like humans?

The Best Machine Translation SystemWhich one should you use for your project?

Training Time

Language

pairs

Cost

Size on disk

Human

quality

BLEU scores

Easy to

reproduce

Featureless

THANKS!