how can artificial intelligence use big data for ... · opennmt –a free neural machine...
TRANSCRIPT
How can Artificial
Intelligence use Big
Data for Translating
Documents?
John Ortega ∙ November 27, 2019 ∙ Vilnius, Lithuania
Who I am and what I do?
Machine TranslationThere is no need to do more than mention the
obvious fact that a multiplicity of language impedes
cultural interchange between the peoples of the
earth, and is a serious deterrent to international
understanding. -- Warren Weaver
State-of-the-Art
Two Paradigms
Basic Introdution
Evolution and Motivation
Winding Down
19402019beginning to present
OverviewWhat I will cover today
Statistical and Neural-Based
Systems with Big Data
Rule-Based MT Systems
The challenge and first systems.
Example-Based MT Systems
2019Quality to Achieve
Human-Like
Translations
1940A Challenge for a
Basic Need
Source Sentence MT System Translation
The dog barks at nightSource Sentence
Translation El perro ladra por la noche
The MT system is a black box generally but we need to now what it does.
Machine TranslationInput and Output
MT
EBMT
Analogy
RBMT
Direct
Transfer
Interlingual
SMT
Word
Phrase
NMT
CNN
RNN
Machine TranslationBreaking down the paradigms
High Use
RBMT
20%60%
10%
Rule-Based
Hybrid Machine Translation
RBMT with NMT or SMT
60%
https://www.psmarketresearch.com/market-analysis/machine-translation-market
Medium
Use
SMT
Low UseNMT
Machine TranslationSector use in production is mixed
PunctuationLong
Sentences
High
Quality
Fast
Training
Example-
Based
Rule-Based
Statistical
Neural
Machine TranslationContrasting strengths of each machine translation system
Read Source Language
Input and Add Rule
System Digests Rules
and Includes in
Catalog
Apply Rule To Target
Language and Verify
System Applies Hierachy
According to Language
Morphology
System Provides Rule-
Based Translation
Rule-Based Machine TranslationAn editing process
Rule-Based Machine TranslationUsing hand-tailored rules
Traceable
Customization
High Quality
Repeatable
Statistical-Based Machine TranslationUsing various statistical methods to choose the best translation
BLEU is the scoring mechanism for translation
quality!
12%
Statistical MT
Based on
Phrases
Moses53%
30.3
31
33
32.3
34.7
33.8
31.5
31.3
33.9
27.7
0 5 10 15 20 25 30 35 40
SMT
NMT
40 50 60 70 80
More than
6 points of
difference
Statistical Machine TranslationComparison to Neural Machine Translation
BLEU scores for Average Sentence Lengths
Statistical Machine TranslationWell-known research paper by Koehn and Knowles on six challenges for machine
translation
Slow Training
Highly Complex
High Quality
Less
Error
Neural Machine TranslationThe new kid on the block
Phrase-Based MT
Neural MT generally captures
semantic details well.
Neural Machine Translation
01 Decode
Choose the best hypothesis
02 Encode
Formulate a hypothesisEncoderDecoderModel
• Sentence-by-Sentence
• Embeddings on the Source are
Encoded
• Classifications are the Target
language
Neural Machine TranslationAn encode-decode model machine translation system
• Free out-of-the box system
• Pre-trained word embeddings
• RNN
• Guillaume Klein et. al in 2017 from
Harvard and Systran
• Predecessor – Nematus
Neural Machine TranslationOpenNMT – a free neural machine translation system
OpenNMT: Neural Machine Translation Toolkit – Klein et.
al
Neural Machine TranslationOpenNMT – a free neural machine translation system
A comparison was done with Knowles on various MT paradigms
and testing on fuzzy-match repair.
F u z z y - M a t c h R e p a i r Te s t
Neural Machine TranslationTesting neural machine translation performance
Neural Machine TranslationThe best non-hybrid machine translation system at this moment
Data that is typically not easily computed using conventional personal computer
mechanisms.
Books20 MB
Chat Data
News
Articles
5 MB
10 MB
Big DataTraining our system by transferring knowledge into it
CPU GPU TPU
• Billions of words
• Tokenization and Cleansing in hours
• Tagging in hours or a day
• Modeling can take weeks or months
Big DataHigh-performance processing of data
• Multiple sources and languages
• Transfer from other models
• Performance-based
Knowledge
• Pre-calculated models (Embeddings)
• Only time to find index (hash lookup)
• Vector Space Modeling
Computations
Big DataTransfer knowledge from one source to another
27
Pre-trained
Based on
Words
Represent
Billions of
Words
Knowledge
Transfer
Trained with
Neural
Networks
No Guarantees
Semantic Knowledge Transfer
Word EmbeddingsCapture knowledge to transfer from one source to another
• Input to a model (linear or not)
• Use vectors from previous run
• Provide similarity not easy to
measure
• Easy-to-load
Word EmbeddingsBuilding an input classifier from learned knowledge
By embedding learned knowledge we are able to capture semantic
context for translating also.
Tokenize / Cleanse
• A classification task using a Neural
Network
• Bag-of-words or Skip Gram
Training
Saving
• Stem words or lemmatize
• Remove stop words
• Freeze model after classification
• Indexes saved to disk for new words
Word EmbeddingsSteps for creating our own word embeddings
Word2VecSkip-Gram
or
CBOW
GloveLog-bilinear
probability
representatio
n
FastTextWord2Vec
extension
Many word embeddings are freely downloadable for almost any project!
The main idea is that an algorithm builds a co-occurrence matrix.
Word EmbeddingsTypes of word embeddings
Word2Vec
https://code.google.com/archive/p/
word2vec/
local vocab_size = 50004 local embedding_size = 500 local
embeddings = torch.Tensor(vocab_size,
embedding_size):uniform() torch.save('enc_embeddings.t7',
embeddings)
OpenNMT Torch Loading
Physical Appearance
Training
Decoding or
Classification
• Several convolutions through hidden
layers
• Rectified Linear Unit (RELU) and
Softmax activation
• Embedding vectors used for each
word or words
• Sentence segments contain words from
embeddings
• Classification is done using dense layers
to match desired classes
Word EmbeddingsTrain a model to use word embeddings
Word embeddings serve as
input word vectors to an
RNN for source language
words.
• Sentence by sentence
encoding/decoding
• Context captured and carried forward
• Grammar
• Syntax
• Phrasal rules
Word EmbeddingsAttention-based machine translation using word embeddings
English German
• Performs best on in-domain data
• Little difference between Nematus and
ONMT
In-Domain
Other Languages
• BLEU score measurement improves
• Speed goes down
• Good for low-resource language
• Good for romance languages (ES, FR, PT,
IT)
• Not so good for some language pairs (EN-
-RU)
Neural Machine Translation with Word EmbeddingsComparison of Nematus to OpenNMT with English and German
Reinforcement LearningLearn by probing
Transfer LearningBring in knowledge
Heuristic LearningArtifact knowledge over time
Mixed LearningCombine several types
Artificial IntelligenceCan we make our MT system learn like humans?
The Best Machine Translation SystemWhich one should you use for your project?
Training Time
Language
pairs
Cost
Size on disk
Human
quality
BLEU scores
Easy to
reproduce
Featureless
THANKS!