deep learning applications in natural language...

42
Deep Learning Applications in Natural Language Processing Jindřich Libovický December 5, 2018 B4M36NLP Introduction to Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated

Upload: others

Post on 28-May-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Deep Learning Applications inNatural Language ProcessingJindřich Libovický

December 5, 2018

B4M36NLP Introduction to Natural Language Processing

Charles UniversityFaculty of Mathematics and PhysicsInstitute of Formal and Applied Linguistics unless otherwise stated

Page 2: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Outline

Information Search

Unsupervised Dictionary Induction

Image Captioning

Deep Learning Applications in Natural Language Processing 1/38

Page 3: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Information Search

Page 4: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Answer Span Selection

Task: Find an answer for a question given question in a coherent text.

http://demo.allennlp.org/machine-comprehension

Deep Learning Applications in Natural Language Processing 2/38

Page 5: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Standard Dataset: SQuAD

• best articles from Wikipedia, of reasonable size (23kparagraphs, 500 articles)

• crowd-sourced more than 100k question-answer pairs• complex quality testing (which got estimate of single

human doing the task)

https://rajpurkar.github.io/SQuAD-explorer/explore/1.1/dev/Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas, November 2016. Association for Computational

Linguistics. URL https://aclweb.org/anthology/D16-1264

Deep Learning Applications in Natural Language Processing 3/38

Page 6: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Method Overview

1. Get text and question representation from• pre-trained word embeddings• character-level CNN

…using your favourite architecture.d2. Compute a similarity between all pairs of words in the text and in

the question.3. Collect all informations we have for each token.4. Classify where the span is.

Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional attention flow for machine comprehension. CoRR, abs/1611.01603,2016. URL http://arxiv.org/abs/1611.01603

Deep Learning Applications in Natural Language Processing 4/38

Page 7: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Method Overview: Image

Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional attention flow for machine comprehension. CoRR, abs/1611.01603,2016. URL http://arxiv.org/abs/1611.01603

Deep Learning Applications in Natural Language Processing 5/38

Page 8: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Representing Words

• pre-trained word embeddings• concatenate with trained character-level representations• character-level representaions allows searching for out-of-vocabulary structured

informations (numbers, addresses)

character embeddings of size 16

1D-convolution to 100 dimensions

max-pooling

Deep Learning Applications in Natural Language Processing 6/38

Page 9: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Contextual Embeddings Layer

• process both question and context with bidirectional LSTM layer→ one state per word

• parameters are shared → representaions share the space

Deep Learning Applications in Natural Language Processing 7/38

Page 10: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Attention Flow

𝑢1𝑢2𝑢3𝑢4𝑢5

quer

yU

ℎ1 ℎ2 ℎ3 ℎ4 ℎ5 ℎ6 ℎ7 ℎ8 ℎ9 ℎ10 ℎ11 ℎ12

context H

S𝑖𝑗 = w𝑇 [ℎ𝑖, 𝑐𝑗, ℎ𝑖 ⊙ 𝑐𝑗]Captures affinity / similarity between pairs of question and context words.

Deep Learning Applications in Natural Language Processing 8/38

Page 11: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Context-to-query Attention

𝑢1𝑢2𝑢3𝑢4𝑢5

quer

yU

ℎ1 ℎ2 ℎ3 ℎ4 ℎ5 ℎ6 ℎ7 ℎ8 ℎ9 ℎ10 ℎ11 ℎ12context H

soft

max

soft

max

soft

max

soft

max

soft

max

soft

max

soft

max

soft

max

soft

max

soft

max

soft

max

soft

max× weighted sum

�̃�1 �̃�2 �̃�3 �̃�4 �̃�5 �̃�6 �̃�7 �̃�8 �̃�9 �̃�10 �̃�11 �̃�12

Deep Learning Applications in Natural Language Processing 9/38

Page 12: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Query-to-Context Attention

𝑢1𝑢2𝑢3𝑢4𝑢5

quer

yU

ℎ1 ℎ2 ℎ3 ℎ4 ℎ5 ℎ6 ℎ7 ℎ8 ℎ9 ℎ10 ℎ11 ℎ12context H

maximum

× × × × × × × × × × × ×

weighted sum

Deep Learning Applications in Natural Language Processing 10/38

Page 13: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Modeling Layer

• concatenate: LSTM outputs for each context word , context-to-query-vectors• copy query-to-context vector to each of them• apply one non-linear layer and bidirectional LSTM

Deep Learning Applications in Natural Language Processing 11/38

Page 14: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Output Layer

1. Start-token probabilities: project each state to scalar → apply softmax over the context2. End-token probabilities:

• Compute weighted averate using the start-token probablities → single vector• Concatenate the vector to each state• Project states to scalar, renormalize with softmax

3. At the end select the most probable span

Deep Learning Applications in Natural Language Processing 12/38

Page 15: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Method Overview: Recap

Deep Learning Applications in Natural Language Processing 13/38

Page 16: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Attention Analysis (1)

Deep Learning Applications in Natural Language Processing 14/38

Page 17: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Attention Analysis (2)

Deep Learning Applications in Natural Language Processing 15/38

Page 18: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Make it 100× Faster!

Replace LSTMs by dilated convolutions.

Deep Learning Applications in Natural Language Processing 16/38

Page 19: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Convolutional Blocks

Deep Learning Applications in Natural Language Processing 17/38

Page 20: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Using Pre-Trained Representations

Just replace the contextual embeddings wiht ELMo or BERT…

Deep Learning Applications in Natural Language Processing 18/38

Page 21: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

SQuAD Leaderboard

method Exact Match F1 ScoreHuman performacne 82.304 91.221BiDAF with BERT 87.433 93.160BiDAF with ELMo 81.003 87.432BiDAF trained from scratch 73.744 81.525

Deep Learning Applications in Natural Language Processing 19/38

Page 22: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Unsupervised Dictionary Induction

Page 23: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Unsupervised Bilingual Dictionary

Task: Get a translation dictionary between two languages using monolignual data only.

• makes NLP accessible for low-resourced languages• basic for unsupervised machine translation• hot research topic (at least 10 research papers on this topic this year)

We will approach: Mikel Artetxe, Gorka Labaka, and Eneko Agirre. A robust self-learningmethod for fully unsupervised cross-lingual mappings of word embeddings. In Proceedings ofthe 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers), pages 789–798, Melbourne, Australia, July 2018. Association for ComputationalLinguistics. URL http://www.aclweb.org/anthology/P18-1073

Deep Learning Applications in Natural Language Processing 20/38

Page 24: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

How it is done

1. Train word embeddings on large monolignual corpora.2. Find a mapping between the two languages.

So far looks simple…

Deep Learning Applications in Natural Language Processing 21/38

Page 25: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Dictionary and Common Projection

𝑋, 𝑍 embedding matrices for 2 languages.Dictionary matrix 𝐷𝑖𝑗 = 1 if 𝑋𝑖 is translation of 𝑍𝑗.

Supervised projection between embeddingsGiven existing dictionary 𝐷 (small seed dictionary):

argmax𝑊𝑍,𝑊𝑋

∑𝑖

∑𝑗

𝐷𝑖𝑗 ⋅ similarity (𝑋𝑖∶𝑊𝑋, 𝑍𝑗∶𝑊𝑍) (𝑋∶𝑖𝑊𝑋 (𝑍𝑗∶𝑊𝑍)𝑇 )

…but we need to find all 𝐷, 𝑊𝑋, and 𝑊𝑍.

Deep Learning Applications in Natural Language Processing 22/38

Page 26: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

A Tiny Observation

𝑋𝑋𝑇Question: How would you interpret this matrix?

It is a table of similarities between pairs of words.

Deep Learning Applications in Natural Language Processing 23/38

Page 27: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

If the Vocabularies were Isometric…

• 𝑀𝑋 = 𝑋𝑋𝑇 and 𝑀𝑍 = 𝑍𝑍𝑇 would only have permuted rows and columns• if we sorted values in each row of 𝑀𝑋 and 𝑀𝑍, corresponding words would have the

same vectors

Let’s assume, it is true (at least approximately)

𝐷𝑖,∶ ← 1[argmin𝑗

(𝑀𝑋)𝑖,∶(𝑀𝑍)𝑇𝑗,∶]

Assign nearest neighbor from the other language.……in practice tragically bad but at least good initialization.

Deep Learning Applications in Natural Language Processing 24/38

Page 28: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Self-Learning

Iterate until convergence:1. Optimize 𝑊𝑍 and 𝑊𝑋, w.r.t to current dictionary

argmax𝑊𝑍,𝑊𝑋

∑𝑖

∑𝑗

𝐷𝑖𝑗 ⋅ (𝑋∶𝑖𝑊𝑋 (𝑍𝑗∶𝑊𝑍)𝑇 )

2. Update dictionary matrix 𝐷

𝐷𝑖𝑗 = {1, if 𝑖 is nearest neighbor of 𝑗 or wise versa0, otherwise

Deep Learning Applications in Natural Language Processing 25/38

Page 29: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Accuracy on Large Dictionary

Deep Learning Applications in Natural Language Processing 26/38

Page 30: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Try it yourself!

• Pre-train monolingual word embeddings using FastText / Word2Vec• Install VecMap

https://github.com/artetxem/vecmap

python3 map_embeddings.py --unsupervised SRC.EMB TRG.EMB SRC_MAPPED.EMBTRG_MAPPED.EMB

Deep Learning Applications in Natural Language Processing 27/38

Page 31: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Image Captioning

Page 32: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Image Captioning

Task: Generate a caption in natural language given an image.

Example:

A group of people wearing snowshoes, anddressed for winter hiking, is standing in front of abuilding that looks like it’s made of blocks of ice.The people are quietly listening while the storyof the ice cabin was explained to them.A group of people standing in front of an igloo.Several students waiting outside an igloo.

Deep Learning Applications in Natural Language Processing 28/38

Page 33: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Deep Learning Solution

1. Obtain pre-trained image representation.2. Use autoregressive decoder to generate the caption using the image representation.

Deep Learning Applications in Natural Language Processing 29/38

Page 34: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

2D Convolution over an Image

Basic method in deep learning for computer vision.

RGB image 9 × 9 × 3

convolutional map 4 × 4 × 6

stride 2

filter size 6kernel size

3

Deep Learning Applications in Natural Language Processing 30/38

Page 35: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Convolutional Network for Image Classification

3 @ 224 × 22424 @ 48 × 48

48 @ 27 × 27 60 @ 13 × 13 60 @ 13 × 13 50 @ 13 × 13 1 × 2048 1 × 2048 1 × 1000

RGB image

Flatten +Dense layer

Denselayer

Convolution +Max-Pooling

Convolution +Max-Pooling Convolution Convolution Convolution +

Max-pooling

• Trained for 1k classes classification, millions of training examples• Architecture: convolutions, max-pooling, residual connections, batch normalization,

50–150 layers

Deep Learning Applications in Natural Language Processing 31/38

Page 36: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Reminder: Autoregressive Decoder

<s>

~y1 ~y2 ~y3 ~y4 ~y5

<s> x1 x2 x3 x4

<s>y1 y2 y3 y4

loss

Deep Learning Applications in Natural Language Processing 32/38

Page 37: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Attention Model in Equations (1)

Inputs:decoder state 𝑠𝑖encoder states ℎ𝑗 = [ ⃗⃗⃗ ⃗⃗ ⃗⃗ℎ𝑗; ⃖⃖⃖ ⃖⃖ ⃖⃖ℎ𝑗] ∀𝑖 = 1 … 𝑇𝑥

Attention energies:

𝑒𝑖𝑗 = 𝑣⊤𝑎 tanh (𝑊𝑎𝑠𝑖−1 + 𝑈𝑎ℎ𝑗 + 𝑏𝑎)

Attention distribution:

𝛼𝑖𝑗 = exp (𝑒𝑖𝑗)∑𝑇𝑥

𝑘=1 exp (𝑒𝑖𝑘)

Context vector:

𝑐𝑖 =𝑇𝑥

∑𝑗=1

𝛼𝑖𝑗ℎ𝑗

Deep Learning Applications in Natural Language Processing 33/38

Page 38: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Attention Model in Equations (2)

Output projection:

𝑡𝑖 = MLP (𝑈𝑜𝑠𝑖−1 + 𝑉𝑜𝐸𝑦𝑖−1 + 𝐶𝑜𝑐𝑖 + 𝑏𝑜)

…attention is mixed with the hidden state

Output distribution:

𝑝 (𝑦𝑖 = 𝑘|𝑠𝑖, 𝑦𝑖−1, 𝑐𝑖) ∝ exp (𝑊𝑜𝑡𝑖)𝑘 + 𝑏𝑘

Deep Learning Applications in Natural Language Processing 34/38

Page 39: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Example Outputs: Correct

Deep Learning Applications in Natural Language Processing 35/38

Page 40: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Example Outputs: Incorrect

Deep Learning Applications in Natural Language Processing 36/38

Page 41: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Employing Transformer Decoder

input embeddings

⊕positionencoding

self-attentivesublayer

multiheadattention

keys &values

queries

layer normalization

cross-attentionsublayer

multiheadattention

keys &values

queries

layer normalization

encoder

feed-forwardsublayer

non-linear layer

linear layer

layer normalization

𝑁×

linear

softmax

output symbol probabilities

𝑣1𝑣2𝑣3

𝑣𝑀

𝑞1 𝑞2 𝑞3

𝑞𝑁

………

… …

Queries 𝑄

Valu

es𝑉

−∞

Deep Learning Applications in Natural Language Processing 37/38

Page 42: Deep Learning Applications in Natural Language Processingufal.mff.cuni.cz/.../lect14-deep-learning-applications.pdf · 2018-12-05 · Deep Learning Applications in Natural Language

Quantitative Results

model BLEU scoreRNN + attention (original) 24.3RNN + attention (with better image representation) 32.6Transformer (with better image representation) 33.3

Deep Learning Applications in Natural Language Processing 38/38