deep learning & natural language processing

60
Deep Learning & Natural Language processing Embeddings, CNNs, RNNs, etc... Julius B. Kirkegaard 2019 Snippets at: https://bit.ly/2VWvs3m

Upload: others

Post on 02-May-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning & Natural Language processing

Deep Learning

& Natural Language processingEmbeddings, CNNs, RNNs, etc...

Julius B. Kirkegaard 2019Snippets at: https://bit.ly/2VWvs3m

Page 2: Deep Learning & Natural Language processing

Deep Neural Networks & PyTorch

Page 3: Deep Learning & Natural Language processing

Deep neural networks

Page 4: Deep Learning & Natural Language processing

Neural networks

Network architectureParameters

Data (perhaps preprocessed)

Page 5: Deep Learning & Natural Language processing

Neural networks

Some matrix

Some non-linear function

”Activation Function”

Some vector

Page 6: Deep Learning & Natural Language processing

Deep neural networks

Page 7: Deep Learning & Natural Language processing

Requirements for DNN Frameworks

• Optimisation of parameters p

• Take first order derivatives

• Chain rule (backpropagation)

• Process large amounts of data fast

• Exploit GPUs

• Nice to haves:

• Standard functions and operations built-in

• Built-in optimizers

• Spread training across network

• Compile for fast inference

• …

Page 8: Deep Learning & Natural Language processing

PyTorch

• GPU acceleration

• Automatic Error-Backpropagation

(chain rule through operations)

• Tons of functionality built-in

Hard to play with,

not good for new ideas

and research (IMO)

Easy play. Dificult to

implement custom and

dynamic architechtures

Page 9: Deep Learning & Natural Language processing

Requirement 1: Calculate gradients

Page 10: Deep Learning & Natural Language processing

Requirement 2: GPU

Page 11: Deep Learning & Natural Language processing

Simple Neural Network: ”Guess the Mean”

1 2 3

Neural network in three steps:

Design the architecture

and initialise parametersCalculate the loss Update parameters based

on loss gradient

Warning: This is not the best way to

implement, better version will follow…

(this version is for understanding)Snippets at: https://bit.ly/2VWvs3m

Page 12: Deep Learning & Natural Language processing

Better optimiser stepping

• What if some gradients are much smaller than others?

• What happens when gradients disappear when loss is small?

Solution → Variable learning rates and momentum

• Many algorithms exists, perhaps most popular: “Adam”

Page 13: Deep Learning & Natural Language processing

Better optimiser stepping

SGD (Stochastic gradient descent) Adam (Adaptive Moment Estimation)

simple_nn.py module_nn.py

Snippets at: https://bit.ly/2VWvs3m

Page 14: Deep Learning & Natural Language processing

Representing sentences

Page 15: Deep Learning & Natural Language processing

The Trouble

“Hej med dig”

Page 16: Deep Learning & Natural Language processing

Bag of Words

“Hej med dig”

“Hej hej”

Page 17: Deep Learning & Natural Language processing

Bag of Words, poor behaviour #1

“I had my car cleaned.”

“I had cleaned my car.” (order ignored)

Page 18: Deep Learning & Natural Language processing

Bag of Words, poor behaviour #2

“Hej med dig”

“Heej med dig”

“Haj medd dig” (typos)

(semantically similar)

Page 19: Deep Learning & Natural Language processing

Bag of Words, poor behaviour #3

“Hej med dig”

Page 20: Deep Learning & Natural Language processing

The idea for a solution

Idea: Represent each word as a vector, with similar words

having vector that are close

Problem: how to choose the vector representing each word?

Page 21: Deep Learning & Natural Language processing

The idea for a solution

The country was ruled by a _____

The bishop anointed the ____ with aromatic oils

The crown was put on the ____

”Context defines meaning”:

King/Queen

Page 22: Deep Learning & Natural Language processing

Continous Bag of Words

• Input is a ”one-hot” vector

• We force network to make eachword into a

~200 length vector

• From these vector we predict ”focus word”

• When done, keep ”embeddings”

See e.g. https://github.com/FraLotito/pytorch-continuous-bag-of-words/blob/master/cbow.py

for simple implementation

The bishop anointed the with aromatic oilsqueen

Context ContextFocus

word

Page 23: Deep Learning & Natural Language processing

Continous Bag of Words

I think therefore

ContextContext

Focus

word

I am Dictionary: [“I”, “think”, “therefore”, “am”]

Context size = 2

Page 24: Deep Learning & Natural Language processing

Continous Bag of Words

Very simple version:

Page 25: Deep Learning & Natural Language processing

Continous Bag of Words

Probability distribution

of all words in

dictionary.

Can be > 1 million

words, so smarter

training techniques are

typically used:

“Negative sampling”

Page 26: Deep Learning & Natural Language processing

Vectors

Page 27: Deep Learning & Natural Language processing

Word2Vec Vectors

Page 28: Deep Learning & Natural Language processing

Word2Vec Vectors

King – Man + Woman = Queen

Page 29: Deep Learning & Natural Language processing

Pretrained word vectors

• Glove: https://nlp.stanford.edu/projects/glove/

• FastText: https://fasttext.cc/docs/en/crawl-vectors.html

• ELMo: https://github.com/HIT-SCIR/ELMoForManyLangs

Can be used as-is or further trained on specific corpus

Trained on Wikipedia and “common crawl”

Page 30: Deep Learning & Natural Language processing

Representing sentences

Using word embeddings sentences become “pictures”:

“I think therefore I am”

5 x 200 matrix

Page 31: Deep Learning & Natural Language processing

Convolutional Neural Networks

Page 32: Deep Learning & Natural Language processing

CNNs: Convolutional Neural Networks

is trainable

Page 33: Deep Learning & Natural Language processing

CNNs: Convolutional Neural Networks

Padded with zeros

Page 34: Deep Learning & Natural Language processing

CNNs: Convolutional Neural Networks

Padded with zeros,

Stride = 2

Page 35: Deep Learning & Natural Language processing

CNNs: Convolutional Neural Networks

Kernels = Filters = Features in CNN language

Page 36: Deep Learning & Natural Language processing

Pooling

Max-pooling 3x3

Pooling = Subsampling in CNN language

Page 37: Deep Learning & Natural Language processing

CNNs: Convolutional Neural Networks

Page 38: Deep Learning & Natural Language processing

Text Classification

Standard choices:

• Convolutional Neural Networks

• Recursive Neural Networks (LSTMs)

Page 39: Deep Learning & Natural Language processing

Classification using CNN

See e.g. https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/4%20-%20Convolutional%20Sentiment%20Analysis.ipynb

1D convolutions with 2D filters

(embedding size x kernel size)

Page 40: Deep Learning & Natural Language processing

Recursive Neural Networks

Page 41: Deep Learning & Natural Language processing

Language Modelling

Hi mom, I’ll be late for …

Page 42: Deep Learning & Natural Language processing

Neural networks

Network architectureParameters

Data (perhaps preprocessed)

Page 43: Deep Learning & Natural Language processing

Recursive neural networks

Network architecture

Parameters

Data (perhaps preprocessed)

Hidden state

Page 44: Deep Learning & Natural Language processing

What are recursive neural networks?

Example: (“classic” RNN):

Page 45: Deep Learning & Natural Language processing

Language Modelling with RNNs

Hi mom, I’ll be late for …

can be used to predict next word

Page 46: Deep Learning & Natural Language processing

Language Modelling with RNNs

Snippets at: https://bit.ly/2VWvs3m

Page 47: Deep Learning & Natural Language processing

RNN Design choices

“I grew up in France” “Since my mother tongue is ____”

Standard RNN:

Page 48: Deep Learning & Natural Language processing

LSTMs: Long-Short Term Memory

Standard RNN:

LSTM:

See https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 49: Deep Learning & Natural Language processing

LSTMs: Long-Short Term Memory

Standard RNN:

LSTM:

Page 50: Deep Learning & Natural Language processing

LSTM Language Model

“I’ll be late for….”

Sample loop: take word of highest probability and repeat

(real models tend to stack many LSTMs)

Page 51: Deep Learning & Natural Language processing

Sampling

Shakespeare

PANDARUS:Alas, I think he shall be come approached and the dayWhen little srain would be attain’d into being never fed,And who is but a chain and subjects of his death,I should not sleep.Second Senator:They are away this miseries, produced upon my soul,Breaking and strongly should be buried, when I perishThe earth and thoughts of many states.

Wikipedia

Naturalism and decision for the majority of Arab countries’ capitalide was groundedby the Irish language by [[John Clair]], [[An Imperial Japanese Revolt]], associatedwith Guangzham’s sovereignty. His generals were the powerful ruler of the Portugalin the [[Protestant Immineners]], which could be said to be directly in CantoneseCommunication, which followed a ceremony and set inspired prison, training. Theemperor travelled back to [[Antioch, Perth, October 25|21]] to note, the Kingdomof Costa Rica, unsuccessful fashioned the [[Thrales]], [[Cynth’s Dajoard]], knownin western [[Scotland]], near Italy to the conquest of India with the conflict.

<revision><id>15900676</id><timestamp>2002–08–03T18:14:12Z</timestamp><contributor><username>Paris</username><id>23</id></contributor><minor /><comment>Automated conversion</comment></revision>

XML ObamaGood everybody. Thank you very much. God bless the United States of America, and has already began with the world’s gatheringIt’s about hard-earned for our efforts that are not continued.We are all the assumptionion to the streets of the Americas that we are still for everybody and destruction.We are doing a lot of this.I know that someone would be prefered to their children to take a million insurance company. We’re watching their people and continued to find ourselves with RepubThank you. God bless you. Good morning, everybody. And May God loss man. Thank you very much. Thank you very much, everybody.

Page 52: Deep Learning & Natural Language processing

OpenAI Language Model…

Prompt: “Recycling is good for the world, no, you could not be more wrong”

“Recycling is NOT good for the world. It is bad for the

environment, it is bad for our health, and it is bad for our

economy. I’m not kidding. Recycling is not good for the

environment. It is destructive to the earth and it is a major

contributor to global warming. Recycling is not good for our

health. It contributes to obesity and diseases like heart disease

and cancer. Recycling is bad for our economy. It increases the

cost of a product, and in turn, the price of everything that is made

with that product. Recycling is not good for our nation. We pay a

tremendous price for the privilege of having the world’s most

advanced and efficient recycling system. Recycling is a huge,

colossal waste of time, energy, money, and resources.”

Page 53: Deep Learning & Natural Language processing

Transfer Learning & State-of-the-Art

Page 54: Deep Learning & Natural Language processing

Example: IMDB Dataset

Task: Guess the rating from review text

Page 55: Deep Learning & Natural Language processing

Transfer learning

Page 56: Deep Learning & Natural Language processing

Transfer learning

Page 57: Deep Learning & Natural Language processing

The Strength of Transfer learning

IMDB: What if only 1 % of reviews included a rating?

can the remaining 99 % reviews be used for anything?

Language model!

(and this is very, very standard situation, in academia and industry)

Page 58: Deep Learning & Natural Language processing

The Strength of Transfer learning

“… we found that training our approach with only 100 labeled examples (and giving it access to about 50,000 unlabeled examples),

we were able to achieve the same performance as training a model from scratch with 10,000 labeled examples.

Another important insight was that we could use any reasonably general and large“

- Howard & Ruder (2018)

Page 59: Deep Learning & Natural Language processing

Transfer learning: Other methods

They all laughed. [NEXT] Frodo felt his spirits reviving.

They all laughed. [NEXT] Bag End seemed sad and gloomy and dishevelled.

Task: Classify if two sentences are next to each other

See e.g. https://arxiv.org/abs/1810.04805

“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”

Page 60: Deep Learning & Natural Language processing

Concepts skipped

• Encoder-Decoders (sequence to sequence)

• Attention

• Transformers

See e.g. paper: “Attention Is All You Need” (2017)