language models are unsupervised multitask learners...outline 1 introduction 2 related work 3 openai...
TRANSCRIPT
![Page 1: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/1.jpg)
Language Models are Unsupervised Multitask Learners
Author: Alec Radford
OpenAI
Presenter: Faizan Ahmadhttps://qdata.github.io/deep2Read
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 1 / 14
![Page 2: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/2.jpg)
Outline
1 Introduction
2 Related Work
3 OpenAI Generative Pre-Training (GPT) 2
4 Evaluation and Results
5 Discussion
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 2 / 14
![Page 3: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/3.jpg)
Introduction
Pre-training and fine tuning - current trend in NLP
Can we do without fine tuning?
Can a single model learn multiple tasks?
Language models to the rescue
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 3 / 14
![Page 4: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/4.jpg)
Related WorkMultitask Learning
Task Specific Architectures
Last 7-10 years
Single Model Finetuned on Different Tasks
BERT by GoogleOpenAI GPT
Single Model for Multiple Tasks without Finetuning
Reading Comprehension
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 4 / 14
![Page 5: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/5.jpg)
OpenAI Generative Pre-Training (GPT) 2
Language modeling at the core
Train a large language model and solve multiple tasks with it
Figure: GPT-1 Model
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 5 / 14
![Page 6: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/6.jpg)
OpenAI Generative Pre-Training (GPT) 2
Which dataset to use?
Character based input or word based?
How many parameters?
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 6 / 14
![Page 7: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/7.jpg)
OpenAI Generative Pre-Training (GPT) 2Dataset (WebText)
Current large data sets have bad quality
Need new quality data sets
How?
Scrap links from RedditKeep the ones with more than 3 likes45 Million Links, 8 Million Documents
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 7 / 14
![Page 8: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/8.jpg)
OpenAI Generative Pre-Training (GPT) 2Input
Word level input converges slowly
Character level input does not work very well
Need a middle ground
Byte Pair Encoding to the rescue (Sub Words)
Figure: BPE
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 8 / 14
![Page 9: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/9.jpg)
OpenAI Generative Pre-Training (GPT) 2Parameters
Bigger is better
Figure: Effect of Parameters
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 9 / 14
![Page 10: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/10.jpg)
ResultsZero-shot Setting
Figure: Summary Results
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 10 / 14
![Page 11: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/11.jpg)
ResultsChildren’s Book Test
Fill in the blanks
10 Choices
New state of the art results
Figure: CBT Results
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 11 / 14
![Page 12: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/12.jpg)
ResultsLAMBADA
Fill the last word of a sentence with at least 50 tokens
New state of the art results
Figure: Summary Results
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 12 / 14
![Page 13: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/13.jpg)
ResultsQualitative Results
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 13 / 14
![Page 14: Language Models are Unsupervised Multitask Learners...Outline 1 Introduction 2 Related Work 3 OpenAI Generative Pre-Training (GPT) 2 4 Evaluation and Results 5 Discussion Author: Alec](https://reader033.vdocuments.net/reader033/viewer/2022050218/5f63ab6e4dad37604309c710/html5/thumbnails/14.jpg)
Discussion
New trend in deep learning for NLP
High quality text generation
Adversarial attacks on these models?
Author: Alec Radford Language Models are Unsupervised Multitask LearnersPresenter: Faizan Ahmad https://qdata.github.io/deep2Read 14 / 14