lecture 1: introduction to natural language processing and text ......

106
Neural Natural Language Processing Lecture 1: Introduction to natural language processing and text categorization

Upload: others

Post on 23-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

Neural Natural Language Processing

Lecture 1: Introduction to natural language processing and text

categorization

Page 2: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 2

Plan of the lecture

● Part 1: About the course: logistics, organization, materials, etc.

● Part 2: Motivation for the course: neural NLP models, the “neural revolution” in NLP.

● Part 3: A short introduction to NLP.● Part 4: Text classification task and a simple

model to solve it using Naive Bayes model.

Page 3: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 3

Lecture 1

Part 1: About the course: logistics, organization, materials, etc.

Page 4: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

4

Acknowledgments

● Based on the materials of the following courses:– Lectures and assignments are adopted from “Neural

Networks for Natural Language Processing” course by Nikolay Arefyev (Samsung Moscow Research Center and Moscow State University).

– Seminars are adopted from various sources, notably NLP course of Yandex School of Data Analysis.

– Additional sources will be indicated as needed.

Page 5: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

5

Instructors

Lectures:

● Prof. Alexander Panchenko, Skoltech

● Dr. Nikolay Arefyev, Samsung / Moscow State University

Seminars, assignments:

● Dr. Artem Shelmanov, Skoltech

● Dr. Varvara Logacheva, Skoltech

● Olga Kozlova, MTS Innovation Center

● Viktoria Chekalina, Skoltech / Philips Innovation Center

● Irina Nikishina, Skoltech

● Daryna Dementieva, Skoltech

Final projects:

● Olga Kozlova, Alexander Panchenko, ...

Page 6: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

6

Tentative schedule of the class

Page 7: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

7

Assignments

● A Kaggle-style competition for the best F-score● One task (sentiment analysis), different models

Page 8: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

8

Assignments

● Sentiment analysis using Naive Bayes classifier.

● Sentiment analysis using Logistic Regression and a Feedforward Neural Network.

● Sentiment analysis using word and document embeddings.

● Sentiment analysis using RNNs.● Sentiment analysis using BERT or ELMo.

Page 9: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

9

Assignments

● Sentiment analysis using Naive Bayes classifier.

● Sentiment analysis using Logistic Regression and a Feedforward Neural Network.

● Sentiment analysis using word and document embeddings.

● Sentiment analysis using RNNs.● Sentiment analysis using BERT or ELMo.

Model complexity, performance(?)

Page 10: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

10

Assignments

Evaluation criterion:● Results: what was the rank of your solution

among other submissions?● Reproducibility: the possibility to get the

results using your script.● Readability: how easy is to understand your

code? ● Timing: did you deliver in time?

Page 11: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

11

Final project

Various options are:● Find an interesting task and propose a (neural)

NLP model to solve it.● Propose a new NLP task or a variant of some

existing one and come up with a baseline for its solution.

● Get a recently published NLP paper and replicate its results. Discuss the outcomes.

Page 12: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

12

Final project

● The list of topics can be found here: http://bit.ly/nnlp_topics– To be further extended.

● Can be done in a group up to 3 people● You can propose yours as well● To propose a topic, enter here your name and topic

http://bit.ly/nnlp_topics_distribution● It is advised to ask an instructor during a seminar about the

suitability of a topic (but this is not a strict requirement).

Page 13: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

13

Final project

Requirements:● Outcome of a project is a Jupyter notebook which describes the

entire experiment:– It should be readable (with supporting text: task, motivation, discussion)– It should be executable: we should be able to reproduce your results

from the first try.

● Due to the time constraints: no oral presentation. Rather communicate what you have done in code, text, formulas, tables, and plots.

● Deadline: 19.12.2019 EoD. ● Suggestion is to start ASAP!

Page 14: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

14

Final project

Evaluation criteria:● Relevance of the task: Are you tackling a relevant research

problem? Did you do something which has not been done yet (at least in some aspect) or a solution is available from Github already before your started?

● Readability: can we easily understand what has been done?● Reproducibility: can we get the same numbers and plots?● Results: did you manage to improve something (or gain some

interesting insights about the negative results)?● Originality: how innovative was the approach you used?● Timing: did you deliver in time?

Page 15: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

15

Exam

● Is not obvious to organize in our case:– … e.g.the Deep Learning course has no exam.

● Mostly questions about various models:– Structure,– Applications,– Training methods,– Objectives.

Page 16: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

16

Cost of various activities

● Assignments: 40%● Final project: 40%● Exam: 20%

● If you already completed a similar NLP course and/or have a publication not lower than a workshop level at a major NLP conference you can do a final project worth 80% and skip the assignments.– The topic will be provided by instructor (less freedom in topic choice).– The load is expected to be the same as Assignments+Final project.

Page 17: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

17

Prerequisites● Basic concepts from Calculus, Linear Algebra, Probability, Statistics, and

Computer Science.

● Fundamentals of Machine Learning:– Recommended machine learning courses

https://www.coursera.org/learn/machine-learning, http://cs229.stanford.edu – … or analogous course on ML and DL in Skoltech!

● Python programming language:– Programming assignments are in Python;– De facto standard for ML/DL/NLP.

● This is NOT a generic machine learning / deep learning course:– Some introductory lectures will give a reminder on the basics, though;– We rather focus on specific architectures of neural networks in NLP.

Page 18: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

18

Outline of the course topics

Page 19: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

19

Lecture logistics

● 45 minutes of lecture● 10 minutes break● 45 minutes of lecture● 10 minutes break● 45 minutes of lecture

Page 20: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

20

Let us drive right in!

Image source: http://fastml.com/introduction-to-pointer-networks

Page 21: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 21

Lecture 1

● Part 1: About the course: logistics, organization, materials, etc.

● Part 2: Motivation for the course: neural NLP models, the “neural revolution” in NLP.

● Part 3: A short introduction to NLP.● Part 4: Text classification task and a simple

model to solve it using Naive Bayes model.

Page 22: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

22

Natural Language

● Language is what makes us different from other living beings:– Allowing sharing and accumulation of knowledge;– Allowing to organize a society in a complex way;– ...

Image source:Wikipedia

Page 23: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 23

Natural Language

Images source: Wikipedia

Page 24: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

24

Natural Language Processing (NLP)● NLP is a subfield of Artificial Intelligence (AI) which relies on:

– Computer Science (recently most notably machine learning)– Linguistics

● The goal is to make computers understand and generate natural language to perform useful tasks like:– Translate a text from one language to another, e.g. Yandex Translate– Search and extract information

● Search engines, e.g. Google● Question answering systems, e.g. IBM Watson

– Dialogue systems● Answer questions, execute voice commands, voice typing● Samsung Bixby, Apple Siri, Google Assistant, etc.

● Language understanding is an “AI-complete” problem– we hope to train computers to extract signal relevant for a particular task

Page 25: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

25

More NLP Applications● Dialog systems for customer support

● Sentiment analysis

● Topic categorization

● Spell checking

● Summarization

● Fact extraction

Page 26: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

26

Traditional NLP Pipeline

Source of the slide: Socher & Manning, cs224n

Page 27: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

27

A glance on the history of Natural Language Processing

A part of table of contents of Jurafsky & Martin (2009) textbook augmented with points 1.6.7 and 1.6.8

Page 28: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

28

ML vs. DL: Function family F?

Source: Socher, Manning. Cs224n, 2017

Page 29: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

29

Good old-fashioned ML

Source: Socher, Manning. CS224n, 2017

Page 30: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

30

Deep Learning

Source: Socher, Manning. CS224n, 2017

Page 31: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

31

Why Deep Learning?

Source: Socher, Manning. CS224n, 2017

Page 32: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

32

Why now?

Source: Socher, Manning. Cs224n, 2017

Page 33: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

33

Speech recognition

Source: Hinton, Neural Networks for Machine Learning @ Coursera, 2012 (Lecture 1, slide 13)

>30% WER improvement

Page 34: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

34

Speech recognition

Source: Hinton, Bengio & LeCun, Deep Learning, NIPS’2015 Tutorial, slide 69

Page 35: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

35

ImageNet● > 1.4M images from the web, 1000 classes

NVIDIA CES 2016 Press Conference, slide 10

● Krizhevsky, Sutskever, Hinton, 2012: ● 74.2% →83.6% Top 5 accuracy● 25.8%→16.4% Top 5 error rate● 36% error reduction (fixed every third error)

Page 36: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 36

ImageNetTop 5 Error Rate

● Human error:– 5.1% (trained and patient)– 15% (non-trained, less patient)

● Best result in 2016: 3.08%Inception-v4 + 3xResNet ensemble

[Fei-Fei Li & Justin Johnson & Serena Yeung, cs231n, 2017. Lecture 1]

[Andrej Karpathy, What I learned from competing against a ConvNet on ImageNet, 2014]

Page 37: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

37

ImageNet – Learnt features

Matthew D. Zeiler and Rob Fergus, Visualizing and Understanding Convolutional Networks

Page 38: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

38

ImageNet – Learnt features

Matthew D. Zeiler and Rob Fergus, Visualizing and Understanding Convolutional Networks

Page 39: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

39

Source: Jawahar G., Sagot B., Seddah D. What does BERT learn about the structure of language? ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Jul 2019, Florence, Italy

What does BERT learn about the structure of language?

Page 40: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

40

Source: Jawahar G., Sagot B., Seddah D. What does BERT learn about the structure of language? ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Jul 2019, Florence, Italy

What does BERT learn about the structure of language?

Page 41: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

41

The ongoing “neural revolution” in NLP: from Collobert to BERT

What problems Neural NLP is addressing:● The need of feature engineering.● Curse of dimensionality:

– SVD and NMT can be used to obtain embeddings, but they the algorithms doesn’t scale well do large datasets.

● The need to develop a custom algorithm / model for each task separately.– Rather the idea is to try to develop a single model for any

NLP task.

Page 42: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 1: About the course.

42

The ongoing “neural revolution” in NLP: from Collobert to BERT

What problems Neural NLP is addressing:● The need of feature engineering.

Page 43: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

43

The ongoing “neural revolution” in NLP: from Collobert to BERT

What problems Neural NLP is addressing:● Curse of dimensionality:

Page 44: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

44

A simpler and a more generic NLP pipeline

Source: https://explosion.ai/blog/deep-learning-formula-nlp?ref=Welcome.AI

Page 45: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

45

A simpler and more generic NLP pipeline … which yields good results

Step 1: Embed

An embedding table maps long, sparse, binary vectors into shorter, dense, continuous vectors.

Source: https://explosion.ai/blog/deep-learning-formula-nlp?ref=Welcome.AI

Page 46: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

46

A simpler and more generic NLP pipeline … which yields good resultsStep 2: Encode

Given a sequence of word vectors, the encode step computes a representation that I'll call a sentence matrix, where each row represents the meaning of each token in the context of the rest of the sentence.

Source: https://explosion.ai/blog/deep-learning-formula-nlp?ref=Welcome.AI

Page 47: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

47

A simpler and more generic NLP pipeline … which yields good resultsStep 3: Attend

The attend step reduces the matrix representation produced by the encode step to a single vector, so that it can be passed on to a standard feed-forward network for prediction.

Source: https://explosion.ai/blog/deep-learning-formula-nlp?ref=Welcome.AI

Page 48: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

48

A simpler and more generic NLP pipeline … which yields good results

Step 4: Predict

Once the text or pair of texts has been reduced into a single vector, we can learn the target representation — a class label, a real value, a vector, etc.

Source: https://explosion.ai/blog/deep-learning-formula-nlp?ref=Welcome.AI

Page 49: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

49

Page 50: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

50Source: Socher, Manning. Cs224n, 2017

Page 51: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

51

MT vs. Human translation

https://www.eff.org/ai/metrics#Translation

Page 52: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

52

Google Neural Machine Translation (NMT) System

Source: Socher, Manning. Cs224n, 2017

Page 53: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

53

GLUE benchmark

Source: Wang et al. GLUE: A Multi-task benchmark and analysis platform for Natural Language Understanding, 2019

Page 54: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

54

GLUE leaderboard

Source: https://gluebenchmark.com/leaderboard

Page 55: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 2: Course motivation.

55

Source: https://super.gluebenchmark.com/leaderboard

SuperGLUE leaderboard

Page 56: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 56

Lecture 1

● Part 1: About the course: logistics, organization, materials, etc.

● Part 2: Motivation for the course: neural NLP models, the “neural revolution” in NLP.

● Part 3: A short introduction to NLP.● Part 4: Text classification task and a simple

model to solve it using Naive Bayes model.

Materials in this part are adopted from: Rao D. & McMahan (2019): Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning. O’Reilly. 1st Edition. ISBN-13: 978-1491978238

Page 57: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

57

A Quick Tour of Traditional NLP

● Natural language processing (NLP) and computational linguistics (CL) are two areas of computational study of human language:– NLP – how to build a technical system which knows something about (i.e performs

processing of) human language: solving practical problems involving language, such as:

● information extraction;● automatic speech recognition;● machine translation;● sentiment analysis;● question answering;● summarization.

– CL – how to learn about some aspect of language using various mathematical and computational methods, models, and algorithms: employs computational methods to understand properties of human language.

● How do we understand language?● How do we produce language?● How do we learn languages?● What relationships do languages have with one another?

Page 58: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

58

A Quick Tour of Traditional NLP

● Natural language processing (NLP) and computational linguistics (CL) are two areas of computational study of human language:– NLP – how to build a technical system which knows something about (i.e performs

processing of) human language: solving practical problems involving language, such as:

● information extraction;● automatic speech recognition;● machine translation;● sentiment analysis;● question answering;● summarization.

– CL – how to learn about some aspect of language using various mathematical and computational methods, models, and algorithms: employs computational methods to understand properties of human language.

● How do we understand language?● How do we produce language?● How do we learn languages?● What relationships do languages have with one another?

Page 59: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

59

Corpora, Tokens, and Types

● NLP methods, be they classic or modern, begin with a text dataset, also called a corpus (plural: corpora).– A corpus usually contains raw text (in ASCII or UTF8) and

any metadata associated with the text.

● The raw text is a sequence of characters (bytes), but most times it is useful to group those characters into contiguous units called tokens.

● Types are unique tokens present in a corpus. The set of all types in a corpus is its vocabulary or lexicon.

Page 60: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

60

Corpora, Tokens, and Types

Page 61: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

61

Tokenization

● The process of breaking a text down into tokens is called tokenization.– There are six tokens in the sentence “Mary

slapped the green witch.” – “.” is one of them. – Tokenization can become more complicated than

simply splitting text based on nonalphanumeric characters.

Page 62: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 62

Tokenization: the case of Turkish

Page 63: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

63

Tokenization: Twitter data

● Tokenizing tweets involves preserving hashtags and @handles, and segmenting smilies such as :-) and URLs as one unit.

● Those decisions can significantly affect accuracy in practice!

Page 64: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

64

Tokenzation

● Using SpaCy

● Using NLTK

Page 65: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

65

Feature engineering

● Feature engineering is the process of understanding the linguistics of a language and applying it to solving NLP problem.

● This is something that we keep to a minimum in neural NLP for:– portability of models across languages;– applicability to more tasks;– avoiding the need in expert knowledge.

● When building realworld production systems,feature engineering is indispensable, despiterecent claims to the contrary.– Will it change in future?

Page 66: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

66

Unigrams, Bigrams, Trigrams, …, N-grams

● N-grams are fixed length (n) consecutive token sequences occurring in the text:– Bigram has two tokens;– Unigram has one token, etc.

Page 67: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

67

Unigrams, Bigrams, Trigrams, …, N-grams

● When subword information itself carries useful information, one might want to generate character N-grams:– For example, the suffix “ol” in “methanol” indicates it

is a kind of alcohol.

Page 68: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

68

Lemmas and Stems

● Lemmas are root forms of words.● Verb fly can be inflected into many different

word forms: flow, flew, flies, flown, flowing.● Lemmatization is reducing the tokens to their

lemmas, e.g. to keep the dimensionality of the vector representation low.

Page 69: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

69

Lemmas and Stems

● Stemming use of handcrafted rules to strip endings of words to reduce them to a common form called stems.– Cons: quality, the “poorman’s lemmatization”

– Pros: efficiency, was/is popular in information retrieval for this reason.

Page 70: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 70

Categorizing Sentences and Documents

● One of the earliest applications of NLP– Topic categorization, predicting sentiment of

reviews, filtering spam emails, language identification, spam filtering, etc.

Page 71: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

71

Categorizing Sentences and Documents: TF representation

Page 72: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

72

TF-IDF representation: TF(w) IDF(⋅ w)

● TF representation weights word w proportional to its frequency:– Common words do not add anything to understanding. – A rare word is likely to be indicative.

● TF-IDF penalizes common tokens and rewards rare tokens in the vector representation:– nw is the number of documents containing the word w

and N is the total number of documents

Page 73: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

73

TF-IDF representation: TF(w) IDF(⋅ w)

Page 74: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

74

Categorizing Words: POS Tagging

● One can label not only documents but also individual words or tokens:– Part-of-speech (POS) tagging– Morphological analysis, etc.

Page 75: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

75

Categorizing Spans: Chunking and Named Entity Recognition

● Label a span of text - a contiguous multitoken sequence.– Chunking:

[NP Mary] [VP slapped] [the green witch] – Named entity recognition:

[PER Mary Johnson] slapped the green witch

Page 76: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

76

Categorizing Spans: Chunking and Named Entity Recognition

● Chunking:

● Named entity recognition:

Page 77: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

77

Structure of sentences: identifying relations between phrases

A constituent parse of the sentence “Mary slapped the green witch.”

Page 78: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 3: A short intro to NLP

78

Structure of sentences: identifying relations between phrases

A dependency parse of the sentence “Mary slapped the green witch.”

Page 79: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 79

Word Senses and Semantics

● Words can have multiple senses– WordNet– Automatic

discovery of senses from context

– ...

Page 80: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

80

Lecture 1

● Part 1: About the course: logistics, organization, materials, etc.

● Part 2: Motivation for the course: neural NLP models, the “neural revolution” in NLP.

● Part 3: A short introduction to NLP.● Part 4: Text classification task and a simple

model to solve it using Naive Bayes model.

Materials in this part are adopted from: Jurafsky & Martin (2019): Speech and Language Processing (3rd edition). https://web.stanford.edu/~jurafsky/slp3/

Page 81: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

81

Who wrote which Federalist papers?

● 1787-8: anonymous essays try to convince New York to ratify U.S Constitution: Jay, Madison, Hamilton.

● Authorship of 12 of the letters in dispute● 1963: solved by Mosteller and Wallace using

Bayesian methods

James Madison Alexander Hamilton

Page 82: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

82

Positive or negative movie review?

● Unbelievably disappointing ● Full of zany characters and richly applied

satire, and some great plot twists● This is the greatest screwball comedy ever

filmed● It was pathetic. The worst part about it was

the boxing scenes.

Page 83: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

83

What is the subject of this article?

• Antogonists and Inhibitors• Blood Supply• Chemistry• Drug Therapy• Embryology• Epidemiology• …

MeSH Subject Category Hierarchy

?

MEDLINE Article

Page 84: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

84

Text Classification

● Assigning subject categories, topics, or genres● Spam detection● Authorship identification● Age/gender identification● Language Identification● Sentiment analysis● …

Page 85: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

85

Text Classification: definition

Input:• a document d• a fixed set of classes C = {c1, c2,…, cJ}

Output: a predicted class c C

Page 86: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

86

Classification Methods: Hand-coded rules

● Rules based on combinations of words or other features– spam: black-list-address OR (“dollars” AND“have

been selected”)

● Accuracy can be high– If rules carefully refined by expert

● But building and maintaining these rules is expensive

Page 87: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

87

Classification Methods:Supervised Machine Learning

• Input: • a document d• a fixed set of classes C = {c1, c2,…, cJ}• A training set of m hand-labeled documents

(d1,c1),....,(dm,cm)

• Output:

• a learned classifier γ:d c

Page 88: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

88

Classification Methods:Supervised Machine Learning

● Any kind of classifier– Naïve Bayes– Logistic regression– Support-vector machines– k-Nearest Neighbors

….● Deep neural networks

Page 89: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

89

Naïve Bayes Intuition

● Simple (“naïve”) classification method based on Bayes rule

● Relies on very simple representation of document– Bag of words

Page 90: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

90

The Bag of Words Representation

Page 91: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

91

The Bag of Words Representation

Page 92: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

92

Bayes’ Rule Applied to Documents and Classes

•For a document d and a class c

P(c | d) =P(d | c)P(c)

P(d)

Page 93: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

93

Naïve Bayes Classifier

MAP is “maximum a posteriori” = most likely class

MAP is “maximum a posteriori” = most likely class

Bayes RuleBayes Rule

Dropping the denominator

Dropping the denominator

cMAP = argmaxcC

P(c | d)

= argmaxcC

P(d | c)P(c)

P(d)

= argmaxcC

P(d | c)P(c)

Document d represented as features x1..xn

Document d represented as features x1..xn

= argmaxcC

P(x1, x2,…, xn | c)P(c)

Page 94: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

94

Naïve Bayes Classifier

How often does this class occur?

How often does this class occur?

O(|X|n•|C|) parametersO(|X|n•|C|) parameters

We can just count the relative frequencies in a corpus

We can just count the relative frequencies in a corpus

Could only be estimated if a very, very large number of training examples was available.

Could only be estimated if a very, very large number of training examples was available.

cMAP = argmaxcC

P(x1, x2,…, xn | c)P(c)

Page 95: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

95

Multinomial Naïve Bayes Independence Assumptions

P(x1, x2,…, xn | c)

• Bag of Words assumption: Assume position doesn’t matter

• Conditional Independence: Assume the feature probabilities P(xi|cj) are independent given the class c.

P(x1,…, xn | c) = P(x1 | c)·P(x2 | c)·P(x3 | c)·...·P(xn | c)

Page 96: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

96

Multinomial Naïve Bayes Classifier

cMAP = argmaxcC

P(x1, x2,…, xn | c)P(c)

cNB = argmaxcC

P(c j ) P(x | c)xX

Õ

positions all word positions in test document

cNB = argmaxc jC

P(c j ) P(xi | c j )ipositions

Õ

Applying Multinomial Naive Bayes Classifiers to Text Classification:

Page 97: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

97

Learning the Multinomial Naïve Bayes Model

• First attempt: maximum likelihood estimates• simply use the frequencies in the data

P̂(c j ) =doccount(C = c j )

Ndoc

fraction of times word wi appears

among all words in documents of topic cj

P̂(wi | c j ) =count(wi,c j )

count(w,c j )wV

å

• Create mega-document for topic j by concatenating all docs in this topic• Use frequency of w in mega-document

Page 98: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

98

Problem with Maximum Likelihood

• What if we have seen no training documents with the word fantastic and classified in the topic positive (thumbs-up)?

• Zero probabilities cannot be conditioned away, no matter the other evidence!

P̂("fantastic" positive) = count("fantastic", positive)

count(w, positivewV

å ) = 0

cMAP = argmaxc P̂(c) P̂(xi | c)i

Õ

Page 99: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

99

Laplace (add-1) smoothing for Naïve Bayes

=count(wi,c)+1

count(w,cwV

å )æ

èçç

ö

ø÷÷ + V

P̂(wi | c) =count(wi,c)

count(w,c)( )wV

å

Page 100: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

100

Multinomial Naïve Bayes: Learning

• Calculate P(cj) terms• For each cj in C do

docsj all docs with class =cj

• From training corpus, extract Vocabulary

P(wk | c j )nk +a

n+a |Vocabulary |

P(c j )| docs j |

| total # documents|

Page 101: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

101

Summary: Naive Bayes is Not So Naive

● Very Fast, low storage requirements● Robust to Irrelevant Features

– Irrelevant Features cancel each other without affecting results

● Very good in domains with many equally important features● Optimal if the independence assumptions hold: If assumed

independence is correct, then it is the Bayes Optimal Classifier for problem

● A good dependable baseline for text classification● But we will see other classifiers that give better accuracy

Page 102: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

102

Evaluation: Precision and Recall

● The 2-by-2 contingency table:

● Precision: % of selected items that are correct● Recall: % of correct items that are selected

correct not correctselected tp fp

not selected fn tn

Page 103: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

103

Evaluation: F1 score

• A combined measure that assesses the P/R tradeoff is F measure (weighted harmonic mean):

• The harmonic mean is a very conservative average;

• People usually use balanced F1 measure i.e., with = 1 (that is, = ½): F = 2PR/(P+R)

RP

PR

RP

F+

+=

-+=

2

2 )1(1

)1(1

1

aa

Page 104: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

104

Evaluation: Confusion matrix c

• For each pair of classes <c1,c2> how many documents from c1 were incorrectly assigned to c2?• c3,2: 90 wheat documents incorrectly assigned to poultry

Docs in test set AssignedUK

Assigned poultry

Assigned wheat

Assigned coffee

Assigned interest

Assigned trade

True UK 95 1 13 0 1 0

True poultry 0 1 0 0 0 0

True wheat 10 90 0 1 0 0

True coffee 0 0 0 34 3 7

True interest - 1 2 13 26 5

True trade 0 0 2 14 5 10

Page 105: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

105

Evaluation: per class measures

Recall: Fraction of docs in class i classified correctly:

Precision: Fraction of docs assigned class i that are actually about class i:

Accuracy: (1 - error rate) Fraction of docs classified correctly:

ciii

å

ciji

åj

å

ciic ji

j

å

ciicij

j

å

Page 106: Lecture 1: Introduction to natural language processing and text ... …panchenko.me/slides/nnlp/lecture1.pdf · 2019. 11. 15. · 29.10.19 Lecture 1: An introduction to NLP and text

29.10.19 Lecture 1: An introduction to NLP and text categorization. Part 4: Text classification.

106

Development Test Sets and Cross-validation

• Metric: P/R/F1 or Accuracy• Unseen test set

• avoid overfitting (‘tuning to the test set’)• more conservative estimate of performance

• Cross-validation over multiple splits• Handle sampling errors from different datasets

• Pool results over each split• Compute pooled dev set performance

Training set Development Test Set Test Set

Test Set

Training Set

Training SetDev Test

Training Set

Dev Test

Dev Test