natural language processing (cse 490u): introduction · daniel jurafsky and james h. martin. speech...
TRANSCRIPT
![Page 1: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/1.jpg)
Natural Language Processing (CSE 490U):Introduction
Noah Smithc© 2017
University of [email protected]
January 4, 2017
1 / 38
![Page 2: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/2.jpg)
What is NLP?
NL ∈ {Mandarin Chinese,English,Spanish,Hindi, . . . , Lushootseed}
Automation of:
I analysis (NL→ R)
I generation (R → NL)
I acquisition of R from knowledge and data
What is R?
2 / 38
![Page 3: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/3.jpg)
analysisgeneration RNL
3 / 38
![Page 4: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/4.jpg)
4 / 38
![Page 5: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/5.jpg)
What does it mean to “know” a language?
5 / 38
![Page 6: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/6.jpg)
Levels of Linguistic Knowledge
phonologyorthography
morphology
syntax
semantics
pragmatics
discourse
phonetics
"shallower"
"deeper"
speech text
lexemes
6 / 38
![Page 7: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/7.jpg)
Orthography
ลกศษยวดกระทงยงยอปดถนนทางขนไปนมสการพระบาทเขาคชฌกฏ หวดปะทะกบเจาถนทออกมาเผชญหนาเพราะเดอดรอนสญจรไมได ผวจ.เรงทกฝายเจรจา กอนทชอเสยงของจงหวดจะเสยหายไปมากกวาน พรอมเสนอหยดจดงาน 15 วน....
7 / 38
![Page 8: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/8.jpg)
Morphology
uygarlastıramadıklarımızdanmıssınızcasına“(behaving) as if you are among those whom we could not civilize”
TIFGOSH ET HA-LELED BA-GAN“you will meet the boy in the park”
unfriend, Obamacare, Manfuckinghattan
8 / 38
![Page 9: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/9.jpg)
The Challenges of “Words”
I Segmenting text into words (e.g., Thai example)
I Morphological variation (e.g., Turkish and Hebrew examples)
I Words with multiple meanings: bank, mean
I Domain-specific meanings: latex
I Multiword expressions: make a decision, take out, make up,bad hombres
9 / 38
![Page 10: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/10.jpg)
Example: Part-of-Speech Tagging
ikr smh he asked fir yo last name
so he can add u on fb lololol
10 / 38
![Page 11: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/11.jpg)
Example: Part-of-Speech Tagging
I know, right shake my head for your
ikr smh he asked fir yo last name
you Facebook laugh out loud
so he can add u on fb lololol
11 / 38
![Page 12: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/12.jpg)
Example: Part-of-Speech Tagging
I know, right shake my head for your
ikr smh he asked fir yo last name! G O V P D A N
interjection acronym pronoun verb prep. det. adj. noun
you Facebook laugh out loud
so he can add u on fb lolololP O V V O P ∧ !
preposition proper noun
12 / 38
![Page 13: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/13.jpg)
Syntax
NP
NP
Adj.
natural
Noun
language
Noun
processing
vs. NP
Adj.
natural
NP
Noun
language
Noun
processing
13 / 38
![Page 14: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/14.jpg)
Morphology + Syntax
A ship-shipping ship, shipping shipping-ships.
14 / 38
![Page 15: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/15.jpg)
Syntax + Semantics
We saw the woman with the telescope wrapped in paper.
I Who has the telescope?
I Who or what is wrapped in paper?
I An event of perception, or an assault?
15 / 38
![Page 16: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/16.jpg)
Syntax + Semantics
We saw the woman with the telescope wrapped in paper.
I Who has the telescope?
I Who or what is wrapped in paper?
I An event of perception, or an assault?
16 / 38
![Page 17: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/17.jpg)
Syntax + Semantics
We saw the woman with the telescope wrapped in paper.
I Who has the telescope?
I Who or what is wrapped in paper?
I An event of perception, or an assault?
17 / 38
![Page 18: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/18.jpg)
Syntax + Semantics
We saw the woman with the telescope wrapped in paper.
I Who has the telescope?
I Who or what is wrapped in paper?
I An event of perception, or an assault?
18 / 38
![Page 19: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/19.jpg)
Semantics
Every fifteen minutes a woman in this country gives birth.
– Groucho Marx
19 / 38
![Page 20: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/20.jpg)
Semantics
Every fifteen minutes a woman in this country givesbirth. Our job is to find this woman, and stop her!
– Groucho Marx
20 / 38
![Page 21: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/21.jpg)
Can R be “Meaning”?
Depends on the application!
I Giving commands to a robot
I Querying a database
I Reasoning about relatively closed, grounded worlds
Harder to formalize:
I Analyzing opinions
I Talking about politics or policy
I Ideas in science
21 / 38
![Page 22: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/22.jpg)
Why NLP is Hard
1. Mappings across levels are complex.I A string may have many possible interpretations in different
contexts, and resolving ambiguity correctly may rely onknowing a lot about the world.
I Richness: any meaning may be expressed many ways, andthere are immeasurably many meanings.
I Linguistic diversity across languages, dialects, genres, styles,. . .
2. Appropriateness of a representation depends on theapplication.
3. Any R is a theorized construct, not directly observable.
4. There are many sources of variation and noise in linguisticinput.
22 / 38
![Page 23: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/23.jpg)
Desiderata for NLP Methods(ordered arbitrarily)
1. Sensitivity to a wide range of the phenomena and constraintsin human language
2. Generality across different languages, genres, styles, andmodalities
3. Computational efficiency at construction time and runtime
4. Strong formal guarantees (e.g., convergence, statisticalefficiency, consistency, etc.)
5. High accuracy when judged against expert annotations and/ortask-specific performance
23 / 38
![Page 24: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/24.jpg)
NLP?= Machine Learning
I To be successful, a machine learner needs bias/assumptions;for NLP, that might be linguistic theory/representations.
I R is not directly observable.
I Early connections to information theory (1940s)
I Symbolic, probabilistic, and connectionist ML have all seenNLP as a source of inspiring applications.
24 / 38
![Page 25: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/25.jpg)
NLP?= Linguistics
I NLP must contend with NL data as found in the world
I NLP ≈ computational linguistics
I Linguistics has begun to use tools originating in NLP!
25 / 38
![Page 26: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/26.jpg)
Fields with Connections to NLP
I Machine learning
I Linguistics (including psycho-, socio-, descriptive, andtheoretical)
I Cognitive science
I Information theory
I Logic
I Theory of computation
I Data science
I Political science
I Psychology
I Economics
I Education
26 / 38
![Page 27: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/27.jpg)
The Engineering Side
I Application tasks are difficult to define formally; they arealways evolving.
I Objective evaluations of performance are always up for debate.
I Different applications require different R.
I People who succeed in NLP for long periods of time are foxes,not hedgehogs.
27 / 38
![Page 28: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/28.jpg)
Today’s Applications
I Conversational agents
I Information extraction and question answering
I Machine translation
I Opinion and sentiment analysis
I Social media analysis
I Rich visual understanding
I Essay evaluation
I Mining legal, medical, or scholarly literature
28 / 38
![Page 29: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/29.jpg)
Factors Changing the NLP Landscape(Hirschberg and Manning, 2015)
I Increases in computing power
I The rise of the web, then the social web
I Advances in machine learning
I Advances in understanding of language in social context
29 / 38
![Page 30: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/30.jpg)
Administrivia
30 / 38
![Page 31: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/31.jpg)
Course Website
http:
//courses.cs.washington.edu/courses/cse490u/17wi/
31 / 38
![Page 32: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/32.jpg)
Your Instructors
Noah (instructor):
I UW CSE professor since 2015, teaching NLP since 2006,studying NLP since 1998, first NLP program in 1991
I Research interests: machine learning for structured problemsin NLP, NLP for social science
Joshua (TA):
I Linguistics Ph.D. student
I Research interests: computational resources for Lushootseed
Sam (TA):
I Computer Science Ph.D. student
I Research interests: machine learning for natural languagesemantics
32 / 38
![Page 33: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/33.jpg)
Outline of CSE 490U
1. Probabilistic language models, which define probabilitydistributions over text passages. (about 1 week)
2. Text classifiers, which infer attributes of a piece of text by“reading” it. (about 1 week)
3. Sequence models (about 1.5 weeks)
4. Parsers (about 2 weeks)
5. Semantics (about 2 weeks)
6. Machine translation (about 1 week)
7. Another advanced topic (about 1 week, time permitting)
33 / 38
![Page 34: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/34.jpg)
Readings
I Main reference text: Jurafsky and Martin, 2008, somechapters from new edition (Jurafsky and Martin, forthcoming)when available
I Course notes from others
I Research articles
Lecture slides will include references for deeper reading on sometopics.
34 / 38
![Page 35: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/35.jpg)
Evaluation
I Approximately five assignments (A1–5), completedindividually (50%).
I Quizzes (15%), given without warning in class, in quizsections, or online
I An exam (30%), to take place at the end of the quarter
I Participation (5%)
35 / 38
![Page 36: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/36.jpg)
Evaluation
I Approximately five assignments (A1–5), completedindividually (50%).
I Some pencil and paper, mostly programmingI Graded mostly on attempt, not correctness
I Quizzes (15%), given without warning in class, in quizsections, or online
I An exam (30%), to take place at the end of the quarter
I Participation (5%)
36 / 38
![Page 37: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/37.jpg)
To-Do List
I Section meetings start next week (January 12), not tomorrow.
I Read: Jurafsky and Martin (2008, ch. 1), Hirschberg andManning (2015).
I Entrance survey (on Canvas).
I Print, sign, and return the academic integrity statement.
37 / 38
![Page 38: Natural Language Processing (CSE 490U): Introduction · Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational](https://reader036.vdocuments.net/reader036/viewer/2022081507/5ecd4072c59797419458518e/html5/thumbnails/38.jpg)
References I
Julia Hirschberg and Christopher D. Manning. Advances in natural languageprocessing. Science, 349(6245):261–266, 2015. URLhttps://www.sciencemag.org/content/349/6245/261.full.
Daniel Jurafsky and James H. Martin. Speech and Language Processing: AnIntroduction to Natural Language Processing, Computational Linguistics, andSpeech Recognition. Prentice Hall, second edition, 2008.
Daniel Jurafsky and James H. Martin. Speech and Language Processing: AnIntroduction to Natural Language Processing, Computational Linguistics, andSpeech Recognition. Prentice Hall, third edition, forthcoming. URLhttps://web.stanford.edu/~jurafsky/slp3/.
38 / 38