nlp tasks - ntu speech processing laboratory
TRANSCRIPT
![Page 1: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/1.jpg)
NLP TasksHung-yi Lee
![Page 2: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/2.jpg)
One slide for this course
Model
ModelModel
Model
Model class Model class
Usually people call them “NLP” tasks.
![Page 3: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/3.jpg)
Category
Model class
Model
class
w1 w2 w3 w4 w5
Model
w1 w2 w3 w4 w5
c1 c2 c3 c4 c5
![Page 4: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/4.jpg)
Category
Encoder
w1 w2 w3 w4 w5
Decoder
Model
seq2seq
w6 w7 w8 w9
+ copy mechanism?
attention
![Page 5: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/5.jpg)
w4 w5w1 w2 w3
Category Model
Model classWhat happy if there are more than two input sequences?
w1 w2 w3 w4 w5
Model Model Model
Integrate
Attention between sequence
w1 w2 w3 w4 w5<SEP>
Simply concatenate …
![Page 6: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/6.jpg)
One Sequence Multiple Sequences
One Class
Sentiment ClassificationStance Detection
Veracity PredictionIntent Classification
Dialogue Policy
NLISearch Engine
Relation Extraction
Class for each Token
POS taggingWord segmentation
Extraction SummarizationSlotting Filling
NER
Copyfrom Input
Extractive QA
GeneralSequence
Abstractive SummarizationTranslation
Grammar CorrectionNLG
General QATask Oriented Dialogue
Chatbot
Other? Parsing, Coreference Resolution
![Page 7: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/7.jpg)
Part-of-Speech (POS) Tagging
• Annotate each word in a sentence with a part-of-speech (e.g. Verb, Adjective, Noun)
John saw the saw.
PN V D N
Model
Input: sequence Output: class for each token
John saw the saw.
Down-stream Task
PN V D N
![Page 8: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/8.jpg)
Word Segmentation
• for Chinese
台 灣 大 學 簡 稱 台 大
Model
Y Y YNNNNN
Input: sequence Output: class for each token
台灣大學 簡稱 台大
Down-stream Task
![Page 9: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/9.jpg)
五人一湧而進。一人大聲叫了起來:「啊哈,你瞧,這裡不明明寫著『楊公再興之神』,這當然是楊再興了。」說話的是桃枝仙。
桃幹仙搔了搔頭,說道:「這裡寫的是『楊公再』,又不是『楊再興』。原來這個楊將軍姓楊,名字叫公再。唔,楊公再,楊公再,好名字啊,好名字。」……桃根仙道:「那麼『興之神』三個字是甚麼意思?』
桃葉仙道:「興,就是高興,興之神,是精神很高興的意思。楊公再這姓楊的小子,死了有人供他,精神當然很高興了。」
桃幹仙道:「很是,很是。」
『楊公再興之神』 (出自《笑傲江湖》)
![Page 10: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/10.jpg)
Parsing Outlier?
Dependency ParsingConstituency Parsing
Source of image: https://en.wikipedia.org/wiki/Parse_tree
The results of parsing can be used in the downstream tasks.
![Page 11: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/11.jpg)
Coreference Resolution (指代消解) Outlier?
Paul Allen was born on January 21, 1953. He attended
Lakeside School, where he befriended Bill Gates. Paul and
Bill used a teletype terminal at their high school, Lakeside,
to develop their programming skills on several time-sharing
computer systems.
Source of example: https://demo.allennlp.org/coreference-resolution/
![Page 12: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/12.jpg)
Summarization
• Extractive summarization
Sentence 1
Sentence 2
Sentence 3
Sentence 4
Sentence 5
……
document summary
Sentence 2
Sentence 4
Input: sequence Output: class for each token
(Here a token is a sentence)
![Page 13: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/13.jpg)
Summarization
• Abstractive summarization
document summary
Input: sequenceOutput: sequence
Pointer network: encouraging direct copy from input
Model
long short
(copy is encouraged)
![Page 14: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/14.jpg)
Machine Translation
Model
Model
Model
Input: sequenceOutput: sequence
Unsupervised machine translation is a critical research direction.
How are you? 你好嗎?
How are you?
How are you?
你好嗎?
你好嗎?
![Page 15: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/15.jpg)
Grammar Error Correction
Source of image: https://www.aclweb.org/anthology/D19-1435.pdf
Model
I am good.
I are good.
Input: sequenceOutput: sequence
(copy is encouraged)
![Page 16: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/16.jpg)
Sentiment Classification
柯南劇場版《紺青之拳》還蠻有趣的
柯南劇場版《紺青之拳》槽點很多
柯南劇場版《紺青之拳》雖然槽點很多,但還蠻有趣的
Input: sequenceOutput: class
柯南劇場版《紺青之拳》雖然還蠻有趣的,但槽點很多
![Page 17: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/17.jpg)
Stance Detection
Used in Veracity Prediction
Many systems use the Support, Denying, Querying, and Commenting (SDQC) labels for classifying replies.
李宏毅是個型男 (post on twitter or FB)
他只是個死臭酸宅
Input: two sequencesOutput: a class
![Page 18: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/18.jpg)
Veracity Prediction
Input: several sequencesOutput: class
Source of image: https://www.aclweb.org/anthology/S17-2006.pdf
Model
Post
Replies
Wikipedia
True/False
![Page 19: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/19.jpg)
premise: A person on a horse jumps over a broken down airplane
hypothesis: A person is at a diner.
hypothesis: A person is outdoors, on a horse.
hypothesis: person is training his horse for a competition.
Natural Language Inference (NLI)
contradiction
entailment
neutral
Model
contradictionentailmentneutral
Input: two sequencesOutput: a class
![Page 20: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/20.jpg)
Search Engine
Source of image: https://www.blog.google/products/search/search-language-understanding-bert/
Input: two sequencesOutput: a class
Model
relevant
![Page 21: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/21.jpg)
Question Answering (QA)
![Page 22: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/22.jpg)
Source of image: https://web.stanford.edu/~jurafsky/slp3/25.pdf
Watson
![Page 23: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/23.jpg)
Question Answering
QAKnowledge source
question
answer
Reading comprehension
unstructured documents
search engine
Input: several sequencesOutput: sequence
Filter out irrelevant documents
![Page 24: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/24.jpg)
Question Answering
• Extractive QA: Answer in the document
17
77 79
𝑠 = 17, 𝑒 = 17
𝑠 = 77, 𝑒 = 79
documentquestion
Input: several sequencesOutput: sequence
𝑠 𝑒
QA
(copy from input)
![Page 25: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/25.jpg)
Dialogue
Chatting
Task-oriented
![Page 26: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/26.jpg)
Chatting
User: What is your name?
Cleverbot: Weil ich dich nicht kennw.
User: What is your name?
Cleverbot: Li.
User: Are you Li?
Cleverbot: No i'm Mara.
User: Are you Mara?
Cleverbot: No.
https://www.cleverbot.com/
User: Who you are?
Cleverbot: I just said your name you don't need to know my name.
User: Do you know my name?
Cleverbot: No that is why I am asking you.
User: Can you tell me your name?
Cleverbot: I already told you.
User: You are Mara?
Cleverbot: I'm sorry.
![Page 27: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/27.jpg)
Chatting
User: Sorry for what?
Cleverbot: For not remembering your name.
User: Do you remember your name?
Cleverbot: Yes I do...
User: What is your name?
Cleverbot: Once again, I can't tell you.
User: You have told me that you are Mara.
Cleverbot: I am a boy.
User: Are you a boy?
Cleverbot: Yes.
User: Are you a girl?
Cleverbot: No, are you?
User: No. How old are you?
Cleverbot: 2000 years old.
User: You are so old!
Cleverbot: No i'm quite young.
User: How old are you?
Cleverbot: Why do you keep asking me that?
User: Because I want to know your age.
Cleverbot: What if I do not want to tell you.
User: Then don't tell me your age.
Cleverbot: My name is Blue Monday.
https://www.cleverbot.com/
![Page 28: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/28.jpg)
Chatting
MachineHuman
Hello
Hi
ModelHello Hi
Hi
![Page 29: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/29.jpg)
Chatting
MachineHuman
Hello
how's your day?
ModelH: Hello
how's your day?
Hi
M: Hi
![Page 30: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/30.jpg)
Chatting
ModelH: Hello
how's your day?M: Hi
Personality
Empathy
Knowledge
![Page 31: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/31.jpg)
Task-oriented
(source of image: 科技大擂台 2020 簡章範例)
Model
![Page 32: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/32.jpg)
Action: Greeting
Ask (入住日)
Ask (退房日)……
Natural Language Generation (NLG)
Model
NLG Ask (訂房人數)
![Page 33: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/33.jpg)
Policy & State Tracker
StateTracker
NLG
連絡人: 林XX入住人數: None連絡電話: None房型: None入住日期:9月9日退房日期:9月11日
State: What has happened in this dialogue
Ask (訂房人數)
Policy
StatePreprocessing by NLU
![Page 34: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/34.jpg)
Natural Language Understanding (NLU)• Intent Classification
• Slot Filling
我 打算 9月 9日 入住、 9月 11日 退房
入住日 入住日 退房日 退房日N N N N
我 打算 9月 9日 入住、 9月 11日 退房
Provide Information
Slot:入住日退房日……
![Page 35: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/35.jpg)
Task-oriented
NLU
PolicyNLG
ASR
TTS
User Input
SystemOutput
State Tracker
state
Input: several sequences, Output: a sequence
End-to-end
Dialogue History
![Page 36: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/36.jpg)
Knowledge Extraction
![Page 37: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/37.jpg)
Knowledge entity
Hogwarts
Ron Weasley
Hermione Granger
relation
Step 1: Extract Entity
Step 2: Extract Relation
(warning: The following description oversimplifies the task)
![Page 38: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/38.jpg)
Name Entity Recognition (NER)
Harry Potter is a student of Hogwarts and lived on Privet Drive.
Name entity recognition
people, organizations, places are usually name entity
Harry Potter is a student of Hogwarts and lived on Privet Drive.
Input: sequence Output: class for each token
(just like POS tagging, slot filling)
![Page 39: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/39.jpg)
Harry Potter is a student of Hogwarts and lived on Privet Drive.
RelationExtraction “is a student of”
Relation Extraction
“none”
Classification Problem
Harry Potter
Hogwarts
Harry Potter is a student of Hogwarts and lived on Privet Drive.
Hogwarts
Privet Drive
RelationExtraction
![Page 40: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/40.jpg)
GLUE
• Corpus of Linguistic Acceptability (CoLA)
• Stanford Sentiment Treebank (SST-2)
• Microsoft Research Paraphrase Corpus (MRPC)
• Quora Question Pairs (QQP)
• Semantic Textual Similarity Benchmark (STS-B)
• Multi-Genre Natural Language Inference (MNLI)
• Question-answering NLI (QNLI)
• Recognizing Textual Entailment (RTE)
• Winograd NLI (WNLI)
General Language Understanding Evaluation (GLUE)
https://gluebenchmark.com/
GLUE also has Chinese version (https://www.cluebenchmarks.com/)
![Page 41: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/41.jpg)
Super GLUEhttps://super.gluebenchmark.com/
![Page 42: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/42.jpg)
Super GLUEhttps://super.gluebenchmark.com/
![Page 43: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/43.jpg)
DecaNLP
• 10 NLP tasks
• Solving by the same model
https://decanlp.com/Decathlon
![Page 44: NLP Tasks - NTU Speech Processing Laboratory](https://reader030.vdocuments.net/reader030/viewer/2022020700/61f4b525ff6530776104c189/html5/thumbnails/44.jpg)
One Sequence Multiple Sequences
One Class
Sentiment ClassificationStance Detection
Veracity PredictionIntent Classification
Dialogue Policy
NLISearch Engine
Relation Extraction
Class for each Token
POS taggingWord segmentation
Extraction SummarizationSlotting Filling
NER
Copyfrom Input
Extractive QA
GeneralSequence
Abstractive SummarizationTranslation
Grammar CorrectionNLG
General QATask Oriented Dialogue
Chatbot
Other? Parsing, Coreference Resolution