nlp for social media: pos tagging, sentiment analysispawang/courses/sc16/nlp_social3.pdf ·...
TRANSCRIPT
![Page 1: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/1.jpg)
NLP for Social Media: POS Tagging, Sentiment Analysis
Pawan Goyal
CSE, IITKGP
August 05, 2016
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 1 / 23
![Page 2: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/2.jpg)
Part-of-Speech (POS) Tagging
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 2 / 23
![Page 3: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/3.jpg)
Penn Treebank POS Tags
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 3 / 23
![Page 4: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/4.jpg)
Part-of-Speech (POS) Tagging
Words often have more than one POS
POS tagging problem is to determine the POS tag for a particular instance of aword.
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 4 / 23
![Page 5: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/5.jpg)
Part-of-Speech (POS) Tagging
Words often have more than one POS
POS tagging problem is to determine the POS tag for a particular instance of aword.
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 4 / 23
![Page 6: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/6.jpg)
Relevant papers and tool
Kevin Gimpel, Nathan Schneider, Brendan O’Connor, Dipanjan Das,Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, JeffreyFlanigan, and Noah A. Smith. Part-of-Speech Tagging for Twitter:Annotation, Features, and Experiments. In Proceedings of ACL 2011.
Olutobi Owoputi, Brendan O’Connor, Chris Dyer, Kevin Gimpel, NathanSchneider and Noah A. Smith. Improved Part-of-Speech Tagging forOnline Conversational Text with Word Clusters. In Proceedings of NAACL2013.
Parts-of-Speech tagget for twitter -http://www.cs.cmu.edu/~ark/TweetNLP/
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 5 / 23
![Page 7: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/7.jpg)
POS Tagset for Twitter
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 6 / 23
![Page 8: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/8.jpg)
Twitter-specific Tags
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 7 / 23
![Page 9: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/9.jpg)
The case of Hashtags
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 8 / 23
![Page 10: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/10.jpg)
Combined forms
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 9 / 23
![Page 11: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/11.jpg)
Tagging Scheme
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 10 / 23
![Page 12: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/12.jpg)
Tagging Scheme
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 11 / 23
![Page 13: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/13.jpg)
Tagging Scheme
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 12 / 23
![Page 14: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/14.jpg)
Building the POS tagger
CRF model was used. Some insighful features:
Twitter orthography: Features for several regular expression-style rulesthat detect at-mentions, hashtags, URLs etc.
Frequently-capitalized tokens: Compiled gazetteers of tokens whichare frequently capitalized. Features would be - membership in the top Nfrequently capitalized items.
Traditional tag dictionary: Features for all coarse-grained tags thateach word occurs with in the Penn Treebank.
Phonetic normalization: Metaphone algorithm to create a coarsephonetic normalization of words to simpler keys.
Distributional similarity: Distributional features from the successor andpredecessor probabilities for the 10,000 most common terms.
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 13 / 23
![Page 15: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/15.jpg)
Building the POS tagger
CRF model was used. Some insighful features:
Twitter orthography: Features for several regular expression-style rulesthat detect at-mentions, hashtags, URLs etc.
Frequently-capitalized tokens: Compiled gazetteers of tokens whichare frequently capitalized. Features would be - membership in the top Nfrequently capitalized items.
Traditional tag dictionary: Features for all coarse-grained tags thateach word occurs with in the Penn Treebank.
Phonetic normalization: Metaphone algorithm to create a coarsephonetic normalization of words to simpler keys.
Distributional similarity: Distributional features from the successor andpredecessor probabilities for the 10,000 most common terms.
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 13 / 23
![Page 16: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/16.jpg)
Building the POS tagger
CRF model was used. Some insighful features:
Twitter orthography: Features for several regular expression-style rulesthat detect at-mentions, hashtags, URLs etc.
Frequently-capitalized tokens: Compiled gazetteers of tokens whichare frequently capitalized. Features would be - membership in the top Nfrequently capitalized items.
Traditional tag dictionary: Features for all coarse-grained tags thateach word occurs with in the Penn Treebank.
Phonetic normalization: Metaphone algorithm to create a coarsephonetic normalization of words to simpler keys.
Distributional similarity: Distributional features from the successor andpredecessor probabilities for the 10,000 most common terms.
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 13 / 23
![Page 17: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/17.jpg)
Building the POS tagger
CRF model was used. Some insighful features:
Twitter orthography: Features for several regular expression-style rulesthat detect at-mentions, hashtags, URLs etc.
Frequently-capitalized tokens: Compiled gazetteers of tokens whichare frequently capitalized. Features would be - membership in the top Nfrequently capitalized items.
Traditional tag dictionary: Features for all coarse-grained tags thateach word occurs with in the Penn Treebank.
Phonetic normalization: Metaphone algorithm to create a coarsephonetic normalization of words to simpler keys.
Distributional similarity: Distributional features from the successor andpredecessor probabilities for the 10,000 most common terms.
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 13 / 23
![Page 18: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/18.jpg)
Building the POS tagger
CRF model was used. Some insighful features:
Twitter orthography: Features for several regular expression-style rulesthat detect at-mentions, hashtags, URLs etc.
Frequently-capitalized tokens: Compiled gazetteers of tokens whichare frequently capitalized. Features would be - membership in the top Nfrequently capitalized items.
Traditional tag dictionary: Features for all coarse-grained tags thateach word occurs with in the Penn Treebank.
Phonetic normalization: Metaphone algorithm to create a coarsephonetic normalization of words to simpler keys.
Distributional similarity: Distributional features from the successor andpredecessor probabilities for the 10,000 most common terms.
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 13 / 23
![Page 19: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/19.jpg)
Related Problems
Entity RecognitionNamed Entity: Names of people, places, organization
Date and time
Can you model it as a sequence labeling problem?
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 14 / 23
![Page 20: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/20.jpg)
Related Problems
Entity RecognitionNamed Entity: Names of people, places, organization
Date and time
Can you model it as a sequence labeling problem?
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 14 / 23
![Page 21: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/21.jpg)
Sentiment Analysis
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 15 / 23
![Page 22: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/22.jpg)
Sentiment Analysis: Relevant Paper
Pak, Alexander, and Patrick Paroubek. “Twitter as a Corpus for SentimentAnalysis and Opinion Mining.” LREC. Vol. 10. 2010.
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 16 / 23
![Page 23: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/23.jpg)
Sentiment Analysis: Examples
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 17 / 23
![Page 24: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/24.jpg)
Data Creation
How to do that without manual labeling?
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 18 / 23
![Page 25: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/25.jpg)
Data Creation
How to do that without manual labeling?
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 18 / 23
![Page 26: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/26.jpg)
Data Creation
How to do that without manual labeling?
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 18 / 23
![Page 27: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/27.jpg)
POS tag Distribution: Subjective vs. Objective
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 19 / 23
![Page 28: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/28.jpg)
POS tag Distribution: Positive vs. Negative
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 20 / 23
![Page 29: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/29.jpg)
Classification Model
Naïve Bayes Model: Main featuresPOS tags, Word n-grams
Using negations in Word n-grams
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 21 / 23
![Page 30: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/30.jpg)
Classification Model
Naïve Bayes Model: Main featuresPOS tags, Word n-grams
Using negations in Word n-grams
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 21 / 23
![Page 31: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/31.jpg)
Disregarding common n-grams
Using entropy and salience
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 22 / 23
![Page 32: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/32.jpg)
Disregarding common n-grams
Using entropy and salience
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 22 / 23
![Page 33: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/33.jpg)
Disregarding common n-grams
Using entropy and salience
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 22 / 23
![Page 34: NLP for Social Media: POS Tagging, Sentiment Analysispawang/courses/SC16/nlp_social3.pdf · Sentiment Analysis: Relevant Paper Pak, Alexander, and Patrick Paroubek. “Twitter as](https://reader033.vdocuments.net/reader033/viewer/2022051913/60040872df6d0e53ee4a5fb8/html5/thumbnails/34.jpg)
N-grams with high salience and low entropy
Pawan Goyal (IIT Kharagpur) NLP for Social Media: POS Tagging, Sentiment Analysis August 05, 2016 23 / 23