alleviating data sparsity for twitter sentiment analysis
Post on 27-Jan-2015
122 Views
Preview:
DESCRIPTION
TRANSCRIPT
Alleviating Data Sparsity For Twitter Sentiment Analysis
Hassan Saif, Yulan He & Harith AlaniKnowledge Media Institute, The Open University, Milton
Keynes, United Kingdom
Making Sense of Microblogs – WWW2012 ConferenceLyon - France
• Hello World• Motivation• Related Work• Semantic Features• Topic-Sentiment Features• Evaluation• Demos• The Future
Outline
Sentiment Analysis
“Sentiment analysis is the task of identifying positive and negative opinions, emotions and evaluations in text”
3
The main dish was delicious It is a Syrian dish The main dish was
salty and horrible
Opinion OpinionFact
Microblogging
4
• Service which allows subscribers to post short updates online and broadcast them
• Answers the question: What are you doing now?
• Twitter, Plurk, sfeed, Yammer, BlueTwi, etc.
Together!
Sense?
Sense?
10
200000
400000
600000
800000
1000000
1200000
1400000
1143562
713178
Objective Tweets Subjective Tweets
UK General Elections Corpus
AGARWAL et al.
BARBOSA, L., AND FENG, J.
BIFET, A., AND FRANK, E.
DIAKOPOULOS, N., AND SHAMMA, D.
GO et al.
He & Saif
PAK & PAROUBEK
And ManyOthers
df
Why
Because It is Critical
Private Sectors
Public Sectors
Keep In Touch
Related Work
Sentiment Analysis
Machine Learning Approach
Right Features
Text Classification Problem Lexical Based Approach
Building Better Dictionary
Word Polarity
Twitter Sentiment Analysis
– The short length of status update
– Language Variations
– Open Social Environment
Challenges
Twitter Sentiment Analysis
• Distant Supervision– Supervised classifiers trained from noisy labels
– Tweets messages are labeled using emoticons
– Data filtering process
Go et al., (2009) - Barbosa and Fengl. (2010) – Pak and Paroubek (2010)
Related Work
Twitter Sentiment Analysis
• Followers Graph & Label Propagation – Twitter follower graph (users, tweets, unigrams
and hashtags)– Start with small number of labeled tweets– Applied label propagation method throughout the
graph.
Speriosu et al., (2009)
Related Work
Twitter Sentiment Analysis
• Feature Engineering– Unigrams, bigrams, POS– Microblogging features • Hashtags• Emoticons• Abbreviations & Intensifiers
Agsrwal et al., (2011) – Kouloumpis et al (2011)
Related Work
So?
What Does sparsity mean?
Training data contains many infrequent terms
What Does sparsity mean?
Word frequency statistics
How!
Semantic Features
Sentiment Topic FeaturesExtracts semantically hidden concepts from tweets data and then incorporates them into supervised classifier training by interpolation Extract latent topics and
the associated topic sentiment from the tweets data which are subsequently added into the original feature space for supervised classifier training
Semantic Features
Shallow Semantic Method
Sushi time for fabulous Jesse's last day on dragons den
@Stace_meister Ya, I have Rugby in an hour
Dear eBay, if I win I owe you a total 580.63 bye paycheckCompany
Person
Sport
Sematic Features
Interpolation Method
Topic-Sentiment Features
Joint Sentiment Topic ModelJST1 is a four-layer generative model which allows the detection of both sentiment and topic simultaneously from text.
The only supervision is word prior polarity information which can be obtained from MPQA subjectivity lexicon.
Lin & He. 2009
Twitter Sentiment Corpus
Collected: the 6th of April & the 25th of June 2009
Training Set: 1.6 million tweets (Balanced)
Testing Set: 177 negative tweets & 182 positive tweets
Stanford University
http://twittersentiment.appspot.com/
Our Sentiment Corpus
• Training Set: 60K tweets• Testing Set: 1000 tweets
• Annotated additional 640 tweets using Tweenator
Evaluation
Method Accuracy
Unigrams 80.7%
Semantic replacement 76.3%
Semantic interpolation 84.0%
Sentiment-topic features 82.3%
Sentiment classification results on the 1000-tweets test set
Extended Test Set
Evaluation
Method Accuracy
Unigrams 81.0%
Semantic replacement 77.3%
Sematic augmentation 80.45%
Semantic interpolation 84.1%
Sentiment-topic features 86.3%
(Go et al., 2009) 83%
(Speriosu et al., 2011) 84.7%
Sentiment classification results on the original Stanford Twitter Sentiment test set
Stanford Dataset
Evaluation
Semantic Features V.s Sentiment-Topic Features
Tweenator
http://tweenator.com
Conclusion
30
• Twitter sentiment analysis faces data sparsity problem due to some special characteristics of Twitter
• Semantic & topic-sentiment features reduce the sparsity problem and increase the performance significantly
• Sentiment-topic features should be preferred over semantic features for the sentiment classification task since it gives much better results with far less features.
The Future
Future Work
Semantic Smoothing Model
Statistical replacement
Enhance Entity Extraction Methods
Attaching weight to extracted features
Sentiment-Topic Model
References[1] AGARWAL, A., XIE, B., VOVSHA, I., RAMBOW, O., AND PASSONNEAU, R. Sentiment analysis of twitter data. In Proceedings of the ACL 2011 Workshop on Languages in Social Media (2011), pp. 30–38.
[2] BARBOSA, L., AND FENG, J. Robust sentiment detection on twitter from biased and noisy data. In Proceedings of COLING (2010), pp. 36–44.
[3] BIFET, A., AND FRANK, E. Sentiment knowledge discovery in twitter streaming data. In Discovery Science (2010), Springer, pp. 1–15.
[4] GO, A., BHAYANI, R., AND HUANG, L. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford (2009).
[5] KOULOUMPIS, E., WILSON, T., AND MOORE, J. Twitter sentiment analysis: The good the bad and the omg! In Proceedings of the ICWSM (2011).
[5]LIN, C., AND HE, Y. Joint sentiment/topic model for sentiment analysis. In Proceeding of the 18th ACM conference on Information and knowledge management (2009), ACM, pp. 375–384.[6] PAK, A., AND PAROUBEK, P. Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of LREC 2010 (2010).
[7]SAIF, H., HE, Y., AND ALANI, H. Semantic Smoothing for Twitter Sentiment Analysis. In Proceeding of the 10th International Semantic Web Conference (ISWC) (2011).
Thank
You
top related