click to add title a systematic framework for sentiment identification by modeling user social...
TRANSCRIPT
Click to Add Title
A Systematic Framework for Sentiment Identification by Modeling User Social Effects
Kunpeng ZhangAssistant Professor
Department of Information and Decision Sciences
University of Illinois at [email protected]
A World-Class Education, A World-Class City
Agenda• Introduction• Problem statement• Methodology• Experiments and results• Conclusion and future work
A World-Class Education, A World-Class City
Co-authors• Yi Yang, Ph.D. student at Northwestern
University
• Aaron Sun, Research Scientist, Samsung Research America
• Hengchang Liu, Assistant Professor at University of Science and Technology of China
A World-Class Education, A World-Class City
Introduction• User generated content on social media
platforms• Data analysis for intelligent marketing
decisions• Voice of consumers
– Positive / negative aspects
A World-Class Education, A World-Class City
Problem Statement• Given a sentence (usually, it is user-
generated content on social media platforms, such as comments on Facebook, tweets on Twitter, review on Amazon.com, etc.), we classify it into one of three categories:– Positive: directly or indirectly praise something, e.g.
“I love it! (^_^)”– Negative: directly or indirectly criticize something,
e.g. “We don’t like it at all. ”– Objective: No sentiments, or express a fact. e.g.
“Apple will release a new iPhone in next two months.”
A World-Class Education, A World-Class City
Previous Work• Bag-of-word approaches
– Collecting keywords [5, 7, 21, 26]• Rule-based methods
– From the perspective of language characteristics [6, 22]
• Machine learning based methods – Sentence-level and document-level [7, 8, 10,
29]• However,
– None of them considers user social effects…
A World-Class Education, A World-Class City
Methodology• Systematic
framework• Classification
problem• 4 major
features:– Peer influence– User preference– User profile– Textual
sentiment
A World-Class Education, A World-Class City
Methodology 1 – User Preference (UserPref)• User preference can somehow reflects
user sentiments.• Item-based collaborative filtering on user-
item matrix– Row: user (millions)– Column: brand (thousands)– The element mij is 1 if user i “likes”
brand j, otherwise 0
m11, m12, …………,
m1n
m21, m22, …………,
m2n
……………
mm1, mm2, ……….., mmn
Note: “like” – like a brand on Facebook, following a brand on Twitter, give a high rating for a product on Amazon, etc.
A World-Class Education, A World-Class City
Methodology 1 – User Preference (UserPref)• Two important issues using collaborative
filtering– Data sparsity
• Integrate multiple low-lever items into fewer high-lever items
– “Mac” and “iPhone” “Computer and Electronics”
– Similarity calculation and preference prediction• Which similarity measure is better?
– Cosine, Pearson correlation, Tanimoto correlation,log-likelihood based, Euclidean distance-based.
• Weighted sum strategy to approximate user preference
A World-Class Education, A World-Class City
Methodology 2 – Peer Influence (PeerInf)• Herding behavior in social psychology.
– We assume that if most of previous comments in one discussion are positive, it is likely to give a positive comment, and similarly for the negative case.
– We randomly pick 1, 000 posts from 5 different Facebook pages and 1, 000 discussion threads from 5 different airlines on the Flyertalk.com forum. The average number of comments per post and per thread is 794 and 32, respectively.
– The sentiments are identified by the state-of-the-art textual algorithm.
A World-Class Education, A World-Class City
Methodology 2 – Peer Influence
A World-Class Education, A World-Class City
Methodology 2 – Peer Influence Modeling
A World-Class Education, A World-Class City
Methodology 3 – User Profile (GenCat)• Female are more positive than male and
fashion page has a higher percentage of positive sentiments than politician page on Facebook and Twitter.Name (Topic) Gender Positive ratio Number of comments + tweets
Barack Obama (Politician)
M 0.61 6,837,096
F 0.69
Chicago Bulls (Sports)
M 0.68 462,092
F 0.79
DKNY (Fashion) M 0.94 14,284
F 0.96
A World-Class Education, A World-Class City
Methodology 4 – Textual Sentiment (TextSent)• State-of-the-art textual sentiment
identification algorithm• Ensemble method integrating three
individual algorithms– Semantic rules based on language
characteristics– Numeric strength computing– Bag-of-word
• Accuracy: ~86%
A World-Class Education, A World-Class City
Experiments and Results• Data collection
– Facebook: posts, comments, likes, user profile
– Twitter: tweets, follower, user profile– Amazon: product and reviews – Flyertalk (airline discussion forum):
discussions• Data cleaning
– Remove spam users
A World-Class Education, A World-Class City
Experiments and Results• The features of learning model for 4
datasets and their differences. Topic is modified based on the raw Facebook category. “×”: missed; “√”: existing.
Data source
TextSent UserPref PeerInf
GenCatGender Topic
Facebook Comments User-post likes on category
√ √ Predefined category
Twitter Tweets User-category following
√ √ Predefined category
Amazon Product reviews
User-product rating √ × Product category
Flyertalk Airline discussions
× √ × Airline types
A World-Class Education, A World-Class City
Experiments and Results• Similarity measure check.
– MAE and RMSE to compare the average estimated error between real preference and predicted preference
• Hadoop-based collaborative filtering implemented by Mahout.– Takes 34 and 21 minutes to approximate
user preferences for Facebook and Twitter
– Can NOT complete in 10 hours for single CPU.
A World-Class Education, A World-Class City
Experiments and Results• Facebook
data• Twitter data• Amazon.com
data
A World-Class Education, A World-Class City
Experiments and Results• Classification accuracy (SS: semantic +
syntactic features used in [28])
A World-Class Education, A World-Class City
Conclusion and Future Work• We propose a systematic framework
to identify social media sentiments by modeling user social effects: user preference, peer influence, user profile, and textual sentiment itself.
• However,– More networked data could be
incorporated.– More efficient algorithms to calculate
user preference.
A World-Class Education, A World-Class City
Thank you