cs 599: social media analysis university of southern california1 sentiment analysis kristina lerman...

45
CS 599: Social Media Analysis University of Southern California 1 Sentiment Analysis Kristina Lerman University of Southern California

Upload: donna-adams

Post on 17-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

CS 599: Social Media Analysis

University of Southern California 1

Sentiment Analysis

Kristina LermanUniversity of Southern California

How do people feel about movies?

Estimate this…Estimate this…

…using only this!…using only this!

Huge interest, but why?• Help consumers and brands understand the opinions being

expressed about– Events

• Court decisions, protests, acts of congress– Products

• Movies, consumer electronics– People

• Political candidates, Dictators– Locations

• Restaurants, hotels, vacation destinations

Mood and emotion• Moods are physiological in origin

– Influenced by levels of neurotransmitters, hormones, …• Moods also depend on external factors

– Daily routing, work, commuting, eating, …– Products used by a person

• Two dimensions of mood– Positive affect

• Enthusiasm, delight, activeness, alertness, happiness, …– Negative affect

• Distress, fear, anger, guilt, disgust, sadness, …• Can we accurately measure mood from text?

Main ideas• Text messages (tweets, blog posts) use distinctive words to

convey emotions– Identify features (words, linguistic features) that are highly

indicative of emotions• Train classifier to recognize emotion in text

– Supervised machine learning• Need labeled data to train classifier• Features are noisy. How to filter them to improve classifier

performance?• What classifier to use?

– Automatically classify the emotion of a new text message using only the features of the message

Recognizing blog moods

“ESSE: Exploring mood on the Web” by Sood and Vasserman•Main idea

– Current search engines are able to find content on the web efficiently but do little to connect the user and content with emotion.

• Searching for a blog post based on keywords simply returns the posts most associated with those words, but the emotional nature of the posts is not considered.

– Train a classifier to recognize the emotion in texts•Contributions

– Training data collected from LiveJournal, with user-labeled emotions

– Handling challenges: noisy labels

Google Blog Search

ESSE: Emotional State Search Engine

• Searches an index of blogs for key terms• Allows the user to search based on one or more emotions as

well as choose a source of indexed blogs

Classification• Train classifier to recognize the mood of LiveJournal posts

– Training phase• More than 600,000 posts, each represented as a feature vector• Each post labeled by author with one of 130 different moods • Naïve Bayes classifier relates post with its label (user indicated

mood)– Test phase

• Classifier labels a new post with a mood

Training data: LiveJournal

Mood distribution

Classification (cont)• But, classifier is prone to overfitting

– Too many noisy labels, not enough generalization– Solution: use clustering to reduce the number of labels by

collapsing related labels into one of 4 (happy, sad, angry, other)

• K Means clustering was used to cluster the posts into 3 groups (happy, sad, or angry), removing the outliers and reducing the data set to 31 moods, or ~130,000 posts

K Means clustering• Each mood represented as a feature vector

– Component of the vector is the number of times that feature occurred in all posts tagged with that mood

– Moods “happy”, “sad” and “angry” are initial cluster centroids

happysad

angry

hyper

giddy

gloomy

irate

K Means clustering• Iterate until no further change

– Moods closest to cluster centroid are assigned to that cluster

– Recalculate cluster centroid

happysad

angry

hyper

giddy

gloomy

irate

Reduced labels

Naïve Bayes classifier

• The probability of a post being classified as class c, given a set of feature f is equal to the prior (P(c)) times the product of all possibilities of all features, given class c.

• Post classified as most likely class, i.e., class c with highest conditional probability

ESSE query

• After training, the ESSE system is able to search different given indexes

• Index scores higher for words used frequently in a document but less so for words that are used frequently in many documents

• Mood classification and filtering is performed on the fly

Evaluation

R = posts relevant to C

P = posts labeled with C

Evaluation TP = true positives

TP FP

TN

FNFP = false positives

TN = true negatives

FN = false negatives

Evaluation

Sentiment of Twitter posts

“Twitter as a Corpus for Sentiment Analysis and Opinion Mining” by Pak & Paroubek•Main idea

– People widely use microblogging platforms (e.g., Twitter) to express opinions. Understanding opiniosns would be useful for marketing and social sciences

– But, it is challenging to extract sentiment from microblog posts, because they are very short (e.g., 140 characters)

•Contributions– Automatically collect training data from Twitter– Use linguistic features to automatically recognize the

sentiment of posts• Positive, negative, objective

Twitter sentiment• Twitter posts often express opinions

– Which posts express positive sentiment? Negative sentiment?

• Posts are short: few words to go by to recognize an opinion

Sentiment classification• Train classifier to recognize positive and negative sentiment• But, need lots of training data containing posts expressing

positivepositive and negativenegative opinions, as well as objectiveobjective posts not expressing an opinion

Training data collection• Query Twitter for posts containing

– Happy emoticons… :-), :), =), :D, … positive posts– Sad emoticons… :-(, :(, =(, ;(, … negative posts– Links to news articles objective posts

Zipf law

Distribution of word counts in the data set is a power-law

Do linguistic features help?

Subjective vs objective posts

• Relative prevalence of POS tags across subjective posts (positive or negative) and objective posts

“Wow”, “OMG”“I”, “he”

“I found”, “you saw”

person, place or thing

“most”, “best”

P

ob

jsu

b

Negative vs Positive • Relative prevalence of POS tags across negative and positive

posts• Prevalence has less discriminative power than for objective vs

subjective posts

P

+ve

-ve

“whose”

“most”, “best”

“missed”, “bored”

Supervised Machine Learning

Trojans Rule! Positive

input output

Sentiment Classifier

Labeled messages

Classifying the sentiment of tweets• Train the classifier

– Features • Remove stop words, URLs • n-gram: sequence of n consecutive words from a post• binary (0,1) feature reflecting presence or absence of an n-gram

– Filtering• Discard common n-grams, which are uniformly distributed across

all data set. These don’t allow to discriminate between sentiments• Entropy of an n-gram g across different sentiments S.

• High entropy g is evenly distributed across all sentiments

Given a message M, what is its sentiment s?

Unigram

(1-gram)

Positive Message Count

Negative Message Count

Objective Message Count

trojans 6 5 5

rule 22 6 25

great 40 1 2

home 10 10 10

bad 2 30 2

news 3 7 44

Total count

5000 5000 5000

Example of calculating P(s|M)

P(+|”trojans rule”) = P(+)* product of probabilities P(unigrams|+)

= P(+) * P(“trojans”|+) * P(“rule”|+)

=0.333 * 6/5000 * 22/5000

• Similarly for P(-|”trojans rule”) and P(obj|”trojans rule”)• Sentiment with the largest probability wins.

Results• Classify the sentiment of 200 messages.• Ground truth: messages were manually annotated for their

sentiment

recall

pre

cisi

on

F-m

easu

re

# training samples

Summary• Authors of Twitter messages use linguistic features to

describe emotions (positive or negative sentiment messages) or state facts (objective messages)– Some part-of-speech tags may be strong indicators of

emotional text• Use examples of positive, negative, and objective messages

collected from Twitter to train a classifier – Recognize sentiment of a new message based on its words

and POS tags

Global mood patterns

“Diurnal and seasonal moods vary with work, sleep and daylength across diverse cultures” by Golder and Macy•Can automated sentiment analysis be applied to social media data to provide a global picture of human mood?•Do moods have a time scale: diurnal, seasonal?

Corpus of Twitter tweets• Up to 400 public messages from each user• 2.4 million individuals worldwide• 509 million messages between 2/08-1/10• 84 identified countries• English only• Date, Time, and country latitude

LIWC• Linguistic Inquiry and Word Count• James W. Pennabaker, U. Texas @ Austin

– “Virtually no one in psychology has realized that low-level words can give clues to large-scale behaviors”

– Recent book: The Secret Life of Pronouns (2011)• 4,500 words and word stems

– Each in one or more word categories• “cried” in sadness, negative emotion, overall affect, verb, past

tense verb. • 0.88 sensitivity and 0.97 specificity

Testing LIWC Onlinehttp://liwc.net/liwcresearch07.php

Methodology• Examined within-individual Positive Affect (PA) and Negative

Affect (NA) independently, – E.g., fraction of PA words appearing in an individual’s

messages every hour

– To eliminate between-individual variation, subtract the mean: PAu*(h) = PAu(h) - <PAu(h)>

• Additional analysis on 4 English-speaking regions: Africa, India, UK/Aus, US/Can

Two peaks in PA daily; PA is higher on weekends

Mood governed by diurnal cycles, not culture

PA is higher when days are growing longer

No significant correlation

Small correlation

Digression: significance testing• “PA is higher when change in daylength is positive …

(r=0.00121, p<0.001)”

• Is there a trend? (measure correlation)– Calculate correlation: y vs x

• Is the trend significant? Or can it be observed purely by chance?– Null-hypothesis: there is no trend– Calculate p-value

P-value• "reject the null hypothesis" when the p-value turns out to be

less than a predetermined significance level– Significance level often set to 0.05 or 0.01

• Correlation values are normally distributed – P-value is the probability of falsely rejecting the null

hypothesis

Summary• Confirm findings from psychological studies

– Psychology studies are small scale, on homogeneous population vs Twitter study on large, heterogeneous population

– Mood changes are associated with diurnal (sleep-wake) cycles

• PA highest in the morning and before midnight• PA highest on weekends • Universal and independent of culture

– Seasonal mood changes• PA decreases as days grow shorter “winter blues”

• Possible to do psychology through text analysis of social media data