sarcasm & thwarting in sentiment analysis [iit-bombay]

38
Sarcasm And Thwarting Lekha Deepali Gupta Sagar Ahire {lekha, gdeepali, sagarahire} @ cse.iitb.ac.in

Upload: sagar-ahire

Post on 18-May-2015

1.031 views

Category:

Technology


3 download

DESCRIPTION

Presentation made at IIT Bombay under the course CS626 (Natural Language Processing) on the challenges of Sarcasm & Thwarting in Sentiment Analysis.

TRANSCRIPT

Page 1: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Sarcasm And ThwartingLekha

Deepali GuptaSagar Ahire

{lekha, gdeepali, sagarahire} @ cse.iitb.ac.in

Page 2: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Roadmap

Irony and Sarcasm

An Algorithm for Sarcasm Detection

Thwarting Detection

Page 3: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Irony and SarcasmLekha | 133050002

Page 4: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Verbal Irony

“An irony is a figure of speech which implicitly displays that the utterance situation was surrounded

by an ironic environment.”

There also exists Situational Irony

Page 5: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Ironic Environment

t0 t1

Expectation Expectation fails

Negative emotional attitude of the speaker

Page 6: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Reasons for Expectation to Fail

Expectation E is caused by an action A1. E failed because A failed or cannot be

performed because of another action B2. E failed because A was not performed

Expectation E is not caused by any action3. E failed by an action B4. E accidentally failed

Type 1 and 3 have victims

Sarcasm is irony with definite victims and counterfeited emotions

Page 7: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Properties of an Ironic Environment

An utterance implicitly displays all the three conditions for ironic environment when it: 1. Alludes to the speaker's expectation E2. Intentionally violates one of pragmatic principles3. Implies the speaker's emotional attitude toward the

failure of E

Irony is recognized if any 2 of these 3 are recognized.

Irony conveys the third unidentified property.

Page 8: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Allude to Speaker’s Expectation

Deepali baked a pizza to satisfy her hunger. She placed the pizza on the table and in the meantime Sagar came and gobbled up the whole pizza. Deepali said to Sagar:

a. I'm not hungry at all b. Have you seen my pizza on the table?c. I'll sleep well on a full stomach.d. I'm really satisfied to eat the pizza.e. Did you enjoy eating the pizza?

Page 9: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Violation of Pragmatic Principles

SincerityYou make a statement you believeYou ask a question whose answer you don’t knowYou offer advice which will benefit the receiverYou thank when you are really grateful

Propositional content You thank for something that has been done for you

Preparatory condition for Offer You offer something that you can really give

Maxim of relevance Politeness principle Maxim of quantity

Page 10: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Emotional Attitude

Tone and expressions

Interjections “Oh! The weather is so nice”

The context implies the emotional attitude of the speaker

Page 11: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Semi Supervised Recognition of Sarcastic

Sentences

Deepali Gupta | 13305R001

Page 12: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Sarcasm

The activity of saying or writing the opposite of what you mean, or of speaking in a way intended to make someone feel stupid or show that you are angry (Macmillan English Dictionary)

Page 13: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Sarcasm manifests in other ways...

● “Love the cover” (book)● “Be sure to save your purchase receipt”

(Smart Phone)● “Great idea, now try again with a real

product development team” (e-reader)● “Where am I?” (GPS device)

Page 14: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

The Algorithm: Overview

1. Training Set: Sentences manually assigned scores 1 to 5 where five means clearly sarcastic and one absence of sarcasm

2. Create feature vectors from the labelled sentences

3. Use these feature vectors to build a model and assign scores to unlabelled examples

Page 15: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Step 1: Preprocessing of Data

1. Replace each appearance of a product/company/author by generalized [product], [company], [author], etc.

2. Remove all HTML tags and special symbols from review text.

Page 16: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Step 2: Creating Feature VectorsPattern Based Features:1. Classify words into High Frequency Words (HFWs) and

Content Words (CWs)All [product], [company] tags and punctuation marks are HFWs.

2. A pattern is a sequence of HFWs with slots for CWs.Example: “Garmin does not care about product quality or customer support” has patterns “[company] does not CW about CW CW” or “about CW CW or CW CW”, etc.

Page 17: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Pattern Matching

1: Exact Match

: Sparse Match - additional non-matching words can be inserted between pattern components

: Incomplete Match - only n of N pattern components appear in sentence, while some non-matching words can be inserted in between

Page 18: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Punctuation Based Features

● Sentence length in words● Number of “!” characters● Number of “?” characters● Number of quotes● Number of capitalized/all capital words

Features are normalized to be in [0-1] by dividing them by maximal observed value

Page 19: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Step 3: Data Enrichment

● For each sentence in the training set perform a search engine query containing this sentence

● Assign similar label to newly extracted sentence.

Page 20: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Step 4: Classification● Construct feature vectors for each sentence in the

training and test set● Compute Euclidean Distance to each of matching

vectors in training set

Let ti i=1..k be the k vectors with lowest Euclidean Distance to v.Then v is classified label l as follows:Count(l) = Count of vectors in the training set with label l

Label(v) =

Page 21: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Star Sentiment Baseline

● From a set of negative reviews (with 1-3 stars) classify those sentences as sarcastic with strong positive sentiment.

● Positive sentiment words can be eg. “great”, “best”, “top”, etc.

Page 22: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Results Precision Recall Accuracy F-Score

punctuation 0.256 0.312 0.821 0.281

patterns 0.743 0.788 0.943 0.765

pat+punct 0.868 0.763 0.945 0.812

enrich punct 0.4 0.390 0.832 0.395

enrich pat 0.762 0.777 0.937 0.769

all: SASI 0.912 0.756 0.947 0.827

Precision

Recall False Pos

False Neg

F-score

Baseline 0.5 0.16 0.05 0.44 0.242

SASI 0.766 0.813 0.11 0.12 0.788

Table 1:(Above): 5-fold cross validation results

Table 2 (Below): Comparison with baseline

Page 23: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Thwarting DetectionSagar Ahire | 133050073

Page 24: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

“The actors were good, the story was great, the screenplay was a marvel of perfection and the music was good too, but the movie couldn’t hold my attention...”

Thwarting?

Page 25: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Detecting Thwarting: The Big Picture

● Ascertain attributes of entity using ontology● Find sentiment of each attribute in ontology

and the overall entity● If there is a contrast, conclude thwarting has

occured

Page 26: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Building the Domain Ontology

1. Identify key features of domain from a corpus

2. Arrange them in a hierarchy

Notes:● Very human-intensive● One-time requirement

Page 27: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

An Example Ontology

Movie

Story Elements Acting Music

Main Story Dialogues Characters Songs Background Score

Page 28: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Approaches to Detect Thwarting

● Rule-based Approach● Machine Learning-based Approach

Page 29: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Rule-based Approach

1. Get dependency parse for adjective-noun dependencies

2. Identify polarities towards all nouns3. Tag corresponding ontology nodes with

found polarities4. If a contradiction across levels is found,

conclude that thwarting has taken place

Page 30: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Rule-based Approach: ExampleMovie

negative

Story Elementspositive

Actingpositive

Musicpositive

Main Storypositive

Dialoguespositive

Characterspositive

Songspositive

Background Score

negative

Page 31: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Machine Learning-based Approach

Proceeds in two phases:1. Learning weights2. Classifying documents

Page 32: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Learning Weights: Choices

1. Choices for loss function:a. Linear lossb. Hinge loss

2. Choices for percolation across ontology levels:a. Complete percolationb. Controlled percolation

Page 33: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Classification: Features

● Convert document into a feature vector.● Examples:

○ Document polarity○ No of flips of sign○ Longest contiguous subsequence of +ve values○ Longest contiguous subsequence of -ve values○ etc.

Page 34: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Results

● Random baseline: 50%● Rule-based approach: 56.3%● ML-based approach (Linear loss, controlled

percolation): 81%

Page 35: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

What’s the catch?

Requires sentiment as input!

Current System

Document withSentiment Information Thwarted or Not Thwarted

Ideal System

Thwarted or Not Thwarted,Document SentimentDocument

Page 36: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Key Ideas● Irony indicates presence of an ironic environment,

with 3 properties ● 2 of those 3 are enough to recognize irony● Sarcasm is irony with victims and counterfeited

emotions● A semi supervised pattern based algorithm detects

sarcasm well● Thwarting is the phenomenon of polarity reversal at

a higher level of ontology compared to the polarity expressed at the lower level

● Rule based and machine learning based approaches have been attempted for thwarting

Page 37: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

References● Akira Utsumi (1996) - A unified theory of irony and its

computational formalization. InCOLING, 962–967.

● Oren Tsur, Dmitry Davidov, Ari Rappoport (2010) - ICWSM – A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews. In Association for the advancement of Artificial Intelligence

● Ankit Ramteke, Akshat Malu, Pushpak Bhattacharyya, J. Saketha Nath (2013) - Detecting Turnarounds in Sentiment Analysis: Thwarting. In ACL 2013.

Page 38: Sarcasm & Thwarting in Sentiment Analysis [IIT-Bombay]

Questions?