extending sasi to satirical product reviews: a preview

17
Extending SASI to Satirical Product Reviews: A Preview Bernease Herman University of Michigan Monday, April 22, 2013

Upload: nevina

Post on 15-Feb-2016

57 views

Category:

Documents


0 download

DESCRIPTION

Extending SASI to Satirical Product Reviews: A Preview. Bernease Herman University of Michigan Monday, April 22, 2013. Satirical Amazon Reviews. For a fun list: http://www.geekosystem.com/funny-amazon-reviews/. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Extending SASI to Satirical Product Reviews: A Preview

Extending SASI to Satirical Product Reviews: A Preview

Bernease HermanUniversity of Michigan

Monday, April 22, 2013

Page 2: Extending SASI to Satirical Product Reviews: A Preview

Satirical Amazon Reviews

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 2

For a fun list: http://www.geekosystem.com/funny-amazon-reviews/

Page 3: Extending SASI to Satirical Product Reviews: A Preview

Defining Irony, Sarcasm and Satire

• Irony: “the use of words to convey a meaning that is the opposite of its literal meaning”

• Sarcasm: “a sharply ironical taunt; sneering or cutting remark”

• Satire: “the use of irony, sarcasm, ridicule, or the like, in exposing, denouncing, or deriding vice, folly, etc.”

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 3

Page 4: Extending SASI to Satirical Product Reviews: A Preview

Sarcastic Review: Shure SE110 Sound Isolating Earphones

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 4

Page 5: Extending SASI to Satirical Product Reviews: A Preview

Satirical Review: BIC Cristal For Her ballpoint pens

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 5

Page 6: Extending SASI to Satirical Product Reviews: A Preview

Satirical Review: Zenith Men’s Defy Xtreme Titanium Watch

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 6

Page 7: Extending SASI to Satirical Product Reviews: A Preview

Semi-supervised Algorithm for Sarcasm Identification (SASI)

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 7

• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary

Algorithm detects sarcasm in individual sentences using k-Nearest Neighbors type algorithm.Features include pattern-matching and punctuation.There are additional features to consider for satire that are not present in sarcasm model.Classification baseline needs to be determined from multiple options.Sentence-based sarcasm detector, not full document.

Page 8: Extending SASI to Satirical Product Reviews: A Preview

Semi-supervised Algorithm for Sarcasm Identification (SASI)

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 8

• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary

Jindal and Liu (2008) has 66,000 data set of book and product reviews. Filatova (2012) provides corpora of Amazon reviews labeled ironic, sarcastic, both, regular.

• Specific products, authors, companies, and book titles were replaced with [product], [author], etc.

• HTML and special symbols were removed from text

Page 9: Extending SASI to Satirical Product Reviews: A Preview

Semi-supervised Algorithm for Sarcasm Identification (SASI)

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 9

• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary

Tsur et al. (2010) posited that sarcastic sentences co-appear with others. Gathered nearby sentences using Yahoo! BOSS API with seeds.

Satirical reviews prove true, not sarcastic ones. Sarcasm Satire

Page 10: Extending SASI to Satirical Product Reviews: A Preview

Semi-supervised Algorithm for Sarcasm Identification (SASI)

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 10

• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary

Via Davidov and Rappoport (2006, 2008):• High frequency words(HFWs)• Content words (CWs)

What can I say about the 571B Banana Slicer that hasn't already been said about the wheel, penicillin or the iPhone…• “What can I CW CW the”• “I CW CW the [product]”• “[product] that hasn’t CW been CW about”• “about the CW”• “CW or the CW”

Page 11: Extending SASI to Satirical Product Reviews: A Preview

Semi-supervised Algorithm for Sarcasm Identification (SASI)

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 11

• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary

Page 12: Extending SASI to Satirical Product Reviews: A Preview

Semi-supervised Algorithm for Sarcasm Identification (SASI)

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 12

• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary

Generic features regarding punctuation, all normalized to [0, 1].

• Sentence length in words• Number of “!” characters• Number of “?” characters• Number of quotes in sentence• Number of capitalized words or words in

all capitals

Page 13: Extending SASI to Satirical Product Reviews: A Preview

Semi-supervised Algorithm for Sarcasm Identification (SASI)

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 13

• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary

• Burfoot and Baldwin (2009) introduced notion of validity for which models absurdity via a measure close to PMI. Related to number of made-up or mismatched named entities. Works well with satire, but not here.

• Absurdity of product• Relevancy of product• How often product is reviewed

Page 14: Extending SASI to Satirical Product Reviews: A Preview

Semi-supervised Algorithm for Sarcasm Identification (SASI)

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 14

• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary

Classification via feature vectors for each pattern in training set.Use Euclidean distance for each of the matching vectors that share at least one pattern.

Page 15: Extending SASI to Satirical Product Reviews: A Preview

Semi-supervised Algorithm for Sarcasm Identification (SASI)

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 15

• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary

Since semi-supervised, the classification algorithm takes advantage of the definition of sarcasm. Assumes low star rating and text with positive literal meaning.

Not as clear-cut with satire, options:• Variation in rating for product• Purchases vs Page Views of product• People finding review helpful• Other heuristics

Page 16: Extending SASI to Satirical Product Reviews: A Preview

Semi-supervised Algorithm for Sarcasm Identification (SASI)

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 16

• Overview• Data preprocessing• Data enrichment• Pattern features• Punctuation features• Additional features• Classification• Baseline options• Summary

Satire seems to have a distinct advantage in the data enrichment phase in comparison to sarcasm.

Satire seems to have a huge disadvantage in the baseline options for classification compared to sarcasm. This is the detail that must be worked out before moving forward with implementation.

Page 17: Extending SASI to Satirical Product Reviews: A Preview

Future Goals

Following the end of the course, I wish to implement SASI - taking the features mentioned today into account.Extend model to sarcasm in other domains.

Any questions or comments?

Extending SASI to Satirical Product Reviews: A Preview April 22, 2013 17