Transcript
Page 1: Great Food, Lousy Service

Great Food, Lousy Service

Topic Modeling for Sentiment Analysis in Sparse Reviews

Robin [email protected]

Dan [email protected]

Page 2: Great Food, Lousy Service

OpenTable.com

Page 3: Great Food, Lousy Service

Short

Characters Words

Page 4: Great Food, Lousy Service

Sparse

“An unexpected combination of Left-Bank Paris and Lower Manhattan in Omaha.

Divine. Inspirational and a great value.”

• Food?• Ambiance?• Service?• Noise?

Page 5: Great Food, Lousy Service

Skewed

Page 6: Great Food, Lousy Service

Correlations

Page 7: Great Food, Lousy Service

SVM + Features, Features, Features!  tokenize punctuation   "white list" (only use sentiment words)  id, neutralize proper nouns   remove stop words  strip numbers   POS tagging, ADJ only  contraction splitting   POS tagging, add ADV  lower casing   Brill tagger  unigram (Bag of Words)   sentiment "white list" (Harvard lexicon)  bigram   count of sentiment words (pos/neg)  trigram   balanced training set  mixed n-grams   binary accuracy  ignore stop words   sub-topic classifiers, hand list  stemming   WordNet topic list expansion  negation processing   topic-filtered n-grams  expanded negation processing   topic-word proximity filtering  large training set size   strict entropy modeling  varying dictionary size   frequency-weighted entropy modeling  SVM scaling    

• 30+ preprocessing and SVM classification features,• ~50 configurations

Page 8: Great Food, Lousy Service

Key Features• Stemming

• Porter 1980 via NLTK• <fast>, <faster>, <fastest> <fast>

• Negation processing • (enhanced approach from Pang et al. 2002)• “Not a great experience.” NOT_great• “They never disappoint!” NOT_disappoint

• Net sentiment count• pos/neg lexicon (Harvard General Inquirer)• running +/- count• “Incredible(+) food, but our server was rude(-).” (0)

Page 9: Great Food, Lousy Service

Results (so far)• Trained on 10,000 reviews• Tested on ~80,000 reviews

• Accuracy• Baseline: 50.0%• Intermediate model: 56.6% (1.13x)• abs( average scoring delta ): 0.56

Page 10: Great Food, Lousy Service

Topic ModelingHand-seeded topic-word list expanded via WordNet

SynSets

1. sub-topic classifiers2. topic-filtered n-grams• <soupFOOD was fantasticADJ>• <fantasticADJ soupFOOD was>

3. topic-word proximity filtering• both above <fantasticADJ/FOOD>.

Results:Food Ambiance Service Noise

1. 39.15% 47.26% 53.70% 48.43%3. 40.05% 47.88% 54.92% 50.35%

1.02x 1.01x 1.02x 1.03x

Page 11: Great Food, Lousy Service

Word-Rating Distributions

“worst” “mediocre” “decent”

“solid” “exceeded”

Page 12: Great Food, Lousy Service

Frequency-Weighted Entropy Model

• Accuracy• Baseline: 50.0%• Intermediate model: 56.6%• Best (entropy) model: 58.6% (1.17x)• abs( average scoring delta ): 0.56 0.52


Top Related