Great Food, Lousy Service

Download Great Food, Lousy Service

Post on 24-Feb-2016

20 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Great Food, Lousy Service. Topic Modeling for Sentiment Analysis in Sparse Reviews. Robin Melnick rmelnick@stanford.edu. Dan Preston dpreston@stanford.edu. OpenTable.com. Short. Words. Characters. Sparse. - PowerPoint PPT Presentation

TRANSCRIPT

Great Food, Lousy Service

Great Food, Lousy ServiceTopic Modeling for Sentiment Analysis in Sparse ReviewsRobin Melnickrmelnick@stanford.eduDan Prestondpreston@stanford.edu

1OpenTable.com

2Short

CharactersWords

SparseAn unexpected combination of Left-Bank Paris and Lower Manhattan in Omaha. Divine. Inspirational and a great value.

Food?Ambiance?Service?Noise?

Skewed

Correlations

SVM + Features, Features, Features!tokenize punctuation"white list" (only use sentiment words)id, neutralize proper nounsremove stop wordsstrip numbersPOS tagging, ADJ onlycontraction splittingPOS tagging, add ADVlower casingBrill taggerunigram (Bag of Words)sentiment "white list" (Harvard lexicon)bigramcount of sentiment words (pos/neg)trigrambalanced training setmixed n-gramsbinary accuracyignore stop wordssub-topic classifiers, hand liststemmingWordNet topic list expansionnegation processingtopic-filtered n-gramsexpanded negation processingtopic-word proximity filteringlarge training set sizestrict entropy modelingvarying dictionary sizefrequency-weighted entropy modelingSVM scaling30+ preprocessing and SVM classification features,~50 configurations

Key FeaturesStemmingPorter 1980 via NLTK, , Negation processing (enhanced approach from Pang et al. 2002)Not a great experience. NOT_greatThey never disappoint! NOT_disappointNet sentiment countpos/neg lexicon (Harvard General Inquirer)running +/- countIncredible(+) food, but our server was rude(-). (0)

Results (so far)Trained on 10,000 reviewsTested on ~80,000 reviews

AccuracyBaseline:50.0%Intermediate model:56.6%(1.13x)

abs( average scoring delta ):0.56

Topic ModelingHand-seeded topic-word list expanded via WordNet SynSets

sub-topic classifierstopic-filtered n-grams

topic-word proximity filteringboth above .

Results:FoodAmbianceServiceNoise1.39.15%47.26%53.70%48.43%3.40.05%47.88%54.92%50.35%1.02x1.01x1.02x1.03x

Word-Rating Distributions

worstmediocredecentsolidexceeded

Frequency-Weighted Entropy ModelAccuracyBaseline:50.0%Intermediate model:56.6%Best (entropy) model:58.6%(1.17x)

abs( average scoring delta ):0.56 0.52