estimating review score from words

18
Estimating review score from words Işık Barış Fidaner CMPE 545 Artificial Neural Networks

Upload: leena

Post on 19-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

CMPE 545 Artificial Neural Networks. Estimating review score from words. Işık Barış Fidaner. S. = 1/N . score i. Metascore. The rating given to this product. r t =. The source of this review. Score. Reviewer. Quote. + affectionate. A few sentences that summarize this review. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Estimating  review score from  words

Estimating reviewscore from words

Işık Barış Fidaner

CMPE 545 Artificial Neural Networks

Page 2: Estimating  review score from  words

Metascore= 1/N . scorei

Page 3: Estimating  review score from  words

Score Reviewer

Quote

The rating given to this product

The source of this review

A few sentences that summarize

this review

xt = ?

rt =

+ exuberant

+ embrace

+ affectionateBag of wordsrepresentation

Existence of somewords in the quote

Page 4: Estimating  review score from  words

Purposes

1. A new database that relates text to score

(...)An affectionate, exuberant picture that seeks to bring even those who don't know Klingon from Portuguese into the embrace of a pop-culture phenomenon.(...)

90?

Page 5: Estimating  review score from  words

Purposes

2. Quantify meaning with machine learning

rivetingexhileratingaffectionatecraftedexuberantdulllackingembrace

00101001

Review quote:

An affectionate, exuberant picture that seeks to bring even those who don't know Klingon from Portuguese into the embrace of a pop-culture phenomenon.

xt

73

70

65

wT

Page 6: Estimating  review score from  words

Purposes

3. Meta-metacritic deductions, such as

Positive words

rivetingexhileratingcraftedsuperbextraordinarybrilliant

Negative words

unfunnytediousfailsmessdulllacking

Page 7: Estimating  review score from  words

Obtaining the database

• Developed a PHP web crawler• It ran for a few days• TV show reviews– 8,335 records

• Music album reviews– 62,293 records

• Movie reviews– 113,456 records

MySQL

PHP

Page 8: Estimating  review score from  words

Bag of words assumption

• Features affect the result independently

=An affectionate, exuberant picture that seeks to bring even those who don't know Klingon from Portuguese into the embrace of a pop-culture phenomenon.

phenomenon from an exuberant picture those into a portugese don’t pop-culture affectionate to embrace bring klingon of who know seeks

• Semantic organization does not matter

Page 9: Estimating  review score from  words

Bag of words assumption

• The problem with modifiers:

This is not good. Is this not good?

• We rely on the information encoded in the vocabulary, not grammar

• Opinions expressed clearly and simply:

Excellent, wonderful! This is dreadful.

Page 10: Estimating  review score from  words

Word selection

1. Quote count (QC)2. Product count (PC)

• Meaningful words (SS < SSmax = 20)

• Frequently used words (PC > PCmin = 20)

• Non-grammatical words (PC < PCmax = 100)

3. Score mean (SM)4. Score stdev (SS)

~20 thousand words ~300 words

Page 11: Estimating  review score from  words

Significant words for TV and movies

unfunny

wastedisappointmentsupposed, fails

fancy words!casual words!

Movies areoverrated!

TV takes toomuch time!

Page 12: Estimating  review score from  words

Significant words for music albums

masterpieceartists

Music is art

datemodern

Music agesquickly

personalityAlbums are attachedto the musician’spersonality

Page 13: Estimating  review score from  words

The input vector and estimation

• Example input vector (divided by quote size)– xt = [1 0 0 1 0 0 0 1 0 0 0 0 ... 0] / 3

• Estimation function

• There is a weight for every selected word• xt chooses the subset of contained words• Estimation is the sum of w0 and the

arithmetic mean of the weights of contained words

Page 14: Estimating  review score from  words

Linear and SVM regression

• Linear regression uses square difference err.

• Which imply these update equations:

• SVM regression uses -sensitive error func.

• With these simpler update equations

Page 15: Estimating  review score from  words

Linear regression learning

Unstable learning in validation set

Error of 17 points

Error of 14 points

Page 16: Estimating  review score from  words

SVM regression learning

Robustness increased, because SVM error function is linear and tolerant to error.

Error of 13 points

Error of 11 points

Better resultswith SVM!

Page 17: Estimating  review score from  words

Possible improvements

• Non-linear model that actually weighs the importance of words

• Normalization by estimating reviewer parameters

• Adding two-word combinations to the input vector

Page 18: Estimating  review score from  words

Estimating reviewscore from words

Işık Barış Fidaner

CMPE 545 Artificial Neural Networks