exploiting coherence in reviews for discovering ... - ibm · *ibm research +indian institute of...
TRANSCRIPT
1
Exploiting Coherence in Reviews for Discovering Latent Facets and
Associated Sentiments
Himabindu Lakkaraju*, Chiranjib Bhattacharyya+, Indrajit Bhattacharya+, Srujana Merugu*
*IBM Research +Indian Institute of Science
2
Outline
• Motivation
• Background
• Our Models – Integrating Syntax and Semantics – FACTS Model– Incorporating Coherence – CFACTS Model– Incorporating Review Ratings – CFACTS-R Model
• Experimental Results
• Conclusions and Future Work
3
Mining Customer Reviews
• Central Problem: Facet based sentiment analysis of customer reviews• Applications
– E-commerce : product recommendation for customers– Business Analytics : aiding product managers and decision makers in
understanding the product's market standing
Facet Sentiment
Memory -
Screen -
Appearance Positive
Facet Sentiment
Memory -
Screen Negative
Appearance -
4
Existing Methods
• Feature based sentiment analysis • Scaffidi et. al (EC '07), Jin et. al (ICML '09)
• Facet extraction and sentiment analysis treated as separate phases• Facet extraction – Titov et. al (WWW '08), • Sentiment analysis – Lin et. al (CIKM '09), Li et. al. (AAAI '10)
• Rule based ontologies and facets, simple frequency based measures• Popescu et. al (EMNLP '05)
Need for - Domain independent, unsupervised or weakly supervised techniques
Joint modeling of facets and sentiments
5
Background
• Latent Dirichlet Allocation (Blei et. al, JMLR '03)
6
Background
Augments LDA with HMM – Only one syntactic class now has topics
• HMM – LDA (Griffiths et. al, NIPS '04)
7
FACeT Sentiment extraction model (FACTS)
FACTS aims at extracting both facets as well as associated sentiments from customer reviews
Captures both the syntactic and semantic dependencies Loosely based on HMM – LDA Facet and Sentiment classes comprise of topics
8
FACTS Model
Extends HMM-LDA to include topics within another syntactic class – 'sentiments'
9
Coherence based FACTS model (CFACTS)
The pictures i took during my last trip with this camera were absolutely great. The picture quality is amazing and the pics come out clear and sharp. I am also very impressed with its battery life, unlike other cameras available in the market, the charge lasts long enough. However, I am unhappy with the accessories.
'Coherence' is an important aspect of user generated content In case of reviews, facet and sentiment coherence are usually prevalent
10
CFACTS Model
Modeling coherence
Each review comprises of basic units of coherence – windows
Each window is associated with a single facet and sentiment
Continuity of topics across windows governed by parameter
11
CFACTS Model
Extends FACTS to incorporate coherence in facets/sentiments
Also, enables loose coupling of the facet and sentiment classes
12
Incorporating ratings - CFACTS-R
The flash washes out the photos, and the camera takes very long to turn on....................................................................................
Review ratings are valuable pointers to the sentiments expressed in reviews Does incorporating these review ratings help us extract sentiments better ?
– Review ratings turn out be of immense help for 'ordering sentiment topics'
flash washes out photos ??
Negative ?
13
Incorporating ratings - CFACTS-R
This model -• Provides a complete view of a review incorporating all the aspects • Incorporating ratings further helps in 'ordering' the sentiments without explicit seed words
Review rating is generated as a normal linear model of 'individual sentiments'
14
Inference
Inference using gibbs sampling
Ran the samplers for all the models for about 1000 iterations
Update equations for CFACTS-R -
Block Sampling of facet, sentiment topics and coherence parameter -
15
Inference (contd.)
where g1 can be computed as and
Conditional distribution for the class variable -
where g2 can be computed as and
16
Experimental Results
Dataset – Amazon reviews crawled during Nov. 2009
Evaluated the model performance over various tasks -
• Facet Extraction
• Sentiment Identification at multiple granularities ( document, sentence, word )
• Facet based opinion summarization
Product Category # of reviews
Digital Cameras 61482
Laptops 10011
Mobile Phones 6348
Flatpanel TVs 2346
Printers 2397
17
Evaluation – Facet Extraction
Qualitative Evaluation Quantitative Evaluation Facet Coverage - the fraction of extracted facets that actually
correspond to product attributes. Benchmarked against amazon's structured ratings facets
Facet Purity – the fraction of the top words in the facet that actually correspond to the product attribute
Digital Camera Corpus
18
Evaluation – Prior Knowledge
Prior Knowledge for FACTS and CFACTS - Seed words for different sentiment levels These seed words facilitate -
Distinction between facet and sentiment classes Distinction within the sentiment topics
Prior Knowledge for FACTS-R and CFACTS-R - Review ratings Seed words for the sentiment class as a whole – No seeding of individual
sentiment topics Seed words for sentiment class – great, amazing, good, easy, like, bad,
terrible, poor
19
Evaluation – Sentiment Analysis
Word and Sentence level
– positive, negative, neutral sentiments– Ground truth for word level – sentiwordnet– Ground truth for sentence level sentiment – manually labeled 8,012 sentences
Review level
– Positive, negative– Five sentiment ratings ( on a scale of 1 to 5 )– Ground truth – amazon review ratings
Baseline – Joint topic sentiment model for sentiment analysis (Lin et. al, CIKM '09)
20
Evaluation – facet based sentiment analysis
We have manually labelled 1500 reviews with (facet,polarity) pairs
Baselines -
− FIFS :- We have implemented a simple rule based feature and polarity word extractor. Further inorder to group the feature terms into facets we use PMI metric
− LFS :– LDA based algorithm to identify facets and sentiments
21
Summary
Main Contributions● Introduce the notions of facet and sentiment coherence ● Probabilistic models for facet-based sentiment analysis that discover
latent facet topics and the corresponding sentiment ratings.● Domain independent, require no expert intervention,
unsupervised
Future Work● Faster inference for the proposed models● Extending approaches to handle hierarchies of facets
22
Thank You !
Contact Author : Himabindu Lakkaraju ([email protected])Project Webpage : http://mllab.csa.iisc.ernet.in/downloads/reviewmining.html