cross-domain sentiment tagging using meta-classifier and a high accuracy in-domain classifier...

Cross-Domain Sentiment Tagging using Meta-Classifier and a High Accuracy In-Domain

Classifier

Balamurali A R*, Debraj Manna~ , Pushpak Bhattacharyya~

*IITB-Monash Research Academy, IIT Bombay

~Department of Computer science and Engineering, IIT Bombay

ICON-2010, IIT Kharagpur

Roadmap

• Sentiment Analysis• Sentiment classification• Problem• Solution intuition• Approach

– H-I System• Result • Conclusion

What is Sentiment Analysis (SA)?

• Given a textual portion,– Is the writer expressing sentiment with respect to a

topic?– What is that sentiment?

• Identify the orientation of opinion in a piece of text

The movie was fabulous!

The movie stars Mr. X

The movie was horrible!

Sentiment classification (1/2)

– Classification of document(review) based on sentiment content

• Content being positive/negative

– Supervised and unsupervised approaches exist– Supervised approaches perform better

I loved CWG opening ceremony I really hate the Indian media for reporting only the bad things

happening around in Delhi.

Supervised Approaches: Pang & Lee (2002), Matsumoto et.al(2005)

Unsupervised Approaches: Turney (2002), Wan (2008)

D1 D2 D1 D2 D3

D3 D4 D5 D4 D5

Sentiment classification (2/2)

• Sentiment tagging – labeled data generation• Issues

– Tedious and expensive process– Domain specific nature of Sentiment Analysis– Utility span of labeled data is limited

• More products with short market life span – Ipod, I-touch, I-Pad• Need of people changes – 6 MP cameraGood Annotators are difficult to find

Problem we address

• Problem: A sentiment classifier trained on one domain performs with a lesser accuracy than on a different domain (Aue and Gamon 2005)

• Solution: Cross-Domain Sentiment Tagging via adaptation– A procedure to use labeled data in an existing domain and

use them to create labeled data for new target domains having little or no training data with the help of few hand labeled target data

Intuition

Predict the SentimentBook review: “As usual, Robin Cook keeps you on the edge of your seat �

until the end. Excellent reading”From Movie review (Domain Pertinent): “edge of your seat” � Common sense (Domain Independent): “Excellent” Eureka!

Its positive!!!!You evolve using the information attained, disregarding the unwanted

Approach (1/3)

Step 1: Generate noisy tagged data for the target domain using a Meta-classifier trained on a source domain

Approach (2/3)

Step 2: Select the highly probable correct instances

Approach (3/3)

Step 3: Use them to create a high accuracy in-domain classifier– This classifier then completely classifies the target domain

High Accuracy-Intermediary Tagger System (H-I)

Training Domain

Intermediary Tagger System (ITS)Target Domain

Tagged Target Domain

High Accuracy Classifier (HAC)

T1 T2 Tn

Is model optimal?

M1 M2 MnLabeled Target Data

Intermediary Tagged Target DomainNo Yes

Intermediary Target Domain Models

Intermediary Target Domain Data

H-I system architecture

ITS

HAC

Selection of Optimal Intermediary Tagged Data

ITS

Source Domain

Feature Representation

(All Words - Unigrams)


(Universal Clue Based) SentiWordNet

Base-2 Classifier Model Base-3 Classifier Base-1 Classifier

Model

Pr (POS) Pr (NEG) Pr (POS) Pr (NEG) SWN (POS)

SWN (NEG)

Meta Features

Meta Classifiers

Target domain

Training Procedure

Intermediary Tagged System (ITS)

c

Domain Pertinent View

Common Sense View

Prior Polarity View

ITS - lexicons used

• SentiWordnet (Esuli and Sebastiani, 2006)– Attaches sentiment score to each synset

• Positive , negative & objective score • Scores sum up to 1• E.g. – {small} – “slight or limited”

– pos – 0.125, neg – 0.125 , obj – 0.75

• Subjectivity Lexicon (Wilson, et al., 2005)– Universal Clues– Consists of manually identified 8000 polar words with their

prior polarity– E.g –{wow} – strongly subjective, positive prior polarity

Source Domain


(All Words - Unigrams)


(Universal Clue Based) SentiWordNet

Base-2 Classifier Model Base-3 Classifier Base-1 Classifier

Model

Pr (POS) Pr (NEG) Pr (POS) Pr (NEG) SWN (POS)

SWN (NEG)

Meta Features

Meta Classifiers

Target domain

Training Procedure

Intermediary Tagged System (ITS)

Base-1 Classifier – All words unigram model (All domain pertinent information are recorded using this model)

Base-2 Classifier – Classification model based universal clues as features (Generic model)

Base- 3 Classifier – Rule based classifier based on SentiWordNetClassification Rule : Document is positive, if prior positive polarity terms >> prior negative polarity

terms

c

Training Domain




T1 T2 Tn

Is model optimal?






Selection of Optimal Intermediary Tagged Data

Not all instances of Intermediary Tagged data are correct!

Selection of optimal intermediary tagged data

• Different sets of intermediary tagged target data created based on different confidence level generated by ITS

• Different SVM based models created using each set of intermediary tagged target corpus

• Model tested on few hand labeled target data to find the best confidence threshold

• This intermediary target set categorized as optimal intermediary tagged data

• A High Accuracy Classification model is developed using the selected dataset

Selected all intermediary tagged data having confidence level 1. > 60%,2. > 65%

.......

Training Domain




T1 T2 Tn

Is model optimal?






HAC


• Complex features not necessary for creating a High Accuracy Sentiment Classifier

• Intuition –– In a particular domain, there exist domain specific features

(words/phrases) which have high discriminatory power– People use almost similar vocabulary in expressing their

sentiment (pertinent to a domain)• Process –

– An Information gain based feature selection done to select domain specific features

– A combination of unigram, bigrams and trigrams used as featuresLame movie, Defective kitchen set

Experimental setup

• Dataset used:-– Multi-Domain Sentiment Dataset (Version 2.0) (Blitzer, et al., 2007)

• Consists of 4 domain – Book(BS), Dvd(DD), Electronics(ES) and Kitchen(KN).• Each domain contains 1000 positive and 1000 negative reviews

• Models are trained on 1000 positive and 1000 negative data from the source domain.

• Tested on 950 target data from each class

• Used 50 positive and 50 negative labeled target data

• All the experiments are carried out using libSVM (Chih-chung and Chih-Jen, 2001) and Rapidminer 4.6 (Ingo, et al., 2006).

Results - HAC

• Baseline – All words unigram-based model• Combine – uni+bi+trigram model• Combine (IG) - uni+bi+trigram model after IG• An average increase of 10% in the classification accuracy with

respect to the baseline

Book DVD ELECTRONICS KITCHEN70

75

80

85

90

95

80.281.9

80.6

85.183.2 83.5

85.487.8

91.2 90.1 90.993

BaseLine COMBINE COMBINE (IG)

Results -Top features

DOMAIN TOP IG BASED FEATURES

BOOK (B) waste of, love this, boring, stupid, too many, whatever, ridiculous, two stars

DVD (D) worst, horrible, your money, lame, of the best, sucks, barely, ridiculous, save your, is a great, pathetic, dumb, not worth, ruined

ELECTRONICS (E) return, terrible, waste your, highly, to return, poor, it back, returning, does not work, do not buy

KITCHEN (K) easy to, easy to use. easy to clean, returning, waste your, tried to excellent, defective, horrible, poor, i love it

• Top IGR based features are different in different domains (Blitzer, et al., 2007).

• Domain specific nature of Sentiment Analysis

Results - Comparison

Similarity between domains

• Cosine Similarity

• Similar domains have high cross classification accuracy

BOOK DVD ELECTRONICS KITCHEN

BOOK 1 0.54 0.41 0.4

DVD 0.54 1 0.41 0.41

ELECTRONICS 0.41 0.41 1 0.49

KITCHEN 0.4 0.4 0.49 1

Results - Comparison

Structural Correspondence Learning (SCL) – State of art for Cross-Domain sentiment classification

Dissimilar domains have relatively high increase in accuracy with respect to ITS performance

Results summary

Domain

Best Baseline Cross-Domain Accuracy

Best H-I-MAccuracy

Best SCL Accuracy

Best BaselineV/SH-I-M

Best H-I-M V/SSCL

BS 74.56 79.75 74.07 5.19 5.68

DD 77.5 81.94 78.86 4.44 3.08

ES 78.06 80.44 78.02 2.38 2.42

KN 77.44 83.00 79.93 5.56 3.07

Average 4.39 3.56

Conclusion

• Methodology for cross domain sentiment tagging introduced• An average cross-domain sentiment classification accuracy of

80% achieved• Our system gives a high cross-domain sentiment classification

accuracy with an average improvement of 4.39% over the best baseline accuracy

Reference (1/2)• Anthony Aue and Michael Gamon,2005,Customizing Sentiment Classifiers to New Domains: A Case Study, Proceedings of Recent

Advances in NLP

• Alina Andreevskaia and Sabine Bergler,2008,When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging, Proceedings of ACL-08: HLT,PP 290-298,Columbus, Ohio

• Blitzer, John and Dredze, Mark and Pereira, Fernando, Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification,Proceedings of the 45th Annual Meeting of ACL,2007,pages = 440--447,Prague, Czech Republic

• Andrea Esuli and Fabrizio Sebastiani,SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining,Proceedings of LREC-06,2006,Genova, Italy

• Theresa Wilson and Janyce Wiebe and Paul Hoffmann,Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis,Proceedings of EMNLP,2005,pages =347-354,Vancouver, Canada

• Bo Pang and Lillian Lee and Shivakumar Vaithyanathan,Thumbs up? Sentiment Classification Using Machine Learning Techniques,Proceedings of EMNLP,2002,pages = 79-86,Philadelphia, Pennsylvania

• David H. Wolpert,Stacked generalization,Neural Networks,Volume 5, 1992, Pages=241-259

• Chih-Chung Chang and Chih-Jen Lin,LIBSVM: a library for support vector machines,2001

• Wu, Ting-Fan and Lin, Chih-Jen and Weng, Ruby C.,Probability Estimates for Multi-class Classification by Pairwise Coupling,Journal of Machine Learning and research,Vol 5,2004},975--1005

Reference (2/2)• Bo Pang and Lillian Lee and Shivakumar Vaithyanathan,Thumbs up? Sentiment Classification Using Machine Learning

Techniques,Proceedings of EMNLP,2002,pages = 79-86,Philadelphia, Pennsylvania

• David H. Wolpert,Stacked generalization,Neural Networks,Volume 5, 1992, Pages=241-259

• Chih-Chung Chang and Chih-Jen Lin,LIBSVM: a library for support vector machines,2001

• Wu, Ting-Fan and Lin, Chih-Jen and Weng, Ruby C.,Probability Estimates for Multi-class Classification by Pairwise Coupling,Journal of Machine Learning and research,Vol 5,2004},975—1005

• Read, Jonathon,Using emoticons to reduce dependency in machine learning techniques for sentiment classification, ACL '05: Proceedings of the ACL Student Research Workshop},2005,pages =43--48,Annrbor, Michigan,Morristown, NJ, USA

• Bo Pang and Lillian Lee, 2005, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, Proceedings of the ACL.

• Wiebe, J., Wilson, T., Bruce, R., Bell, M. & Martin, M. ,Learning Subjective Language, 2004,Computational Linguistics .• Philip J. Stone, Dexter C. Dunphy, Marshall S. Smith, and Daniel M. Ogilvie. The General Inquirer: A Computer Approach to Content

Analysis. MIT Press, 1966.

• Taboada, M. and J. Grieve (2004) Analyzing Appraisal Automatically. American Association for Artificial Intelligence Spring Symposium on Exploring Attitude and Affect in Text. Stanford. March 2004. AAAI Technical Report SS-04-07. (pp.158-161).

Meta-Classifier (1/2)

• An ensemble classifier– Use of multiple models to obtain better predictive

performance than using just a constituent model• Commonly known as stacking

– Base models are called level-0 models– Level -1 model combines level-0 predictions

• If only two level – final level features are called meta-features

• A classifier modeled on meta-features to get final prediction

Meta-Classifier (2/2)

• Motivation– Reduce bias (model)

• A combination of multiple classifiers may learn a more expressive concept class than a single classifier

• Key step• Formation of an meta-classifier of diverse classifiers

from a single training set

Extra slide -Information gain• A term goodness criteria

– how good is the term for classification– IG(C|X) = How many bits on average would it save me if both ends of

the line knew X?– IG(X=t) = H(C) - H(C|X=t)

• How to calculate

• t = term to be considered• P(ci) – probability of class • P(t) – probability of term occurring• P(t’)- probability of term not occurring

cross-domain sentiment tagging using meta-classifier and a high accuracy in-domain classifier...

Documents

target domain slide

source domain slide

sentiment classifier

existing domain

crossdomain sentiment

domain classifier balamurali

target data intermediary

hand labeled target