a sentimental education: sentiment analysis using ... · a sentimental education: sentiment...

8

Click here to load reader

Upload: doque

Post on 03-Oct-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Sentimental Education: Sentiment Analysis Using ... · A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts Bo Pang and Lillian Lee

A Sentimental Education: Sentiment Analysis Using SubjectivitySummarization Based on Minimum Cuts

Bo Pang and Lillian LeeDepartment of Computer Science

Cornell UniversityIthaca, NY 14853-7501�

pabo,llee � @cs.cornell.edu

AbstractSentiment analysis seeks to identify the view-point(s) underlying a text span; an example appli-cation is classifying a movie review as “thumbs up”or “thumbs down”. To determine this sentiment po-larity, we propose a novel machine-learning methodthat applies text-categorization techniques to justthe subjective portions of the document. Extractingthese portions can be implemented using efficienttechniques for finding minimum cuts in graphs; thisgreatly facilitates incorporation of cross-sentencecontextual constraints.

Publication info: Proceedings of the ACL, 2004.

1 IntroductionThe computational treatment of opinion, sentiment,and subjectivity has recently attracted a great dealof attention (see references), in part because of itspotential applications. For instance, information-extraction and question-answering systems couldflag statements and queries regarding opinionsrather than facts (Cardie et al., 2003). Also, ithas proven useful for companies, recommender sys-tems, and editorial sites to create summaries of peo-ple’s experiences and opinions that consist of sub-jective expressions extracted from reviews (as iscommonly done in movie ads) or even just a re-view’s polarity — positive (“thumbs up”) or neg-ative (“thumbs down”).

Document polarity classification poses a signifi-cant challenge to data-driven methods, resisting tra-ditional text-categorization techniques (Pang, Lee,and Vaithyanathan, 2002). Previous approaches fo-cused on selecting indicative lexical features (e.g.,the word “good”), classifying a document accord-ing to the number of such features that occur any-where within it. In contrast, we propose the follow-ing process: (1) label the sentences in the documentas either subjective or objective, discarding the lat-

ter; and then (2) apply a standard machine-learningclassifier to the resulting extract. This can preventthe polarity classifier from considering irrelevant oreven potentially misleading text: for example, al-though the sentence “The protagonist tries to pro-tect her good name” contains the word “good”, ittells us nothing about the author’s opinion and infact could well be embedded in a negative moviereview. Also, as mentioned above, subjectivity ex-tracts can be provided to users as a summary of thesentiment-oriented content of the document.

Our results show that the subjectivity extractswe create accurately represent the sentiment in-formation of the originating documents in a muchmore compact form: depending on choice of down-stream polarity classifier, we can achieve highly sta-tistically significant improvement (from 82.8% to86.4%) or maintain the same level of performancefor the polarity classification task while retainingonly 60% of the reviews’ words. Also, we ex-plore extraction methods based on a minimum cutformulation, which provides an efficient, intuitive,and effective means for integrating inter-sentence-level contextual information with traditional bag-of-words features.

2 Method2.1 ArchitectureOne can consider document-level polarity classi-fication to be just a special (more difficult) caseof text categorization with sentiment- rather thantopic-based categories. Hence, standard machine-learning classification techniques, such as supportvector machines (SVMs), can be applied to the en-tire documents themselves, as was done by Pang,Lee, and Vaithyanathan (2002). We refer to suchclassification techniques as default polarity classi-fiers.

However, as noted above, we may be able to im-

Page 2: A Sentimental Education: Sentiment Analysis Using ... · A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts Bo Pang and Lillian Lee

prove polarity classification by removing objectivesentences (such as plot summaries in a movie re-view). We therefore propose, as depicted in Figure1, to first employ a subjectivity detector that deter-mines whether each sentence is subjective or not:discarding the objective ones creates an extract thatshould better represent a review’s subjective contentto a default polarity classifier.

s1

s2

s3

s4

s_n

+/-s4

s1

subj

ectiv

ityde

tect

oryes

no

no

yes

n-sentence reviewsubjectivesentence? m-sentence extract

(m<=n)review?

positive or negative

defa

ult

clas

sifi

erpo

lari

ty

subjectivity extraction

Figure 1: Polarity classification via subjectivity detec-tion.

To our knowledge, previous work has not in-tegrated sentence-level subjectivity detection withdocument-level sentiment polarity. Yu and Hatzi-vassiloglou (2003) provide methods for sentence-level analysis and for determining whether a doc-ument is subjective or not, but do not combine thesetwo types of algorithms or consider document polar-ity classification. The motivation behind the single-sentence selection method of Beineke et al. (2004)is to reveal a document’s sentiment polarity, but theydo not evaluate the polarity-classification accuracythat results.

2.2 Context and Subjectivity DetectionAs with document-level polarity classification, wecould perform subjectivity detection on individualsentences by applying a standard classification algo-rithm on each sentence in isolation. However, mod-eling proximity relationships between sentenceswould enable us to leverage coherence: text spansoccurring near each other (within discourse bound-aries) may share the same subjectivity status, otherthings being equal (Wiebe, 1994).

We would therefore like to supply our algorithmswith pair-wise interaction information, e.g., to spec-ify that two particular sentences should ideally re-ceive the same subjectivity label but not state whichlabel this should be. Incorporating such informa-tion is somewhat unnatural for classifiers whose in-put consists simply of individual feature vectors,such as Naive Bayes or SVMs, precisely becausesuch classifiers label each test item in isolation.One could define synthetic features or feature vec-

tors to attempt to overcome this obstacle. However,we propose an alternative that avoids the need forsuch feature engineering: we use an efficient andintuitive graph-based formulation relying on find-ing minimum cuts. Our approach is inspired byBlum and Chawla (2001), although they focused onsimilarity between items (the motivation being tocombine labeled and unlabeled data), whereas weare concerned with physical proximity between theitems to be classified; indeed, in computer vision,modeling proximity information via graph cuts hasled to very effective classification (Boykov, Veksler,and Zabih, 1999).

2.3 Cut-based classificationFigure 2 shows a worked example of the conceptsin this section.

Suppose we have � items ��������������� to divideinto two classes �� and �� , and we have access totwo types of information:� Individual scores ��������������� : non-negative esti-mates of each ��� ’s preference for being in �� basedon just the features of ��� alone; and� Association scores "!#!$�%&�����'���(&� : non-negativeestimates of how important it is that �)� and ��( be inthe same class.1

We would like to maximize each item’s “net hap-piness”: its individual score for the class it is as-signed to, minus its individual score for the otherclass. But, we also want to penalize putting tightly-associated items into different classes. Thus, aftersome algebra, we arrive at the following optimiza-tion problem: assign the � � s to � and � so as tominimize the partition cost*+�,�-�. �/���0�&���1�32 *+�,&-54 �����5�6���)�72 *+98:,&- .';+<�,&-54 "!#!$�%&�����'���(&�3�

The problem appears intractable, since there are= � possible binary partitions of the �)� ’s. How-ever, suppose we represent the situation in the fol-lowing manner. Build an undirected graph > withvertices ?9@"�9�������9�@����7!0�A3B ; the last two are, respec-tively, the source and sink. Add � edges �:!C�@���� , eachwith weight �����D�������� , and � edges ��@��'�AE� , each withweight �������C�����:� . Finally, add F � �HG edges ��@��I�@&(�� ,each with weight "!#!$�%&�����J���(�� . Then, cuts in >are defined as follows:

Definition 1 A cut ��K��ILM� of > is a partition of itsnodes into sets KONP?6!CBMQRKTS and LUNV?9A3BMQWL�S ,where !YXZ K[S\�AMXZ L]S . Its cost %H$&!9A���K��ILM� is the sum

1Asymmetry is allowed, but we used symmetric scores.

Page 3: A Sentimental Education: Sentiment Analysis Using ... · A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts Bo Pang and Lillian Lee

[

]

s t

Y

M

N

2ind (Y) [.2]1ind (Y) [.8]

2ind (M) [.5]

1ind (M) [.5]

[.1]assoc(Y,N)

2ind (N) [.9]1ind (N)

assoc(M,N)

assoc(Y,M)

[.2]

[1.0]

[.1]

^T_Individual Association Costpenalties penalties`

Y,M a bdcfegbdhfegbji bjikeRb c 1.1(none) b l�eRbdhfegbji 0 1.4`

Y,M,N a b cfeRbdhfegb m 0 1.6`Y a bdcfegbdhfeRbni ib oTeRbji 1.9`N a b l�egbdhfeRb m bnikeRb c 2.5`M a b l�egbdhfeRbni ib oTeRbdc 2.6`Y,N a bdcfegbdhfegb m i6b ofeRb c 2.8`M,N a b l�egbdhfegb m i6b ofeRbni 3.3

Figure 2: Graph for classifying three items. Brackets enclose example values; here, the individual scores happen tobe probabilities. Based on individual scores alone, we would put p (“yes”) in

^ _, q (“no”) in

^�r, and be undecided

about s (“maybe”). But the association scores favor cuts that put p and s in the same class, as shown in the table.Thus, the minimum cut, indicated by the dashed line, places s together with p in

^ _.

of the weights of all edges crossing from K to L . Aminimum cut of > is one of minimum cost.

Observe that every cut corresponds to a partition ofthe items and has cost equal to the partition cost.Thus, our optimization problem reduces to findingminimum cuts.

Practical advantages As we have noted, formulat-ing our subjectivity-detection problem in terms ofgraphs allows us to model item-specific and pair-wise information independently. Note that this isa very flexible paradigm. For instance, it is per-fectly legitimate to use knowledge-rich algorithmsemploying deep linguistic knowledge about sen-timent indicators to derive the individual scores.And we could also simultaneously use knowledge-lean methods to assign the association scores. In-terestingly, Yu and Hatzivassiloglou (2003) com-pared an individual-preference classifier against arelationship-based method, but didn’t combine thetwo; the ability to coordinate such algorithms isprecisely one of the strengths of our approach.

But a crucial advantage specific to the utilizationof a minimum-cut-based approach is that we can usemaximum-flow algorithms with polynomial asymp-totic running times — and near-linear running timesin practice — to exactly compute the minimum-cost cut(s), despite the apparent intractability ofthe optimization problem (Cormen, Leiserson, andRivest, 1990; Ahuja, Magnanti, and Orlin, 1993).2

In contrast, other graph-partitioning problems thathave been previously used to formulate NLP clas-sification problems3 are NP-complete (Hatzivassi-

2Code available at http://www.avglab.com/andrew/soft.html.3Graph-based approaches to general clustering problems

are too numerous to mention here.

loglou and McKeown, 1997; Agrawal et al., 2003;Joachims, 2003).

3 Evaluation FrameworkOur experiments involve classifying movie reviewsas either positive or negative, an appealing task forseveral reasons. First, as mentioned in the intro-duction, providing polarity information about re-views is a useful service: witness the popularity ofwww.rottentomatoes.com. Second, movie reviewsare apparently harder to classify than reviews ofother products (Turney, 2002; Dave, Lawrence, andPennock, 2003). Third, the correct label can be ex-tracted automatically from rating information (e.g.,number of stars). Our data4 contains 1000 positiveand 1000 negative reviews all written before 2002,with a cap of 20 reviews per author (312 authorstotal) per category. We refer to this corpus as thepolarity dataset.

Default polarity classifiers We tested support vec-tor machines (SVMs) and Naive Bayes (NB). Fol-lowing Pang et al. (2002), we use unigram-presencefeatures: the � th coordinate of a feature vector is1 if the corresponding unigram occurs in the inputtext, 0 otherwise. (For SVMs, the feature vectorsare length-normalized). Each default document-level polarity classifier is trained and tested on theextracts formed by applying one of the sentence-level subjectivity detectors to reviews in the polaritydataset.

Subjectivity dataset To train our detectors, weneed a collection of labeled sentences. Riloff and

4Available at www.cs.cornell.edu/people/pabo/movie-review-data/ (review corpus version 2.0).

Page 4: A Sentimental Education: Sentiment Analysis Using ... · A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts Bo Pang and Lillian Lee

Wiebe (2003) state that “It is [very hard] to ob-tain collections of individual sentences that can beeasily identified as subjective or objective”; thepolarity-dataset sentences, for example, have notbeen so annotated.5 Fortunately, we were ableto mine the Web to create a large, automatically-labeled sentence corpus6 . To gather subjectivesentences (or phrases), we collected 5000 movie-review snippets (e.g., “bold, imaginative, and im-possible to resist”) from www.rottentomatoes.com.To obtain (mostly) objective data, we took 5000 sen-tences from plot summaries available from the In-ternet Movie Database (www.imdb.com). We onlyselected sentences or snippets at least ten wordslong and drawn from reviews or plot summaries ofmovies released post-2001, which prevents overlapwith the polarity dataset.

Subjectivity detectors As noted above, we can useour default polarity classifiers as “basic” sentence-level subjectivity detectors (after retraining on thesubjectivity dataset) to produce extracts of the orig-inal reviews. We also create a family of cut-basedsubjectivity detectors; these take as input the set ofsentences appearing in a single document and de-termine the subjectivity status of all the sentencessimultaneously using per-item and pairwise rela-tionship information. Specifically, for a given doc-ument, we use the construction in Section 2.2 tobuild a graph wherein the source ! and sink A cor-respond to the class of subjective and objective sen-tences, respectively, and each internal node @�� cor-responds to the document’s �'tvu sentence !6� . We canset the individual scores �������9�:!�/� to wyxCz|{}/~� �:!�/� and����� � �:! � � to ����wyxCz�{}�~� �:! � � , as shown in Figure 3,where wyx z�{}�~6� �:!�� denotes Naive Bayes’ estimate ofthe probability that sentence ! is subjective; or, wecan use the weights produced by the SVM classi-fier instead.7 If we set all the association scoresto zero, then the minimum-cut classification of thesentences is the same as that of the basic subjectiv-ity detector. Alternatively, we incorporate the de-

5We therefore could not directly evaluate sentence-classification accuracy on the polarity dataset.

6Available at www.cs.cornell.edu/people/pabo/movie-review-data/ , sentence corpus version 1.0.

7We converted SVM output � 8 , which is a signed distance(negative=objective) from the separating hyperplane, to non-negative numbers by�v� � .E��� 8��5�J�:������ � 8��Y� ;� �)� � 8 �:�E���k�|� � 8 ��� ;� � 8�� �k� .and

�v� � 4 ��� 8 � � � � ��� � . ��� 8 � . Note that scaling is employedonly for consistency; the algorithm itself does not require prob-abilities for individual scores.

gree of proximity between pairs of sentences, con-trolled by three parameters. The threshold L spec-ifies the maximum distance two sentences can beseparated by and still be considered proximal. Thenon-increasing function �T�\�"� specifies how the in-fluence of proximal sentences decays with respect todistance � ; in our experiments, we tried �T�\���MNU� ,  �¡�¢ , and �£#� � . The constant % controls the relativeinfluence of the association scores: a larger % makesthe minimum-cut algorithm more loath to put prox-imal sentences in different classes. With these inhand8, we set (for ¤¦¥§� ) �!#!$�%&�:!�J�7!¨�� ¢E©IªN¬« �T�­¤��g�'�k®6% if �­¤¯�g�'��°±L ;²

otherwise.

4 Experimental ResultsBelow, we report average accuracies computed byten-fold cross-validation over the polarity dataset.Section 4.1 examines our basic subjectivity extrac-tion algorithms, which are based on individual-sentence predictions alone. Section 4.2 evaluatesthe more sophisticated form of subjectivity extrac-tion that incorporates context information via theminimum-cut paradigm.

As we will see, the use of subjectivity extractscan in the best case provide satisfying improve-ment in polarity classification, and otherwise canat least yield polarity-classification accuracies indis-tinguishable from employing the full review. At thesame time, the extracts we create are both smalleron average than the original document and moreeffective as input to a default polarity classifierthan the same-length counterparts produced by stan-dard summarization tactics (e.g., first- or last-N sen-tences). We therefore conclude that subjectivity ex-traction produces effective summaries of documentsentiment.

4.1 Basic subjectivity extractionAs noted in Section 3, both Naive Bayes and SVMscan be trained on our subjectivity dataset and thenused as a basic subjectivity detector. The former hassomewhat better average ten-fold cross-validationperformance on the subjectivity dataset (92% vs.90%), and so for space reasons, our initial discus-sions will focus on the results attained via NB sub-jectivity detection.

Employing Naive Bayes as a subjectivity detec-tor (ExtractNB) in conjunction with a Naive Bayesdocument-level polarity classifier achieves 86.4%

8Parameter training is driven by optimizing the performanceof the downstream polarity classifier rather than the detectoritself because the subjectivity dataset’s sentences come fromdifferent reviews, and so are never proximal.

Page 5: A Sentimental Education: Sentiment Analysis Using ... · A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts Bo Pang and Lillian Lee

...

...

subsubNB NB

s1

s2

s3

s4

s_n

³y³y³y³y³y³´y´y´y´y´constructgraph

computemin. cut

µyµyµµyµyµ¶y¶¶y¶·y··y·¸y¸¸y¸ extract

create s1

s4

m−sentence extract(m<=n)¹y¹y¹¹y¹y¹ºyººyº

n−sentence review

v2

v3

v1

s t

v n

v1

v2s

v3

edge crossing the cut

v n

t

proximity link

individual subjectivity−probability linkPr

1−Pr (s1)Pr (s1)

Figure 3: Graph-cut-based creation of subjective extracts.

accuracy.9 This is a clear improvement over the82.8% that results when no extraction is applied(Full review); indeed, the difference is highly sta-tistically significant (»½¼ ² � ² � , paired t-test). WithSVMs as the polarity classifier instead, the Full re-view performance rises to 87.15%, but comparisonvia the paired t-test reveals that this is statisticallyindistinguishable from the 86.4% that is achieved byrunning the SVM polarity classifier on ExtractNBinput. (More improvements to extraction perfor-mance are reported later in this section.)

These findings indicate10 that the extracts pre-serve (and, in the NB polarity-classifier case, appar-ently clarify) the sentiment information in the orig-inating documents, and thus are good summariesfrom the polarity-classification point of view. Fur-ther support comes from a “flipping” experiment:if we give as input to the default polarity classifieran extract consisting of the sentences labeled ob-jective, accuracy drops dramatically to 71% for NBand 67% for SVMs. This confirms our hypothesisthat sentences discarded by the subjectivity extrac-tion process are indeed much less indicative of sen-timent polarity.

Moreover, the subjectivity extracts are muchmore compact than the original documents (an im-portant feature for a summary to have): they containon average only about 60% of the source reviews’words. (This word preservation rate is plotted alongthe x-axis in the graphs in Figure 5.) This promptsus to study how much reduction of the original doc-uments subjectivity detectors can perform and stillaccurately represent the texts’ sentiment informa-tion.

We can create subjectivity extracts of varying

9This result and others are depicted in Figure 5; for now,consider only the y-axis in those plots.

10Recall that direct evidence is not available because the po-larity dataset’s sentences lack subjectivity labels.

lengths by taking just the ¾ most subjective sen-tences11 from the originating review. As one base-line to compare against, we take the canonical sum-marization standard of extracting the first ¾ sen-tences — in general settings, authors often be-gin documents with an overview. We also con-sider the last ¾ sentences: in many documents,concluding material may be a good summary, andwww.rottentomatoes.com tends to select “snippets”from the end of movie reviews (Beineke et al.,2004). Finally, as a sanity check, we include resultsfrom the ¾ least subjective sentences according toNaive Bayes.

Figure 4 shows the polarity classifier results as¾ ranges between 1 and 40. Our first observationis that the NB detector provides very good “bangfor the buck”: with subjectivity extracts containingas few as 15 sentences, accuracy is quite close towhat one gets if the entire review is used. In fact,for the NB polarity classifier, just using the 5 mostsubjective sentences is almost as informative as theFull review while containing on average only about22% of the source reviews’ words.

Also, it so happens that at ¾¿NÁÀ ² , performanceis actually slightly better than (but statistically in-distinguishable from) Full review even when theSVM default polarity classifier is used (87.2% vs.87.15%).12 This suggests potentially effective ex-traction alternatives other than using a fixed proba-bility threshold (which resulted in the lower accu-racy of 86.4% reported above).

Furthermore, we see in Figure 4 that the ¾ most-

11These are the  sentences assigned the highest probabilityby the basic NB detector, regardless of whether their probabil-ities exceed 50% and so would actually be classified as subjec-tive by Naive Bayes. For reviews with fewer than  sentences,the entire review will be returned.

12Note that roughly half of the documents in the polaritydataset contain more than 30 sentences (average=32.3, standarddeviation 15).

Page 6: A Sentimental Education: Sentiment Analysis Using ... · A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts Bo Pang and Lillian Lee

55

60

65

70

75

80

85

90

1 5 10 15 20 25 30 35 40

Ave

rage

acc

urac

y

N

Accuracy for N-sentence abstracts (def = NB)

N most subjective sentenceslast N sentencesfirst N sentences

N least subjective sentencesFull review

55

60

65

70

75

80

85

90

1 5 10 15 20 25 30 35 40

Ave

rage

acc

urac

y

N

Accuracy for N-sentence abstracts (def = SVM)

N most subjective sentenceslast N sentencesfirst N sentences

N least subjective sentencesFull review

Figure 4: Accuracies using N-sentence extracts for NB (left) and SVM (right) default polarity classifiers.

83

83.5

84

84.5

85

85.5

86

86.5

87

0.6 0.7 0.8 0.9 1 1.1

Ave

rage

acc

urac

y

% of words extracted

Accuracy for subjective abstracts (def = NB)

difference in accuracy

ExtractSVM+Prox

ExtractNB+ProxExtractNB

ExtractSVM

not statistically significant

Full Reviewindicates statistically significantimprovement in accuracy 83

83.5

84

84.5

85

85.5

86

86.5

87

0.6 0.7 0.8 0.9 1 1.1

Ave

rage

acc

urac

y

% of words extracted

Accuracy for subjective abstracts (def = SVM)

difference in accuracy ExtractNB+ProxExtractSVM+Prox

ExtractSVM

ExtractNBnot statistically significant

Full Review

improvement in accuracyindicates statistically significant

Figure 5: Word preservation rate vs. accuracy, NB (left) and SVMs (right) as default polarity classifiers.Also indicated are results for some statistical significance tests.

subjective-sentences method generally outperformsthe other baseline summarization methods (whichperhaps suggests that sentiment summarization can-not be treated the same as topic-based summariza-tion, although this conjecture would need to be veri-fied on other domains and data). It’s also interestingto observe how much better the last ¾ sentences arethan the first ¾ sentences; this may reflect a (hardlysurprising) tendency for movie-review authors toplace plot descriptions at the beginning rather thanthe end of the text and conclude with overtly opin-ionated statements.

4.2 Incorporating context information

The previous section demonstrated the value ofsubjectivity detection. We now examine whethercontext information, particularly regarding sentenceproximity, can further improve subjectivity extrac-tion. As discussed in Section 2.2 and 3, con-

textual constraints are easily incorporated via theminimum-cut formalism but are not natural inputsfor standard Naive Bayes and SVMs.

Figure 5 shows the effect of adding inproximity information. ExtractNB+Prox andExtractSVM+Prox are the graph-based subjectivitydetectors using Naive Bayes and SVMs, respec-tively, for the individual scores; we depict thebest performance achieved by a single setting ofthe three proximity-related edge-weight parametersover all ten data folds13 (parameter selection wasnot a focus of the current work). The two compar-isons we are most interested in are ExtractNB+Proxversus ExtractNB and ExtractSVM+Prox versusExtractSVM.

We see that the context-aware graph-based sub-jectivity detectors tend to create extracts that are

13Parameters are chosen from ÃÅÄÇÆ �3È � È�ÉHÊ , Ë � � � ÄÆ �3È/Ì .:Í � ÈI� � � 4 Ê , and ÎkÄÐÏ � È�JÑ at intervals of 0.1.

Page 7: A Sentimental Education: Sentiment Analysis Using ... · A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts Bo Pang and Lillian Lee

more informative (statistically significant so (pairedt-test) for SVM subjectivity detectors only), al-though these extracts are longer than their context-blind counterparts. We note that the performanceenhancements cannot be attributed entirely to themere inclusion of more sentences regardless ofwhether they are subjective or not — one counter-argument is that Full review yielded substantiallyworse results for the NB default polarity classifier—and at any rate, the graph-derived extracts are stillsubstantially more concise than the full texts.

Now, while incorporating a bias for assigningnearby sentences to the same category into NB andSVM subjectivity detectors seems to require somenon-obvious feature engineering, we also wishto investigate whether our graph-based paradigmmakes better use of contextual constraints that canbe (more or less) easily encoded into the input ofstandard classifiers. For illustrative purposes, weconsider paragraph-boundary information, lookingonly at SVM subjectivity detection for simplicity’ssake.

It seems intuitively plausible that paragraphboundaries (an approximation to discourse bound-aries) loosen coherence constraints between nearbysentences. To capture this notion for minimum-cut-based classification, we can simply reduce the as-sociation scores for all pairs of sentences that oc-cur in different paragraphs by multiplying them bya cross-paragraph-boundary weight Ò ZÔÓ ² �9�¨Õ . Forstandard classifiers, we can employ the trick of hav-ing the detector treat paragraphs, rather than sen-tences, as the basic unit to be labeled. This en-ables the standard classifier to utilize coherence be-tween sentences in the same paragraph; on the otherhand, it also (probably unavoidably) poses a hardconstraint that all of a paragraph’s sentences get thesame label, which increases noise sensitivity.14 Ourexperiments reveal the graph-cut formulation to bethe better approach: for both default polarity clas-sifiers (NB and SVM), some choice of parameters(including Ò ) for ExtractSVM+Prox yields statisti-cally significant improvement over its paragraph-unit non-graph counterpart (NB: 86.4% vs. 85.2%;SVM: 86.15% vs. 85.45%).

5 ConclusionsWe examined the relation between subjectivity de-tection and polarity classification, showing that sub-jectivity detection can compress reviews into muchshorter extracts that still retain polarity informationat a level comparable to that of the full review. In

14For example, in the data we used, boundaries may havebeen missed due to malformed html.

fact, for the Naive Bayes polarity classifier, the sub-jectivity extracts are shown to be more effective in-put than the originating document, which suggeststhat they are not only shorter, but also “cleaner” rep-resentations of the intended polarity.

We have also shown that employing theminimum-cut framework results in the develop-ment of efficient algorithms for sentiment analy-sis. Utilizing contextual information via this frame-work can lead to statistically significant improve-ment in polarity-classification accuracy. Directionsfor future research include developing parameter-selection techniques, incorporating other sources ofcontextual cues besides sentence proximity, and in-vestigating other means for modeling such informa-tion.

AcknowledgmentsWe thank Eric Breck, Claire Cardie, Rich Caruana,Yejin Choi, Shimon Edelman, Thorsten Joachims,Jon Kleinberg, Oren Kurland, Art Munson, VincentNg, Fernando Pereira, Ves Stoyanov, Ramin Zabih,and the anonymous reviewers for helpful comments.This paper is based upon work supported in partby the National Science Foundation under grantsITR/IM IIS-0081334 and IIS-0329064, a CornellGraduate Fellowship in Cognitive Studies, and byan Alfred P. Sloan Research Fellowship. Any opin-ions, findings, and conclusions or recommendationsexpressed above are those of the authors and do notnecessarily reflect the views of the National ScienceFoundation or Sloan Foundation.

ReferencesAgrawal, Rakesh, Sridhar Rajagopalan, Ramakrish-

nan Srikant, and Yirong Xu. 2003. Mining news-groups using networks arising from social behav-ior. In WWW, pages 529–535.

Ahuja, Ravindra, Thomas L. Magnanti, andJames B. Orlin. 1993. Network Flows: Theory,Algorithms, and Applications. Prentice Hall.

Beineke, Philip, Trevor Hastie, Christopher Man-ning, and Shivakumar Vaithyanathan. 2004.Exploring sentiment summarization. In AAAISpring Symposium on Exploring Attitude and Af-fect in Text: Theories and Applications (AAAItech report SS-04-07).

Blum, Avrim and Shuchi Chawla. 2001. Learningfrom labeled and unlabeled data using graph min-cuts. In Intl. Conf. on Machine Learning (ICML),pages 19–26.

Boykov, Yuri, Olga Veksler, and Ramin Zabih.1999. Fast approximate energy minimization viagraph cuts. In Intl. Conf. on Computer Vision

Page 8: A Sentimental Education: Sentiment Analysis Using ... · A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts Bo Pang and Lillian Lee

(ICCV), pages 377–384. Journal version in IEEETrans. Pattern Analysis and Machine Intelligence(PAMI) 23(11):1222–1239, 2001.

Cardie, Claire, Janyce Wiebe, Theresa Wilson, andDiane Litman. 2003. Combining low-level andsummary representations of opinions for multi-perspective question answering. In AAAI SpringSymposium on New Directions in Question An-swering, pages 20–27.

Cormen, Thomas H., Charles E. Leiserson, andRonald L. Rivest. 1990. Introduction to Algo-rithms. MIT Press.

Das, Sanjiv and Mike Chen. 2001. Yahoo! forAmazon: Extracting market sentiment from stockmessage boards. In Asia Pacific Finance Associ-ation Annual Conf. (APFA).

Dave, Kushal, Steve Lawrence, and David M. Pen-nock. 2003. Mining the peanut gallery: Opinionextraction and semantic classification of productreviews. In WWW, pages 519–528.

Dini, Luca and Giampaolo Mazzini. 2002. Opin-ion classification through information extraction.In Intl. Conf. on Data Mining Methods andDatabases for Engineering, Finance and OtherFields, pages 299–310.

Durbin, Stephen D., J. Neal Richter, and DougWarner. 2003. A system for affective rating oftexts. In KDD Wksp. on Operational Text Classi-fication Systems (OTC-3).

Hatzivassiloglou, Vasileios and Kathleen Mc-Keown. 1997. Predicting the semantic orienta-tion of adjectives. In 35th ACL/8th EACL, pages174–181.

Joachims, Thorsten. 2003. Transductive learningvia spectral graph partitioning. In Intl. Conf. onMachine Learning (ICML).

Liu, Hugo, Henry Lieberman, and Ted Selker.2003. A model of textual affect sensing usingreal-world knowledge. In Intelligent User Inter-faces (IUI), pages 125–132.

Montes-y-Gomez, Manuel, Aurelio Lopez-Lopez,and Alexander Gelbukh. 1999. Text mining as asocial thermometer. In IJCAI Wksp. on Text Min-ing, pages 103–107.

Morinaga, Satoshi, Kenji Yamanishi, Kenji Tateishi,and Toshikazu Fukushima. 2002. Mining prod-uct reputations on the web. In KDD, pages 341–349. Industry track.

Pang, Bo, Lillian Lee, and ShivakumarVaithyanathan. 2002. Thumbs up? Senti-ment classification using machine learningtechniques. In EMNLP, pages 79–86.

Qu, Yan, James Shanahan, and Janyce Wiebe, edi-tors. 2004. AAAI Spring Symposium on Explor-

ing Attitude and Affect in Text: Theories and Ap-plications. AAAI technical report SS-04-07.

Riloff, Ellen and Janyce Wiebe. 2003. Learningextraction patterns for subjective expressions. InEMNLP.

Riloff, Ellen, Janyce Wiebe, and Theresa Wilson.2003. Learning subjective nouns using extractionpattern bootstrapping. In Conf. on Natural Lan-guage Learning (CoNLL), pages 25–32.

Subasic, Pero and Alison Huettner. 2001. Af-fect analysis of text using fuzzy semantic typing.IEEE Trans. Fuzzy Systems, 9(4):483–496.

Tong, Richard M. 2001. An operational system fordetecting and tracking opinions in on-line discus-sion. SIGIR Wksp. on Operational Text Classifi-cation.

Turney, Peter. 2002. Thumbs up or thumbs down?Semantic orientation applied to unsupervisedclassification of reviews. In ACL, pages 417–424.

Wiebe, Janyce M. 1994. Tracking point of view innarrative. Computational Linguistics, 20(2):233–287.

Yi, Jeonghee, Tetsuya Nasukawa, Razvan Bunescu,and Wayne Niblack. 2003. Sentiment analyzer:Extracting sentiments about a given topic usingnatural language processing techniques. In IEEEIntl. Conf. on Data Mining (ICDM).

Yu, Hong and Vasileios Hatzivassiloglou. 2003.Towards answering opinion questions: Separat-ing facts from opinions and identifying the polar-ity of opinion sentences. In EMNLP.