opinion mining - hasso plattner...
TRANSCRIPT
![Page 1: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/1.jpg)
Opinion Mining
Question Answering Seminar
January 20, 2012Nils RethmeierHPI Potsdam
![Page 2: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/2.jpg)
Overview
Motivation● Applications and the task at hand
Introduction● Opinion definition● Opinion analysis
○ sentences, documents, results● Backgrounds (Bayes Classification)● Detection features
Evaluation● Testsets
○ documents, sentences● Results
Discussion
![Page 3: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/3.jpg)
Information extraction discard subjective results■ bias in news
Question Answering opinion detectionSummarization summarizing different points of viewContent rating via comments, stars
■ child protection■ appropriate ad placement
Business Intelligence customer support■ product image mining■ help customers find needed information
Application areas
![Page 4: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/4.jpg)
Definition Opinion :=
Introduction
Task: Given a text ...
![Page 5: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/5.jpg)
Sentence-level classification
Hypothesis: Opinion documents mostly contain opinion sentences
Classifier:
○ sentences similarity○ 1 or n Naive Bayes
Polarity Classification
Document-level classification
● Classifier: Naive Bayes
● Training Data: Reference text collections = News, Business articles (facts), editorials and letters to author (opinion)
Classification
![Page 6: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/6.jpg)
Bayes Classification, theorem
![Page 7: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/7.jpg)
Bayes' Classifier (machine learning ML)
Given: Text W, of words wi
Task: Classify whether W is opinion or fact?
How likely is opinion if we know W > ...
Bayes' Classification, steps
![Page 8: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/8.jpg)
Bayes' Classification, steps
Bayes' Classifier (machine learning ML)
Problem:
Solution: ■ Take a set of reference opinions and facts■ Assume, words occur independent
(Naive Bayes Assumption NBA)
![Page 9: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/9.jpg)
Bayes' Classification, steps
Bayes' Classifier (machine learning ML)
Summary:
1. Learn features How likely is a text W given we want opinions?
2. Use features to classify using Bayes How likely is an opinion/ fact given a text W?
![Page 10: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/10.jpg)
Sentence-level classification
Hypothesis: Opinion documents mostly contain opinion sentences
Classifier:
○ sentences similarity○ 1 or n Naive Bayes
Polarity Classification
Document-level classification
● Classifier: Naive Bayes
● Training Data: Reference text collections = News, Business articles (facts), editorials and letters to author (opinion)
Classification
![Page 11: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/11.jpg)
Classifiers: SimFinder
Sentence Similarity:
Idea: Given a fixed topic, opinion sentences are more similar to each other than they are to factual sentences.
Retrieve: All documents Dt for a topic, e.g. "welfare reforms"
Features: SimFinder similarity score S of each sentence in Dt■ words■ phrases (n-grams)■ WordNet synsets
![Page 12: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/12.jpg)
1 NB classifier C on sentences
Train: Learn features on opinion/ fact articles.
Features: A classifier C with all the features■ n-grams, parts of speech (POS)■ sentence positive/ negative word counts■ polarity n-gram magnitude, e.g. "++"for
two consecutive positive words
Combination:
Classifier: Naive Bayes 1
![Page 13: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/13.jpg)
Classifier: Naive Bayes n
n NB classifiers C1 .. Cn, each with a different feature
Problem: The hypothesis, that opinion documents only contain opinion sentences is flawed.
Idea: Now, only use sentences that are likely to be labeled correctly during training.
Features: as before, but split between classifiers Ci
■ 1-3 grams | POS | +/-words | magnitudes■ recursive filtering of the training data
using next Ci at each recursion step
![Page 14: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/14.jpg)
Document-level classification
● Classifier: Naive Bayes
● Training Data: Reference text collections = News, Business articles (facts), editorials and letters to author (opinion)
Sentence-level classification
Hypothesis: Opinion documents mostly contain opinion sentences
Classifier:
○ sentences similarity○ 1 or n Naive Bayes
Polarity Classification
Polarity Classification
![Page 15: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/15.jpg)
Polarity Classification
Given: A set of polarity words (manually annotated).
Idea: Positive words occur together more often than by chance (word co-occurrence).
Classifier: is positive model P(+) more likely?
![Page 16: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/16.jpg)
Evaluation
Documents classificationGoldstandard: label of each article
Naive Bayes classifier:
Trainingset: 2000 Wall Street Journal (WSJ) articles for each (=4000)■ facts from labels "news", "business articles"■ opinions from labels "editorial" and "Letter to editor"
Testset: another 2000 WSJ articles each
Sentence classification400 sentences of human annotations
● A=300 one annotator● B=100 two annotators agree on
type
Similarity classifier: {recall, precision}
![Page 17: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/17.jpg)
Evaluation
Sentence classification 1 and n Naive Bayes classifiers: human annotations (A = 300, B = 100)
�
● using words only works well already● using word n-grams + POS + polarity works best● using multiple-classifier-filtering increases recall
![Page 18: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/18.jpg)
Evaluation
Sentence classification polarity classifier: accuracy
● combining adjectives, adverbs and verbs yieldsbest polarity classification
�
![Page 19: Opinion Mining - Hasso Plattner Institutehpi.de/.../FG_Naumann/folien/WS1112/Question_Answering/Opinion_Mining.pdfOpinion Mining Question Answering Seminar January 20, 2012 Nils Rethmeier](https://reader033.vdocuments.net/reader033/viewer/2022041421/5e1efc82b71f227b436e8820/html5/thumbnails/19.jpg)
Opinion Mining
Fact/ Opinion Classification
Classifier:
○ document■ Naive Bayes
○ sentences■ similarity■ 1 or n Naive Bayes■ polarity
Discussion
NB Classifier Evaluation
Documents:● Naive Bayes
produces 97% F-measure
Sentences:● Similarity less useful● Naive Bayes already
works well on word n-grams (86% precision)
● polarity classification needs adjectives, adverbs and verbs to work well (90% agreements)