opinion mining

28
Opinion Mining Opinion Mining

Upload: ali-habeeb

Post on 15-Jan-2015

430 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Opinion Mining

Opinion MiningOpinion Mining

Page 2: Opinion Mining

OutlineOutline

DefinitionDefinition ApplicationsApplications ChallengesChallenges Model ConclusionConclusion References References

Page 3: Opinion Mining

DefinitionDefinition

Opinion mining (sentiment mining, opinion/sentiment Opinion mining (sentiment mining, opinion/sentiment extraction) is the area of research that attempts to extraction) is the area of research that attempts to make automatic systems to determine human opinion make automatic systems to determine human opinion from text written in natural language. from text written in natural language.

It seeks to identify the view point (s) underlying a It seeks to identify the view point (s) underlying a text span; an example application is classifying a text span; an example application is classifying a movie review as thumbs up or thumbs down.movie review as thumbs up or thumbs down.

Page 4: Opinion Mining

Opinion mining is a new discipline which has Opinion mining is a new discipline which has recently attracted increased attension within fields recently attracted increased attension within fields such as Marketing,Recomandation systems and such as Marketing,Recomandation systems and financial market prediction.Although often financial market prediction.Although often associated Emotional states from text,Opinion associated Emotional states from text,Opinion Mining is an independent area related to Natural Mining is an independent area related to Natural Language Processing And Text mining that deals Language Processing And Text mining that deals with the Identification of opinionsAnd attitudes in with the Identification of opinionsAnd attitudes in

Natural Language TextNatural Language Text . .

Page 5: Opinion Mining

Consider, for instance, the following scenario. A Consider, for instance, the following scenario. A major computer manufacturer, disappointed with major computer manufacturer, disappointed with unexpectedly low sales, finds itself confronted with unexpectedly low sales, finds itself confronted with this question: this question:

Why aren't consumers buying our laptop? Why aren't consumers buying our laptop?

Page 6: Opinion Mining

What other people think has always been an What other people think has always been an important piece of information for most of us during important piece of information for most of us during the decision-making process.the decision-making process.

Opinion mining draws on computational linguistic, Opinion mining draws on computational linguistic, information retrieval, text mining, natural language information retrieval, text mining, natural language processing, machine learning, statistics and predictive processing, machine learning, statistics and predictive analysisanalysis

Page 7: Opinion Mining

Two main types of textual information.Two main types of textual information.

1.1. Facts Facts

2.2. OpinionsOpinions Most current information processing technique (e.g., Most current information processing technique (e.g.,

search engines) work with facts (assume they are search engines) work with facts (assume they are true)true)

Facts can be expressed with topic keywords Facts can be expressed with topic keywords

Page 8: Opinion Mining

In real life, facts are important, but opinion alsoIn real life, facts are important, but opinion also

plays a crucial role. A computer manufacturer,plays a crucial role. A computer manufacturer,

disappointed with low sales, asks itself: Why arendisappointed with low sales, asks itself: Why aren ’’tt

consumers buying our laptop? The Democraticconsumers buying our laptop? The Democratic

National Committee, disappointed with the lastNational Committee, disappointed with the last

election, wants to know on an on-going basis: Whatelection, wants to know on an on-going basis: What

is the reaction in the press, newsgroups, chat rooms,is the reaction in the press, newsgroups, chat rooms,

and blogs to Bushand blogs to Bush’’s latest policy decision?s latest policy decision?

Page 9: Opinion Mining

The main advantage is the speedThe main advantage is the speed

On average, humans process six articles On average, humans process six articles per hour against the machineper hour against the machine’’s throughput of 10 per s throughput of 10 per second second

Page 10: Opinion Mining

ApplicationsApplications recommendation systemsrecommendation systems

SummarizationSummarization

Applications in BusinessApplications in Business marketing intelligence, product and service benchmarking and

improvement. To understand the voice of the customer as To understand the voice of the customer as

expressed in everyday communicationsexpressed in everyday communications

Page 11: Opinion Mining

ApplicationsApplications

PoliticsPolitics

As is well known, opinions matter a As is well known, opinions matter a great deal in politics. Some work has focused on great deal in politics. Some work has focused on understanding what voters are thinkingunderstanding what voters are thinking

Page 12: Opinion Mining

ChallengesChallenges

The difficulty lies in the richness of the language that human use.

Example:Example: 1. This is a 1. This is a greatgreat camera. camera.

2. A 2. A greatgreat amount of money was spent for amount of money was spent for promoting this camera.promoting this camera.

3. One might think this is a 3. One might think this is a greatgreat camera. camera. Well think again, because.....Well think again, because.....

a single keyword can be used to convey three a single keyword can be used to convey three different opinions, +ve, neutral and -ve respectively.different opinions, +ve, neutral and -ve respectively.

Page 13: Opinion Mining

ChallengesChallenges

In order to arrive at sensible conclusions, sentiment In order to arrive at sensible conclusions, sentiment analysis has to understand context. For example, analysis has to understand context. For example, ““fightingfighting”” and and ““diseasedisease”” is negative in a war context is negative in a war context but positive in a medical one.but positive in a medical one.

Different mining for different domains.Different mining for different domains.

Page 14: Opinion Mining

sentiment analysis model

Page 15: Opinion Mining

Data Preparation

The data preparation step performs necessary data preprocessing and cleaning on the dataset for the subsequent analysis. Some commonly used preprocessing steps include removing non-textual contents and markup tags (for HTML pages), and removing information about the reviews that are not required for sentiment analysis, such as review dates and reviewers’ names.

Page 16: Opinion Mining

Review Analysis

The review analysis step analyzes the linguistic features of reviews so that interesting information, including opinions and/or product features, can be identified.

This step often applies various computational linguistics tasks to reviews first, and then extracts opinions and product features from the processed reviews.

Page 17: Opinion Mining

Sentiment Classification

There are two main techniques for sentiment classification:

The symbolic technique uses manually crafted rules and lexicons,

The machine learning approach uses unsupervised, or supervised learning to construct a model from a large training corpus.

Page 18: Opinion Mining

WhatWhat??

Find relevant words, phrases, patterns that can be Find relevant words, phrases, patterns that can be used to express subjectivityused to express subjectivity

Determine the polarity of subjective expressionsDetermine the polarity of subjective expressions

Page 19: Opinion Mining

WordsWords

AdjectivesAdjectives positivepositive:: honest important mature large patienthonest important mature large patient

Ron Paul is the only Ron Paul is the only honesthonest man in Washington. man in Washington. KitchellKitchell’’s writing is unbelievably s writing is unbelievably maturemature and is only likely to get and is only likely to get

better. better. To humour me my To humour me my patientpatient father agrees yet again to my choice of father agrees yet again to my choice of

film film

negativenegative: : harmful hypocritical inefficient insecureharmful hypocritical inefficient insecure It was a macabre and It was a macabre and hypocriticalhypocritical circus. circus. Why are they being so Why are they being so inefficientinefficient ? ?

Page 20: Opinion Mining

WordsWords

VerbsVerbs positivepositive:: praise, lovepraise, love negativenegative: : blame, criticizeblame, criticize

NounsNouns positivepositive: : pleasure, enjoymentpleasure, enjoyment negativenegative: : pain, criticismpain, criticism

Page 21: Opinion Mining

PhrasesPhrases

Phrases containing adjectives and adverbsPhrases containing adjectives and adverbs positive: positive: high intelligence, low costhigh intelligence, low cost negative: negative: little variation, many troubleslittle variation, many troubles

Page 22: Opinion Mining

Machine LearningMachine Learning

Studies showed that standard machine learning techniques definitively outperform human-produced baselines.

To treat sentiment classification simply as a special case of topic-based categorization (with the two “topics” being positive sentiment and negative sentiment)

Page 23: Opinion Mining

Supervised Methods

In order to train a classifier for sentiment recognition in text, classic supervised learning techniques (e.g. Support Vector Machines, naive Bayes, Maximum Entropy) can be used. A supervised approach entails the use of a labelled training corpus to learn a certain classification function. The method that in the literature often yields the highest accuracy regards a Support Vector Machine classifier

Page 24: Opinion Mining

Suport Vector MachineSuport Vector Machine

Page 25: Opinion Mining
Page 26: Opinion Mining

Unsupervised LearningUnsupervised Learning

A A clustering algorithmclustering algorithm partitions the adjectives into two partitions the adjectives into two subsetssubsets

nice

handsome

terrible

comfortable

painful

expensive

fun

scenicslow

+

Page 27: Opinion Mining

ConclusionConclusion An important field of studyAn important field of study New FieldNew Field Many applicationsMany applications Almost no work in this areaAlmost no work in this area

Page 28: Opinion Mining

ReferencesReferences

Pang, Bo and Lee, L. (2008). “Opinion Mining and Sentiment Analysis”, Foundations and Trends R in, Information Retrieval, Vol. 2, Nos. 1–2 (2008) 1–135, ebook from http://www.cs.cornell.edu/home/llee/omsa/omsa.pdf

Wiebe, J. Cardie, C. and Riloff, E. ( 2007). Wiebe, J. Cardie, C. and Riloff, E. ( 2007). ““Manual and Automatic Subjectivity and Sentiment Manual and Automatic Subjectivity and Sentiment Analysis” , Analysis” , Center for Extraction and Center for Extraction and Summarization of Events and Opinions in Text. Summarization of Events and Opinions in Text. University of UtahUniversity of Utah