a review of sentiment analysis approaches in big
DESCRIPTION
The big data phenomenon has confirmed the achievement of data access transformation. Sentiment analysis (SA) is one of the most exploited area and used for profit-making purpose through business intelligence applications. This paper reviews the trends in SA and relates the growth in the area with the big data era.TRANSCRIPT
A Review of Sentiment Analysis Approaches in Big Data Era
Nurfadhlina Mohd SharefDepartment of Computer Science
Faculty of Computer Science and Information Technology, Universiti Putra MalaysiaSerdang, Selangor, Malaysia
Sentiment Analysis
analyzes people’s sentiments, opinions, appraisals, attitudes, evaluations, and emotions
towards entities such as organizations, products, services, individuals, topics, issues, events, and their
attributes
as presented online via text, video and other means of communication.
Sentiment Analysis
These communications can fall into three broad categories: positive, neutral or negative.
There are also many names and slightly different tasks, e.g., sentiment analysis, opinion mining, opinion extraction, sentiment mining, subjectivity analysis, customer complaint, affect analysis, emotion analysis, review mining, review analysis, etc.
Tools DescriptionThe Hadoop Distributed File System
(HDFS)
HDFS divides the data into smaller parts and distributes it across the various
servers/nodes
SQL Server Integration
Service These tools allow posts can be downloaded and loaded into Hadoop
Apache Flume
MapReduceMapReduce is a process that transforms
data loaded into Hadoop into a format that can be used for analysis.
Hivea runtime Hadoop support architecture that leverages Structure Query Language (SQL)
with the Hadoop platform.
Jaql Jaql converts high-level queries into low-level queries and
Zookeeper Zookeeper coordinate parallel processing across big clusters
HBase HBase is a column-oriented database management system that sits on top of
HDFS by using a non-SQL approach.
Problem
Which features to use?
Words (unigrams)
Phrases/n-grams
Sentences
How to interpret features for sentiment detection?
Bag of words (IR)
Annotated lexicons (WordNet, SentiWordNet)
Syntactic patterns
Paragraph structure
Challenges
Harder than topical classification, with which bag of words features perform well
Must consider other features due to…
Subtlety of sentiment expression
irony
expression of sentiment using neutral words
Domain/context dependence
words/phrases can mean different things in different contexts and domains
Effect of syntax on semantics
Sentiment Analysis TrendsYear Quantit
yHighlighted Topics
2004 4 Affective computing, sentiment classification, polarity2005 10 Contextual polarity, phrase level SA, sentiment classification, scores, subject classification2006 10 Lexicon, feature, summarization, mining, understanding, temporal SA, weighted polarity, user profiling
based on SA2007 43 Lexicon, feature mining, emotion detection, clustering, conjuncts presence2008 72 Multi-lingual SA, ratings inference, feature mining, word orientation, SentiWordNet, rating weighting,
radicalization detection, affective computing, compositional semantics analysis, sentiment-based prediction, concept hierarchy, classification
2009 131 ML approaches for SA, user profiling based on SA, feature association, semantic association, visual SA, cross-linguistic SA, ontology-based SA, polarity lexicon, multi-entity scoring, affective computing
2010 216 Orientation analysis, affective computing, linguistic models, applied visual for SA, semantic role labeling, clustering-based SA, cross-lingual SA, SA-based prediction, twitter-based SA, global SentiWordNet, intensity classification, cross-domain SA, opinion question- answering, sentiment topic detection, language specific SA
2011 297 Opinion leader identification, social network-based surveillance, product recommendation, terrorism informatics, affective computing, features clustering, political orientation detection, wish identification, sentiment lexicon, influence detection, personality mining, polarity analysis, graph based sentiment representation, semantic based SA, learning models for SA, emotion clustering, ontology based SA, sentence level SA, language specific SA
2012 454 Linguistic features analysis, business and financial forecasting, attitude prediction, sentiment topic detection, verbs polarity disambiguation, SenticNet, semantic orientation, language specific SA, cross lingual SA, emotion recognition, social values and group identification,
2013 562 Multilingual, ML-based polarity detection, sentiment evolution modeling, aspect-based sentiment classification, social intelligence, SA-based prediction, computational analysis of public voice, emotion mining, SA-based customer care, security-related intelligence, graph extraction, social network-based SA, linguistic features, statistical approaches for SA, concept-level SA, correlational study between financial sentiment and prices in financial markets, subjectivity detection, cross-domain SA, opinion leaders identification,
2014 216 Feature-based SA through ontologies, concept-level SA based on dependency rules, word polarity disambiguation, aspect-oriented SA, sentence-level SA, graph clustering for SA, subjectivity analysis, word sentiment in WordNet 3.0, computational analysis of public voice
Text mining
techniques
multilingual
linguistics
applied
linguistics
Approaches
Sentiment Analysis
Content-based
Polarity Detection Positive, Negative, Neutral
Strength Detection
Typically [-1,1]
SentiWordNet
Feature Mining
Unigram, Bigram
Syntactic, Lexical, Structural
Link-based
Stylistic
Affective Computation
Emotion Classification
Social Network
Influencer
Multilingual
Machine Learning
Naïve Bayes
Support Vector Model
Conclusion
This paper has discussed the trends in SA
the climax of big data era has gained even more focus even the area has been started since before year 2004. Advancements in big data technologies have also enabled this area to flourish.
Nevertheless, many rooms of improvements exist such as maturing the big data technologies and increasing alternatives for SA solutions using the platform.
More infrastructures are also needed to let SA to be exploited for many more applications besides the existing community centric, product review-based and influential assessment.
Studies for techniques of SA in cross-domain dataset and multilingual should also explored.
Improvements for deeper semantic computation such as the SenticNet approach should also be expanded besides enriching SentiWordNet for multilingual, more precise and multi-granular representation