context-sensitive classification of short colloquial text · context-sensitive classification of...

14
1/12 Context-Sensitive Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr TU Delft - Network Architectures and Services (NAS)

Upload: hoanghanh

Post on 21-May-2018

246 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

1/12

Context-Sensitive Classification of Short Colloquial Text

Norbert Blenn, Kassandra Charalampidou, and Christian Doerr TU Delft - Network Architectures and Services (NAS)

Page 2: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

2/12

Outline

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr

“Emotions propagate through a social network like viruses.” “Some people influence others opinions in online social networks”

1. The data (Short Colloquial Text) 2. Sentiment Classification

• Objective vs. Subjective • Compared to existing tools

3. Automatic Detection of Polarization Intensity 4. Networks of Concepts (Word-Graphs)

Page 3: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

3/12

A Word about the used Dataset Twitter Twitter is the largest “free” source of user conversations at this time. Users post what they are doing, their opinions, asking questions or simply discuss whatever they want in 140 characters of text. Our dataset is based on the twitter stream, having a corpus of ca. 500,000,000 tweets. The Twitter sample stream provides: 1% off all Tweets -> randomly sampled (still 17 per second). The Text of a Tweet may contain meta information “#” as Topic indicators “@+Username” in order to mentioned users “RT” indicates that this Tweet is a ReTweet

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr

Tweet

Text: I like the movie I saw last night but the cinema was bad. Userobject, If retweeted the full retweeted Tweet

created at, in reply to user…, Tweetid, location,

username, real name, timezone, Joined Twitter at, url, profile image url, description, Number of friends & followers location, language, number of posts …

Page 4: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

4/12

Sentiment Analysis Objective – Subjective Classification

Examples: “I liked The King’s Speech” - subjective “The King’s Speech was a long movie” – objective “I like you, even when watching The King’s Speech” - Subjective with a reference on you

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr

Page 5: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

5/12

Sentiment Analysis Objective – Subjective Classification

Reference detection – (Stanford NLP [lexicographical parser]) : [I, liked, the, movie] (ROOT (S (NP (PRP I)) (VP (VBD liked) (NP (DT the) (NN movie))))) nsubj(liked-2, I-1), det(movie-4, the-3), dobj(liked-2, movie-4) “I”: nominal subject of “liked”, “movie”: direct object of “liked” Check if there is an adjective/verb referring to the subject of interest (WordNet or Stanford PoS [part-of-speech Tagger]). I/PRP liked/VBD the/DT movie/NN “I” is a personal pronoun, “liked” a verb in past tense, “the” a determiner and “movie” a noun.

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr

Page 6: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

6/12

“Subjectivity is mostly based on adjectives or verbs expressing the polarity related to the subject of the message.”

• 1,073 randomly chosen English tweets related to movies of the 83rd Academy

Awards(Oscars)

• Evaluation against:

• Manual Sentiment classification • “Twitter Sentiment” – SVM trained on tweets containing emoticons • “Tweet Sentiments” – SVM trained by users of the service • “Lingpipe” – SVM, Maximum Entropy, Naive Bayes trained on a given dataset

IMDB / half of our hand classified dataset

Sentiment Analysis Classification Evaluation

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr

Page 7: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

7/12

Sentiment Analysis

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr

Classification Evaluation

Page 8: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

8/12

Sentiment Analysis Polarity Classification

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr

Page 9: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

9/12

Sentiment Analysis Polarity Classification

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr

Page 10: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

10/12

Sentiment Analysis Polarity Classification Possible for all words /languages (252,000 words only from English tweets from the first 2 weeks in February ‘12): • Sunday: 0.014 • Monday: - 0.09 • Good, 0.19 • Bad, -0.59 • Networking: 0.15 • Sweet, 0.26 • Sour, -0.03 • …

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr

• 0, 0.03 • 1, 0.02 • 2, -0.07 • 3, 0.13 • 4, -0.03 • 5, -0.08 • 6, -0.11 • 7, -0.35 • 8, -0.06 • 9, -0.07

Page 11: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

11/12

Networks of Concepts Graphs generated by Word co-ocurrences Creating a graph of words: • Words are connected if they appear in the same Tweet • Links are directed and weighted • The Link weight is given through the probability a

word co-occurs with the second one. • The node properties are given by term and document

frequencies

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr

Page 12: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

12/12

Networks of Concepts Graphs generated by Word co-ocurrences

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr

Page 13: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

13/12

A Word about the used Dataset Twitter Sentiment Analysis Typical propagation pattern: (positive vs. negative Tweets)

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr

Page 14: Context-Sensitive Classification of Short Colloquial Text · Context-Sensitive Classification of Short Colloquial Text ... (Stanford NLP [lexicographical ... Context-Sensitive Sentiment

14/12

Thank you for your attention Questions

Delft University of Technology Faculty of Electr. Engineering Dept. of Telecommunication

Mekelweg 4 2628 CD Delft

The Netherlands Room: EWI 19.240

Context-Sensitive Sentiment Classification of Short Colloquial Text Norbert Blenn, Kassandra Charalampidou, and Christian Doerr