exploring demographic language variations to …svitlana/posters/emnlp13-slides.pdfexploring...

66
Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media Svitlana Volkova 1 , Theresa Wilson 2 and David Yarowsky 1,2 , 1 Center for Language and Speech Processing, Johns Hopkins University 2 Human-Language technology Center of Excellence

Upload: others

Post on 22-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Exploring Demographic Language Variations toImprove Multilingual Sentiment Analysis

in Social Media

Svitlana Volkova1, Theresa Wilson2 and David Yarowsky1,2,

1Center for Language and Speech Processing, Johns Hopkins University2Human-Language technology Center of Excellence

Page 2: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott, Choi, Cardie, & Hancock,2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor, Eisenstein, Xing,& Smith, 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Page 3: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Page 4: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Page 5: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Page 6: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),

detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Page 7: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),

tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Page 8: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),

large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Page 9: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Page 10: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”

♂− “Clearly they know our weakness. Argggg....”

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 3 / 35

Page 11: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 3 / 35

Page 12: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 3 / 35

Page 13: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 4 / 35

Page 14: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 5 / 35

Page 15: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Motivation

Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 6 / 35

Page 16: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Goal

I. Explore gender bias in the use of subjective language in Twitter:

investigate multilingual subjective lexical variations;cross-cultural emoticon and hashtag usage.

II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:

demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

Page 17: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Goal

I. Explore gender bias in the use of subjective language in Twitter:investigate multilingual subjective lexical variations;

cross-cultural emoticon and hashtag usage.

II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:

demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

Page 18: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Goal

I. Explore gender bias in the use of subjective language in Twitter:investigate multilingual subjective lexical variations;cross-cultural emoticon and hashtag usage.

II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:

demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

Page 19: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Goal

I. Explore gender bias in the use of subjective language in Twitter:investigate multilingual subjective lexical variations;cross-cultural emoticon and hashtag usage.

II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:

demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

Page 20: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Goal

I. Explore gender bias in the use of subjective language in Twitter:investigate multilingual subjective lexical variations;cross-cultural emoticon and hashtag usage.

II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:

demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

Page 21: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Data

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 8 / 35

Page 22: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Data

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 8 / 35

Page 23: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Data

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 8 / 35

Page 24: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Page 25: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Page 26: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Page 27: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)

Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Page 28: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)

Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Page 29: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))

Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Page 30: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Page 31: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Metrics Lexical Evaluation across Genders

Term ti subjectivity:

pti (subj |g) =c(ti ,P,g) + c(ti ,N,g)

c(ti ,g),

Term ti polarity:

pti (+|g) =c(ti ,P,g)

c(ti ,P,g) + c(ti ,N,g),

Polarity change across genders:

∆p+ti = |pti (+|F )− pti (+|M)|

s.t .∣∣∣∣∣1− tf subjti (F )

tf subjti (M)

∣∣∣∣∣ ≤ λ, tf subjti (M) 6= 0,

λ controls term frequency similarity.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 10 / 35

Page 32: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Metrics Lexical Evaluation across Genders

Term ti subjectivity:

pti (subj |g) =c(ti ,P,g) + c(ti ,N,g)

c(ti ,g),

Term ti polarity:

pti (+|g) =c(ti ,P,g)

c(ti ,P,g) + c(ti ,N,g),

Polarity change across genders:

∆p+ti = |pti (+|F )− pti (+|M)|

s.t .∣∣∣∣∣1− tf subjti (F )

tf subjti (M)

∣∣∣∣∣ ≤ λ, tf subjti (M) 6= 0,

λ controls term frequency similarity.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 10 / 35

Page 33: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Metrics Lexical Evaluation across Genders

Term ti subjectivity:

pti (subj |g) =c(ti ,P,g) + c(ti ,N,g)

c(ti ,g),

Term ti polarity:

pti (+|g) =c(ti ,P,g)

c(ti ,P,g) + c(ti ,N,g),

Polarity change across genders:

∆p+ti = |pti (+|F )− pti (+|M)|

s.t .∣∣∣∣∣1− tf subjti (F )

tf subjti (M)

∣∣∣∣∣ ≤ λ, tf subjti (M) 6= 0,

λ controls term frequency similarity.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 10 / 35

Page 34: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Lexical Evaluation across Genders for English

Terms: 3 - from LI , 4 - bootstrapped lexicon LB, and � - hashtags

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 11 / 35

Page 35: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Lexical Evaluation across Genders for English

Terms: 3 - from LI , 4 - bootstrapped lexicon LB, and � - hashtags

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 11 / 35

Page 36: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Lexical Evaluation across Genders for English

Terms: 3 - from LI , 4 - bootstrapped lexicon LB, and � - hashtags

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 12 / 35

Page 37: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Lexical Evaluation across Genders for English

Terms: 3 - from LI , 4 - bootstrapped lexicon LB, and � - hashtags

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 13 / 35

Page 38: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Lexical Evaluation for Spanish and Russian

Spanish:fiasco, triunfar (succeed) and #britneyspears used F+ but M−;horooriza (horrifies), #metallica and #latingrammy used F− but M+.

Russian:мечтайте (dream!), магический (magical) and совет (advice) usedF+ but M−;исскушение (temptation), сложны (complicated), #iphones and#spartak (soccer team) used F− but M+.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 14 / 35

Page 39: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Lexical Evaluation for Spanish and Russian

Spanish:fiasco, triunfar (succeed) and #britneyspears used F+ but M−;horooriza (horrifies), #metallica and #latingrammy used F− but M+.

Russian:мечтайте (dream!), магический (magical) and совет (advice) usedF+ but M−;исскушение (temptation), сложны (complicated), #iphones and#spartak (soccer team) used F− but M+.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 14 / 35

Page 40: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

How gender differences in subjective languagecan help subjectivity and polarity classification in

social media?

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 15 / 35

Page 41: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Rule-based Subjectivity Classifiers

Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):

GIndRBsubj =

{1 if ~w ·~f ≥ 0.5,0 otherwise.

Gender-Dependent:

GDepRBsubj =

{1 if ~wM · ~f M ≥ 0.5 ∧M,0 otherwise.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 16 / 35

Page 42: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Rule-based Subjectivity Classifiers

Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):

GIndRBsubj =

{1 if ~w ·~f ≥ 0.5,0 otherwise.

Gender-Dependent:

GDepRBsubj =

{1 if ~wM · ~f M ≥ 0.5 ∧M,0 otherwise.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 16 / 35

Page 43: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Rule-based Subjectivity Classifiers

Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):

GIndRBsubj =

{1 if ~w ·~f ≥ 0.5,0 otherwise.

Gender-Dependent:

GDepRBsubj =

{1 if ~wF · ~f F ≥ 0.5 ∧ F ,0 otherwise.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 17 / 35

Page 44: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Rule-based Subjectivity Classification Results

Start with LI and incrementally add Emoticons, Adjectives, AdveRbs,Verbs, Nouns from LB.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 18 / 35

Page 45: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Rule-based Subjectivity Classification Results

Start with LI and incrementally add Emoticons, Adjectives, AdveRbs,Verbs, Nouns from LB.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 19 / 35

Page 46: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Rule-based Polarity Classifiers

Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):

GIndRBpol =

{1 if ~w+ · ~f+ ≥ ~w− · ~f−,0 otherwise

Gender-Dependent:

GDepRBpol =

{1 if ~wM+ · ~f M+ ≥ ~wM− · ~f M− ∧M,0 otherwise

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 20 / 35

Page 47: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Rule-based Polarity Classifiers

Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):

GIndRBpol =

{1 if ~w+ · ~f+ ≥ ~w− · ~f−,0 otherwise

Gender-Dependent:

GDepRBpol =

{1 if ~wM+ · ~f M+ ≥ ~wM− · ~f M− ∧M,0 otherwise

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 20 / 35

Page 48: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Rule-based Polarity Classifiers

Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):

GIndRBpol =

{1 if ~w+ · ~f+ ≥ ~w− · ~f−,0 otherwise

Gender-Dependent:

GDepRBpol =

{1 if ~wF+ · ~f F+ ≥ ~wF− · ~f F− ∧ F ,0 otherwise

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 21 / 35

Page 49: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Rule-based Polarity Classification Results

Start with LI and incrementally add Emoticons, Adjectives, AdveRbs,Verbs, Nouns from LB.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 22 / 35

Page 50: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Rule-based Polarity Classification Results

Start with LI and incrementally add Emoticons, Adjectives, AdveRbs,Verbs, Nouns from LB.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 23 / 35

Page 51: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Experimental Setup

Gender-Independent features:V - unigram counts, LI ,LB - set-count features from the originaland bootstrapped lexicons, and E - emoticons

~f GIndsubj = [LI ,LB,E ,V ];

~f GIndpol = [L+

I ,L+B ,E

+,L−I ,L

−B ,E

−,V ].

Gender-Dependent joint features:

~f GDep−Jsubj = [LM

I ,LMB ,E

M ,LFI ,L

FB ,E

F ,V ];

~f Dep−Jpol = [LM+

I ,LM+B ,EM+,LF+

I ,LF+B ,EF+

LM−I ,LM−

B ,EM−,LF−I ,LF−

B ,EF−,V ].

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 24 / 35

Page 52: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Experimental Setup

Gender-Independent features:V - unigram counts, LI ,LB - set-count features from the originaland bootstrapped lexicons, and E - emoticons

~f GIndsubj = [LI ,LB,E ,V ];

~f GIndpol = [L+

I ,L+B ,E

+,L−I ,L

−B ,E

−,V ].

Gender-Dependent joint features:

~f GDep−Jsubj = [LM

I ,LMB ,E

M ,LFI ,L

FB ,E

F ,V ];

~f Dep−Jpol = [LM+

I ,LM+B ,EM+,LF+

I ,LF+B ,EF+

LM−I ,LM−

B ,EM−,LF−I ,LF−

B ,EF−,V ].

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 24 / 35

Page 53: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Subjectivity Classification Results using SL

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 25 / 35

Page 54: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Subjectivity Classification Results using SL

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 26 / 35

Page 55: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Subjectivity Classification Results using SL

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 27 / 35

Page 56: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Polarity Classification Results using SL

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 28 / 35

Page 57: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Polarity Classification Results using SL

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 29 / 35

Page 58: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Polarity Classification Results using SL

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 30 / 35

Page 59: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Summary

Empirical study of differences in subjective language betweenmale and female users in Twitter.

Analysis of hashtag and emoticon usage across cultures.Incorporating author gender as a model component cansignificantly improve subjectivity and polarity classification formultiple languages in social media.

Data: http://www.cs.jhu.edu/~svitlana/data/data_emnlp2013.zip

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 31 / 35

Page 60: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Summary

Empirical study of differences in subjective language betweenmale and female users in Twitter.Analysis of hashtag and emoticon usage across cultures.

Incorporating author gender as a model component cansignificantly improve subjectivity and polarity classification formultiple languages in social media.

Data: http://www.cs.jhu.edu/~svitlana/data/data_emnlp2013.zip

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 31 / 35

Page 61: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Summary

Empirical study of differences in subjective language betweenmale and female users in Twitter.Analysis of hashtag and emoticon usage across cultures.Incorporating author gender as a model component cansignificantly improve subjectivity and polarity classification formultiple languages in social media.

Data: http://www.cs.jhu.edu/~svitlana/data/data_emnlp2013.zip

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 31 / 35

Page 62: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Summary

Empirical study of differences in subjective language betweenmale and female users in Twitter.Analysis of hashtag and emoticon usage across cultures.Incorporating author gender as a model component cansignificantly improve subjectivity and polarity classification formultiple languages in social media.

Data: http://www.cs.jhu.edu/~svitlana/data/data_emnlp2013.zip

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 31 / 35

Page 63: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

References I

Bamman, D., Eisenstein, J., & Schnoebelen, T. (2012). Gender inTwitter: styles, stances, and social networks. ComputingResearch Repository.

Boneva, B., Kraut, R., & Frohlich, D. (2001). Using email for personalrelationships: The difference gender makes. AmericanBehavioral Scientist, 45(3), 530-549.

Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2010). Alatent variable model for geographic lexical variation. InProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP’10) (p. 1277-1287).

Fan, T. K., & Chang, C. H. (2009). Sentiment-oriented contextualadvertising. Advances in Information Retrieval, 5478, 202-215.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 32 / 35

Page 64: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

References II

Gefen, D., & Ridings, C. M. (2005). If you spoke as she does, sir,instead of the way you do: a sociolinguistics perspective ofgender differences in virtual communities. SIGMIS Database,36(2), 78-92.

Holmes, J., & Meyerhoff, M. (2004). The handbook of language andgender. Blackwell Publishing.

Macaulay, R. (2006). Pure grammaticalization: The development of ateenage intensifier. Language Variation and Change, 18(03),267–283.

Mohammad, S., & Yang, T. (2011). Tracking sentiment in mail: Howgenders differ on emotional axes. In Proceedings of the 2ndWorkshop on Computational Approaches to Subjectivity andSentiment Analysis (WASSA’11) (p. 70-79).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 33 / 35

Page 65: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

References III

O’Connor, B., Eisenstein, J., Xing, E. P., & Smith, N. A. (2010). Amixture model of demographic lexical variation. In Proceedingsof NIPS Workshop on Machine Learning in Computational SocialScience (p. 1-7).

Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011). Finding deceptiveopinion spam by any stretch of the imagination. In Proceedingsof the 49th Annual Meeting of the Association for ComputationalLinguistics: Human Language Technologies (p. 309-319).

Picard, R. W. (1997). Affective computing. MIT Press.Resnik, P. (2013). Getting real(-time) with live polling.

(http://vimeo.com/68210812)Riloff, E., & Wiebe, J. (2003). Learning extraction patterns for

subjective expressions. In Proceedings of the Conference onEmpirical Methods in Natural Language Processing (EMNLP’03)(p. 105-112).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 34 / 35

Page 66: Exploring Demographic Language Variations to …svitlana/posters/emnlp13-slides.pdfExploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

References IV

Tagliamonte, S. A. (2006). Analysing sociolinguistic variation.Cambridge University Press, 1st. Edition.

Volkova, S., Wilson, T., & Yarowsky, D. (2013). Exploring sentiment insocial media: Bootstrapping subjectivity clues from multilingualTwitter streams. In Proceedings of the 51st Annual Meeting ofthe Association for Computational Linguistics (ACL’13) (pp.505–510).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 35 / 35