exploring demographic language variations to …svitlana/posters/emnlp13-slides.pdfexploring...

Post on 22-Jun-2020

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Exploring Demographic Language Variations toImprove Multilingual Sentiment Analysis

in Social Media

Svitlana Volkova1, Theresa Wilson2 and David Yarowsky1,2,

1Center for Language and Speech Processing, Johns Hopkins University2Human-Language technology Center of Excellence

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott, Choi, Cardie, & Hancock,2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor, Eisenstein, Xing,& Smith, 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),

detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),

tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),

large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Motivation

Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).

DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)

We propose to study differences in subjective language in socialmedia to support commercial applications:

personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35

Motivation

Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”

♂− “Clearly they know our weakness. Argggg....”

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 3 / 35

Motivation

Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 3 / 35

Motivation

Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 3 / 35

Motivation

Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 4 / 35

Motivation

Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 5 / 35

Motivation

Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 6 / 35

Goal

I. Explore gender bias in the use of subjective language in Twitter:

investigate multilingual subjective lexical variations;cross-cultural emoticon and hashtag usage.

II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:

demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

Goal

I. Explore gender bias in the use of subjective language in Twitter:investigate multilingual subjective lexical variations;

cross-cultural emoticon and hashtag usage.

II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:

demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

Goal

I. Explore gender bias in the use of subjective language in Twitter:investigate multilingual subjective lexical variations;cross-cultural emoticon and hashtag usage.

II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:

demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

Goal

I. Explore gender bias in the use of subjective language in Twitter:investigate multilingual subjective lexical variations;cross-cultural emoticon and hashtag usage.

II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:

demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

Goal

I. Explore gender bias in the use of subjective language in Twitter:investigate multilingual subjective lexical variations;cross-cultural emoticon and hashtag usage.

II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:

demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35

Data

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 8 / 35

Data

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 8 / 35

Data

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 8 / 35

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)

Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)

Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))

Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Data

Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).

Sentiment labels from Mechanical Turk (5 annotations per tweet):

Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35

Metrics Lexical Evaluation across Genders

Term ti subjectivity:

pti (subj |g) =c(ti ,P,g) + c(ti ,N,g)

c(ti ,g),

Term ti polarity:

pti (+|g) =c(ti ,P,g)

c(ti ,P,g) + c(ti ,N,g),

Polarity change across genders:

∆p+ti = |pti (+|F )− pti (+|M)|

s.t .∣∣∣∣∣1− tf subjti (F )

tf subjti (M)

∣∣∣∣∣ ≤ λ, tf subjti (M) 6= 0,

λ controls term frequency similarity.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 10 / 35

Metrics Lexical Evaluation across Genders

Term ti subjectivity:

pti (subj |g) =c(ti ,P,g) + c(ti ,N,g)

c(ti ,g),

Term ti polarity:

pti (+|g) =c(ti ,P,g)

c(ti ,P,g) + c(ti ,N,g),

Polarity change across genders:

∆p+ti = |pti (+|F )− pti (+|M)|

s.t .∣∣∣∣∣1− tf subjti (F )

tf subjti (M)

∣∣∣∣∣ ≤ λ, tf subjti (M) 6= 0,

λ controls term frequency similarity.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 10 / 35

Metrics Lexical Evaluation across Genders

Term ti subjectivity:

pti (subj |g) =c(ti ,P,g) + c(ti ,N,g)

c(ti ,g),

Term ti polarity:

pti (+|g) =c(ti ,P,g)

c(ti ,P,g) + c(ti ,N,g),

Polarity change across genders:

∆p+ti = |pti (+|F )− pti (+|M)|

s.t .∣∣∣∣∣1− tf subjti (F )

tf subjti (M)

∣∣∣∣∣ ≤ λ, tf subjti (M) 6= 0,

λ controls term frequency similarity.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 10 / 35

Lexical Evaluation across Genders for English

Terms: 3 - from LI , 4 - bootstrapped lexicon LB, and � - hashtags

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 11 / 35

Lexical Evaluation across Genders for English

Terms: 3 - from LI , 4 - bootstrapped lexicon LB, and � - hashtags

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 11 / 35

Lexical Evaluation across Genders for English

Terms: 3 - from LI , 4 - bootstrapped lexicon LB, and � - hashtags

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 12 / 35

Lexical Evaluation across Genders for English

Terms: 3 - from LI , 4 - bootstrapped lexicon LB, and � - hashtags

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 13 / 35

Lexical Evaluation for Spanish and Russian

Spanish:fiasco, triunfar (succeed) and #britneyspears used F+ but M−;horooriza (horrifies), #metallica and #latingrammy used F− but M+.

Russian:мечтайте (dream!), магический (magical) and совет (advice) usedF+ but M−;исскушение (temptation), сложны (complicated), #iphones and#spartak (soccer team) used F− but M+.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 14 / 35

Lexical Evaluation for Spanish and Russian

Spanish:fiasco, triunfar (succeed) and #britneyspears used F+ but M−;horooriza (horrifies), #metallica and #latingrammy used F− but M+.

Russian:мечтайте (dream!), магический (magical) and совет (advice) usedF+ but M−;исскушение (temptation), сложны (complicated), #iphones and#spartak (soccer team) used F− but M+.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 14 / 35

How gender differences in subjective languagecan help subjectivity and polarity classification in

social media?

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 15 / 35

Rule-based Subjectivity Classifiers

Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):

GIndRBsubj =

{1 if ~w ·~f ≥ 0.5,0 otherwise.

Gender-Dependent:

GDepRBsubj =

{1 if ~wM · ~f M ≥ 0.5 ∧M,0 otherwise.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 16 / 35

Rule-based Subjectivity Classifiers

Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):

GIndRBsubj =

{1 if ~w ·~f ≥ 0.5,0 otherwise.

Gender-Dependent:

GDepRBsubj =

{1 if ~wM · ~f M ≥ 0.5 ∧M,0 otherwise.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 16 / 35

Rule-based Subjectivity Classifiers

Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):

GIndRBsubj =

{1 if ~w ·~f ≥ 0.5,0 otherwise.

Gender-Dependent:

GDepRBsubj =

{1 if ~wF · ~f F ≥ 0.5 ∧ F ,0 otherwise.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 17 / 35

Rule-based Subjectivity Classification Results

Start with LI and incrementally add Emoticons, Adjectives, AdveRbs,Verbs, Nouns from LB.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 18 / 35

Rule-based Subjectivity Classification Results

Start with LI and incrementally add Emoticons, Adjectives, AdveRbs,Verbs, Nouns from LB.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 19 / 35

Rule-based Polarity Classifiers

Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):

GIndRBpol =

{1 if ~w+ · ~f+ ≥ ~w− · ~f−,0 otherwise

Gender-Dependent:

GDepRBpol =

{1 if ~wM+ · ~f M+ ≥ ~wM− · ~f M− ∧M,0 otherwise

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 20 / 35

Rule-based Polarity Classifiers

Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):

GIndRBpol =

{1 if ~w+ · ~f+ ≥ ~w− · ~f−,0 otherwise

Gender-Dependent:

GDepRBpol =

{1 if ~wM+ · ~f M+ ≥ ~wM− · ~f M− ∧M,0 otherwise

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 20 / 35

Rule-based Polarity Classifiers

Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):

GIndRBpol =

{1 if ~w+ · ~f+ ≥ ~w− · ~f−,0 otherwise

Gender-Dependent:

GDepRBpol =

{1 if ~wF+ · ~f F+ ≥ ~wF− · ~f F− ∧ F ,0 otherwise

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 21 / 35

Rule-based Polarity Classification Results

Start with LI and incrementally add Emoticons, Adjectives, AdveRbs,Verbs, Nouns from LB.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 22 / 35

Rule-based Polarity Classification Results

Start with LI and incrementally add Emoticons, Adjectives, AdveRbs,Verbs, Nouns from LB.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 23 / 35

Experimental Setup

Gender-Independent features:V - unigram counts, LI ,LB - set-count features from the originaland bootstrapped lexicons, and E - emoticons

~f GIndsubj = [LI ,LB,E ,V ];

~f GIndpol = [L+

I ,L+B ,E

+,L−I ,L

−B ,E

−,V ].

Gender-Dependent joint features:

~f GDep−Jsubj = [LM

I ,LMB ,E

M ,LFI ,L

FB ,E

F ,V ];

~f Dep−Jpol = [LM+

I ,LM+B ,EM+,LF+

I ,LF+B ,EF+

LM−I ,LM−

B ,EM−,LF−I ,LF−

B ,EF−,V ].

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 24 / 35

Experimental Setup

Gender-Independent features:V - unigram counts, LI ,LB - set-count features from the originaland bootstrapped lexicons, and E - emoticons

~f GIndsubj = [LI ,LB,E ,V ];

~f GIndpol = [L+

I ,L+B ,E

+,L−I ,L

−B ,E

−,V ].

Gender-Dependent joint features:

~f GDep−Jsubj = [LM

I ,LMB ,E

M ,LFI ,L

FB ,E

F ,V ];

~f Dep−Jpol = [LM+

I ,LM+B ,EM+,LF+

I ,LF+B ,EF+

LM−I ,LM−

B ,EM−,LF−I ,LF−

B ,EF−,V ].

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 24 / 35

Subjectivity Classification Results using SL

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 25 / 35

Subjectivity Classification Results using SL

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 26 / 35

Subjectivity Classification Results using SL

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 27 / 35

Polarity Classification Results using SL

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 28 / 35

Polarity Classification Results using SL

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 29 / 35

Polarity Classification Results using SL

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 30 / 35

Summary

Empirical study of differences in subjective language betweenmale and female users in Twitter.

Analysis of hashtag and emoticon usage across cultures.Incorporating author gender as a model component cansignificantly improve subjectivity and polarity classification formultiple languages in social media.

Data: http://www.cs.jhu.edu/~svitlana/data/data_emnlp2013.zip

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 31 / 35

Summary

Empirical study of differences in subjective language betweenmale and female users in Twitter.Analysis of hashtag and emoticon usage across cultures.

Incorporating author gender as a model component cansignificantly improve subjectivity and polarity classification formultiple languages in social media.

Data: http://www.cs.jhu.edu/~svitlana/data/data_emnlp2013.zip

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 31 / 35

Summary

Empirical study of differences in subjective language betweenmale and female users in Twitter.Analysis of hashtag and emoticon usage across cultures.Incorporating author gender as a model component cansignificantly improve subjectivity and polarity classification formultiple languages in social media.

Data: http://www.cs.jhu.edu/~svitlana/data/data_emnlp2013.zip

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 31 / 35

Summary

Empirical study of differences in subjective language betweenmale and female users in Twitter.Analysis of hashtag and emoticon usage across cultures.Incorporating author gender as a model component cansignificantly improve subjectivity and polarity classification formultiple languages in social media.

Data: http://www.cs.jhu.edu/~svitlana/data/data_emnlp2013.zip

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 31 / 35

References I

Bamman, D., Eisenstein, J., & Schnoebelen, T. (2012). Gender inTwitter: styles, stances, and social networks. ComputingResearch Repository.

Boneva, B., Kraut, R., & Frohlich, D. (2001). Using email for personalrelationships: The difference gender makes. AmericanBehavioral Scientist, 45(3), 530-549.

Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2010). Alatent variable model for geographic lexical variation. InProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP’10) (p. 1277-1287).

Fan, T. K., & Chang, C. H. (2009). Sentiment-oriented contextualadvertising. Advances in Information Retrieval, 5478, 202-215.

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 32 / 35

References II

Gefen, D., & Ridings, C. M. (2005). If you spoke as she does, sir,instead of the way you do: a sociolinguistics perspective ofgender differences in virtual communities. SIGMIS Database,36(2), 78-92.

Holmes, J., & Meyerhoff, M. (2004). The handbook of language andgender. Blackwell Publishing.

Macaulay, R. (2006). Pure grammaticalization: The development of ateenage intensifier. Language Variation and Change, 18(03),267–283.

Mohammad, S., & Yang, T. (2011). Tracking sentiment in mail: Howgenders differ on emotional axes. In Proceedings of the 2ndWorkshop on Computational Approaches to Subjectivity andSentiment Analysis (WASSA’11) (p. 70-79).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 33 / 35

References III

O’Connor, B., Eisenstein, J., Xing, E. P., & Smith, N. A. (2010). Amixture model of demographic lexical variation. In Proceedingsof NIPS Workshop on Machine Learning in Computational SocialScience (p. 1-7).

Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011). Finding deceptiveopinion spam by any stretch of the imagination. In Proceedingsof the 49th Annual Meeting of the Association for ComputationalLinguistics: Human Language Technologies (p. 309-319).

Picard, R. W. (1997). Affective computing. MIT Press.Resnik, P. (2013). Getting real(-time) with live polling.

(http://vimeo.com/68210812)Riloff, E., & Wiebe, J. (2003). Learning extraction patterns for

subjective expressions. In Proceedings of the Conference onEmpirical Methods in Natural Language Processing (EMNLP’03)(p. 105-112).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 34 / 35

References IV

Tagliamonte, S. A. (2006). Analysing sociolinguistic variation.Cambridge University Press, 1st. Edition.

Volkova, S., Wilson, T., & Yarowsky, D. (2013). Exploring sentiment insocial media: Bootstrapping subjectivity clues from multilingualTwitter streams. In Proceedings of the 51st Annual Meeting ofthe Association for Computational Linguistics (ACL’13) (pp.505–510).

S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 35 / 35

top related