exploring demographic language variations to …svitlana/posters/emnlp13-slides.pdfexploring...
TRANSCRIPT
Exploring Demographic Language Variations toImprove Multilingual Sentiment Analysis
in Social Media
Svitlana Volkova1, Theresa Wilson2 and David Yarowsky1,2,
1Center for Language and Speech Processing, Johns Hopkins University2Human-Language technology Center of Excellence
Motivation
Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).
DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)
We propose to study differences in subjective language in socialmedia to support commercial applications:
personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott, Choi, Cardie, & Hancock,2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor, Eisenstein, Xing,& Smith, 2010).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35
Motivation
Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).
DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)
We propose to study differences in subjective language in socialmedia to support commercial applications:
personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35
Motivation
Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).
DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)
We propose to study differences in subjective language in socialmedia to support commercial applications:
personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35
Motivation
Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).
DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)
We propose to study differences in subjective language in socialmedia to support commercial applications:
personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35
Motivation
Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).
DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)
We propose to study differences in subjective language in socialmedia to support commercial applications:
personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),
detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35
Motivation
Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).
DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)
We propose to study differences in subjective language in socialmedia to support commercial applications:
personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),
tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35
Motivation
Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).
DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)
We propose to study differences in subjective language in socialmedia to support commercial applications:
personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),
large-scale, low-cost, passive polling (O’Connor et al., 2010).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35
Motivation
Demographic language variations (DLV) have been studied bysocio-linguists for decades (Picard, 1997; Gefen & Ridings, 2005;Holmes & Meyerhoff, 2004; Macaulay, 2006; Tagliamonte, 2006).
DLV have been recently explored in personal email communication,blog posts, and public discussions (Boneva et al., 2001; Mohammad &Yang, 2011; Eisenstein et al., 2010; Bamman et al., 2012)
We propose to study differences in subjective language in socialmedia to support commercial applications:
personalized recommendation systems and targeted onlineadvertising (Fan & Chang, 2009),detecting helpful product reviews (Ott et al., 2011),tracking sentiment in real time (Resnik, 2013),large-scale, low-cost, passive polling (O’Connor et al., 2010).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 2 / 35
Motivation
Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”
♂− “Clearly they know our weakness. Argggg....”
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 3 / 35
Motivation
Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 3 / 35
Motivation
Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 3 / 35
Motivation
Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 4 / 35
Motivation
Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 5 / 35
Motivation
Male ♂ and Female ♀ Twitter users use subjective terms differently:♀+ “Chocolate is my weakness”♂− “Clearly they know our weakness. Argggg....”
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 6 / 35
Goal
I. Explore gender bias in the use of subjective language in Twitter:
investigate multilingual subjective lexical variations;cross-cultural emoticon and hashtag usage.
II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:
demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35
Goal
I. Explore gender bias in the use of subjective language in Twitter:investigate multilingual subjective lexical variations;
cross-cultural emoticon and hashtag usage.
II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:
demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35
Goal
I. Explore gender bias in the use of subjective language in Twitter:investigate multilingual subjective lexical variations;cross-cultural emoticon and hashtag usage.
II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:
demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35
Goal
I. Explore gender bias in the use of subjective language in Twitter:investigate multilingual subjective lexical variations;cross-cultural emoticon and hashtag usage.
II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:
demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35
Goal
I. Explore gender bias in the use of subjective language in Twitter:investigate multilingual subjective lexical variations;cross-cultural emoticon and hashtag usage.
II. Incorporate gender bias into models to improve sentimentanalysis for English, Spanish, and Russian:
demonstrate that simple, binary features representing authorgender are insufficient for gender-dependent sentiment analysis.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 7 / 35
Data
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 8 / 35
Data
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 8 / 35
Data
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 8 / 35
Data
Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).
Sentiment labels from Mechanical Turk (5 annotations per tweet):
Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35
Data
Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).
Sentiment labels from Mechanical Turk (5 annotations per tweet):
Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35
Data
Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).
Sentiment labels from Mechanical Turk (5 annotations per tweet):
Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35
Data
Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).
Sentiment labels from Mechanical Turk (5 annotations per tweet):
Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)
Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35
Data
Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).
Sentiment labels from Mechanical Turk (5 annotations per tweet):
Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)
Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35
Data
Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).
Sentiment labels from Mechanical Turk (5 annotations per tweet):
Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))
Neutral: Киев очень старый город (Kiev is a very old city).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35
Data
Automatic gender label prediction using user first namemorphology (precision is above 0.98 across languages).
Sentiment labels from Mechanical Turk (5 annotations per tweet):
Positive: Как же приятно просто лечь в постель после тяжелогодня... (It is a great pleasure to go to bed after a long day at work...)Negative: Уважаемый господин Прохоров купите эти выборы!(Dear Mr. Prokhorov just buy the elections!)Both: Затолкали меня на местном рынке! но зато закупиласьподарками для всей семьи :) (It was crowded at the local market!But I got presents for my family:-))Neutral: Киев очень старый город (Kiev is a very old city).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 9 / 35
Metrics Lexical Evaluation across Genders
Term ti subjectivity:
pti (subj |g) =c(ti ,P,g) + c(ti ,N,g)
c(ti ,g),
Term ti polarity:
pti (+|g) =c(ti ,P,g)
c(ti ,P,g) + c(ti ,N,g),
Polarity change across genders:
∆p+ti = |pti (+|F )− pti (+|M)|
s.t .∣∣∣∣∣1− tf subjti (F )
tf subjti (M)
∣∣∣∣∣ ≤ λ, tf subjti (M) 6= 0,
λ controls term frequency similarity.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 10 / 35
Metrics Lexical Evaluation across Genders
Term ti subjectivity:
pti (subj |g) =c(ti ,P,g) + c(ti ,N,g)
c(ti ,g),
Term ti polarity:
pti (+|g) =c(ti ,P,g)
c(ti ,P,g) + c(ti ,N,g),
Polarity change across genders:
∆p+ti = |pti (+|F )− pti (+|M)|
s.t .∣∣∣∣∣1− tf subjti (F )
tf subjti (M)
∣∣∣∣∣ ≤ λ, tf subjti (M) 6= 0,
λ controls term frequency similarity.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 10 / 35
Metrics Lexical Evaluation across Genders
Term ti subjectivity:
pti (subj |g) =c(ti ,P,g) + c(ti ,N,g)
c(ti ,g),
Term ti polarity:
pti (+|g) =c(ti ,P,g)
c(ti ,P,g) + c(ti ,N,g),
Polarity change across genders:
∆p+ti = |pti (+|F )− pti (+|M)|
s.t .∣∣∣∣∣1− tf subjti (F )
tf subjti (M)
∣∣∣∣∣ ≤ λ, tf subjti (M) 6= 0,
λ controls term frequency similarity.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 10 / 35
Lexical Evaluation across Genders for English
Terms: 3 - from LI , 4 - bootstrapped lexicon LB, and � - hashtags
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 11 / 35
Lexical Evaluation across Genders for English
Terms: 3 - from LI , 4 - bootstrapped lexicon LB, and � - hashtags
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 11 / 35
Lexical Evaluation across Genders for English
Terms: 3 - from LI , 4 - bootstrapped lexicon LB, and � - hashtags
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 12 / 35
Lexical Evaluation across Genders for English
Terms: 3 - from LI , 4 - bootstrapped lexicon LB, and � - hashtags
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 13 / 35
Lexical Evaluation for Spanish and Russian
Spanish:fiasco, triunfar (succeed) and #britneyspears used F+ but M−;horooriza (horrifies), #metallica and #latingrammy used F− but M+.
Russian:мечтайте (dream!), магический (magical) and совет (advice) usedF+ but M−;исскушение (temptation), сложны (complicated), #iphones and#spartak (soccer team) used F− but M+.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 14 / 35
Lexical Evaluation for Spanish and Russian
Spanish:fiasco, triunfar (succeed) and #britneyspears used F+ but M−;horooriza (horrifies), #metallica and #latingrammy used F− but M+.
Russian:мечтайте (dream!), магический (magical) and совет (advice) usedF+ but M−;исскушение (temptation), сложны (complicated), #iphones and#spartak (soccer team) used F− but M+.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 14 / 35
How gender differences in subjective languagecan help subjectivity and polarity classification in
social media?
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 15 / 35
Rule-based Subjectivity Classifiers
Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):
GIndRBsubj =
{1 if ~w ·~f ≥ 0.5,0 otherwise.
Gender-Dependent:
GDepRBsubj =
{1 if ~wM · ~f M ≥ 0.5 ∧M,0 otherwise.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 16 / 35
Rule-based Subjectivity Classifiers
Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):
GIndRBsubj =
{1 if ~w ·~f ≥ 0.5,0 otherwise.
Gender-Dependent:
GDepRBsubj =
{1 if ~wM · ~f M ≥ 0.5 ∧M,0 otherwise.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 16 / 35
Rule-based Subjectivity Classifiers
Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):
GIndRBsubj =
{1 if ~w ·~f ≥ 0.5,0 otherwise.
Gender-Dependent:
GDepRBsubj =
{1 if ~wF · ~f F ≥ 0.5 ∧ F ,0 otherwise.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 17 / 35
Rule-based Subjectivity Classification Results
Start with LI and incrementally add Emoticons, Adjectives, AdveRbs,Verbs, Nouns from LB.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 18 / 35
Rule-based Subjectivity Classification Results
Start with LI and incrementally add Emoticons, Adjectives, AdveRbs,Verbs, Nouns from LB.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 19 / 35
Rule-based Polarity Classifiers
Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):
GIndRBpol =
{1 if ~w+ · ~f+ ≥ ~w− · ~f−,0 otherwise
Gender-Dependent:
GDepRBpol =
{1 if ~wM+ · ~f M+ ≥ ~wM− · ~f M− ∧M,0 otherwise
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 20 / 35
Rule-based Polarity Classifiers
Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):
GIndRBpol =
{1 if ~w+ · ~f+ ≥ ~w− · ~f−,0 otherwise
Gender-Dependent:
GDepRBpol =
{1 if ~wM+ · ~f M+ ≥ ~wM− · ~f M− ∧M,0 otherwise
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 20 / 35
Rule-based Polarity Classifiers
Gender-Independent (Riloff & Wiebe, 2003; Volkova et al., 2013):
GIndRBpol =
{1 if ~w+ · ~f+ ≥ ~w− · ~f−,0 otherwise
Gender-Dependent:
GDepRBpol =
{1 if ~wF+ · ~f F+ ≥ ~wF− · ~f F− ∧ F ,0 otherwise
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 21 / 35
Rule-based Polarity Classification Results
Start with LI and incrementally add Emoticons, Adjectives, AdveRbs,Verbs, Nouns from LB.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 22 / 35
Rule-based Polarity Classification Results
Start with LI and incrementally add Emoticons, Adjectives, AdveRbs,Verbs, Nouns from LB.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 23 / 35
Experimental Setup
Gender-Independent features:V - unigram counts, LI ,LB - set-count features from the originaland bootstrapped lexicons, and E - emoticons
~f GIndsubj = [LI ,LB,E ,V ];
~f GIndpol = [L+
I ,L+B ,E
+,L−I ,L
−B ,E
−,V ].
Gender-Dependent joint features:
~f GDep−Jsubj = [LM
I ,LMB ,E
M ,LFI ,L
FB ,E
F ,V ];
~f Dep−Jpol = [LM+
I ,LM+B ,EM+,LF+
I ,LF+B ,EF+
LM−I ,LM−
B ,EM−,LF−I ,LF−
B ,EF−,V ].
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 24 / 35
Experimental Setup
Gender-Independent features:V - unigram counts, LI ,LB - set-count features from the originaland bootstrapped lexicons, and E - emoticons
~f GIndsubj = [LI ,LB,E ,V ];
~f GIndpol = [L+
I ,L+B ,E
+,L−I ,L
−B ,E
−,V ].
Gender-Dependent joint features:
~f GDep−Jsubj = [LM
I ,LMB ,E
M ,LFI ,L
FB ,E
F ,V ];
~f Dep−Jpol = [LM+
I ,LM+B ,EM+,LF+
I ,LF+B ,EF+
LM−I ,LM−
B ,EM−,LF−I ,LF−
B ,EF−,V ].
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 24 / 35
Subjectivity Classification Results using SL
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 25 / 35
Subjectivity Classification Results using SL
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 26 / 35
Subjectivity Classification Results using SL
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 27 / 35
Polarity Classification Results using SL
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 28 / 35
Polarity Classification Results using SL
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 29 / 35
Polarity Classification Results using SL
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 30 / 35
Summary
Empirical study of differences in subjective language betweenmale and female users in Twitter.
Analysis of hashtag and emoticon usage across cultures.Incorporating author gender as a model component cansignificantly improve subjectivity and polarity classification formultiple languages in social media.
Data: http://www.cs.jhu.edu/~svitlana/data/data_emnlp2013.zip
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 31 / 35
Summary
Empirical study of differences in subjective language betweenmale and female users in Twitter.Analysis of hashtag and emoticon usage across cultures.
Incorporating author gender as a model component cansignificantly improve subjectivity and polarity classification formultiple languages in social media.
Data: http://www.cs.jhu.edu/~svitlana/data/data_emnlp2013.zip
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 31 / 35
Summary
Empirical study of differences in subjective language betweenmale and female users in Twitter.Analysis of hashtag and emoticon usage across cultures.Incorporating author gender as a model component cansignificantly improve subjectivity and polarity classification formultiple languages in social media.
Data: http://www.cs.jhu.edu/~svitlana/data/data_emnlp2013.zip
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 31 / 35
Summary
Empirical study of differences in subjective language betweenmale and female users in Twitter.Analysis of hashtag and emoticon usage across cultures.Incorporating author gender as a model component cansignificantly improve subjectivity and polarity classification formultiple languages in social media.
Data: http://www.cs.jhu.edu/~svitlana/data/data_emnlp2013.zip
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 31 / 35
References I
Bamman, D., Eisenstein, J., & Schnoebelen, T. (2012). Gender inTwitter: styles, stances, and social networks. ComputingResearch Repository.
Boneva, B., Kraut, R., & Frohlich, D. (2001). Using email for personalrelationships: The difference gender makes. AmericanBehavioral Scientist, 45(3), 530-549.
Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2010). Alatent variable model for geographic lexical variation. InProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP’10) (p. 1277-1287).
Fan, T. K., & Chang, C. H. (2009). Sentiment-oriented contextualadvertising. Advances in Information Retrieval, 5478, 202-215.
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 32 / 35
References II
Gefen, D., & Ridings, C. M. (2005). If you spoke as she does, sir,instead of the way you do: a sociolinguistics perspective ofgender differences in virtual communities. SIGMIS Database,36(2), 78-92.
Holmes, J., & Meyerhoff, M. (2004). The handbook of language andgender. Blackwell Publishing.
Macaulay, R. (2006). Pure grammaticalization: The development of ateenage intensifier. Language Variation and Change, 18(03),267–283.
Mohammad, S., & Yang, T. (2011). Tracking sentiment in mail: Howgenders differ on emotional axes. In Proceedings of the 2ndWorkshop on Computational Approaches to Subjectivity andSentiment Analysis (WASSA’11) (p. 70-79).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 33 / 35
References III
O’Connor, B., Eisenstein, J., Xing, E. P., & Smith, N. A. (2010). Amixture model of demographic lexical variation. In Proceedingsof NIPS Workshop on Machine Learning in Computational SocialScience (p. 1-7).
Ott, M., Choi, Y., Cardie, C., & Hancock, J. T. (2011). Finding deceptiveopinion spam by any stretch of the imagination. In Proceedingsof the 49th Annual Meeting of the Association for ComputationalLinguistics: Human Language Technologies (p. 309-319).
Picard, R. W. (1997). Affective computing. MIT Press.Resnik, P. (2013). Getting real(-time) with live polling.
(http://vimeo.com/68210812)Riloff, E., & Wiebe, J. (2003). Learning extraction patterns for
subjective expressions. In Proceedings of the Conference onEmpirical Methods in Natural Language Processing (EMNLP’03)(p. 105-112).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 34 / 35
References IV
Tagliamonte, S. A. (2006). Analysing sociolinguistic variation.Cambridge University Press, 1st. Edition.
Volkova, S., Wilson, T., & Yarowsky, D. (2013). Exploring sentiment insocial media: Bootstrapping subjectivity clues from multilingualTwitter streams. In Proceedings of the 51st Annual Meeting ofthe Association for Computational Linguistics (ACL’13) (pp.505–510).
S. Volkova, T. Wilson, D. Yarowsky (JHU) Demographic Language Variations in Twitter 35 / 35