u.s. religious landscape on twitter

19
U.S. Religious Landscape on Twitter U.S. Religious Landscape on Twitter 1 Lu Chen [email protected] Ingmar Weber [email protected] Adam Okulicz-Kozaryn [email protected] This work was done while the first author was an intern at Qatar Computing Research Institute.

Upload: lu-chen

Post on 02-Jul-2015

409 views

Category:

Social Media


1 download

DESCRIPTION

Presentation at SocInfo2014

TRANSCRIPT

Page 1: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter

1

Lu Chen [email protected]

Ingmar Weber [email protected]

Adam Okulicz-Kozaryn [email protected]

This work was done while the first author was an intern at Qatar Computing Research Institute.

Page 2: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 2

Religiosity is a powerful force shaping human societies.

source: http://www.pewforum.org/2012/12/18/global-religious-landscape-exec/#

Page 3: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter

• A key feature of any belief system such as

religion is replication.

– Vertically: to new generations

– Horizontally: to new adherents

• As more religious leaders, organizations

as well as believers start using social

networking sites, online activities become

important extensions to traditional

religious rituals and practices.

3

Social networking facilitates the replication of Religion.

What can we learn about religion from social media?

“74% of online adults use

social networking sites.”

January 2014

source: http://bit.ly/1qBBhgq

Page 4: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 4

Collecting Twitter users who self-reported their religions in bios

https://followerwonk.com/bio/?q=Christian&q_type=bio

searching Twitter user bios with

religion-specific keywords

Username @screen_name

Username @screen_name

Page 5: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter

• The “undeclared” user group: random set of users who do not report any of the six religions/beliefs in their bios.

• The dataset comprises 250,840 U.S. Twitter users, the lists of their friends/followers, and 96,902,499 tweets.

• On average, Atheists appear to be more active than religious users, while the undeclared group generally appears to be less active than other groups.

5

The dataset comprises 250,840 U.S. Twitter users.

Page 6: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 6

Data Validation

Twitter Bio Examples

Page 7: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 7

How does the fraction of religious people of any belief within a given state

on Twitter correlate with that in surveys?

r = .79 (p < .0001)

Page 8: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 8

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

0.00% 2.00% 4.00% 6.00% 8.00%10.00%

Twit

ter

Pew Research

Atheist r = 0.56 ****

ρ = 0.62 ****

65.00%

70.00%

75.00%

80.00%

85.00%

90.00%

95.00%

100.00%

80.00% 85.00% 90.00% 95.00%100.00%

Twit

ter

Pew research

Christian

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

14.00%

0.00% 1.00% 2.00% 3.00%

Twit

ter

Pew Research

Muslim

0.00%

2.00%

4.00%

6.00%

8.00%

10.00%

12.00%

14.00%

16.00%

0.00% 2.00% 4.00% 6.00% 8.00%

Twit

ter

Pew Research

Jew

0.00%

0.10%

0.20%

0.30%

0.40%

0.50%

0.60%

0.70%

0.00% 1.00% 2.00% 3.00%

Twit

ter

Pew Research

Hindu

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

9.00%

0.00% 2.00% 4.00% 6.00% 8.00%

Twit

ter

Pew Research

Buddhist r = 0.23

ρ = 0.75 ****

How does the distribution of religious people of a given belief across U.S. states

on Twitter correlate with that in surveys? r = 0.73 ****

ρ = 0.77 ****

r = 0.30 *

ρ = 0.48 *** r = 0.77 ****

ρ = 0.79 ****

r = 0.16

ρ = 0.49 ***

source: http://religions.pewforum.org/

* significant at p<0.05; ** significant at p<0.005; *** significant at p<0.001; **** significant at p < .0001

Page 9: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 9

Do various denominations differ in terms of their content?

The top 15 most discriminative words of each denomination based on a chi-square test

Each group is represented by a different color, and

the font size of a word is determined by its chi-

square score.

• The discriminative words are largely religion-specific.

• Non-religious terms also appear as discriminative features.

Page 10: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 10

Do various denominations differ in terms of their friends?

The top 15 most discriminative Twitter accounts being followed by each denomination

based on a chi-square test

Each group is represented by a different color, and

the font size of an account is determined by its chi-

square score. • The discriminative Twitter accounts are also largely religion-specific.

Page 11: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 11

The top 15 most frequent words for each denomination.

The top 15 Twitter accounts being followed by most users of each denomination

• In a sense, people differ more in whom they follow rather than what they tweet about.

Page 12: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 12

Can we build classifiers to accurately identify believers of different religions?

• Tweet-based:

– Each user is represented as a vector of unigrams and bigrams (df >= 100)

extracted from their tweets.

– An entry of the vector refers to the frequency of that ngram in the user's tweets.

• Friend-based:

– Each user is represented as a vector of their friends.

– An entry of the vector refers to whether the user follows an account.

• Binary classification: each denomination vs. undeclared user group

– SVM classifiers

– 10-fold cross validation

Page 13: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 13

Can we build classifiers to accurately identify believers of different religions?

From easiest to hardest (based on F1 Score):

• Tweet-based: Atheist, Jew, Christian, Buddhist, Muslim, Hindu

• Friend-based: Muslim, Atheist, Buddhist, Jew, Christian, Hindu

• Network “following” features appear to be superior to content features.

Page 14: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter

• Assortativity is a preference for a network's nodes to attach to others that

are similar in some way. -- Wikipedia

• Connections

– following, being-followed-by, mentioning, and retweeting

• Raw proportions:

– For each user in our dataset, calculate the proportions of the in-group

connections and the connections to users from other groups

– Get the average proportions of in-group and out-group connections for each

group

• Expected proportions:

– Estimated by the fraction of users of a certain religion in a random user sample

14

Does network assortativity exist ?

Page 15: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 15

Atheist Buddhist Christian Hindu Jew Muslim

Atheist 46.6788 -0.0955 -0.8012 -0.294 1.0077 -0.7597

Buddhist 0.3621 82.8597 -0.8737 1.6318 0.179 -0.7264

Christian -0.6304 -0.8647 2.2415 -0.917 0.0458 -0.8206

Hindu 0.1886 0.6009 -0.8651 737.2847 0.7622 -0.8373

Jew -0.1657 -0.5392 -0.674 -0.7853 392.3228 -0.2192

Muslim -0.6393 -0.8425 -0.8442 -0.6871 0.9668 60.6716

Undeclared -0.4572 -0.7357 -0.6621 -0.8843 0.0908 -0.8164

Does network assortativity exist ?

The relative difference of the proportion of following a denomination to its expected value

Yes, users are much more likely to follow other users of the same

religion/belief than of a different religion/belief.

Page 16: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 16

Does network assortativity exist ?

0.0466%

0.0259%

1.3358%

0.0013%

0.0207%

0.0414%

Yes, the assortativity exists

in all types of connections

across all the religious

groups.

The proportion of same-religion relations of each religious group.

Page 17: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter

• There is a moderate correlation between survey results and Twitter data.

– the macro-average Spearman's rank correlation of all the

denominations is .65, regarding the distribution of religious believers of

a given denomination across states

– Pearson Correlation is .79 (p < .0001), regarding the fraction of religious

people of any belief within a given state

• Twitter users of a particular religion differ in what they discuss or whom they

follow compared to undeclared users

• The network “following” features are more robust than tweet content

features in identifying believers.

• Assortativity exists in all types of connections across all the religious

groups.

17

summary

Note: only the Twitter users who publicly declare their religion are included in our data, while

vast majority of believers may not disclose their religion in bios and thus not included.

Page 18: U.S. Religious Landscape on Twitter

U.S. Religious Landscape on Twitter 18

There is interest in the topic!