balancing diversity to counter-measure geographical centralization in microblogging platforms

22
Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms Eduardo Graells-Garrido Web Research Group Universitat Pompeu Fabra Barcelona, Spain Mounia Lalmas Yahoo Labs London, UK Hypertext Sept. 4, 2014 Santiago, Chile

Upload: carnby

Post on 01-Dec-2014

1.024 views

Category:

Data & Analytics


0 download

DESCRIPTION

Presented at Hypertext 2014.

TRANSCRIPT

Page 1: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Balancing Diversity to Counter-measure Geographical Centralization in Microblogging

Platforms

Eduardo Graells-GarridoWeb Research GroupUniversitat Pompeu FabraBarcelona, Spain

Mounia LalmasYahoo LabsLondon, UK

HypertextSept. 4, 2014

Santiago, Chile

Page 2: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Motivation: Geographical Centralization

Every person behaves in a biased way (homophily, selective exposure, etc.) in both physical and virtual worlds.

Does the same happen with systematic biases?

Chile is a centralized country - public policy, population migration and media are biased towards its capital. This is increasing the population imbalance, and vice versa!

Page 3: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Some Effects of Geographical Centralization

This affects Web users as content is not geographically diverse (mostly related to/from Santiago). Content from other locations is hidden and hard to find.

(I was at WWW when I searched for this. “Everywhere” displays relevant tweets from Santiago only.)

Page 4: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Problem Statement

Detect and Measure Geographical CentralizationIs centralization reflected on micro-blogging platforms?

Tweet Classification into LocationsHow to find tweets from other locations in imbalanced contexts?

[Rout et al, HT 2013] studied geolocation in imbalanced populations from a network perspective. We follow a similar approach from a content perspective.

Information Filtering - Geo. Diverse TimelineHow to build a geographically diverse timeline?

We build upon the work of others based on information diversity filtering. [De Choudhury et al, HT 2011] and [Munson et al, ICWSM 2009]

Page 5: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Case Study: Chile, Municipal Elections 2012Is Geographical Centralization Reflected on Twitter?

Page 6: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Frequent Terms

Page 7: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Dataset: #municipales2012

Locally Important Denser network discussionsLocal vocabulary (classification)

National LevelInteractions between locations

Query Keywordshashtags, tenses of to-vote, candidate names, political institutions, locations

Using self-reported location, 27,95% of users is geolocated at regional level. They published 42,15% of tweets in dataset.

Ideal characteristics, but there is a need to classify tweets.

Page 8: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Physical and Virtual Population Distributions

. We consider the sample geographically representative.

r = 0.95, p < 0.01Source: Census 2012*

r = 0.68, p < 0.01Source: CASEN Survey

Imbalanced Population(Different Orders of Magnitude)

Balanced Representation (Equal Orders of Magnitude)

Page 9: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Is the Chilean Virtual Population in Twitter centralized towards the capital Metropolitan Region?

Page 10: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Interactions Between Locations

Adjacency Matrix of 1-way interactions. [Quercia et al, 2012]

M(i,j) = mentions(Li, Lj) + retweets(Li, Lj)

Each arc in the visualization represents a M(i,j). Li is on the left, Lj on the right.

Green edges indicate i = j.Brown edges indicate j = Santiago

(RM).The rest is gray.

Page 11: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Geographical Centralization

We explain the extreme differences between observations and expectations as geographical centralization towards Santiago (Metropolitan Region)

Observed CentralityEstimated from a graph based on M.

Expected CentralityEstimated from a graph with edge weights based on location populations.

Page 12: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

How to make timelines more Geographically Diverse?

Shannon Entropy with respect to geography

Page 13: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

First: Classifying Tweets into Locations with Diversity

We built a corpus of location documents.For classification we consider a tweet as a vector of cosine similarities with each location document, weighted using TF-IDF. We evaluate with 10-fold cross-validation.

Similarity features provide more geographical diversity (lost because of population imbalance) and are overall more accurate than bag of words approaches.

Similarity Features

BOW Features

Page 14: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

We iteratively add tweets to a timeline T. Each added tweet maximizes T’s information entropy [Choudhury et al, 2011], but we enforce geographical diversity of those additions [Munson et al, 2009].

Second: Filtering Tweets to build a Geo. Diverse Timeline

Page 15: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Empirical Observationselection results start to appear!

unexpected results in some location! discussion becomes a bit more global. in all cases, geographical diversity exists.

Proposed Method is more geographically diverse than baselines:DIV [Choudhury et al, HT 2011]POP: top-k popular tweets

in terms of social voting, PM has more representation of popular tweets than DIV.

Page 16: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Overview of Results

Is centralization reflected on micro-blogging platforms?Yes! As with other behavioral biases (homophily, selective exposure), the systematic bias of geographical centralization is also present and is measurable.

How to find tweets from other locations?Consider imbalance-aware features, such as content similarity metrics. This improves diversity of classifications without losing accuracy.

How to build a geographically diverse timeline?A correct mixture of known techniques can have the desired effects without trade-offs! (gained representation of popularity, did not lose info. diversity)In contrast to sensitive contexts where selective exposure is crucial, geographical diversity is less likely to generate cognitive dissonance.

Page 17: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Future Work

User Evaluationis geographical diversity interesting?

Visualization and User Interfacesis geographical diversity engaging?

Page 18: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Questions?

Thanks for attending!

Contact@carnby

http://carnby.github.io

Special ThanksDany Passarinho, Bárbara Poblete, Diego Sáez-Trumper and Anonymous Reviewers

This work was partially funded by Grant TIN2012-38741 (Understanding Social Media: An Integrated Data Mining Approach) of the Ministry of Economy and Competitiveness of Spain.

https://www.flickr.com/photos/malikaladak/8868491759https://www.flickr.com/photos/28047774@N04/6312764345

https://www.flickr.com/photos/iron_horses/6274365371https://www.flickr.com/photos/efimeravulgata/1429969601

Page 19: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms

Additional Data :)

Page 20: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms
Page 21: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms
Page 22: Balancing Diversity to Counter-measure Geographical Centralization in Microblogging Platforms