mobility analysis from twitter data ntts 2015 - satellite workshop on big data

23
Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Upload: gerard-walsh

Post on 24-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Mobility analysis from Twitter data

NTTS 2015 - satellite Workshop on Big Data

Page 2: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Twitter as data source

NoSQL Database

Filter by: Geo-referenced Only

México

Real-time Tweets

INEGI

TwitterTwitter

Page 3: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Why Tweeter?

• Availability• 1% of Tweets available without cost• Around 12 M accounts in Mexico• 700,000 accounts are geo-referenced• Collection of 150 M of tweets since

January 2014

Page 4: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Devices generatingtweets in Mexico

Andr

oid

iPho

ne

Page 5: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Tweet collection infrastructure

Unix “Red Hat”

NoSql Database “Elasticsearch”

Cluster (Hydra)

Big Data Layers

Test of Concept

Page 6: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

General Process

Every DayCollection

StoreGeo-Referenced

Tweets

15M

?

Set an Objective

Filter and Process

Generate outputs

Page 7: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Topics

• Mobility– Internal flows– Tourism– Borders commuting– National Roads Networks: Use of roads (planned)– Urban influence zones (planned)

• Subjective wellness– Based on text– Based on emoticons

Page 8: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data
Page 9: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Geo-referenced Tweets 2014

Page 10: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data
Page 11: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

DF

Internal mobility (from-to)

Méx

ico St

ate

To Mexico City

From Mexico

City

Where we go when tweeting?

Page 12: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Internal Tourism

Origin of Tourists visiting

Guanajuato (1-3 February 2014)

Page 13: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Internal Tourism

Origin of Tourists visiting

Puebla(1-3 February 2014)

Page 14: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Use of twitter in long weekendsDisplacements to Puebla and Guanajuato before, on and

after 1-3 February period

Page 15: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Border commuting

• México

• USA

Page 16: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

National Roads Network

Page 17: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Urban Influence zones

Page 18: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Subjective Wellness• Complement of existing survey

– Subjective perceived wellness (monthly)

• Two approaches– Based on emoticons (possible international

comparability)• Netherlands experiments

– Based on text (diversity of analysis, regionalisms)

• Text analysis infrastructure development

Page 19: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Methods and Tools

• Pioanalisis: Tool for collection of the training set (crowdsourcing)

• Machine learning (supervised and unsupervised), Support Vector Machines, Incremental Learning

• Random forest, Latent Dirchlet Allocation (LDA)• SOM Neuronal Networks (SOM: Self Organizing

Map)• Classification Methods: Naive Bayes, Support

Vector Machines (SVM), KNN, Word Count• Dictionaries:Spanish Emotion Lexicon (SEL), KNN,

AFINN, WordNet, ANEW

Page 20: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Partnerships• International

– UNECE• ICHEC

– UNSD– LAMBDoop– University of Pensylvania

• National– KioNetworks

• Dattlas

– TecMilenioINFOTEC– Centro Geo– CIDE– CIMAT– Sectur

• Internal– INEGI General Directions

Page 21: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Conclusions• We are in a discovery stage:

– Findings going from ‘interesting’ to ‘valuable’

• Lot of research needed: – … but we are getting a lot of knowledge and experience

• Partnerships are a must• Combining other big data sources is an imminent next

step• New challenges and threats will appear

– Costs increase?– Legal issues?– Methodologies and quality frameworks re-engineering)?– Evolution of traditional statistics?

• A lot of etcetera?

Page 22: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

New statistics production landscape?

Page 23: Mobility analysis from Twitter data NTTS 2015 - satellite Workshop on Big Data

Conociendo México

01 800 111 46 34www.inegi.org.mx

[email protected]

@inegi_informa INEGI Informa