mobility analysis from twitter data ntts 2015 - satellite workshop on big data

Mobility analysis from Twitter data

NTTS 2015 - satellite Workshop on Big Data

Twitter as data source

NoSQL Database

Filter by: Geo-referenced Only

México

Real-time Tweets

TwitterTwitter

Why Tweeter?

• Availability• 1% of Tweets available without cost• Around 12 M accounts in Mexico• 700,000 accounts are geo-referenced• Collection of 150 M of tweets since

January 2014

Devices generatingtweets in Mexico

Tweet collection infrastructure

Unix “Red Hat”

NoSql Database “Elasticsearch”

Cluster (Hydra)

Big Data Layers

Test of Concept

General Process

Every DayCollection

StoreGeo-Referenced

Tweets

Set an Objective

Filter and Process

Generate outputs

Topics

• Mobility– Internal flows– Tourism– Borders commuting– National Roads Networks: Use of roads (planned)– Urban influence zones (planned)

• Subjective wellness– Based on text– Based on emoticons

Geo-referenced Tweets 2014

Internal mobility (from-to)

ico St

To Mexico City

From Mexico

Where we go when tweeting?

Internal Tourism

Origin of Tourists visiting

Guanajuato (1-3 February 2014)

Internal Tourism

Origin of Tourists visiting

Puebla(1-3 February 2014)

Use of twitter in long weekendsDisplacements to Puebla and Guanajuato before, on and

after 1-3 February period

Border commuting

• México

• USA

National Roads Network

Urban Influence zones

Subjective Wellness• Complement of existing survey

– Subjective perceived wellness (monthly)

• Two approaches– Based on emoticons (possible international

comparability)• Netherlands experiments

– Based on text (diversity of analysis, regionalisms)

• Text analysis infrastructure development

Methods and Tools

• Pioanalisis: Tool for collection of the training set (crowdsourcing)

• Machine learning (supervised and unsupervised), Support Vector Machines, Incremental Learning

• Random forest, Latent Dirchlet Allocation (LDA)• SOM Neuronal Networks (SOM: Self Organizing

Map)• Classification Methods: Naive Bayes, Support

Vector Machines (SVM), KNN, Word Count• Dictionaries:Spanish Emotion Lexicon (SEL), KNN,

AFINN, WordNet, ANEW

Partnerships• International

– UNECE• ICHEC

– UNSD– LAMBDoop– University of Pensylvania

• National– KioNetworks

• Dattlas

– TecMilenioINFOTEC– Centro Geo– CIDE– CIMAT– Sectur

• Internal– INEGI General Directions

Conclusions• We are in a discovery stage:

– Findings going from ‘interesting’ to ‘valuable’

• Lot of research needed: – … but we are getting a lot of knowledge and experience

• Partnerships are a must• Combining other big data sources is an imminent next

step• New challenges and threats will appear

– Costs increase?– Legal issues?– Methodologies and quality frameworks re-engineering)?– Evolution of traditional statistics?

• A lot of etcetera?

New statistics production landscape?

Conociendo México

01 800 111 46 34www.inegi.org.mx

atencion.usuarios@inegi.org.mx

@inegi_informa INEGI Informa

mobility analysis from twitter data ntts 2015 - satellite workshop on big data

period slide

big data slide

national roads network

mexico android iphone

georeferenced collection

tweets available

twitter data ntts

big data sources

Documents

job vacancies experiment boro nikić satellite workshop on...

data mining presentation - twitter classification

sentiment analysis of twitter data

collecting twitter data

twitter data clustering and visualization

(ntts 47) kloha texts and traditions 2014

(ntts 40) krans patristic and text-critical studies 2011

analyzing twitter data with apache hadoop · analyzing...

twitter presentation (data science team)

phần 2: quản lý chất lượng môi trường ntts

edoardo pizzoli, chiara piccini ntts 2013 - new techniques...

satellite services b.v. next generation tm/tc system (ntts...

big data: mapping twitter communities

data visualization at twitter

data mining twitter from indo

mirror outlier detection in foreign trade data markos...

processing twitter data with mongodb -...

buddy mkt twitter-data-report

big data + twitter

new training technologies - · pdf filenew training...