Statistical Physics, Network theory & Big data
An approach to human mobility
Oleguer Sagarra Dept. Física Fonamental,
University of Barcelona�1
�2
Statistical Physics &
Big Data
“New Social Sciences”
A killing combination...
Why?
Mobility has deep implications in many processes.. (contagion, spread of ideas...)
The development of GPS/mobile phone technologies makes gathering data cheap and possible at large scale.
�3
We want to study Human Mobility…
What?
Different scales (Micro/Meso/Macro)
Society is heterogeneous… (Humans are not “monkeys”… in principle!)
�4
(Human) Mobility is a rather complex process…
But we are physicists! So we will try to model it anyway…
But we don’t need modelling…
�5
“Computers are useless, they can only give you answers…” (P. Picasso)
This talk is about questions rather…
“Models push the boundaries of our understanding"
How?
�6
Real (big) Data
Theoretical Empirical
Physics Mathematics
Network Science
The data... (has problems)
�7Citizens
a) How to get it?
Private companies (Social Media)
Getting the data... ExperimentsSmartphones give lots of “sensing opportunities”
Citizen science aims to involve people in data collection, sharing and processing
�8
BeePath: Experiments on human mobility
http://bee-path.net
(Btw: Very interesting project, but don’t have time for it today)
Getting the data... Social Media
�9
b) Is it biased? (Big data can also mean big errors)
Social media data
Social media data is geolocalized, we can extract trajectories from it.
But first, is the data representative from the population?
�10
We can compare with the census… Analysis must be done at user level!
(We want info about people, not about “some people that tweet a lot”)
�11
From points to a network?
The data... is geolocalized, and (too) big!
c) Continuous vs discrete data
(We want only the flows: From where and to where people go, “on average”)
The network approach
Network
Data
Filtering
Aggregation (grid)
�������������� ������
� ���
����
�12
Network data
�13
(We can now apply network metrics and… data is normalized!)
Sagarra, O. Master Thesis. http://upcommons.upc.edu/pfc/handle/2099.1/13134
Now we know how to deal with the data...
�14
We want to detect “abnormal” patterns...
What is chance, what is not?
What is important, what is not?
Modeling as a physicist…
�15
Take all trivial elements out…
Keep just the “basic” factors in mobility !
- Distance / Cost (a.k.a. laziness) - Population density (a.k.a. opportunities)
(We look for causality, not correlation)
Macro/Meso level: (urban/regional/national)
�16
Taking inspiration from Statistical Mechanics and Network Theory, one can define flexible
null models.
We need a general model for mobility networks…
�17
Procedure: 1. Fix some hypothesis
“The population leaving or entering each cell is given” !
2. Generate predictions “How do the flows organize?”
!
3. Compare Data vs Prediction
We need a null model for the data...
(quite a lot of maths….)*
Sagarra, O. et altr. Phys. Rev. E 88, 062806 (2013)
Roadmap
�18
Hypothesis... Modelling
Raw data Clean data
Data featuresPrediction
Experiments, Databases...
Data treatment tools
Null Model predictions
Visualizations
Statistical Validation
(We are here)(Product)
What’s the goal of all this?
Understand what drives human mobility
Discriminate important factors from negligible ones (population density, distance, cost...)
Create tools to study data in an unbiased manner
�19