l1 spatial data - uantwerpen
TRANSCRIPT
Spatial issues in data analysis and model building:
distance, scale and complexity.
Isabelle THOMAS Francqui Chair
March 11th 2015
Spatial analysis
• Visualization Showing interesting patterns (Maps)
• Exploratory Spatial Data Analysis (ESDA) Finding interesting patterns
• Spatial modelling (regression, …) Explaining interesting patterns
Spatial is special
INTRODUCTION Distance Scale Complexity Accidents Conclusions
BAD NEWS
GOOD NEWS
ESDA DESCRIPTION
Spatial STATISTICS
Statistical MAPS
Modeling Spatial statistical
analysis and hypothesis testing
(Spatial) modeling and prediction
LEVEL OF DIFFICULTY
INTRODUCTION Distance Scale Complexity Accidents Conclusions
DISTANCE
DISTANCE Adjacency, interaction, and neighborhoods SCALE MAUP, spatial autocorrelation, ecology fallacy, edge/border effect
Why is distance so important ? (1)
Price of land
Quantity of Land
Towards downtown
Towards the periphery
Q1 Q2 Q3
P1
P2
P3
Distance to CBD
High densities ----------------------------------------------------Low densities
The core of (transport) geography Enters most models, many indices
LOCATION
Absolute Latitude,
longitude; an address
Relative Distance,
directions to other places
Distance
Adjacency
Neighbourhood
Interaction
Why distance so important ? (2)
Introduction DISTANCE Scale Complexity Accidents Conclusions
B
C
A
E
F
D
Adjacency Distance Interaction Neighboorhood
Adjacency matrix (or adjacency list)
Introduction DISTANCE Scale Complexity Accidents Conclusions
i and j are adjacent - if they share a common boundary - Share = ? - if they are within a specified distance (buffer - neighbourhood) Binary or distance-based weights.
Order of adjacency.
Introduction DISTANCE Scale Complexity Accidents Conclusions
Rook Queen
Brig
gs H
enan
Uni
vers
ity 2
012
9
1st order
2nd order
Introduction DISTANCE Scale Complexity Accidents Conclusions
B
C
A
E
F
D
66
24
41
68
68
Adjacency Distance Interaction Neighboorhood
Introduction DISTANCE Scale Complexity Accidents Conclusions
– dij measures the separation between i and j – (mathematical) definition:
• dij>0 if i≠j (distinction/separation) • dij=0 if i=j (co-location/equivalence)
Diagonal of the adjacency matrix
• dij+djk≥dik (triangle inequality) • dij=dji symmetry (is the graph symmetric ?)
Measuring distance is not simple …
In spatial analysis Objects may not be truly point-like/distinct Triangle inequality may not hold Symmetry condition may not hold
Introduction DISTANCE Scale Complexity Accidents Conclusions
ww
w.s
patia
lana
lysis
onlin
e.co
m
Terrain distances – cross section view
Measuring distance is not simple …
Introduction DISTANCE Scale Complexity Accidents Conclusions
ww
w.s
patia
lana
lysis
onlin
e.co
m
13
NB.- Spherical coordinates – spherical /ellipsoidal computations • Metrics
( ) ( )
2,
2:
coscossinsinsin2 221
jiji
jiij
BAwhere
BARd
λλφφ
φφ
−=
−=
+= −
Measuring distance • lp metrics
p = 1 Manhattan; p = 2 Euclidean; ...
Introduction DISTANCE Scale Complexity Accidents Conclusions
B
C
A
E
F
D
Distance Adjacency Interaction Neighboorhood
Introduction DISTANCE Scale Complexity Accidents Conclusions
ww
w.s
patia
lana
lysis
onlin
e.co
m
Distance decay models – Simple inverse power models
– Trip distribution models
– Statistical modelling
0,})({
≥= ββij
ij
d
zfz
)( ijjijiij dfDOBAT =
Introduction DISTANCE Scale Complexity Accidents Conclusions
? B
C
A
E
F
D
Adjacency Distance Interaction Neighboorhood
Sour
ce :
Ovt
rach
t, 20
14
Introduction DISTANCE Scale Complexity Accidents Conclusions
http
://w
ww
.col
orad
o.ed
u/ge
ogra
phy/
Introduction DISTANCE Scale Complexity Accidents Conclusions
http
://w
ww
.col
orad
o.ed
u/ge
ogra
phy/
Introduction DISTANCE Scale Complexity Accidents Conclusions
j
1 2
4 3
Errors A : d(2,j) < d(i,j) < d(5;j) B : d(1,i) = 0 C : i can be allocated to j while closer to j ’
5 i
Aggregation decreases – data collection costs – modeling costs – computing costs – confidentiality concerns – data statistical
uncertainty (smaller sample deviations for larger samples)
Increases – modeling errors/biases
Distance – agregation & scale
Introduction Distance SCALE Complexity Accidents Conclusions
SCALE
LOCATION
Don’t forget the essence of your problem
SITE
SITUATION
SOCIOECONOMIC ENVIRONMENT
Land, transportation, amenities, …
Labor, materials, energy, …
Capital, subsidies, regulations, …
MACRO (national)
MICRO (local)
MESO (regional)
SCALE
SCALE: cartographically
Large cartographic scale Small cartographic scale
Sour
ce :
Topo
map
vie
wer
. N
GI/
ING
Statistical sectors Communes, provinces, …
Introduction Distance SCALE Complexity Accidents Conclusions
Extent constant, different grain
Increasing extent, grain constant
• Extent: spatial dimension
of an object (or process) observed/analyzed
• Grain (BSU): level of spatial resolution at which an object (or process) is measured/observed.
SCALE: 2 aspects
Source « INS »
Aute
urs :
Lar
ielle
et T
hom
as, 2
014
Land rent
(by sq m) 2013
25
SCALE: Extent
Results obtained at one scale do not necessarily apply at other scales. A pattern may be clustered at one scale but dispersed at another scale
Brig
gs H
enan
Uni
vers
ity 2
012
Population clustered into cities
City populations are dispersed
Scale is always important in spatial analysis!
SCALE: Extent
Introduction Distance SCALE Complexity Accidents Conclusions
1. Patterns are dependent upon the scale of observation 2. The importance of explanatory variables changes with scale. 3. Statistical relationships may change with scale. 4. Patterns are generated by processes acting over various
spatial (and temporal) scales.
No unique solution Nested models, power laws, fractals, networks, …
Why being concerned about scale?
Power laws • Summarize how relationships
change with changes in scale • Often expressed on a log-log
plot. • Y = constant (X)n
• Similar slopes are thought to have similar structuring processes (n = slope)
• Example • Species-area relationships
! However : power laws often lack an explanatory process
• The same pattern appears across all scales. It is scale invariant.
• The relationship between size of box and pattern in it is constant.
• Fractals follow their own power law relating how number of boxes needed to cover a shape change in relation to their size.
Fractals
Introduction Distance SCALE Complexity Accidents Conclusions
• Can represent relationships at a variety of scales at once.
• Structural properties of networks provide means of understanding how they work. – Nodes and links – Degree centrality and
betweeness – Weak versus strong links – Directional versus non-
directional graphs
Networks
Introduction Distance SCALE Complexity Accidents Conclusions
1. Modifiable Areal Unit Problem (MAUP) 2. Ecology fallacy, 3. Edge/border effect 4. Spatial autocorrelation, (…)
Fallacies of scale
Introduction Distance SCALE Complexity Accidents Conclusions
1. Modifiable Areal Unit Problem (MAUP)
Introduction Distance SCALE Complexity Accidents Conclusions
Ecological fallacy: making claims about local-scale phenomena based on broad-scale observations Individualistic fallacy: making claims about broad scale phenomena based on observations conducted at small, local scales
2. Ecological fallacy
Do not generalise conclusions at other scales
Points close to the border are closer to locations out of the studied area. Arises when an artificial boundary is imposed on a study, often just to keep it manageable. Biases > nearest-neighbor distances > (model results) ? How to consider “the rest of the world”.
3. Edge/Border effects Solution:
1)Biased parameter estimates 2)Data redundancy (affecting the calculation of confidence intervals) 3)Moran and Geary
4. Spatial autocorrelation (1)
ww
w.s
patia
lana
lysis
onlin
e.co
m
Coefficient – Coordinate (x,y,Z) – Spatial weights matrix (binary or other), W={wij} – Coefficient formulation – desirable properties
• Reflects co-variation patterns • Reflects adjacency patterns via weights matrix • Normalised for absolute cell values • Normalised for data variation • Adjusts for number of included cells in totals
4. Spatial autocorrelation (2)
Introduction Distance SCALE Complexity Accidents Conclusions
ww
w.spatialanalysisonline.com
• Moran’s I
• Modification for point data • Replace weights matrix with distance bands, width h • Pre-normalise z values by subtracting means • Count number of other points in each band, N(h)
∑∑∑∑∑
=−
−−
=i j
ij
ii
i jjiij
nwpzz
zzzzw
pI / where,
)(
))((1
2
∑∑∑
=
ii
i jji
z
zz
hNhI2
)()(
4. Spatial autocorrelation (3)
Introduction Distance SCALE Complexity Accidents Conclusions
Extending SA concepts – Distance formula weights vs bands – Lattice models with more complex
neighbourhoods and lag models (GeoDa) – Disaggregation of SA index computations (row-
wise) with/without row standardisation (LISA) – Significance testing
• Normal model • Randomisation models • Bonferroni/other corrections
4. Spatial autocorrelation (4)
Introduction Distance SCALE Complexity Accidents Conclusions
ww
w.s
patia
lana
lysis
onlin
e.co
m
Moran I Correlogram
Source data points Lag distance bands, h Correlogram
4. Spatial autocorrelation (5)
Introduction Distance SCALE Complexity Accidents Conclusions
• Underlying socio-economic process has led to clustered distribution of variable values – Grouping, Spatial interaction – Diffusion, Dispersal – Spatial hierarchies
• Mis-match betw. process and spatial units
– Counties vs retail trade zones – Census block groups vs neighborhood networks
4. Spatial autocorrelation (6) Causes of spatial dependence / Interpretation
What is Spatial autocorrelation D. Griffith, 1992 – L’Esp. Géo.
Explore the data
Fit an OLS
model
Perform diagnosis
Run adapted model
(ex GWR)
Compare models
EDA ESDA
Global autocorrelation Local autocorrelation
Global model Local model
RESULTS DECISION
Hypo theses
Introduction Distance SCALE COMPLEXITY Accidents Conclusions
Start with OLS and look for
– Positive spatial autocorrelation > dependence between samples exists
– Datasets often non-Normal >> transformations may be required (Log, Box-Cox, Logistic)
– Samples are often clustered >> spatial declustering may be required
– Heteroskedasticity is common (iid) – Spatial coordinates (x,y) may form part of the
modelling process
ww
w.s
patia
lana
lysis
onlin
e.co
m
Introduction Distance SCALE Complexity Accidents Conclusions
Type of spatial effect > Remedies – Spatial heterogeneity (Koenker-Bassett test)
• Include covariate which accounts for heterogeneity? • Split region?
– Spatial autocorrelation (Lagrange Multiplier tests) • Identify missing variables? • Explore effects of spatially-lagged independent variables? • Use appropriate spatial regression model?
Regression models
ww
w.s
patia
lana
lysis
onlin
e.co
m
Introduction Distance SCALE COMPLEXITY Accidents Conclusions
• Identify the source (LM tests will help) – Regression residuals (LM-Error)
• Mismatch of process and spatial units => systematic errors, correlated across spatial units
– Dependent variable (LM-Lag) • Underlying socio-economic process has led to clustered
distribution of variable values => influence of neighboring values on unit values
Regression models
ww
w.s
patia
lana
lysis
onlin
e.co
m
LARGE number of solutions : Spatial autoregressive process (SAR) Spatial moving average process (SMA), …
COMPLEXITY or COMPLICATION ?
Introduction Distance Scale COMPLEXITY Accidents Conclusions
• Algorithmic complexity • Deterministic complexity • Aggregate complexity Key generic properties 1. Nonlinear relationships 2. Techniques such as artificial intelligence 3. Emerges form relatively simple interactions System change and evolve
Complexity is hard to define
M
anso
n, 2
001
- R
. Mar
tin a
nd S
unle
y.
Property Attributes
Has a distributed nature & representation Multiscalar.
Openness Open system
Non-linear dynamics Path dependence.
Limited functional decomposability
Emergence and self-organisation Emergence
Adaptive behaviour and adaptation Self organization
Non deterministic and non tractability Stochastic
Vocabulary about complexity
M
anso
n, 2
001
- R
. Mar
int a
nd S
unle
y.
SYSTEM ANALYSIS
MIT, Jay Forrester (6’), Bertalanffy (67) General system
theorySystem’s autonomy
SELFORGANIZATION Prigogine, Haken (1970-80)
Open systems, dissipative structures, impredictible effects of
non linear micro-interactions on system’s macro structure and dynamics, path dependence
(irreversibility)
COMPLEX SYSTEMS Santa Fe Institute,
ISI, ECSS (1990-2000)
Emerging properties
Models: Multi-Agents-Systems
Models: differential equations
Urban systems are complex systems • Urban systems are produced by social interactions (conveying
information), according to their range in space and duration in time
• Non-linear interaction occur at micro, meso or macro levels, and between levels
• Emergence of collective properties within cities: • Hierarchical organisation (« cities as systems within systems of cities »
Reynaud, 1841, Berry, 1964, Pred, 1977) • Urban « memory » (dynamic path dependence) as a constraint on
urban dynamics at both levels
PLACE(S)(Environment)
Road(s) PEOPLE (Roadusers):
(x, y, t)
t-1
t
t+1
VEHICLE(S)
INTERACTIONS
From facts … to geography
Introduction Distance Scale Complexity ACCIDENTS Conclusions
Multi-level problem
Introduction Distance Scale Complexity ACCIDENTS Conclusions
Explore the data
Fit an OLS
model
Perform diagnosis
Run adapted model
(ex GWR)
Compare models
EDA ESDA
Global autocorrelation Local autocorrelation
Global model Local model
Step 1: EDA Select variable and describe
Univariate
Bi- and multi- variate
Visualizations
Tables, Charts, Plots, autocorr, hot spot
Maps
Step 2 : ESDA
Test spatial homogeneity
Spatial weights
Global & Local spatial autocorrelation
• Point pattern analysis Describing a point pattern. Black spots, black zones
- Density-based point pattern measures - Distance-based point pattern measures
Assessing point patterns statistically • Aggregation - Segments of road - Communes (stat sectors) • Explanation/prediction - Measuring and modeling numbers/risk
5.1
Poin
t pat
tern
ana
lyse
s
Introduction Distance Scale Complexity ACCIDENTS Conclusions
Pinpoint location (point) Black spot Black road segment (line) Black « region » (polygon) Multi- scale, dimensional, disciplinary, causal analysis. Necessity: to isolate, to control for in order to avoid badly specified models.
Describe / Understand / Explain / predict + ACT (Engineering, Enforcement, Education, Environment)
5.1
Poin
t pat
tern
ana
lyse
s
Introduction Distance Scale Complexity ACCIDENTS Conclusions
Poisson or not ?
• Poisson > Binomial • Aggregation effects • Length of segments
Sour
ce :
Thom
as, 1
996
5.1
Poin
t pat
tern
ana
lyse
s
Sour
ce :
Flah
aut,
2002
Road accidents N29 Charleroi-Jodoigne
Moran for black segments
5.1
Poin
t pat
tern
ana
lyse
s
Introduction Distance Scale Complexity ACCIDENTS Conclusions
5. A
CC
IDEN
TS D
E LA
RO
UTE
5.
1 Po
int p
atte
rn a
naly
ses
Sour
ce: E
ckha
rt, 2
002
5.1
Poin
t pat
tern
ana
lyse
s
Introduction Distance Scale Complexity ACCIDENTS Conclusions
Kernel
Sour
ce: S
teen
berg
hen,
Def
ays,
Tho
mas
, Fla
haut
, 201
0
5.1
Poin
t pat
tern
ana
lyse
s
Mechelen
Sour
ce: S
teen
berg
hen,
Def
ays,
Tho
mas
, Fla
haut
, 201
0
5.1
Poin
t pat
tern
ana
lyse
s
Infrastructure &
Environnement
Yi = 1 if hm belongs to a « black segment ».
Yi = 0 otherwise
Xi
Characteristics of the road - Usage - Physical properties - Environment (landuse, …)
(Official data; Numerical Digital Terrain Model; IGN maps)
Logistic regression 5.2
Mod
el fo
r i =
hec
otm
ers
Sour
ce :
Flah
aut,
2004
Introduction Distance Scale Complexity ACCIDENTS Conclusions
N 0 250m
5. A
CC
IDEN
TS D
E LA
RO
UTE
5.
2 M
odel
for i
= h
ecot
mer
s
Sour
ce :
Flah
aut,
2004
5. A
CC
IDEN
TS D
E LA
RO
UTE
Sour
ce :
Flah
aut,
2004
5.2
Mod
el fo
r i =
hec
otm
ers
Sour
ce :
Vand
enbu
lcke
et a
l, 20
11
5.3
Mod
el fo
r i =
com
mun
es
Objective : explain variations in Y Controlling spatial biases
5.3
Mod
el fo
r i =
com
mun
es
EXPLORATORY
Identify potential explanatory factors
Statistical tools: • Graphics, (basic statistics) • Cluster analyses, (PCA) • Correlations (x,y)
STATISTICAL MODELLING
Relative importance of variables?
Statistical tools • Statistical models • Corrections for
multicollinearity & spatial effects
2 steps
Factor X ?
Factor 1
Factor 2
?
town
village
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
% cycling
Distance (km)
H1
H2
H3
H5
H8
10 km
• Commuting distances (< 10 km) • Town size: regional towns > large towns • Regional differences (culture + …)
Exploratory step
Sour
ce :
Vand
enbu
lcke
et a
l, 20
11
5.3
Mod
el fo
r i =
com
mun
es
5. A
CC
IDEN
TS D
E LA
RO
UTE
Sour
ce :
Vand
enbu
lcke
et a
l, 20
11
Exploratory step 5.
3 M
odel
for i
= c
omm
unes
Unsatisfaction of cycleways: –0.82
Slopes: –0.77 Bad health: – 0.58
ρxy = 1
(correlation)
Active people < 25 years: 0.54
Accident risk: – 0.32
Job density: 0.38
No child, town size: 0.23
ρxy = 0
ρxy = –1
Commuting distances (km)
Average slopes (d°)
Commuting distances: – 0.54
POLICY-RELATED FACTORS
ENVIRONMENTAL FACTORS
INDIVIDUAL FACTORS
- Income - Education - Gender - Age - Car availability - Young childrens/household
Socio-economic data (NIS)
- Subjective health
Health data (NIS)
- Slopes (d°)
Physical data (UCL)
- Air pollution (PM10)
Environmental data (IRCEL-CELINE)
- Accident risk: f (number of accidents, travel time)
Accident data (NIS)
- Land-use (e.g. urban) - City size - Job and pop. densities
Land-use data (UCL)
- Satisfaction of cycle paths - Traffic volume - Commuting distance (km)
Trip/local characteristics
BICYCLE USE
Scale : communes (INS 5)
Vandenbulcke et al Transportation Research Part A (2011)
SPATIAL AUTOREGRESSIVE
MODEL + REGIMES
Uncorrelated X
"White correction »
OLS (Ordinary-Least Squares )
Spatial autocorrelation (LM tests)
Structural instability (Chow tests)
Multicollinearity (VIF, …)
Heteroskedasticity (BP tests)
Spatial autoregressive model (spatial lag)
Inclusion of spatial regimes (ESDA)
111111 εβρ ++= XyWy
222222 εβρ ++= XyWy
εβρ ++= XWyy(Queenmatrix)
εβ += Xy5.
3 M
odel
for i
= c
omm
unes
OLS Model (n = 589)
Italics: ln(x+1)
Y = % commuter cyclists in commune i
Estimation OLS (y)
Intercept 6,4124****
Median income 0,0030
Active men 0,0472****
Age 2 (45-54 years) -0,0460****
Young children -0,0567****
Cycleways unsatisfaction -0,0127****
Commuting distance -0,0114***
Air quality 0,0141****
City size -0,0954****
Bad health -0,0521****
Accident risk -0,1673**
Traffic volume 2 (municipal network) -0,9216****
Age 3 (> 54 years) -0,2054*
Education 3 (university degree) -0,4988****
Slopes -0,4873****
R-squared (R²) 0,879
Log Likelihood -102,43
Moran's I of residuals 0,34 (0,00)
Sour
ce :
Vand
enbu
lcke
et a
l, 20
11
Estimation OLS (y) ML (y)
Intercept 6,4124**** 3,2698****
Median income 0,0030 0,00852
Active men 0,0472**** 0,01673**
Age 2 (45-54 years) -0,0460**** -0,02505***
Young children -0,0567**** -0,0218****
Cycleways unsatisfaction -0,0127**** -0,0049****
Commuting distance -0,0114*** -0,00652**
Air quality 0,0141**** 0,00405
City size -0,0954**** -0,08747****
Bad health -0,0521**** -0,01889****
Accident risk -0,1673** -0,14495***
Traffic volume 2 (municipal network) -0,9216**** -0,46952****
Age 3 (> 54 years) -0,2054* -0,14503*
Education 3 (university degree) -0,4988**** -0,23034***
Slopes -0,4873**** -0,17630****
Lag coefficient (ρ) - 0,6015****
R-squared (R²) 0,879 -
Log Likelihood -102,43 33,68
Moran's I of residuals 0,34 (0,00) 0,01 (0,45)
Y = % commuter cyclists in commune i
OLS
LAG
Sour
ce :
Vand
enbu
lcke
et a
l, 20
11
SAR Model (LAG)
LAG
Residuals
OLS
Simpson’s paradox 5.
3 M
odel
for i
= c
omm
unes
Introduction Distance Scale Complexity ACCIDENTS Conclusions
Spatial LAG model + Regimes N-S
North South
Intercept 2,3084* 4,30951****
Median income 0,0311* -0,0027
Active men 0,0296** 0,0008
Age 2 (45-54 years) -0,0417** -0,0205***
Young children -0,0365*** -0,0247***
Cycleways unsatisfaction -0,0052*** -0,0045***
Commuting distance -0,0165*** -0,0047*
Air quality 0,01384**** -0,0054
City size -0,11459**** -0,03615****
Bad health -0,0098 -0,0146**
Accident risk -0,76319**** -0,14892****
Traffic volume 2 (municipal network) -0,2357 -0,4521**
Age 3 (> 54 years) -0,1074 -0,0680
Education 3 (university degree) -0,0968 -0,3132***
Slopes -0,1931** -0,19718****
Lag coefficient (ρ) 0,5362****
N 589 (NNorth = 308; NSouth = 281)
Log Likelihood 93,923
Y = % commuter cyclists in commune i
North = Flanders South = Wallonia & Brussels
Sour
ce :
Vand
enbu
lcke
et a
l, 20
11
Main results
– Demographic factors: e.g. gender, children – Socio-economic: e.g. education – Environmental & policy-related factors, e.g.:
• Dissatisfaction with cycle facilities • Town size • Accident risk • Traffic volume
5.3
Mod
el fo
r i =
com
mun
es
Introduction Distance Scale Complexity ACCIDENTS Conclusions
location 2 > location 1
Spatial factors?
Importance of space/location
Network location 1 Network location 2
Bicycle traffic =
? ?
? accident
street network
5.4
Mod
el fo
r i =
add
ress
es
• Binary Yi = 0,1 logistic specification
• Corrections for – Multicollinearity – Heteroskedasticity – Residual spatial autocorrelation
omitted variables? spatial models
• Spatial models (Bayesian framework) – ICAR model… but fit not improved – Hierarchical auto-logistic model
5. A
CC
IDEN
TS D
E LA
RO
UTE
5.
4 M
odel
for i
= a
ddre
sses
Introduction Distance Scale Complexity ACCIDENTS Conclusions
Cases = accidents + Controls = generated absences yi = (0,1)
Regression methods (e.g. logistic models) Advantage: estimation of risk, reduced statistical bias Issues: no vehicle & human factors, selection of controls
Models based on case-controls?
Methodology
Regression methods (e.g. multinomial logit models) Issues: over-/under-dispersion, underreporting, etc.
Regression methods (e.g. logistic models) Main issue: bias in the selection of road trajectories
Case-control
strategy
Transportation (gravity-based
models)
Epidemiology (case-control
studies)
Ecology (generation of
controls)
Models based on surveys, road trajectories
Models based on accident-only data
Data collection
• Accident risk = time-consuming process – Accidents (cases) to be geocoded/located
– ‘Absences’ (controls) to be generated • … but no rigorous sampling method tricky and questionable results!
– Road network exclude ‘unbikeable’ links
– Risk factors to be collected…
• Software requirements: GIS 4.4
Mod
el fo
r i =
add
ress
es
Introduction Distance Scale Complexity ACCIDENTS Conclusions
• Controls = locations without any accident (officially) supposed to be safe
• Generation of controls = random sampling of points along the road network, BUT:
Proportional to bicycle traffic (stratified sampling) Exclude ‘black zones’ (hot spots of accidents) from the
bikeable network
Black zones
Data collection: controls and absences
1) Negative exponential function
2) 500 impedance functions 3) No edge effect
Stratified random sampling
Potential bicycle traffic
111111
Black spots (network kernel densities)
Sa
mp
ling
inte
nsi
ty
Sa
mp
ling
re
gio
n
111111
Ncontrols = 4*Naccidents
Data collection: risk factors Infrastructure factors • Cycling facilities & contraflow cycling • Discontinuities • Parking areas & garages • Bridge & funnels • Crossroads & complexity • Tram railways • Traffic-calming areas • Major roads • Proximity city centre • Distance to specific points of interest (e.g. schools, bus stops, etc.)
Traffic conditions • Cars • Trucks/lorries & buses • Vans
Environmental factors • Gradients • Green blocks (parks, etc.)
5. A
CC
IDEN
TS D
E LA
RO
UTE
4.
4 M
odel
for i
= a
ddre
sses
• Advantage of GIS: combination of several datasets
• Accidents/controls – ‘Attached’ variables – ‘Crossings’
Data collection: risk factors
DATASET
Results: Modelling process
DEPENDENT VARIABLE (BINARY) Accident data (geocoded)
Controls/absences
INDEPENDENT VARIABLES (RISK FACTORS)
Infrastructure factors
Traffic conditions
Environment (physical)
MODELLING PROCESS
FINAL MODEL
Choice of the specification
Convergence diagnostics
Corrections for spatial effects
PREDICTIONS
GIS
Results: robust
Results: Predictions for a trajectory
Schuman’s roundabout
Tram railways
High traffic
volume
Exit High traffic
volume
Succession of crossroads on a major road (Wetstraat/Rue de la Loi) + segregated cycling facility
End of a separated cycling facility at
the crossroad Residential ward
Residential ward + contraflow
Take home message
• Location(s) and distance (s) • Scale : independance of scales; nested. • COMPLEXITY of spatial processes • UNCERTAINTY
Introduction Distance Scale Complexity Accidents CONCLUSIONS
Spatial statistics Large data sets Spatial autocorrelation Scales Border/edge effects MAUP (scale + zoning) Heterogeneity …
SPACE BIASES
Introduction Distance Scale Complexity Accidents CONCLUSIONS
Readings
Data analysis • Fotheringham A., Brunsdon C. &Charlton M. (2000) Quantitative Geography Perspectives on Spatial Data Analysis, London, SAGE • Fotheringham A, C Brunsdon &M Charlton (2002) Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Chichester. • Bailey, T., & A. Gatrell. 1995. Interactive spatial data analysis. Essex, UK: Longman. • www.spatialanalysisonline.com Road accidents in Belgium • Thomas I. (1996), Spatial Data Aggregation. Exploratory Analysis of Road Accidents. AAP, 28:2, 251-264 • SteenberghenT. et al. (2004) Intra-urban location of road accidents blackzones: a Belgian example. IJGIS: 18,2, 169-181. • Vandenbulcke G., Thomas I., IntPanis L. (2014), Predicting cycling accident risk in Brussels: an innovative spatial case-control approach. AAP, 62, 341-357 • Vandenbulcke G.,. et al. (2011) Bicycle commuting in Belgium: Spatial determinants and re-cycling strategies, TR – A 45 118–137
Your exercice – 10 pages. Take your own data set (If you haven’t : go to Census11) and « PLAY » with them. Get 3 variables : Y (your choice) + 1 X « explanatory » + a measure of distance 1. Define/describe them very well; justify the scale (extent and grain) and its
limitations 2. EDA and ESDA + Statistical map of the 3 variables. Compute correlations between variables for several extents and/or 2 levels of aggregation and/or 2 subsets. 3. Compute simple OLS and map residuals (compute spatial autocorrelation) for both levels of aggregation. 4. If possible enhance regression by adopting other method f.i. correct for spatial autocorrelation. 5. Critical and strong conclusion (incl. potentials, challenges, …)