applied spatial statistics in r, section 2 - huit sites hostingzhukov/spatial2.pdf · ·...
TRANSCRIPT
Applied Spatial Statistics in R, Section 2Spatial Autocorrelation
Yuri M. Zhukov
IQSS, Harvard University
January 16, 2010
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 1 / 1
Spatial Autocorrelation
Outline
1 IntroductionWhy use spatial methods?The spatial autoregressive data generating process
2 Spatial Data and Basic Visualization in R
PointsPolygonsGrids
3 Spatial Autocorrelation
4 Spatial Weights
5 Point Processes
6 Geostatistics7 Spatial Regression
Models for continuous dependent variablesModels for categorical dependent variablesSpatiotemporal models
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 2 / 1
Spatial Autocorrelation
What is Spatial Autocorrelation?
Spatial autocorrelation measures the degree to which a phenomenonof interest is correlated to itself in space (Cliff and Ord 1973, 1981).
Tests of spatial autocorrelation examine whether the observed valueof a variable at one location is independent of values of that variableat neighboring locations.
Positive spatial autocorrelation indicates that similar values appearclose to each other, or cluster, in space
Negative spatial autocorrelation indicates that neighboring values aredissimilar or, equivalenty, that similar values are dispersed.
Null spatial autocorrelation indicates that the spatial pattern israndom.
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 3 / 1
Spatial Autocorrelation
Global autocorrelation: Moran’s I
The Moran’s I coefficient calculates the ratio between the product ofthe variable of interest and its spatial lag, with the product of thevariable of interest, adjusted for the spatial weights used.
I =n∑n
i=1
∑nj=1 wij
∑ni=1
∑nj=1 wij(yi − y)(yj − y)∑n
i=1(yi − y)2
where yi is the value of a variable for the ith observation, y is thesample mean and wij is the spatial weight of the connection between iand j .
Values range from –1 (perfect dispersion) to +1 (perfect correlation).A zero value indicates a random spatial pattern.
Under the null hypothesis of no autocorrelation, E[I] = −1n−1
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 4 / 1
Spatial Autocorrelation
Global autocorrelation: Moran’s I
Calculating the variance of Moran’s I is a little more involved:
Var(I) =ns1 − s2s3
(n − 1)(n − 2)(n − 3)(∑
i
∑j wij)2
s1 =(n2 − 3n + 3)(1
2
∑i
∑j
(wij + wji )2)
− n(∑
i
(∑j
wij +∑j
wji )2)
+ 3(∑i
∑j
wij)2
s2 =n−1
∑i (yi − x)4
(n−1∑
i (yi − x)2)2
s3 =1
2
∑i
∑j
(wij + wji )2 − 2n
(1
2
∑i
∑j
(wij + wji )2)
+ 6(∑
i
∑j
wij
)2
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 5 / 1
Spatial Autocorrelation
Global autocorrelation: Geary’s C
The Geary’s C uses the sum of squared differences between pairs ofdata values as its measure of covariation.
C =(n − 1)
∑i
∑j wij(yi − yj)
2
2(∑
i
∑j wij)
∑i (yi − y)2
where yi is the value of a variable for the ith observation, y is thesample mean and wij is the spatial weight of the connection between iand j .
Values range from 0 (perfect correlation) to 2 (perfect dispersion). Avalue of 1 indicates a random spatial pattern.
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 6 / 1
Spatial Autocorrelation
Global autocorrelation: Join Counts
When the variable of interest is categorical, a join count analysis canbe used to assess the degree of clustering or dispersion.
A binary variable is mapped in two colors (Black & White), such thata join, or edge, is classified as either WW (0-0), BB (1-1), or BW(1-0).
Join count statistics can show
positive spatial autocorrelation (clustering) if the number of BW joinsis significantly lower than what we would expect by chance,negative spatial autocorrelation (dispersion) if the number of BW joinsis significantly higher than what we would expect by chance,null spatial autocorrelation (random pattern) if the number of BWjoins is approximately the same as what we would expect by chance.
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 7 / 1
Spatial Autocorrelation
Global autocorrelation: Join Counts
By the naive definition of probability, if we have nB Black units andnW = n − nB White units, the respective probabilities of observingthe two types of units are:
PB =nBn
PW =n − nB
n= 1− PB
The probabilities of BB and WW in two adjacent cells are
PBB = PBPB = P2B PWW = (1− PB)(1− PB) = (1− PB)2
The probability of BW in two adjacent cells is
PBW = PB(1− PB) + (1− PB)PB = 2PB(1− PB)
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 8 / 1
Spatial Autocorrelation
Global autocorrelation: Join Counts
The expected counts of each type of join are:
E[BB] =1
2
∑i
∑j
wijP2B E[WW ] =
1
2
∑i
∑j
wij(1− PB)2
E[BW ] =1
2
∑i
∑j
wij2PB(1− PB)
Where 12
∑i
∑j wij is the total number of joins (of any type) on a
map, assuming a binary connectivity matrix.The observed counts are:
BB =1
2
∑i
∑j
wijyiyj WW =1
2
∑i
∑j
wij(1− yi )(1− yj)
BW =1
2
∑i
∑j
wij(yi − yj)2
where yi = 1 if unit i is Black and yi = 0 if White.Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 9 / 1
Spatial Autocorrelation
Global autocorrelation: Join Counts
The variance of BW is calculated as
σ2BW =E[BW 2]− E[BW ]2
=1
4
(2s2nB(n − nB)
n(n − 1)+
(s3 − s1)nB(n − nB)
n(n − 1)
+4(s2
1 + s2 − s3)nB(nB − 1)(n − nB)(n − nB − 1)
n(n − 1)(n − 2)(n − 3)
)− E[BW ]2
s1 =∑i
∑j
wij
s2 =1
2
∑i
∑j
(wij − wji )2
s3 =∑i
(∑j
wij +∑j
wji )2
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 10 / 1
Spatial Autocorrelation
Global autocorrelation: Join Counts
A test statistic for the BW join count is
Z(BW ) =BW − E[BW ]√
σ2BW
The join count statistic is assumed to be asymptotically normallydistributed under the null hypothesis of no spatial autocorrelation.
The test of significance is then provided by evaluating the BWstatistic as a standard deviate (Cliff and Ord, 1981).
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 11 / 1
Spatial Autocorrelation
Local autocorrelation
Global tests for spatial autocorrelation are calculated from localrelationships between observed values at spatial units and theirneighbors.
It is possible to break these measures down into their components,thus constructing local tests for spatial autocorrelation.
These tests can be used to detect
Clusters, or units with similar neighborsHotspots, or units with dissimilar neighbors
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 12 / 1
Spatial Autocorrelation
Local autocorrelation
Below is a scatterplot of county vote for Bush and its spatial lag (averagevote received in neighboring counties). The Moran’s I coefficient is drawnas the slope of the linear relationship between the two. The plot ispartitioned into four quadrants: low-low, low-high, high-low and high-high.
0 20 40 60 80 100
020
4060
80100
Moran Scatterplot
Percent for Bush
Per
cent
for B
ush
(Spa
tial L
ag)
GlacierRolette
St. Louis
San Juan
Roosevelt
Clallam
Jefferson
GarfieldMissoula
Custer
DouglasAshland
SiouxDeer Lodge
Carter
Big Horn
Harding
Multnomah
Dewey
Valley
Ramsey
Yellowstone National Park (Part)
ClintonLamoilleChittenden
Clark
Buffalo
Blaine Teton Madison
Mower
Ada
Shannon Todd
Windham
Bannock
Sioux
Berkshire
Franklin
Cassia
HampshireSuffolk
FranklinBlaineBox ElderRich
Arthur
Geauga
Banner
Cuyahoga
Dukes
WeberLincoln
Nantucket
Morgan
Summit
Mahoning
Bergen
Bronx
Salt Lake
Nassau
Essex
New York
UintahDuchesne
Hudson
Queens
Kings
Hayes
Richmond
Utah
Philadelphia
Marion
Carbon
DenverPitkin
Montgomery
Monroe
Anne ArundelPrince Georges
Washington
St. Louis
Sonoma
Monroe
Marin
Elliott
San Miguel
Garfield
DoloresAlameda
San FranciscoSan Mateo
Costilla
Jackson
KnottSanta Clara
Santa Cruz
San JuanRio Arriba
Apache
TaosHertfordWarren
Person
Ochiltree
OrangeDurham
Kingfisher
Roberts
Santa Fe
McKinley
Los Alamos
San Miguel Tipton
Potter
Oldham
St. FrancisTorrance De Soto
Tunica
Parmer
CatronGrant
Florence
Foard
Fulton
Clarke Jefferson
TaliaferroCarroll
Clayton
ChicotFayette
Hancock
Bamberg
Holmes
DorchesterGlascock
NoxubeeGreene Allendale
West Carroll
Dallas
Sumter
Shackelford
Perry
Warren
MaconEffingham
Marengo
SchleyLowndes
BullockWilcox
Beaufort
Jasper
Claiborne
SterlingGlasscock
Loving
Lee
Jefferson Davis
Crane
Miller
Decatur
Grady
Gadsden Livingston
DuvalJefferson
Orleans
St. Bernard
Alachua
LaFourche
Bexar
ZavalaMcMullen
Dimmit
Duval
BrooksStarr
Willacy
Menominee
Orleans
Monroe
WayneDane
Milwaukee
LivingstonWayneWashtenaw
Broome Cook
Monroe
Lagrange
Baltimore City
Fairfax
ArlingtonAlexandria
HarrisonburgStaunton
Charlottesville
Clifton ForgeRichmond City
Colonial Heights
Petersburg
Poquoson CitySouth Boston
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 13 / 1
Spatial Autocorrelation
Local autocorrelation: Local Moran’s I
A local Moran’s I coefficient for unit i can be constructed as one ofthe n components which comprise the global test:
Ii =(yi − y)
∑nj=1 wij(yj − y)∑n
i=1(yi−y)2
n
As with global statistics, we assume that the global mean y is anadequate representation of the variable of interest.
As before, local statistics can be tested for divergence from expectedvalues, under assumptions of normality.
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 14 / 1
Spatial Autocorrelation
Local autocorrelation: Local Moran’s IBelow is a plot of Local Moran |z |-scores for the 2004 PresidentialElections. Higher absolute values of z scores (red) indicate the presence of“hot spots”, where the percentage of the vote received by President Bushwas significantly different from that in neighboring counties.
Local Moran's I (|z| scores)
0
5
10
15
20
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 15 / 1
Spatial Autocorrelation
Words of Caution
1 Autocorrelation tests are highly sensitive to spatial patterning in thevariable of interest from any source. But by assuming that the meanmodel removes such systematic spatial patterning, spatialautocorrelation tests do not always produce useful insights into theDGP.
2 These tests are also highly sensitive to one’s choice of spatial weights.Where the weights do not reflect the “true” structure of spatialinteraction, estimated autocorrelation (or lack thereof) may actuallystem from misspecification.
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 16 / 1
Spatial Autocorrelation
Words of Caution
1 Autocorrelation tests are highly sensitive to spatial patterning in thevariable of interest from any source. But by assuming that the meanmodel removes such systematic spatial patterning, spatialautocorrelation tests do not always produce useful insights into theDGP.
2 These tests are also highly sensitive to one’s choice of spatial weights.Where the weights do not reflect the “true” structure of spatialinteraction, estimated autocorrelation (or lack thereof) may actuallystem from misspecification.
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 16 / 1
Spatial Autocorrelation
Words of Caution
Below is a correlogram of Moran’s I coefficients for Polity IV countrydemocracy scores in 2008. The x-axis represents distances betweencountry capitals, in kilometers. Here, democracy is significantly (p ≤ .05)spatially autocorrelated only at distances of 3,000 km and below. So,autocorrelation estimates will depend highly on choice of lag distance.
0 2000 6000 10000 14000 18000 22000 26000 30000 34000 38000
-1.0
-0.6
-0.2
d2m(corD1[, 1])
Mor
an's
I C
oeffi
cien
t
0 2000 6000 10000 14000 18000 22000 26000 30000 34000 38000
0.00.20.40.60.81.0
p-value
p ≤ 0.05
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 17 / 1
Spatial Autocorrelation
Words of Caution
1 Autocorrelation tests are highly sensitive to spatial patterning in thevariable of interest from any source. But by assuming that the meanmodel removes such systematic spatial patterning, spatialautocorrelation tests do not always produce useful insights into theDGP.
2 These tests are also highly sensitive to one’s choice of spatial weights.Where the weights do not reflect the “true” structure of spatialinteraction, estimated autocorrelation (or lack thereof) may actuallystem from misspecification.
3 As originally designed, spatial autocorrelation tests assumed there areno neighborless units in the study area. When this assumption isviolated, the size of n may be adjusted (reduced) to reflect the factthat some units are effectively being ignored. Not doing so willgenerally bias the absolute value for the autocorrelation statisticupward and the variance downward.
Yuri M. Zhukov (IQSS, Harvard University) Applied Spatial Statistics in R, Section 2 January 16, 2010 18 / 1