gist 4302/5302: spatial analysis and modeling - point

44
GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis Guofeng Cao www.spatial.ttu.edu Department of Geosciences Texas Tech University [email protected] Spring 2018

Upload: others

Post on 09-Jun-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GIST 4302/5302: Spatial Analysis and Modeling - Point

GIST 4302/5302: Spatial Analysis and ModelingPoint Pattern Analysis

Guofeng Caowww.spatial.ttu.edu

Department of GeosciencesTexas Tech University

[email protected]

Spring 2018

Page 2: GIST 4302/5302: Spatial Analysis and Modeling - Point

Spatial Point Patterns

Characteristics:• set of n point locations with recorded “events”, e.g., locations oftrees, disease or crime incidents S = {s1, . . . , si , . . . , sn}

• point locations correspond to all possible events or to subsets ofthem

• attribute values also possible at same locations, e.g., tree diameter,magnitude of earthquakes (marked point pattern)W = {w1, . . . ,wi , . . . ,wn}

Analysis objectives:

• detect spatial clustering or repulsion, as opposed to completerandomness, of event locations (in space and time)

• if clustering detected, investigate possible relations with nearby“sources”

2/44

Page 3: GIST 4302/5302: Spatial Analysis and Modeling - Point

Spatial Point Patterns

Further issues:• analysis of point patterns over large areas should take into accountdistance distortions due to map projections

• boundaries of study area should not be arbitrary• analysis of sampled point patterns can be misleading• one-to-one correspondence between objects in study area and eventsin pattern

3/44

Page 4: GIST 4302/5302: Spatial Analysis and Modeling - Point

Simple Descriptive StatisticsMean center of a point pattern:

• point with coordinates s = (x , y):

x =∑n

i=1 wixi

∑ni=1 wi

and y =∑n

i=1 wiyi

∑ni=1 wi

• center of point pattern, or point with average x and y -coordinates

Median center of a point pattern:• both of the following two centers are called median centers,although they are essentially different (confusing!)

• the intersection between the median of the x and the y coordinates

• center for minimum distance: sc ∈ {s1, . . . , sn}s.t.minn∑i=1|si − sc |

• the first type of median center is not unique, and there is no closedform for the second type

• p-median problem (a typical problem in spatial optimization ): the

problem of locating p “facilities” relative to a set of “customers” such that the sum of

the shortest demand weighted distance between “customers” and “facilities” is minimized

hoti4/44

Page 5: GIST 4302/5302: Spatial Analysis and Modeling - Point

Simple Descriptive StatisticsChanges of population center (year 1790-2000):

5/44

Page 6: GIST 4302/5302: Spatial Analysis and Modeling - Point

Descriptive Statistics

Standard distance of a point pattern:

• average squared deviations of x and y coordinates from theirrespective mean:

dstd =

√∑n

i=1(xi − x)2 + ∑ni=1(yi − y)2

n− 2

• related to standard deviation of coordinates, a summary circle(centered at s with radius dstd ) of a point pattern

Standard deviational ellipse:

• Taking directional effects into account for anisotropy cases• Please refer to Levine and Associates, 2004 for calculations

6/44

Page 7: GIST 4302/5302: Spatial Analysis and Modeling - Point

Descriptive Statistics

Examples:

605000 610000 615000 620000

4825

000

4830

000

4835

000

4840

000

Easting (m)

Nor

thin

g (m

)

SDD

600000 605000 610000 615000 62000048

2000

048

2500

048

3000

048

3500

048

4000

0

Easting (m)

Nor

thin

g (m

)

SDE

Remarks:• indicates overall shape and center of point pattern• do not suffice to fully specify a spatial point pattern

7/44

Page 8: GIST 4302/5302: Spatial Analysis and Modeling - Point

Point Pattern Analysis Methods

1st order (i.e., intensity): absolute location of events on map:

• Quadrat methods• Density Estimation (KDE)• Moran’s I and Geary’s C

2nd order (i.e., interactions): interaction of events:

• Nearest neighbor distance• Distance functions G, K, F, L• Getis-Ord Gi* and Anselin local Moran’s I

8/44

Page 9: GIST 4302/5302: Spatial Analysis and Modeling - Point

Quadrat methods

Consider a point pattern with n events within a study region A of area |A|

Global intensity:

λ =n

|A| =#of events withinA

|A|

Local intensity via quadrats

1. partition A into L sub-regions Al , l = 1, . . . , L of equal area |Al |(also called quadrats)

2. count number of events n(Al ) in each sub-region Al

3. convert sample counts into estimated intensity rates as:

λ(Al ) =n(Al )

|Al |

9/44

Page 10: GIST 4302/5302: Spatial Analysis and Modeling - Point

Quadrat methods bei

337 608 162 73 105 268

422 49 17 52 128 146

231 134 92 406 310 64

• estimated rates λ(Al ) over set of quadrats• reveal large-scale patterns in intensity variation over A• larger quadrats yield smoother intensity maps; smaller quadrats yield‘spiky’ intensity maps

• size, origin, and shape of quadrats is critical (recall: MAUP)• only first-order effects are captured

10/44

Page 11: GIST 4302/5302: Spatial Analysis and Modeling - Point

Dependence of intensity on a covariate (Inhomogeneous cases) bei slope

0.05

0.1

0.15

0.2

0.25

0.3

reclass of slope

12

34

quadrat based on reclass−ed slope

12

34

271

9841028

1321

0.00 0.05 0.10 0.15 0.20 0.25 0.30

intensity vs. slope

ρ(Z)ρhi(Z)ρlo(Z)

11/44

Page 12: GIST 4302/5302: Spatial Analysis and Modeling - Point

Kernel Density EstimationProcedure of Kernel Density Estimation (KDE)

1. define a kernel K (s; r) of radius (or bandwidth) r centered at anyarbitrary location s

2. estimate local intensity at s as:

λ(s) =1n

n

∑i=1

K (si − s; r)

3. repeat estimation for all points s in the study region to create adensity map

12/44

Page 13: GIST 4302/5302: Spatial Analysis and Modeling - Point

Kernel Density Estimation

An illustration of the KDE procedure in 1D

13/44

Page 14: GIST 4302/5302: Spatial Analysis and Modeling - Point

Kernel Density Estimation

Example for the previous dataset: den

0.00

50.

015

x

y

den

den

14/44

Page 15: GIST 4302/5302: Spatial Analysis and Modeling - Point

Kernel Density EstimationExample with 2km bandwidth

15/44

Page 16: GIST 4302/5302: Spatial Analysis and Modeling - Point

Kernel Density EstimationExample with 10km bandwidth

16/44

Page 17: GIST 4302/5302: Spatial Analysis and Modeling - Point

Kernel Density EstimationExample with 40km bandwidth

17/44

Page 18: GIST 4302/5302: Spatial Analysis and Modeling - Point

Kernel Density Estimation

Comments• Choice of kernel function is not critical (Diggle, 1985)• Choice of bandwidth, or degree of smoothing critical:

• Small bandwidth → spiky results• Large bandwidth → loss of detail

• Multi-scale analyses can use these bandwidth characteristics toinvestigate both broad trends and localized variation

• How to choose bandwidth: choose the degree of smoothingsubjectively, by eye, or by formula (Diggle)

• could define local bandwidth based on function of presence of eventsin neighborhood of s (i.e., adaptive kernel estimation)

What does the output of KDE means?

18/44

Page 19: GIST 4302/5302: Spatial Analysis and Modeling - Point

Distance-based Descriptors of Point Patterns• Distances: accessing second order effects

• Event-to-event distance: distance dij between event at arbitrarylocation si and another event at another arbitrary location sj :

dij =√(xi − xj )2 + (yi − yj )2

• Point-to-event distance: distance dpj between a randomly chosenpoint at location sp and an event at location sj :

dpj =√(xp − xj )2 + (yi − yj )2

• Event-to-nearest-neighbour distance: distance dmin(si ) between anevent at location si and its nearest neighbor event:

dmin(si ) = min{dij , j 6= i , j = 1, . . . , n}

• Point-to-nearest-neighbour distance (i.e., empty space distance):distance dmin(sp) between a randomly chosen point at location spand its nearest neighbor event:

dmin(sp) = min{dpj , j = 1, . . . , n}

19/44

Page 20: GIST 4302/5302: Spatial Analysis and Modeling - Point

Event-to-Nearest-Neighbor Distances

606000 610000 614000 61800048

2800

048

3200

048

3600

048

4000

0

X

Y

1

2

3

4

5

1 2 3 4 51 0.00 11947.70 16042.65 3481.22 10742.982 11947.70 0.00 5126.79 15219.58 1599.073 16042.65 5126.79 0.00 19481.59 6720.594 3481.22 15219.58 19481.59 0.00 13913.705 10742.98 1599.07 6720.59 13913.70 0.00

Table: Euclidean distance matrix

20/44

Page 21: GIST 4302/5302: Spatial Analysis and Modeling - Point

Event-to-Nearest-Neighbor DistancesNearest neighour distances

103 clustering points 103 clustering points

cluster

nndist(cluster)

Fre

quen

cy

0.00 0.05 0.10 0.15 0.20

05

1015

2025

3035

uniform

nndist(unif)

Fre

quen

cy

0.00 0.02 0.04 0.06 0.08 0.10 0.120

24

68

10

• Mean nearest neighbour distance: Average of all dmin(si ) values

dmin =1n

n

∑i=1

dmin(si )

21/44

Page 22: GIST 4302/5302: Spatial Analysis and Modeling - Point

The G function• Definition: nearest neighbour distance function, i.e., proportion ofevent-to-nearest-neighbor distances dmin(si ) no greater than givendistance cutoff d , estimated as:

G (d) =#{dmin(si ) < d , i = 1, . . . , n}

n

• alternative definition: cumulative distribution function (CDF) of alln event-to-nearest-neighbor distances; instead of computing averagedmin of dmin values, compute their CDF

• the G function provides information on event proximity• example for previous clustering point pattern:

Histogram of dm[which(dm > 0)]

dm[which(dm > 0)]

Fre

quen

cy

0.00 0.05 0.10 0.15 0.20

05

1015

2025

3035

0.00 0.05 0.10 0.15 0.20

0.0

0.2

0.4

0.6

0.8

1.0

dm[which(dm > 0)]

Pro

port

ion

<=

x

n:103 m:0

22/44

Page 23: GIST 4302/5302: Spatial Analysis and Modeling - Point

Examples of G function 103 clustering points 103 clustering points

0.00 0.05 0.10 0.15

0.0

0.2

0.4

0.6

0.8

1.0

ghatcluster

r

Gra

w(r

)

0.00 0.05 0.10 0.15

0.0

0.2

0.4

0.6

0.8

1.0

ghatunif

r

Gra

w(r

)

Expected plot:

• for clustered events, G (d) rises sharply at short distances, and thenlevels off at larger d-values

• for randomly-spaced events, G (d) rises gradually up to the distanceat which most events are spaced, and then increases sharply

23/44

Page 24: GIST 4302/5302: Spatial Analysis and Modeling - Point

The K function

Working with pair-wise distances&looking beyond nearest neighours

Concept

1. construct set of concentric circles (of increasing radius d) aroundeach event

2. count number of events in each distance “band”3. cumulative number of events up to radius d around all events

becomes the sample K function K (d)

24/44

Page 25: GIST 4302/5302: Spatial Analysis and Modeling - Point

The K function

Working with pair-wise distances&looking beyond nearest neighours

• Formal definition:

K (d) =1λ

#{dij ≤ d , i , j = 1, . . . , n}n

=|A|n

#{dij ≤ d , i , j = 1, . . . , n}n

= |A|(proportion of event-to-event distance ≤ d)

• In other words, the K (d) is the sample cumulative distributionfunction (CDF) of all n2 event-to-event distances, scaled by |A|

25/44

Page 26: GIST 4302/5302: Spatial Analysis and Modeling - Point

Examples of Event-to-Event Distance Histogram and CDFs 103 clustering points 103 uniform points

cluster histogram

pairdist(cluster)

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0 1.2

020

040

060

080

0

uniform histogram

pairdist(unif)F

requ

ency

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

020

040

060

0

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0.0

0.2

0.4

0.6

0.8

1.0

cluster CDF

pairdist(cluster)

Pro

port

ion

<=

x

n:10609 m:0

0.0 0.2 0.4 0.6 0.8 1.0 1.2

0.0

0.2

0.4

0.6

0.8

1.0

uniform CDF

pairdist(unif)

Pro

port

ion

<=

x

n:10609 m:0

• for clustered events, there are multiple bumps in the CDF of E2Edistances due to the grouping of events in space

26/44

Page 27: GIST 4302/5302: Spatial Analysis and Modeling - Point

Examples of K functions 103 clustering points 103 uniform points

0.00 0.05 0.10 0.15 0.20 0.25

0.00

0.05

0.10

0.15

0.20

Khatcluster

r

Ku

n(r

)

0.00 0.05 0.10 0.15 0.20 0.25

0.00

0.05

0.10

0.15

Khatunif

r

Ku

n(r

)

• the sample K function K (d) is monotonically increasing and is ascaled (by area |A|) version of the CDF of E2E distances

27/44

Page 28: GIST 4302/5302: Spatial Analysis and Modeling - Point

Recap

Spatial point patterns

• set of n point locations with recorded “events”

Describing the first-order effect

• overal intensity• local intensity (quadrat count and kernel density estimation)

Describing the second-order effect• nearest neighbour distances

• the G function• pair-wise distances

• the K function

28/44

Page 29: GIST 4302/5302: Spatial Analysis and Modeling - Point

CaveatsCaveats:

• theoretical G, F, K functions are defined and estimated under theassumption that the point process is stationary (homogeneous)

• these summary functions do not completely characterise the process• if the process is not stationary, deviations between the empirical andtheoretical functions (e.g. K and K) are not necessarily evidence ofinterpoint interaction, since they may also be attributable tovariations in intensity

Example

caveat #2

0.00 0.05 0.10 0.15 0.20 0.25

0.00

0.05

0.10

0.15

0.20

caveat #2

r

K(r

)

Kiso(r)Ktrans(r)Kbord(r)Kpois(r)

caveat #3

●●

●●

●●

●●

●●

●●

●●

0.00 0.05 0.10 0.15 0.20 0.25

0.00

0.05

0.10

0.15

0.20

0.25

0.30

caveat #3

r

K(r

)

Kiso(r)Ktrans(r)Kbord(r)Kpois(r)

29/44

Page 30: GIST 4302/5302: Spatial Analysis and Modeling - Point

Descriptive vs Statistical Point Pattern Analysis

Descriptive analysis:

• set of quantitative (and graphical) tools for characterizing spatialpoint patterns

• different tools are appropriate for investigating first- or second-ordereffects (e.g., kernel density estimation versus sample G function)

• can shed light onto whether points are clustered or evenlydistributed in space

Limitation:• no assessment of how clustered or how evenly-spaced is an observedpoint pattern

• no yardstick against which to compare observed values (or graph) ofresults

30/44

Page 31: GIST 4302/5302: Spatial Analysis and Modeling - Point

Descriptive vs Statistical Point Pattern Analysis

Statistical analysis:

• assessment of whether an observed point pattern can be regarded asone (out of many) realizations from a particular spatial process

• measures of confidence with which the above assessment can bemade (how likely is that the observed pattern is a realization of aparticular spatial process)

Are daisies randomly distributed in your garden?

31/44

Page 32: GIST 4302/5302: Spatial Analysis and Modeling - Point

Complete Spatial Randomness (CSR)Complete Spatial Randomness (CSR)

• yardstick, reference model that observed point patterns could becompared with, i.e., null hypothesis

• = homogeneous (uniform) Poisson point process• basic properties:

• the number of points falling in any region A has a Poissondistribution with mean λ|A|

• given that there are n points inside region A, the locations of thesepoints are i.i.d. and uniformly distributed inside A

• the contents of two disjoint regions A and B are independent

Example: csr example #1 csr example #2

32/44

Page 33: GIST 4302/5302: Spatial Analysis and Modeling - Point

Quadrat counting test for CSR

Quadrat counting test

• partition study area A into L sub-regions (quadrats), A1, . . . ,AL

• count number of events n(Al ) in each sub-region Al

• Under the null hypothesis of CSR, the n(Al ) are i.i.d. Poissonrandom variables with the same expected value

• The Pearson χ2 goodness-of-fit test can be used• test statistics: Pearson residual ∑l ε(Al )

2

ε(Al ) =n(Al )− µ(Al )√

µ(Al ),

where µ(Al ) indicates the expected number of events in Al• ∑l=1 ε(Al )

2 is assumed to follow χ2 distribution

33/44

Page 34: GIST 4302/5302: Spatial Analysis and Modeling - Point

Quadrat counting test for CSR

Example

unif

18 18 11

19 11 26

17.2 17.2 17.2

17.2 17.2 17.2

0.2 0.2 −1.5

0.44 −1.5 2.1

• three values indicate the number of observations, CSR-expectednumber of observations, and the Pearson residuals

• p-value = 0.617

34/44

Page 35: GIST 4302/5302: Spatial Analysis and Modeling - Point

Nearest Neighbour Index (NNI) test under CSR

Nearest neighbour index

• Compares the mean of the distance observed between each pointand its nearest neighbor (dmin) and the expected mean distanceunder CSR E (dmin)

NNI =dmin

E (dmin)

• Under CSR, we have:

E (dmin) =1

2√

λ

σ(dmin) =0.26136√n2/A

35/44

Page 36: GIST 4302/5302: Spatial Analysis and Modeling - Point

Nearest Neighbour Index (NNI) test under CSR

Nearest neighbour index test

• Test statistics:

z =dmin − E (dmin)

σ(dmin),

• z is assumed to follow Gaussian distribution, thus, if z < −1, 96 orz > 1.96, we are 95% confident that the distribution is notrandomly distributed

36/44

Page 37: GIST 4302/5302: Spatial Analysis and Modeling - Point

The G Function under CSR

• The G function is a function of nearest-neighbour distances• For a homogeneous Poisson point process of intensity λ, thenearest-neighour distance distribution (the G function) is known tobe:

G (d) = 1− exp{−λπd2}

Example bei

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

Gtest

r (metres)

G(r

)

Graw(r)Gpois(r)

37/44

Page 38: GIST 4302/5302: Spatial Analysis and Modeling - Point

The F Function under CSR• The F function is a function of empty space distances• For a homogeneous Poisson point process of intensity λ, the emptyspace distance distribution (the F function)is known to be:

F (d) = 1− exp{−λπd2}

• Equivalent to the G function• Intuitively, because points (events) of the Poisson process areindependent of each other, the knowledge that a random point is aevent of a point pattern does not affect any other event of theprocess

Example bei

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

Ftest

r (metres)

F(r

)

Fraw(r)Fpois(r)

38/44

Page 39: GIST 4302/5302: Spatial Analysis and Modeling - Point

The K Function under CSR• The K function is a function of pair-wise distances• For a homogeneous Poisson point process of intensity λ, thepair-wise distance distribution (the K function) is known to be:

K (d) = πd2

• A commonly-used transformation of K is the L-function:

L(d) =

√K (d)

π= d

Example bei

0 20 40 60 80 100 120

010

000

2000

030

000

4000

050

000

Ktest

r (metres)

K(r

)

Kun(r)Kpois(r)

0 20 40 60 80 100 120

020

4060

8010

012

0

Ltest

r (metres)

L(r

)

Lun(r)Lpois(r)

39/44

Page 40: GIST 4302/5302: Spatial Analysis and Modeling - Point

Monte Carlo test

• because of random variability, we will never obtain perfectagreement between sample functions (say the K function) withtheoretical functions (the theoretical K functions), even with acompletely random pattern

Example

0.00 0.05 0.10 0.15 0.20 0.25

0.00

0.05

0.10

0.15

Kest(rpoispp(50), correction = "none")

r

Ku

n(r

)

40/44

Page 41: GIST 4302/5302: Spatial Analysis and Modeling - Point

Monte Carlo test

• A Monte Carlo test is a test based on simulations from the nullhypothesis

• Basic procedures:• generate M independent simulations of CSR inside the study region A• compute the estimated K functions for each of these realisations, say

K (j)(r) for j = 1, . . . ,M• obtain the pointwise upper and lower envelopes of these simulatedcurves

• not a confidence interval

Example

0.00 0.05 0.10 0.15 0.20 0.25

0.00

0.05

0.10

0.15

0.20

pointwise envelopes

r

K(r

)

Kobs(r)Ktheo(r)Khi(r)Klo(r)

41/44

Page 42: GIST 4302/5302: Spatial Analysis and Modeling - Point

RecapStatitsical analysis of spatial point patterns:

• allows to quantify departure of results obtained via exploratory tools,e.g., G (d), from expected such results derived under specific nullhypotheses, here CSR hypothesis

• can be used to assess to what extent observed point patterns can beregarded as realizations from a particular spatial process (here CSR)

• Same concepts can be applied for hypothesis of other types of pointprocesses (e.g., Poisson cluster process, Cox process)

Sampling distribution of a test statistics

• lies at the heart of any statistical hypothesis testing procedure, andis tied to a particular null hypothesis

• simulation and analytical derivations are two alternative ways ofcomputing such sampling distributions (the latter being increasinglyreplaced by the former)

Edge Effects42/44

Page 43: GIST 4302/5302: Spatial Analysis and Modeling - Point

RecapScale effects

• Wolf pack example

• Nearest neighour distance (NN distance, G,F functions) vs K function

Edge effects

43/44

Page 44: GIST 4302/5302: Spatial Analysis and Modeling - Point

Recap

Extended into line processes

• Line density

44/44