statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/r/st.pdf · statistics...

33
Statistics for spatio-temporal data: an introduction Edzer Pebesma ifgi Institute for Geoinformatics University of Münster Geostat Summer School, Bergen, 15-21 Jun 2014 1 / 30

Upload: others

Post on 14-Jun-2020

16 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Statistics for spatio-temporal data:an introduction

Edzer Pebesma

1. Das neue IfGI-Logo 1.6 Logovarianten

Logo für den Einsatz in internationalen bzw.

englischsprachigen Präsentationen.

Einsatzbereiche: Briefbogen, Visitenkarte,

Titelblätter etc.

Mindestgröße 45 mm Breite

ifgi

ifgi

Institute for GeoinformaticsUniversity of Münster

ifgi

Institut für GeoinformatikUniversität Münster

Logo für den Einsatz in nationalen bzw.

deutschsprachigen Präsentationen.

Einsatzbereiche: Briefbogen, Visitenkarte,

Titelblätter etc.

Mindestgröße 45 mm Breite

Dieses Logo kann bei Anwendungen

eingesetzt werden, wo das Logo besonders

klein erscheint.

Einsatzbereiche: Sponsorenlogo,

Power-Point

Größe bis 40 mm Breite

Geostat Summer School, Bergen, 15-21 Jun 2014

1 / 30

Page 2: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6
Page 3: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6
Page 4: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6
Page 5: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6
Page 6: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6
Page 7: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6
Page 8: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

All data are spatio-temporal

1. There are no pure-spatial data. Maps reflect either

I a snapshot in time (remote sensing image)

I an aggregate over a time period (e.g., interpolated yearlyaverage temperature, or yearly aggregated daily interpolations)

I something that is constant over a period of time (politicalboundary)

I a seemingly non-changing phenomenon (geology)

2. There are no pure-temporal data. Time series reflect either

I spatially aggregated values (global temperature curves)

I a single spatial location (air quality sensor DEUB032, at8.191934E,50.93033N)

I vaguely located, or universal aggregates (world market prices,stock quotes)

8 / 30

Page 9: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Functions

We can write function y = f (x ) as:

f : X → Y

which means that for any X , we have a corresponding Y .

X ×Y

is the Carthesian product, the collection of all ordered pairs (x , y)(Wikipedia): “A function f from X to Y is a subset of theCartesian product X ×Y subject to the following condition: everyelement of X is the first component of one and only one orderedpair in the subset. In other words, for every x in X there is exactlyone element y such that the ordered pair (x , y) is contained in thesubset defining the function f .”X is called the domain, Y the codomain or range

9 / 30

Page 10: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Functions

We can write function y = f (x ) as:

f : X → Y

which means that for any X , we have a corresponding Y .

X ×Y

is the Carthesian product, the collection of all ordered pairs (x , y)(Wikipedia): “A function f from X to Y is a subset of theCartesian product X ×Y subject to the following condition: everyelement of X is the first component of one and only one orderedpair in the subset. In other words, for every x in X there is exactlyone element y such that the ordered pair (x , y) is contained in thesubset defining the function f .”X is called the domain, Y the codomain or range

9 / 30

Page 11: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Functions

We can write function y = f (x ) as:

f : X → Y

which means that for any X , we have a corresponding Y .

X ×Y

is the Carthesian product, the collection of all ordered pairs (x , y)(Wikipedia): “A function f from X to Y is a subset of theCartesian product X ×Y subject to the following condition: everyelement of X is the first component of one and only one orderedpair in the subset. In other words, for every x in X there is exactlyone element y such that the ordered pair (x , y) is contained in thesubset defining the function f .”X is called the domain, Y the codomain or range

9 / 30

Page 12: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Functions

We can write function y = f (x ) as:

f : X → Y

which means that for any X , we have a corresponding Y .

X ×Y

is the Carthesian product, the collection of all ordered pairs (x , y)(Wikipedia): “A function f from X to Y is a subset of theCartesian product X ×Y subject to the following condition: everyelement of X is the first component of one and only one orderedpair in the subset. In other words, for every x in X there is exactlyone element y such that the ordered pair (x , y) is contained in thesubset defining the function f .”X is called the domain, Y the codomain or range

9 / 30

Page 13: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Inverse functions

for a set of values B in the range,

f −1(B) = x ∈ X : f (x ) ∈ B

for a single value b in the range,

f −1(b) = x ∈ X : f (x ) = b

the resulting set may contain any number of elements.Example: f : X → X 2, the range (Y ) value 4 has correspondingdomain values {−2, 2}.

10 / 30

Page 14: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Reference systems

Reference systems are conventions that encode the sharedunderstanding of information. Examples are

I spatial (coordinate) reference systems (where is (52,8)?)

I temporal reference systems (what does

> Sys.time()

[1] "2014-06-18 08:57:55 CEST"

mean?

I attribute reference systems (e.g., UCUM, Unified Code forUnits of Measure)

I semantic reference systems (vocabularies, ontologies, Rfunction index)

11 / 30

Page 15: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Space, Time, Attribute, Identity

We will look at the following four reference system domains:S space 1,2,3-dimensional, e.g. 2D degrees in

WGS84, R2 or R3, continuousT time 1-dimensional or cyclic, R, sometimes 2-

dimensional, continuousQ quality 1-dimensional (UCUM), higher-dimensional:

functional, multivariate, also possibly nomi-nal, ordinal, interval (Stevens’ 1946)

D discrete indicating distinct entities (objects, events);N, IDs, primary key in RDBMS, row numberin data.frame

12 / 30

Page 16: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Fields

functional form:(S × T )→ Q

I Answers: “what is then and there?”

I Inverting answers: “when/where was that?”

I Specialisations: S → Q , T → Q

I Incarnations: points (sampled field: meuse), contour lines,coverage

13 / 30

Page 17: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Field examples: grid, points

log(zinc, ppm), interpolated

5.0

5.5

6.0

6.5

7.0

7.5

zinc (ppm)

●● ●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●

●●

●●● ●

●●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●●

● ●●●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●●

●●

●●●

[113,458.2](458.2,803.4](803.4,1149](1149,1494](1494,1839]

Page 18: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Field examples: lines, polygons

x

y

5.05.0

5.05.0

5.05.0

5.0

5.0

5.55.5

5.5

5.55.5 5.5

5.55.5

5.55.5

5.55.5

5.5 5.5

5.5

6.0

6.0

6.0

6.0

6.0

6.5

6.5

6.56.5

6.56.5

6.5

6.5

6.5

6.5

6.5

7.07.0

7.0

7.0

[4,4.5]

(4.5,5]

(5,5.5]

(5.5,6]

(6,6.5]

(6.5,7]

(7,7.5]

(7.5,8]

Page 19: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Field: categorical coverage

16 / 30

Page 20: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Non-Field: choropleth, aggregation

17 / 30

Page 21: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Non-moving Entities (objects, events)

functional form:D → (S × T ×Q)

(for objects without properties, take Q ≡ 1)I Specialisations:

I D → (S ×Q): spatial point pattern,I D → (T ×Q): temporal point pattern

18 / 30

Page 22: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Moving entities (objects, events)

functional form:D → T → (S ×Q)

(for objects without properties, take Q ≡ 1)

I generalization of D → (S × T ×Q)

I specialisations: D → T → Q , D → S → Q

19 / 30

Page 23: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Support and aggregation

1. we cannot make observations of zero duration, or zero spatialsize; the actual size and duration are the measurementsupport (footprint). Think: soil samples, RS cells.

2. often, we want to estimate or compute aggregated values, e.g.over periods over areas.

3. even more often, the data we get were aggregated, forconvenience (size), or privacy concerns (health data).

20 / 30

Page 24: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6
Page 25: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Particulate matter time series, averaged over stationtype

22 / 30

Page 26: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

More complications ...

I “intermediate” phenomena: air quality in street canions(“traffic”)

I true “hybrid”, 1: time events, spatial fieldsI D → ((S → Q)× T )I example: election maps

I true “hybrid”, 2: spatial events, time fieldsI D → (S × (T → Q))I example: emission from power plants

23 / 30

Page 27: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

How to represent, and then store fields?

1. as functions! Interpolation functions return values at arbitrarytimes, moments (gstat::idw in space, zoo::na.approx intime)

2. as evaluated (or observed) functions, atI discretized space, regular raster::raster or irregular

sp::SpatialPoints, orI time, regular: stats::ts, or irregular: zoo::zoo, xts::xts

3. natural would be to use an index that relates to space and/ortime, and records with arbitrarily typed fields → arrays

4. netcdf, HDF5;

5. R: array (and raster?) do not support fields of mixed type

6. R for time: zoo, xts do not support fields of mixed type

7. R for space: sp::SpatialGridDataFrame do

8. R for space/time: spacetime does too,

9. big data array processing engine: SciDB

24 / 30

Page 28: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

How to store objects/events?

Tables are one-dimensional arrays; The Spatial* objects in sp

“behave” like tables (data.frame).Subsetting like x[3,"zinc"] works for all, except forSpatialGridDataFrame.

25 / 30

Page 29: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

I will assume you understand this:

> a = data.frame(varA = c(1,1.5,2),

+ varB = c("a", "a", "b"))

> a[1,]

varA varB

1 1 a

> a[1, drop=FALSE]

varA

1 1.0

2 1.5

3 2.0

> a[,1]

[1] 1.0 1.5 2.0

> a[1]

varA

1 1.0

2 1.5

3 2.0

> a[[1]]

[1] 1.0 1.5 2.0

> a["varA"]

varA

1 1.0

2 1.5

3 2.0

> a[c("varA", "varB")]

varA varB

1 1.0 a

2 1.5 a

3 2.0 b

> a$varA

[1] 1.0 1.5 2.0

> a$varA <- 3:1

> a

varA varB

1 3 a

2 2 a

3 1 b

Page 30: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Functional programming

I do it: learn apply, lapply, do.call,

I program generically, e.g. aggregate

27 / 30

Page 31: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Time, Time Series Data

1. POSIXt, Date, yearmon, yearqtr

2. zoo, xts, ?aggregate

3. forecast, ...

4. see Task View

28 / 30

Page 32: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Space, Spatial Data

1. Spatial*, raster,

2. rgdal, rgeos

3. see Task View

4. selecting records, variables

5. selecting based on spatial match

6. sp::aggregate

7. vignette("over") (or see CRAN page)

8. edit(vignette("over")), run, modify, run

29 / 30

Page 33: Statistics for spatio-temporal data: an introductionpebesma.staff.ifgi.de/R/st.pdf · Statistics for spatio-temporal data: an introduction Edzer Pebesma 1. Das neue IfGI-Logo 1.6

Space-time, Spatiotemporal Data

1. spacetime, ST*, also raster,

2. back ends: PostGIS, TGRASS, SciDB

3. combines sp and xts

4. selection, aggregation

5. go through spacetime vignettes

6. see Task View

30 / 30