applied spatial statistics: spatial count...

21
Applied Spatial Statistics: Spatial count data Douglas Nychka, National Center for Atmospheric Research Supported by the National Science Foundation Boulder, Spring 2013

Upload: others

Post on 03-Dec-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Applied Spatial Statistics:Spatial count data

Douglas Nychka,National Center for Atmospheric Research

Supported by the National Science Foundation Boulder, Spring 2013

Page 2: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Outline

D. Nychka Spatial Stats Lecture 10 2

• Poisson distribution

• A hierarchical model

• Gaussian approximations and pseudo data

• Rongelap island

Combine a simple model for counts with the dependence on a spatial

field that controls the parameters

Page 3: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Estimating a curve or surface.

D. Nychka Spatial Stats Lecture 10 3

The additive statistical model:

Given n pairs of observations (xi, yi), i = 1, . . . , n Distribution of yidepends on f(xi)

[yi|f(xi)]

f is an unknown smooth function.

f(x) is a Gaussian process but yi may not be normally distributed.

Page 4: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Some examples

D. Nychka Spatial Stats Lecture 10 4

Rongelap Island157 γ detector counts measuring residual radiation from nuclear tests

2

4

6

8

10

12

14

Page 5: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

D. Nychka Spatial Stats Lecture 10 5

Tornado AlleyReported starting locations for tornados in 2012.

Event locations bin counts for grid

●●●●●●

●●●

●●●

●●●● ●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●●●●●

●●

●●●

●●●

●●

●●

● ● ●●●

●●●●

●●

●●

●●●

●●

●●●●●●●

●●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●●●

●●

●●

●●

●●●

●●●● ●

●●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●●●●●●

● ●●

●●

●●●

●●●● ●

●●●

●●●●

●●●●●●

●●

●●

●●

●●●●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●●

● ●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●●●

●●●

●●● ●

●●●●●●●● ● ●

●●

●●●

●●

●●

●●●●●●

●●

●●●

●●

●●

●●

●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●●

●●●●●●

●●

●●

●●●●●

●●

●●

●●●●●●●

●●●

●●

●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●●●●●●●●

●●

●●●●

● ●

●●●

●●

●●●

●●●

●●●

●● ●●

●●●●

●● ●●●

●●

●●

●●●●

1

2

3

4

5

6

7

Severity: 0, 1, 2, 3, 4

Page 6: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Poission distribution

D. Nychka Spatial Stats Lecture 10 6

Distribution of counts for rare events, has parameter α

P (Y = k) =αke−α

k!

• E(Y ) = α and V AR(α) = α

• log Likelihood for a random sample y1, y2, ..., yn

n∑i=1

[log(α)yi − α− log(yi!)]

MLE: α̂ = y

Spatial model:

• yi is Poission with parameter α(xi)

• α(x) follows the usual Gaussian spatial model.

Page 7: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Rongelap data ignoring spatial aspect

D. Nychka Spatial Stats Lecture 10 7

157 counts of differing time duration: (Yi, ti)

Actual data model is Yi ∼ Poisson(αti) assuming no spatial variation.

log likelihood:

n∑i=1

[log(αti)yi − αti − log(yi!)]

Taking derivative and setting equal to zero

n∑i=1

[yi/α− ti] = 0

solving for MLE: α̂ =∑ni=1 yi∑ni=1 ti

Page 8: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

First a review of Normal Kriging

D. Nychka Spatial Stats Lecture 10 8

Observations

[yi|f(xi)] = Normal(f(xi), σ2wi)

wi known set of weights.

Process

[f(x)|ρ,d, θ]

f(x) a Gaussian process

f(x) =∑j

φ(x)di + g(x)

• mean∑j φ(x)di (usual fixed part)

• covariance for g is ρkθ(., .) (usual random part)

Page 9: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

covariance parameters

D. Nychka Spatial Stats Lecture 10 9

In the normal case we know how to find estimates using maximum

likelihood

Distribution of observations:

y ∼MN(Xd, ρK + σ2I)

or

y ∼MN(Xd, ρK + σ2W )

if measurements have different weights.

• Likelihood has a closed form

Page 10: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Poisson data

D. Nychka Spatial Stats Lecture 10 10

Observations:

[yi|f(xi)] = Poisson(f(xi))

implies that

E([yi|f(xi)]) = f(xi) V AR([yi|f(xi)]) = f(xi)

just the Possion distribution.

Process:f(x) is the same as in Gaussian case

Page 11: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

D. Nychka Spatial Stats Lecture 10 11

Main idea:Approximate the distribution of y with a normal assuming the specific

form for the mean and variance.

e.g. If we knew f∗ then use it in the error part assume

y = f(xi) + ei

where V AR(ei) = σ2f∗i

Practical strategy:We don’t really know f∗ so use an iterative method where previous

estimate of f is used for variances and f is reestimated.

Note: ”weights” in fields are the reciprocal of variance. If variance is fithen specify weight 1/fi

Page 12: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Rongelap data set

D. Nychka Spatial Stats Lecture 10 12

Exponential covariance, θ = 1500 and λ = .1

xR<- rongelap$coords

yR<- rongelap$data/rongelap$units.m

wtR<- rongelap$units.m

#

fit.rongelap<- function(theta, lambda, tolerance=1e-6){

fhat.old<- rep(1, length(yR))

for( k in 1:50){

obj<- mKrig( xR, yR, theta= theta, lambda=lambda,

weights= wtR/fhat.old , m=1)

fhat.new<- obj$fitted.values

test.value<- mean( abs(fhat.old- fhat.new) )/ mean( abs( fhat.old))

if( test.value < tolerance){

break}

fhat.old<- c(fhat.new)

}

return(obj)

}

Page 13: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Take a look

D. Nychka Spatial Stats Lecture 10 13

R.fit<- fit.rongelap(400,100)

out.p<-predict.surface(R.fit, nx=200, ny=200, extrap=TRUE)

island<- in.poly.grid( out.p, rongelap$border)

out.p$z[!island]<- NA

image.plot( out.p)

lines( rongelap$border)

−6000 −5000 −4000 −3000 −2000 −1000

−30

00−

2000

−10

000

2

4

6

8

10

12

Page 14: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Searching over ρ and θ

D. Nychka Spatial Stats Lecture 10 14

par.list<- list(

llambda=seq(1,6,,10),

theta= exp(seq( log(40),log(400),,15)))

par.grid<- make.surface.grid( par.list)

NG<- nrow( par.grid)

lnLike<- rep( NA, NG)

for( k in 1:NG){

cat(k," " )

lnLike[k]<- fit.rongelap(theta=par.grid[k,2],

lambda=exp(par.grid[k,1]) )$lnProfileLike

}

image.plot( as.surface( par.grid, lnLike))

Page 15: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Approximate lnProfileLike surface

D. Nychka Spatial Stats Lecture 10 15

• Outer contour at 95% level

1 2 3 4 5 6

5010

015

020

025

030

035

040

0

log lambda

thet

a

−380

−378

−376

−374

−372

−370

−6000 −5000 −4000 −3000 −2000 −1000

−30

00−

2000

−10

000

5

6

7

8

9

Page 16: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Some problems

D. Nychka Spatial Stats Lecture 10 16

Does it really make sense to have a positive mean ( e.g. average counts)

follow a Gaussian distribution?

Better model is to allow a link to enforce positivity of the mean

E.g.

E([yi|f i)]) = µ(f i) V AR([yi|f i]) = µ(f i)

where for example µ(f) = exp f .

Page 17: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Quasi Likelihood solution

D. Nychka Spatial Stats Lecture 10 17

In general

E([yi|f i)]) = µ(f i) V AR([yi|f i]) = γ(f i)

g(f) = µ the inverse relationship

• Consider the pseudo data

yPSi = f̂ i + g′(f̂ i)(yi − µ(f̂ i)

• f̂ is a previous estimate or a ”pilot” estimate for f .

• Analyze the pseudo data as if it from the spatial model

yPSi = f(xi) + ei

where V AR(ei) = g′(f̂ i)2γ(f̂ i)

• Note: the pilot estimate is kept fixed in this approximate model.

• Iterate to update the pilot until it does not change.

Page 18: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

For Possion problem

D. Nychka Spatial Stats Lecture 10 18

µ(f) = ef f = g(µ) = log(µ) g′(µ) = 1/µ

γ(f) = f g′(µ)2γ(µ) = 1/µ

Page 19: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

R code for algorithm

D. Nychka Spatial Stats Lecture 10 19

fit.rongelap.exp<- function( theta, lambda){

mu.old<- rep( mean(yR), length(yR))

fhat.old<- log( mu.old)

for( k in 1:50){

yPS<- fhat.old + (1/mu.old)*( yR- mu.old)

obj <- mKrig( xR, yPS, weights=wtR*mu.old,

lambda=lambda, theta=theta, m=1)

fhat.new <- obj$fitted.values

mu.new<- exp( fhat.new)

#add convergence criterion

fhat.old<- c(fhat.new)

mu.old<- mu.new

}

return(obj)

}

Interpretation is that at convergence one has fit a Gaussian spatial model

– where the weights used are what one gets after fitting the model!

Page 20: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Approximate lnProfileLike surface

D. Nychka Spatial Stats Lecture 10 20

• log link function Outer contour at 95% level

1 2 3 4 5

1000

2000

3000

4000

5000

6000

7000

log lambda

thet

a

−62

−60

−58

−56

−54

−52

−6000 −5000 −4000 −3000 −2000 −1000

−30

00−

2000

−10

000

6.5

7.0

7.5

8.0

8.5

Page 21: Applied Spatial Statistics: Spatial count datacivil.colorado.edu/~balajir/CVEN6833/spatial-resources-DN/SpStats... · Applied Spatial Statistics: Spatial count data Douglas Nychka,

Summary

D. Nychka Spatial Stats Lecture 10 21

• Nongaussian data can be analyzed by relating to a weighted Gaussian

model.

• The concept of pseudo data is used to suggest an iterative algorithm

to find an estimate.

• Not clear exactly what statistical problem we have solved or what we

have approximated.