spatial interpolation, geostatistics and sampling

61
©2003 All lecture materials by Austin Troy except where noted Lecture 6: Spatial Interpolation, geostatistics and sampling By Austin Troy ------Using GIS-- Introduction to GIS

Upload: shonda-lindsey

Post on 06-Jan-2018

248 views

Category:

Documents


5 download

DESCRIPTION

©2003 All lecture materials by Austin Troy except where noted Introduction to GIS What is interpolation? Three types: Resampling of raster cell size Transforming a continuous surface from one data model to another (e.g. TIN to raster or raster to vector). Creating a surface based on a sample of values within the domain. Dense sampling networks Sparse sampling networks ©2003 All lecture materials by Austin Troy except where noted

TRANSCRIPT

Page 1: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Lecture 6:Spatial Interpolation, geostatistics

and sampling

By Austin Troy

------Using GIS--Introduction to GIS

Page 2: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

What is interpolation?• Three types:

1. Resampling of raster cell size

2. Transforming a continuous surface from one data model to another (e.g. TIN to raster or raster to vector).

3. Creating a surface based on a sample of values within the domain.

• Dense sampling networks

• Sparse sampling networks

Introduction to GIS

Page 3: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sample-based interpolation•Process of creating a surface based on values at isolated sample points.

•Sample points are locations where we collect data on some phenomenon and record the spatial coordinates

•We use mathematical estimation to “guess at” what the values are “in between” those points

•We can create either a raster or vector interpolated surface

•Interpolation is used because field data are expensive to collect, and can’t be collected everywhere

Introduction to GIS

Page 4: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

How does it Look

Introduction to GIS

•Let say we have our ground water pollution samples

This gives us

Page 5: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

How does it work

Introduction to GIS

•This can be displayed as a 3D trend surface in 3D analyst

Page 6: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

How does it work

Introduction to GIS

•We can also use interpolation methods to create contours

Also known as “Isolines”

Page 7: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Requirements of interpolation•Interpolation only works where values are spatially dependant, or spatially dependent, that is, where nearby location tend to have similar Z values.

•Examples of spatially dependent features: elevation, property value, crime levels, precipitation

•Non-dependent examples: number of drum sets per city block; cheeseburgers consumed per household.

•Where values across a landscape are geographically independent, interpolation does not work because value of (x,y) cannot be used to predict value of (x+1, y+1).

Introduction to GIS

Page 8: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Interpolation examples•Elevation:

•Elevation values tend to be highly spatially autocorrelated because elevation at location (x,y) is generally a function of the surrounding locations

•Except is areas where terrain is very abrupt and precipitous, such as Patagonia, or Yosemite

•In this case, elevation would not be autocorrelated at local (large) scale, but still may be autocorrelated at regional (small scale)

Introduction to GIS

Page 9: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Interpolation examples•Elevation:

Introduction to GIS

Source: LUBOS MITAS AND HELENA MITASOVA, University of Illinois

Page 10: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sample points• Also known as “control points.”

•These are points where you or someone else has collected data (attributes) for a spatial coordinate (point)

•Any number of attributes can be collected at that point

•E.g.1 weather stations collect data on temperature, rainfall, wind, humidity, etc.

•E.g. 2 soil invertebrate samples would record abundance of numerous species at each location

Introduction to GIS

Page 11: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling example•Imagine this elevation cross section: If each dashed line represented a sample point (in 1-D), this spacing would miss major local sources of variation, like the gorge

Introduction to GIS

Page 12: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling example•Our interpolated surface (represented in 1-D by the blue line) would look like this

Introduction to GIS

Page 13: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling example•If we increased the sampling rate, we would pick up that local variation

Introduction to GIS

Page 14: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling example•Here our interpolated surface is much closer to reality at the local level, but we pay for this in the form of higher data gathering cost

Introduction to GIS

Page 15: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Interpolation examples•Weather

•Weather tends to be modeled on a regional level (e.g. your local weather report) because, in most places, weather systems and trends happen over a very large area. Hence the need for sample point density is not so great

•In other places, local climate variability is very great, such as in the SF Bay Area where temperatures can vary 50 degrees within 10 miles due to ocean effects.

Introduction to GIS

Page 16: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Interpolation examples•Weather

•Weather is also extremely variable over time, so samples must be continually taken. This is why weather stations are usually permanent

Introduction to GIS

Source: LUBOS MITAS AND HELENA MITASOVA, University of Illinois

Example: precipitation varying over a season

Page 17: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Interpolation examples

Introduction to GIS

•Groundwater contamination:

•The needed density of points will depend on the geology and the type of terrain

•Areas where geology allows for free groundwater flows across large areas will have less local variation and need less dense points, while areas with geologic features that inhibit or redirect flow (e.g. karst topography) will need denser points

Page 18: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Where interpolation does not work

Introduction to GIS

•Cannot use interpolation where values are not spatially autocorrelated

•Say looking at household income—in an income-segregated city, you could take a small sample of households for income and probably interpolate

•However, in a highly income-integrated city, where a given block has rich and poor, this would not work

Page 19: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling Approaches

Introduction to GIS

•Often a regular gridded sampling strategy is appropriate and can eliminate sampling biases

•Sometimes, though, it can introduce biases if the grid pattern correlates in frequency with something in the landscape, such as trees in a plantation or irrigation lines

•Random sampling can avoid this but introduces other problems including difficulty in finding sample points and uneven distribution of points, leading to geographic gaps.

•This depends partially on the size of the support, or sampling unit

Page 20: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling Approaches

Introduction to GIS

•An intermediate approach is the stratified random sample

•Create geographic or non-geographic subpopulations, from each of which random sample is taken

•Proportional or equal probability SRS: enforce a certain sampling rate, πhj= nh/Nh for each stratum h and obs j.

•Simple SRS: enforce a certain sample size nh

•Disproportionate SRS: where πhj varies such that certain strata are oversampled and certain undersampled.

Page 21: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling Approaches

Introduction to GIS

•DSRS is advantageous when subpopulation variances are unequal, which is frequently the case when stratum sizes are considerably different. In DSRS we sample those strata with higher variance at higher rate. We may also use this when we have an underrepresented subpopulation that will have too few observations to model if sampled with SSRS.

•Proportional samples are self-weighting because the rates are the same for each stratum

•The other two have unequal sampling probabilities (unless a simple SRS has equal Nh) and may require weighting

Page 22: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling Approaches

Introduction to GIS

•When the stratifying unit is geographical (e.g. county, soil polygon, forest stand), this is called a cluster sample.

•In a one stage cluster sample (OSCS) a series of geographic units are sampled and all observations within are sampled: obviously this does not work for interpolation

•More relevant is a two stage cluster sample (TSCS) in which we take a sample of cluster units and then a subsample of the population of each cluster unit.

•In this type of sample, variance has two components, that between clusters and that between observations

Page 23: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling

Introduction to GIS

•The number of samples we want within each zone depends on the statistical certainty with which we want to generate our surface

•Do we want to be 95% certain that a given pixel is classified right, or 90% or 80%?

•Our desired confidence level will determine the number of samples we need per strata

•This is a tradeoff between cost and statistical certainty

•Think of other examples where you could stratify….

Page 24: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling

Introduction to GIS

•A common problem with sampling points for interpolation is what is not being sampled?

•Very frequently people leave out sample points that are hard to get to or hard to collect data at

•This creates sampling biases and regions whose interpolated values are essentially meaningless

•So spacing of sample points from interpolation should be based on some meaningful factor—if they are dense in a region in sparse in a region, it should be because the values are variable in the first area and homogeneous in the other

Page 25: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling

Introduction to GIS

•Example: let’s say want to make an average precipitation layer and we find that in our study zone precipitation is highly spatially variable within 10 miles of the ocean

•We’d a coastline layer to help us sample.

•We’d have high density of sampling points within 10 miles of the ocean a much lower density in the inland zones

Page 26: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling

Introduction to GIS

•Say we were looking at an inland area, far from any ocean, and we decided that precipitation varied with elevation. How would we set up our sampling design?

•In this case, flat areas would need fewer sample points, while areas of rough topography would need more

•In our sampling design we would set up zones, or strata, corresponding to different elevation zones and we would make sure that we get a certain minimum number of samples within each of those zones

•This ensures we get a representative sample across, in this case, elevation;

Page 27: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling

Introduction to GIS

•The number of zones we use will determine how representative our sample is; if zones are big and broad, we do not ensure that all elevation ranges are represented

Page 28: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Sampling and Scale dependency•Sampling strategy for interpolation depends on the scale at which you are working and the scale dependency of the phenomenon you are studying

•In many cases interpolation will work to pick up regional trends but lose the local variation in the process

•The density of sample points must be chosen to reflect the scale of the phenomenon you are measuring.

Introduction to GIS

Page 29: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Scale dependency•If you have a high density of sample points, you will capture local variation, which is appropriate for large-scale (small-area) studies

•If you have low density of sample points, you will lose sensitivity of local variation and capture only the regional variation; this is more appropriate for small-scale (large-area) studies

Introduction to GIS

Page 30: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

How does interpolation work

Introduction to GIS

•In ArcGIS, to interpolate:

•Create or add a point shapefile with some attribute that will be used as a Z value

•Click Spatial Analyst>>Interpolate to Raster and then choose the method

Page 31: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Three methods in Arc GIS

Introduction to GIS

•IDW

•SPLINE

•Kriging

Page 32: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Inverse Distance Weighting

Introduction to GIS

•IDW weights the value of each point by its distance to the cell being analyzed and averages the values.

•IDW assumes that unknown value is influenced more by nearby than far away points, but we can control how rapid that decay is. Influence diminishes with distance.

•IDW has no method of testing for the quality of predictions, so validity testing requires taking additional observations.

•IDW is sensitive to sampling, with circular patterns often around solitary data points

Page 33: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

•IDW: assumes value of an attribute z at any unsampled point is a distance-weighted average of sampled points lying within a defined neighborhood around that unsampled point. Essentially it is a weighted moving avg

Where λi are given by some weighting fn and

•Common form of weighting function is d-p yielding:

Inverse Distance Weighting

n

iii xzxz

10

^)()(

n

ii

1

1

Introduction to GIS

n

i

pij

n

i

piji

d

dxzxz

1

10

^)(

)(

Page 34: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

IDW-How it works

Introduction to GIS

•Z value at location ij is f of Z value at known point xy times the inverse distance raised to a power P.

•Z value field: numeric attribute to be interpolated

•Power: determines relationship of weighting and distance; where p= 0, no decrease in influence with distance; as p increases distant points becoming less influential in interpolating Z value at a given pixel

Page 35: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

IDW-How it works

Introduction to GIS

•There are two IDW method options Variable and fixed radius:

•1. Variable (or nearest neighbor): User defines how many neighbor points are going to be used to define value for each cell

•2. Fixed Radius: User defines a radius within which every point will be used to define the value for each cell

Page 36: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

IDW-How it works

Introduction to GIS

•Can also define “Barriers”: User chooses whether to limit certain points from being used in the calculation of a new value for a cell, even if the point is near. E.g. wouldn't use an elevation point on one side of a ridge to create an elevation value on the other side of the ridge. User chooses a line theme to represent the barrier

Page 37: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

IDW-How it works

Introduction to GIS

•What is the best P to use?

•It is the P where the Root Mean Squared Prediction Error (RMSPE) is lowest, as in the graph on right

•To determine this, we would need a test, or validation data set, showing Z values in x,y locations that are not included in prediction data and then look for discrepancies between actual and predicted values. We keep changing the P value until we get the minimum level of error. Without this, we just guess.

Page 38: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

IDW-How it works

Introduction to GIS

•This can be done in ArcGIS using the Geostatistical Wizard

•You can look for an optimal P by testing your sample point data against a validation data set

•This validation set can be another point layer or a raster layer

•Example: we have elevation data points and we generate a DTM. We then validate our newly created DTM against an existing DTM, or against another existing elevation points data set. The computer determine what the optimum P is to minimize our error

Page 39: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

IDW-How it works

Introduction to GIS

Page 40: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Optimizing P value

Introduction to GIS

Page 41: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Plot of model fits

Introduction to GIS

The blue line indicates degree of spatial autocorrelation (required for interpolation). The closer to the dashed (1:1) line, the more perfectly autocorrelated.

Where horizontal, indicates data independence Mean pred. Error near zero means unbiased

Page 42: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Plot of model errors

Introduction to GIS

Page 43: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Spline Method

Introduction to GIS

•Another option for interpolation method

•This fits a curve through the sample data assign values to other locations based on their location on the curve

•Thin plate splines create a surface that passes through sample points with the least possible change in slope at all points, that is with a minimum curvature surface.

•Uses piece-wise functions fitted to a small number of data points, but joins are continuous, hence can modify one part of curve without having to recompute whole

•Overall function is continuous with continuous first and second derivatives.

Page 44: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Spline Method

Introduction to GIS

•SPLINE has two types: regularized and tension

•Tension results in a rougher surface that more closely adheres to abrupt changes in sample points

•Regularized results in a smoother surface that smoothes out abruptly changing values somewhat

Page 45: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Spline Method

Introduction to GIS

•Weight: this controls the tautness of the curves. High weight value with the Regularized Type, will result in an increasingly smooth output surface. Under the Tension Type, increases in the Weight will cause the surface to become stiffer, eventually conforming closely to the input points.

•Number of points around a cell that will be used to fit a polynomial function to a curve

Page 46: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Pros and Cons of Spline Method

Introduction to GIS

•Splines retain smaller features, in contrast to IDW

•Produce clear overview of data

•Continuous, so easy to calculate derivates for topology

•Results are sensitive to locations of break points

•No estimate of errors, like with IDW

•Can often result in over-smooth surfaces

Page 47: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Kriging Method

Introduction to GIS

•Like IDW interpolation, Kriging forms weights from surrounding measured values to predict values at unmeasured locations. As with IDW interpolation, the closest measured values usually have the most influence. However, the kriging weights for the surrounding measured points are more sophisticated than those of IDW. IDW uses a simple algorithm based on distance, but kriging weights come from a semivariogram that was developed by looking at the spatial structure of the data. To create a continuous surface or map of the phenomenon, predictions are made for locations in the study area based on the semivariogram and the spatial arrangement of measured values that are nearby.

--from ESRI Help

Page 48: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Kriging Method

Introduction to GIS

•Kriging is a geostatistical method and a probabilistic method, unlike the others, which are deterministic. That is, there is a probability associated with each prediction. Kriging has both a deterministic and probabilistic component, respectively Z(s) = μ(s) + ε(s), where both are functions of distance

•Assumes spatial variation in variable is too irregular to be modeled by simple smooth function, better with stochastic surface

•Interpolation parameters (e.g. weights) are chosen to optimize fn

•Assumes that variable in space can be modeled as sum of three components: 1) structure/deterministic part, 2) random but spatially correlated part and 3) spatially uncorrelated random part

Page 49: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Kriging Method

Introduction to GIS

•Hence, foundation of Kriging is notion of spatial autocorrelation, or tendency of values of entities closer in space to be related.

•This is a violation of classical statistical models, since observations are assumed to be independent.

•Autocorrelation can be assessed using a semivariogram, which plots the difference in pair values (variance) against their distances.

•Where autocorrelation exists, the semivariance should increase until certain distance where SV= variance around mean, so flattens out. That value is called a “sill.” The sloped area, or “neighborhood” is where values are related to each other

Page 50: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Kriging Method

n

hxzxzh

n

iii

2

)}()({)( 1

2

Introduction to GIS

•Semivariogram(distance h) = 0.5 * average [ (value at location i– value at location j)2] OR

•Based on the scatter of points, the computer (Geostatistical analyst) fits a curve through those points

•The inverse is the covariance matrix whichshows correlation over space

Page 51: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Kriging Method

Introduction to GIS

•We can then use a scatter plot of predicted versus actual values to see the extent to which our model actually predicts the values

•If the blue line and the points lie along the 1:1 line this indicates that the kriging model predicts the data well

Page 52: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Kriging Method

Introduction to GIS

•The fitted variogram results in a series of matrices and vectors that are used in weighting and locally solving the kriging equation.

•Basically, at this point, it is similar to other interpolation methods in that we are taking a weighting moving average, but the weights (λ) are based on statistically derived autocorrelation measures.

• λs are chosen so that the estimate is unbiased and the estimated variance is less than for any other possible linear combo of the variables.

)( 0xz

Page 53: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Kriging Method

Introduction to GIS

•Produces four types of prediction maps:

•Prediction Map: Predicted values

•Probability Map: Probability that value over x

•Prediction Standard Error Map: fit of model

•Quantile maps: Probability that value over certain quantile

Page 54: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Kriging Method

Introduction to GIS

•Semivariograms measure the strength of statistical correlation as a function of distance; they quantify spatial autocorrelation

•Because Kriging is based on the semivariogram, it is probabilistic, while IDW and Spline are deterministic

•Kriging associates some probability with each prediction, hence it provides not just a surface, but some measure of the accuracy of that surface

•Kriging equations are determined by fitting line through points so as to minimize weighted sum of squares between points and line

•These equations are weighted based on spatial autocorrelation, which is determined from the semivariograms

Page 55: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Example

Introduction to GIS

•Here are some sample elevation points from which surfaces were derived using the three methods

Page 56: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Example: IDW

Introduction to GIS

•Done with P =2. Notice how it is not as smooth as Spline. This is because of the weighting function introduced through P

Page 57: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Example: Spline

Introduction to GIS

•Note how smooth the curves of the terrain are; this is because Spline is fitting a simply polynomial equation through the points

Page 58: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Example: Kriging

Introduction to GIS

•This one is kind of in between—because it fits an equation through point, but weights it based on probabilities

Page 59: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Other methods of interpolation

Introduction to GIS

•Thiessen polygons

•This method builds polygons, rather than a raster surface, from control points

•“grows” polygons around sample points that are supposed to represent areas of homogeneity

Source: Jens-Ulrich Nomme http://www.tu-harburg.de/sb3/pssd/GIS-Methods/thiessen.html

Page 60: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Density Functions

Introduction to GIS

•We can also use sample points to map out density raster surfaces. This need to require a z value in each, it can simply be based on the abundance and distribution of points.

Page 61: Spatial Interpolation, geostatistics and sampling

©2003 All lecture materials by Austin Troy except where noted

Density Functions

Introduction to GIS

•These settings would give us a raster density surface, based just on the abundance of points within a “kernel” or data frame. In this case, a z value for each point is not necessary.