gis geostatistics

Upload: ashvin-grace

Post on 03-Apr-2018

256 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 GIS Geostatistics

    1/17

    Environmental and Ecological Statistics 8, 361377, 2001

    GIS and geostatistics: Essential partners

    for spatial analysis

    P. A . B U R R O U G H

    Utrecht Centre for Environment and Landscape Dynamics (UCEL),

    Faculty of Geographical Sciences, Utrecht University, Post Box 80.115, 3508 TC Utrecht,

    The Netherlands

    E-mail: [email protected]

    Received June 1999; Revised May 2001

    Initially, geographical information systems (GIS) concentrated on two issues: automated map

    making, and facilitating the comparison of data on thematic maps. The rst required high quality

    graphics, vector data models and powerful data bases, the second is based on grid cells that can be

    manipulated by suites of mathematical operators collectively termed ``map algebra''. Both kinds of

    GIS are widely available and are taught in many universities and technical colleges. After more than

    20 years of development, most standard GIS provide both kinds of functionality and good quality

    graphic display, but until recently they have not included the methods of statistics and geostatistics as

    tools for spatial analysis.

    Recently, standard statistical packages have been linked to GIS for both exploratory data analysis

    and statistical analysis and hypothesis testing. Standard statistical packages include methods for the

    analysis of random samples of cases or objects that are not necessarily co-located in spaceif the

    results of statistical analysis display a spatial pattern then that is because the underlying data alsoshare that pattern.

    Geostatistics addresses the need to make predictions of sampled attributes (i.e., maps) at

    unsampled locations from sparse, often expensive data. To make up for lack of hard data

    geostatistics has concentrated on the development of powerful methods based on stochastic theory.

    Though there have been recent moves to incorporate ancillary data in geostatistical analyses,

    insufcient attention has been paid to using modern methods of data display for the visualization of

    results.

    GIS can serve geostatistics by aiding geo-registration of data, facilitating spatial exploratory data

    analysis, providing a spatial context for interpolation and conditional simulation, as well as

    providing easy-to-use and effective tools for data display and visualization. The value of

    geostatistics for GIS lies in the provision of reliable interpolation methods with known errors,

    methods of upscaling and generalization, and for supplying multiple realizations of spatial patterns

    that can be used in environmental modeling. These stochastic methods are improving understanding

    of how errors in models of spatial processes accrue from errors in data or incompleteness in thestructure of the models.

    New developments in GIS, based on ideas taken from map algebra, cellular automata and image

    analysis are providing high level programming languages for modeling dynamic processes such as

    erosion or the development of alluvial fans and deltas. Research has demonstrated that these models

    need stochastic inputs to yield realistic results. Non-stochastic tools such as fuzzy subsets have been

    shown to be useful for spatial analysis when probabilistic approaches are inappropriate or

    impossible. The conclusion is that in spite of differences in history and approach, the linkage of GIS,

    statistics and geostatistics provides a powerful, and complementary suite of tools for spatial analysis

    in the agricultural, earth and environmental sciences.

    1352-8505 # 2001 Kluwer Academic Publishers

  • 7/28/2019 GIS Geostatistics

    2/17

    Keywords: geographic information systems, geostatistics, statistical methods, spatial analysis,environmental modeling, map algebra, fuzzy sets

    1352-8505 # 2001 Kluwer Academic Publishers

    1. IntroductionGIS, statistics and geostatistics

    Geographical information systems, in the sense of computer tools for handling spatial data

    (Burrough and McDonnell, 1998), have been used since the late 1960s (Coppock and

    Rhind, 1991). Their initial development was mainly in North America, stimulated by the

    need to map, plan and manage large areas of terrain, but major contributions came also

    from Britain and other European countries, and from Japan and Australasia. Initially there

    were two different kinds of GIS. The rst kind, dominated by cartographers, aimed at

    automating the map making process: ultimately this was to replace the paper map by the

    much more exible electronic database. Initially, the essential ingredients of this approach

    were geometrical accuracy, and elegant hard copy output. The second approach, pioneered

    by the Harvard Laboratory for Computer Graphics, focused on spatial analysis, in

    particular the overlaying of different thematic maps so that relations and conicts in land

    use could be resolved. Whereas the rst approach was an automated version of the

    cartographer's eye, arm and hand, and insisted on full cartographic design standards, the

    Harvard approach concentrated on the clever combination of data linked to a gridded

    division of space. As the computer output devices of the time were limited to line printers

    having a unit cell measuring 1=661=10 inch, differences in values could only be indicated

    by overprinting different alphanumeric characters, so the gridded (or raster maps) were notat all pretty. GIS anno 1980 consisted of two opposing camps, the one with expensive,

    beautiful, but essentially dumb products that were the electronic equivalent of paper maps;

    the other, a sort of mapping spreadsheet, in which spatial analysis could be carried out with

    great mathematical exibility, but ugly results and huge demands on the then limited

    computer memories. Developments in computer technology and the analysis of remotely

    sensed images has reinforced the gridded approach for environmental study. Iinitially,

    however, the differences in budgets and apparatus between the remote sensing

    professionals and environmental scientists ensured that raster GIS and the classication

    and display of remotely sensed images remained separate areas of development.

    Technical advances since the 1980s have ensured that the division of GIS practitioners

    into two opposing camps has largely disappeared, and the input of gridded maps and

    remotely sensed images to GIS has now become standard practice. True, there are still

    arguments today as to whether the raster (gridded) or the vector (point, line, polygon)

    approach is better, but the discussion now focuses on the correct choice of spatial

    paradigm for a given application, and not on the limitations of the approaches per se

    (Burrough and McDonnell, 1998). Today, most commercial GIS provide facilities for

    working with raster or vector data, either individually, or in combination. They also

    provide database facilities for storing, retrieving, modifying the attributes of the spatial

    entities that have been recognized for the given application, and many also include their

    own internal programming languages which allow the user to treat the spatial data as

    inputs to a virtually unlimited range of environmental models (Burrough, 1996).

    362 Burrough

  • 7/28/2019 GIS Geostatistics

    3/17

    In brief, GIS are sets of computer tools for the storage, retrieval, analysis and display ofspatial data. GIS may also be required to supply data to numerical models of

    environmental processes (e.g., air quality, water quality and quantity, plant-soil-

    environment responses, etc.) and display the results of these models as cartographically

    acceptable screen or hard copy images. By convention, GIS analyses are almost

    exclusively deterministic and data are assumed to be exact. Apart from specialists (e.g.,

    Heuvelink and Lemmens, 2000) the GIS community has shown little regard for issues of

    uncertainty and spatio-temporal variability apart from geometric precision. This is not

    because of computational problems, but because market forces have determined that many

    GIS applications need not address these issues.

    1.1 GIS and statistics

    Statistical theory and practice for describing the average properties of samples, and for

    hypothesis testing are well known in environmental science. Conventionally, the

    geographical location of the individual observations is not taken into account, but if

    these methods are used for attributes of spatially located objects then one may be able to

    set up and test hypotheses as to whether geographically separate, but eponymous objects

    (e.g., instances of soil series, land use classes) really share the same sets of attributes.

    Statistical spatial data analysis (SSDA) (Wise et al., 2001) treats the objects in the spatial

    data base (points, lines, areas, pixels) as though they and their attributes were samples

    from a larger population. As Wise et al. (2001) point out, two main approaches have been

    developedexploratory spatial data analysis (ESDA) and conrmatory spatial data

    analysis (CSDA). ESDA is a spatial extension of Tukey's (1977) methods for robust andvisual analysis of data: the accent is on descriptive univariate and multivariate statistics

    (means, deviations, ranges, correlations, principal components) in which one searches for

    outliers or oddities in the value patterns of the spatial objects under consideration. In

    CSDA, attention is focused on building empirical regression models and/or the testing of

    hypotheses.

    Several standard statistical packages (SPSS, S-plus, etc.) include a wide range of

    methods for EDA and CDA, though they may not include all the hyper data links

    envisaged by the developers of ESDA (e.g., Wise et al., 2001). Never the less, today it is

    comparatively easy to link a statistical analysis of tabular attribute data to a set of

    geographical objects in a GIS like ARC-VIEW, either via a DBase le (e.g., using SPSS)

    or embedded links (using S-plus).

    As an example of simple descriptive statistical analysis linked to GIS, consider Fig. 1,

    which shows a soil map with three soil types and 126 sample locations.

    In the study area the soil is usually less than 100 cm thick over bedrock. In a GIS

    analysis we might want to test the hypothesis that there is no signicant difference in soil

    thickness between the three soil types so that the map pattern may be simplied without

    loss of information. Visual inspection of the right hand gure suggests that the different

    soil types do have different soil thickness, and this is easily conrmed by extracting the

    observed thickness data for each site and carrying out an ANOVA analysis for all soil

    types. As Table 1 shows, the mean soil thickness per soil type does differ signicantly; the

    analysis returns a F-value of 22.67 with p40.001. A post-hoc Scheffe test suggests that all

    GIS and geostatistics 363

  • 7/28/2019 GIS Geostatistics

    4/17

    three soil types have signicantly different means (Table 2) so there is little point in

    simplifying the soil map.

    As another example of straightforward statistical analysis using a linked statistics

    package, Fig. 2 presents the results of carrying out a multivariate discriminant analysis on

    all the 20 attributes of the soil collected at each of the 126 sample sites. This clearly shows

    that though the centroids of the three soil types clearly differ in multivariate space, there is

    considerable overlap.

    1.2 GIS and geostatistics

    As noted, the standard GIS approach to recording and analyzing the attributes of pre-

    dened objects implies no spatial variation within an object, and all change occurs at

    object boundaries. In many applications (hydrology, oceanography, earth sciences, soil

    Figure 1. Left: Soil prole classes at sample sites (dot is unit Cr, small circle with dot is unit Ct and

    large circle with dot is unit Ia). Right: Soil thickness at sample sites (dot is 040 cm, small ag is 40

    80 cm, and large ag is4 80 cm).

    Table 1. Descriptive statistics of soil thickness for each soil type.

    Soil Type N Mean Std. Error

    Ct 36 51.15 4.20

    Cr 31 67.02 4.48

    Ia 59 32.76 2.79

    Total 126 46.45 2.42

    364 Burrough

  • 7/28/2019 GIS Geostatistics

    5/17

    science to name but a few), this approach is not always sensible and it is better to consider

    the variation of the attribute in terms of a continuous, but noisy surface. This surface is

    often constructed by interpolation from sets of point data. Though there are many methods

    for interpolation (see Burrough and MacDonnell, 1998), most of these treat the data as if

    they can be modeled by a smooth, differentiable surface and no attention is paid to the

    uncertainty of the results. The methods of geostatistics (Matheron, 1965; Journel, 1996;

    Goovaerts, 1997) use the stochastical theory of spatial correlation both for interpolation

    and for apportioning uncertainty.

    Although still unfamiliar to many GIS users, in terms of technical development, the

    Table 2. Post hoc Scheffe test indicates that all three soil types have signicantly differentthicknesses.

    Subset for Alpha 0.05

    Soil Type N 1 2 3

    3 59 32,7617

    1 36 51,1528

    2 31 67,0232

    Sig. 1000 1000 1000

    Figure 2. Plot of discriminant functions for all 126 soil observations compared with map classes.

    GIS and geostatistics 365

  • 7/28/2019 GIS Geostatistics

    6/17

    methods of geostatistics are of similar age to GIS, but have different roots. Whereas GISwas seen as a way to automate the creation of exact, deterministic models of the world in a

    dominantly cartographic context, geostatistics is about making predictions under

    conditions of uncertainty and limited information. The path of geostatistics from its

    founders Krige and Matheron in the 1960s and 1970s to present day exponents such as

    Journel, Goovaerts and others emphasizes the role of chance in spatial prediction. Where

    GIS ignores statistical variation, geostatistics uses the understanding of statistical variation

    as an important source of information for improving predictions of an attribute at

    unsampled points, given a limited set of measurements. Geostatistics are therefore a very

    useful ``add on'' or extension to the GIS toolkit for spatial analysis.

    A central aspect of geostatistics is the use of spatial autocovariance structures, often

    represented by the (semi)variogram, or its cousin the autocovariogram, which differentiate

    different kinds of spatial variation. The semivariance indicates the degree of similarity of

    values of a regionalized variable Zover a given sample spacing or lag, h. Semivariograms

    (Fig. 3) are graphs of the semivariance gh against sample spacing or lag, h: they aredened as:

    gh 1

    2VarfZxi Zxi hg 1

    and estimated from sampled data by:

    gh 1

    2nXn

    i 1

    fzxi zxi hg2

    2

    where n is the number of samples, and zxi; zxi h are measurements separated by a

    distance h.In practice, gh is estimated from sets of point samples which can be extracted from the

    GIS data base. Because experimentally derived semivariances do not always follow a

    smooth increase with sample spacing, a theoretical variogram model is tted to the data

    (Burrough and McDonnell, 1998; Deutsch and Journel, 1998; Goovaerts, 1997). The

    interpolation weights for predicting the value of attribute z at unsampled locations x are

    derived with the help of this tted model and the method is known as ordinary point

    kriging (OPK) after its rst exponent. Predictions can also be computed for units of land

    (blocks) larger than those sampled, thereby smoothing out local variationsthis is known

    as block kriging. Much practical geostatistics is concerned with the estimation and tting

    of variograms to experimental data (Pannatier, 1996) followed by interpolation or

    conditional simulation of gridded surfaces (Pebesma and Wesseling, 1998). Besides

    interpolation, kriging provides information on interpolation errors. Knowledge of the

    spatial correlation structures may also be used to generate sets of equiprobable realizations

    (simulations) of the attribute z that can be of great value for studying error propagation

    through spatial models that may be linked to the GIS.

    For many users of GIS, kriging is no more than an alternative method of interpolation

    (see Burrough and McDonnell, 1998 for references). Indeed, many statisticians and

    geographers use other methods for statistical spatial analysis (c.f. Bailey and Gatrell, 1995;

    Cressie, 1991). The general lack of appreciation of geostatistics by the GIS community

    during the seminal years from the mid-1970s to the mid-1990s was due to many factors,

    including the publication of Matheron's original treatize in French (Matheron, 1965),

    366 Burrough

  • 7/28/2019 GIS Geostatistics

    7/17

    which is therefore inaccessible to most native English speakers. Until the mid-1990s, the

    high prices charged for geostatistics software packages and their almost exclusive use by

    mining corporations made it difcult to teach geostatistics in many universities. Of course,

    a contributing factor to the lack of interest in geostatistics by the GIS practitioner is itsgrounding in mathematical statistics which clearly bafes those of us who have little

    feeling for the statistical treatment of sampling, variance analysis and correlation and

    regression.

    2. The mutual benets of linking GIS, statistics andgeostatistics

    In this Section I present some examples of the ways in which GIS, statistics and

    geostatistics complement each other in spatial analysis.

    2.1 The value of GIS for geostatistics

    Besides acting as a spatial database, GIS provides several benets to statisticians and

    geostatisticians that are largely concerned with the correct geometric registration of

    sample data, prior data analysis, the linking of hard and soft data, and the presentation of

    results.

    Geo-registration. As with all spatial data, spatial analysis must be carried out on data

    that have been collected with reference to a properly dened coordinate system. GIS can

    Figure 3. Example of a semivariogram tted to experimental data. The numbers indicate the

    numbers of pairs of points used at each lag.

    GIS and geostatistics 367

  • 7/28/2019 GIS Geostatistics

    8/17

    provide the means to register the locations of samples directly (via GPS or other methods),or to convert local coordinates to standard coordinates. The use of standard coordinates

    ensures that data collected at different times can be properly combined and overlaid on

    conventional maps. The use of standard coordinate systems is particularly important when

    international databases are created from different sources, such as occurs in Europe, for

    example.

    Exploratory spatial data analysis. As already noted, ESDA is a useful toolkit for

    examining data prior to analysis. For geostatisticians, the presence and location of spatial

    outliers, or other irregularities in the data may have important consequences for the tting

    of variograms, or for determining whether data should be transformed to logarithms. GIS

    often provide search engines that can be linked to statistical packages to determine

    whether any given data set contains anomalies or unexpected structure. The underlying

    reasons for such anomalies may sometimes be easily seen when these data are displayed

    on a map together with other information. Not all users of ESDA in GIS use conventional

    geostatistics, however, and other measures of spatial autocorrelation such as Moran's I

    statistic are often used (Pereira et al., 1998).

    Spatial context and the use of external information. Increasingly, the suite of

    geostatistical methods currently available allow the user to incorporate external

    information that can be used to modify, and possibly improve, the predictions or

    simulations required. Geostatisticians term the external information ``secondary'',

    because they believe that the ``hard data'' measured at the sample locations is most

    important. But GIS practitioners might prefer to call the ``primary data'' that which

    separates a landscape into its main componentsdifferent soils, or rock types, or land

    cover classes, regarding the sampled data as merely lling in the details that were not

    apparent at the smaller map scale. In any case, GIS makes it possible to incorporate data

    from other aspects of the environment with the geostatistical study of autocorrelationstructures, so that differentiated knowledge of different patterns of variation can be used to

    best effect. For example, in the c. 56 2 km study area used in Principles of Geographical

    Information Systems (Burrough and McDonnell, 1998) the distribution of heavy metals

    (zinc) in the top soils of the river alluvium was clearly inuenced by ooding regime,

    which in turn is affected by factors such as distance from the river and the relative

    elevation of the oodplain. Fig. 4 shows how the extra information may be used in several

    ways. Stratied kriging involves dividing the original set of 155 soil samples into classes

    based on ooding frequencya simple ``point-in-polygon'' search in GISto yield three

    strata. Variograms were estimated for each stratum and these were interpolated to yield a

    single map (Fig. 4b). In a second approach, a multiple regression model was computed

    from the triplets of zinc level, elevation and distance to river measured at all data points

    (Fig. 4c). A third approach, known as ``Universal kriging'' directly incorporates the trend

    in the estimation of the interpolation weights and Fig. 4d illustrates how both stratication

    and trends may be combined.

    The results clearly show the differences in the patterns obtained with and without the

    ancillary data. The single, or combined incorporation of external information through

    stratication and strata-specic trends yielded maps with good levels of prediction and a

    spatial resolution that was better than could have been obtained from ordinary point

    kriging alone. Other examples are given in Goovaerts (1997, 1999).

    Display and visualization2D, 3D, plus time. Who is the recipient of a geostatistical

    interpolation? If a geostatistician, or statistician, then simple maps and tables of numbers

    368 Burrough

  • 7/28/2019 GIS Geostatistics

    9/17

    may sufce, but environmental managers need to see how the results relate to other aspects

    of the terrain. Today it is easy to import the results of a kriging interpolation into a GIS and

    display the results in conjunction with a scanned topographic map, or display them in 3D

    over a digital elevation model (DEM) of the landscape from which the samples were taken

    (Fig. 5). Such presentation invites visual interpretation, the re-evaluation of results and the

    discovery of more information, and therefore is an essential part of the spatial analysis

    process.

    Figure 4. Results of interpolating the ln(Zinc) levels of topsoils (010 cm) in a frequently ooded

    part of the Maas oodplain, Limburg, NL. a: ordinary point kriging, b: OPK within different ooding

    strata, c: using a regression model based on elevation and distance from the river, d: universal kriging

    with a single trend, e: universal kriging with stratication and different trends for each stratum.

    GIS and geostatistics 369

  • 7/28/2019 GIS Geostatistics

    10/17

    2.2 The value of geostatistics for GIS

    Besides providing powerful means of interpolating point data to areas, there are many

    useful ways in which statistics and geostatistics can bring major improvements to the

    understanding of uncertainty and error in GIS-based spatial analyzes. This is particularly

    so for most kinds of GIS-based environmental modeling where a priori we are dealing

    with incomplete data and uncertainty. Indeed, to pretend, as the standard GIS paradigms

    do, that all data are exactly known, and exactly located, is not to recognize reality

    Geostatistics provides at least the following attractive options for environmental GIS

    and environmental decision support systems: interpolation from point data and estimates

    of error bounds, estimates of error propagation and uncertainty ranges for spatial and

    temporal modeling, and data reduction and generalization.

    Interpolation errors. Although surfaces interpolated by kriging are smooth, all forms of

    kriging yield estimates of the estimation uncertainty or kriging error. Such values can be

    mapped to provide error surfaces which can be combined with other information. Kriging

    errors depend on the form of the variogram and the disposition of observationsthe more

    Figure 5. 3-Dimensional display of interpolation results obtained from stratied kriging on a digitalelevation model with shading and transparency oated above a scanned topographic map. Dark gray

    zones indicate heavy metal concentrations.

    370 Burrough

  • 7/28/2019 GIS Geostatistics

    11/17

    data surrounding an unsampled location, and the stronger the autocorrelation structure, thelower the estimation variance.

    Error propagation in spatial models. When data from interpolated surfaces are used as

    inputs to numerical models, the error surfaces associated with kriging interpolation may be

    used to understand the propagation of errors through spatial models. Heuvelink (1998) gives

    both theory and examples of using Taylor series expansion on interpolated data to compute

    error propagation through cartographic modelssee also Burrough and McDonnell (1998).

    An increasingly popular alternative to the Taylor expansion method is to use methods of

    conditional simulation (Pebesma and Wesseling, 1998) to provide sets of multiple

    realizations of data surfaces for inputs to numerical models like the 3D groundwater model

    ``MODFLOW'', so that error propagation and model sensitivity can be followed using

    Monte Carlo methods (e.g., Bierkens, 1994; Gomez-Hernandez and Journel, 1992).

    Monte Carlo techniques using conditional simulation may also be useful for comparingdata collected at different times and locations within the same area. Recent work on the

    redistribution of137Cs fallout from the Chernobyl nuclear disaster in 1986 has shown that

    the normal decay of radiocaesium levels and uptake rates in cow's milk can be temporally

    reversed if the cows are grazing on recently ooded, poorly drained peat soils (Burrough

    and McDonnell, 1998; Burrough et al., 1999a). The data for these studies consisted of

    radionuclide determinations made on bulked soil samples taken in 1988 and 1993.

    Unfortunately, the samples were collected at different sites in the two years, so it was

    difcult to use the raw data to test the hypothesis that the ood events had really enhanced

    radio caesium levels near the rivers. However, by computing the variograms for the data

    sets from both years and using these to compute sets of conditional simulations of the

    normalized differences of radiocaesium in the topsoil between the two sampling times and

    at all sampled sites, it was possible to establish a clear relation between the incidence of

    ooding and ood-induced enhancement of radiocaesium which could enter the food

    chain (Burrough et al., 1999a). Fig. 6 shows clearly that although there seem to be

    systematic differences between the two years (mean values for 1993 exceed those for 1988

    by 0.51.0 standard errors) sites within 1.5 km of a ooding river are not only more

    variable, but many have higher levels of radio caesium.

    Data reduction and spatial generalization. In some applications there may be too much

    data, which may need to be reduced to manageable proportions or common coordinates.

    An example is the need to compare the yields of different crops over several years on the

    same plot when yields have been recorded using data loggers and GPS. For example,

    Burrough and Swindell (1997) report the collection of annual yield data for three

    successive crops on a 5 ha eld at the experimental farm of the Royal College of

    Agriculture, Cirencester, UK. Data were collected on wheat, barley and oilseed rape in

    successive years by a combine harvester tted with a data logger whose location was

    pinpointed by locally referenced GPS. The spatial resolution of the sample was

    approximately 4 m (the width of the harvester)6 2.5 m (along the cut), and each survey

    yielded some 2000 samples or more.

    Because of locational noise in the GPS and errors in the amount of crop cut each 2.5 m

    by the harvester, it was not possible to relate the yields of the three crops directly to

    location in the eld nor to investigate links between crop yields and soil conditions. To

    generalize and smooth the data, for each year an isotropic variogram was computed: the

    data were then interpolated to a common grid of 2.5 m resolution using block kriging with

    GIS and geostatistics 371

  • 7/28/2019 GIS Geostatistics

    12/17

    units of 256 25 m. Each annual map was normalized to give a map showing relative

    yield; these three maps were then combined to give a three year, normalized average.

    Comparison of the normalized average yield map with a computer enhanced, scanned

    aerial image of the site (Fig. 7) demonstrates clear relations between site conditions and

    normalized crop yields that otherwise were not apparent.

    Figure 6. Plots of conditional simulations for the 19881993 normalized differences of137Cs at data

    points, with distance to rivers that ood.

    Figure 7. Comparison between aerial photo image of eld A and displayed on its right, the average,

    standardized crop yields as interpolated using block kriging.

    372 Burrough

  • 7/28/2019 GIS Geostatistics

    13/17

    Geostatistics and remote sensing. The applications of geostatistical methods in theanalysis of remotely sensed images is a topic in itself. Here I refer the reader to the recent

    issue of Photogrammetric Engineering and Remote Sensing (January, 1999) for a recent

    compilation of research. Remote Sensing applications of geostatistics have less to do with

    interpolation from sparse data (the images are complete unless masked by cloud cover in

    which interpolation could be used to ll in the gaps) than with the description and analysis

    of gridded, stochastic surfaces and the simulation of multiscale data sets.

    3. Stochastic inputs to the modeling of spatial processes

    As already indicated, geostatistical methods of conditional simulation are useful for

    following the propagation of errors through spatial models that may be linked to, or runfrom GIS. Recent research in the modeling of dynamic spatial processes (van Deursen,

    1995; Takeyama and Couclelis, 1997; Wesseling et al., 1996) indicates the value of

    including an understanding of errors and roughness in many models of dynamic spatial

    processes, particularly when processes are non-linear.

    Stability of the topology of drainage nets. The automatic derivation of surface topology

    from gridded digital elevation model is now a standard operation in GIS that are used for

    hydrological projects (Fig. 8a). The usual procedure is to use thin plate splines to

    interpolate a DEM (digital elevation model) from digitized contours to a ne grid so that

    the resulting topological net is free from discontinuities (Mitasova and Hoerka, 1993).

    Unfortunately, although smooth interpolators guarantee continuity in surface topology,

    they also constrain the topology to a single set of drainage lines, which may result inserious artefacts in hydrological derivatives such as wetness indices (see Burrough and

    MacDonnell, 1998 for denitions). Simple methods, such as the D8 algorithm, for deriving

    drainage nets from gridded surfaces, produce a unique solution in which the main stream

    line is only one cell wide (e.g., Fig. 8a). Large differences in the size of the upstream

    contributing catchment area between a cell on the main drainage line and its off-line

    neighbor may arise. This is counter-intuitive, because we expect cells close to each other

    to have similar conditions and contributing areas, especially in the bottoms of valleys. A

    Figure 8. a: Single realization of a drainage network derived from a smooth DEM; b: average image

    computed from 100 realizations derived from the initial DEM plus 10 cm root mean square (RMS)

    error.

    GIS and geostatistics 373

  • 7/28/2019 GIS Geostatistics

    14/17

    better idea of surface water drainage may be obtained by considering the averageproperties of a suite of possible drainage nets that are obtained when surface roughness is

    added to the DEM. The roughness can easily be modeled by a small Gaussian noise which

    is added to each cell (a standard deviation equal to 0.1% of the maximum relief difference

    in the area is enough as a rst approximation); the result yields one possible realization of

    the net. Repeating the procedure for 1001000 times with different random values for

    roughness creates an average probability density map of the cumulative contributing area

    (Fig. 8b) which appears to be more realistic than the single deterministic solution. Note

    that one cannot compute Fig. 8b by passing a moving window smoothing function over

    Fig. 8a.

    The effects of small errors on the derived ow paths may be effectively demonstrated by

    displaying the whole set as a movie, when the amplitudes and locations of the swings of

    drainage paths resulting from the minor errors will become very apparent. Though this

    example uses spatially uncorrelated noise for each realization of the DEM surface, one

    could of course examine the effects of spatially correlated noise on the model by rst

    creating a set of conditional simulations based on a known or assumed variogram.

    Repeating the analysis for multiple realizations and displaying these using dynamic

    visualization enhances understanding of the results.

    Adding stochasticity to make a deterministic process model work properly. In certain

    situations it appears to be necessary to add roughness to a surface so that a well-known

    deterministic process can be modeled effectively, and this is illustrated using the example

    of the creation of an alluvial fan. If a hillside is modeled as a smooth inclined plane, then

    the topology consists merely of a set of parallel lines that run from top to bottom, much

    like the way rain falling on the windscreen of a stationary car runs off in parallel streams.

    These streams can be ``forced'' to merge if the initial surface is roughend (e.g., Liverpooland Edwards, 1995). In the case of the alluvial fan, each ``event'' by which material falls

    down the slope and is added to the fan modies the surface roughness in a way that is very

    difcult to predict, but which must not be ignored. So the initial roughness is modied by

    feedback from the sedimentation process so that for each cycle there is a new surface for

    the ow and deposition. If the deposits are sufciently large, the surface topology changes

    with each cycle.

    The need for initial roughness which is modied but maintained during the development

    of the delta is a nice example of how a better understanding of the physical process may

    arise by linking geostatistics with interactive dynamic modeling. Ongoing research in

    Utrecht and elsewhere is beginning to demonstrate the value of conditional simulation in

    dynamic, as well as static models of landscape change (see Karssenberg et al., in press).

    4. Non-stochastic tools for analyzing uncertainty in spatialdata: fuzzy subsets

    In many situations we know there is uncertainty, but we do not know, nor can we construct

    probability distributions. We may also be uncertain how to dene the geographical objects

    in the data base (Burrough and Frank, 1996). The development of fuzzy subsets in

    environmental science is increasingly being seen not as a replacement for statistics and

    374 Burrough

  • 7/28/2019 GIS Geostatistics

    15/17

    geostatistics, but as a complementary suite of methods for operating in uncertainconditions. The main uses of fuzzy subsets in GIS are for the selection and retrieval of data

    under conditions of uncertainty (eg., Burrough and McDonnell, 1998; Canters, 1997), and

    in creating multivariate classes that overlap (fuzzy k-means) (Burrough et al., 1999b).

    Data retrieval using fuzzy subsets has been demonstrated to be less error prone than

    conventional Boolean SQL methods (Heuvelink and Burrough 1993). Fuzzy memberships

    can be interpolated using kriging (de Gruijter et al., 1997; Burrough and McDonnell,

    1998) and the application of fuzzy k-means to derivatives of digital elevation models pro-

    vides convincing and objective methods for classifying terrain (Burrough et al., 2000,

    2001). Fuzzy subsets can also be used to address issues of the crispness of spatial bound-

    aries (e.g., Lagacherie et al., 1996) or the intervisibility across 3D surfaces (Fisher, 1995).

    Fuzzy subsets may also be used to dene sensible ways to select point data for kriging.

    5. Conclusions

    This review has demonstrated that GIS, statistics and geostatistics have much to give to

    each other, particularly when GIS are used for environmental analysis. Geostatistics

    benet from having standard methods of geographical registration, data storage, retrieval

    and display, while GIS benets by being able to incorporate proven methods for testing

    hypotheses and for handling and understanding errors in data and illustrating their effects

    on the outcomes of models used for environmental management. In some situations,

    geostatistics may be supplemented by non-probabilistic methods of handling uncertainty

    such as provided by fuzzy subsets.

    References

    Bailey, T.C. and Gatrell, A.C. (1995) Interactive Spatial Data Analysis, Longman, Harlow, 413 pp.

    Bierkens, M.F.P. (1994) Complex Conning Layers: A Stochastic Analysis of Hydraulic Properties at

    Various Scales, Royal Dutch Geographical Association (KNAW)/Faculty of Geographical

    Sciences, University of Utrecht, Utrecht, NL.

    Burrough, P.A. (1996) Opportunities and limitations of GIS-based modeling of solute transport at the

    regional scale. In: Application of GIS to the Modeling of Non-Point Source Pollutants in the

    Vadose Zone, SSSA Special Publication 48, Soil Science Society of America, Madison, 1937.

    Burrough, P.A. and Frank, A. (1996) (eds), Geographic Objects with Indeterminate Boundaries,

    GISDATA Series 2, Taylor and Francis, London.

    Burrough, P.A., van Gaans, P.F.M., and MacMillan, R.A. (2000) High-resolution landform

    classication using fuzzy k-means. Journal of Fuzzy Sets and Systems, 113, 3752.Burrough, P.A., van Gaans, P.F.M., Wilson, J., and Hansen, A.J. (2001) Fuzzy k-means classication

    of topo-climatic data as an aid to forest mapping in the Greater Yellowstone Area, USA.

    Landscape Ecology, 16, 52346.

    Burrough, P.A. and McDonnell, R.A. (1998) Principles of Geographical Information Systems,

    Oxford, Oxford University Press, 330 pp.

    Burrough, P.A. and Swindell J. (1997) Optimal mapping of site-specic multivariate soil properties.

    In Precision Agriculture: Spatial and Temporal Variability of Environmental Quality, J. Lake,

    G. Bock, and J. Goode (eds), Proc: CIBA Foundation Symposium 210, John Wiley and Sons,

    Chichester, pp. 20820.

    GIS and geostatistics 375

  • 7/28/2019 GIS Geostatistics

    16/17

    Burrough, P.A., van der Perk, M., Howard, B., Prister, B., Sansone, U., and Voitsekhovitch, O.V.(1999a) Environmental mobility of Radiocaesium in the Pripyat Catchment, Ukraine/Belarus.

    Water, Air and Soil Pollution, 110, 3555.

    Burrough, P.A., van Gaans, P.F.M., and MacMillan, R.A. (2000) High-resolution landform

    classication using fuzzy k-means. Journal of Fuzzy Sets and Systems, 113, 3752.

    Canters, F. (1997) Evaluating the uncertainty of area estimates derived from fuzzy land-cover

    classication. Photogrammetric Engineering and Remote Sensing, 63, 40314.

    Coppock, J.T. and Rhind, D.W. (1991) The history of GIS. In: Geographical Information Systems,

    Vol. 1, Principle, D.J. Maguire, M.F. Goodchild, and D.W. Rhind (eds), Longman Scientic

    and Technical, New York, pp. 2143.

    Cressie, N. (1991) Statistics for Spatial Data, Wiley, New York, 900 pp.

    De Gruijter, J.J., de Walvoort, D., and van Gaans, P. (1997) Continuous soil mapsa fuzzy set

    approach to bridge the gap between aggregation levels of process and distribution models.

    Geoderma, 77, 16995.

    Deutsch, C. and Journel, A.G. (1998) GSLIB Geostatistical Handbook, 2nd edition, Oxford.Fisher, P.F. (1995) An exploration of probable viewsheds in landscape planning. Environment and

    Planning B: Planning and Design, 22, 52746.

    Gomez-Hernandez, J.J. and Journel, A.G. (1992) Joint sequential simulation of multigaussian elds.

    In: A. Soares (ed), Proc. Fourth Geostatistics Congress, Troia, Portugal. Quantitative Geology

    and Geostatistics, (5), 8594, Dordrecht, Kluwer Academic Publishers.

    Goovaerts, P. (1997) Geostatistics for Natural Resources Evaluation, Oxford University Press,

    483 pp.

    Goovaerts, P. (1999) Using elevation to aid the geostatistical mapping of rainfall erosivity. CATENA,

    34, 22742.

    Heuvelink, G.B.M. (1998) Error Propagation in Environmental Modeling, Taylor and Francis,

    London, 127 pp.

    Heuvelink, G.B.M. and Burrough, P.A. (1993) Error propagation in cartographic modeling using

    Boolean logic and continuous classication. Int. J. Geographical Information Systems, 7, 231

    46.

    Heuvelink, G.B.M. and Lemmens, T. (2000) (eds), Accuracy 2000. Proceedings of the 4th

    International Meeting on Accuracy in Spatial Data, Amsterdam, July, Delft University Press,

    Delft.

    Karssenberg, D.J., Torqvist, T., and Bridges, J. (2001) Conditioning a process-based model of

    sedimentatry architecture to well data. Journal of Sedimentary Research, 71(6).

    Lagacherie, P., Andrieux, P., and Bouzigues, R. (1996) Fuzziness and uncertainty of soil boundaries:

    from reality to coding in GIS. In: P.A. Burrough and A.U. Frank (eds), Geographical Objects

    with Indeterminate Boundaries, Taylor and Francis, London, pp. 27586.

    Liverpool, T. and Edwards, S. (1995) Modeling meandering rivers. Physical Review Letters, 75,

    3016.

    Matheron, G. (1965) La Theorie des Variables Regionalisee et ses Applications, Masson, Paris.

    Mitasova, H. and Hoerka, J. (1993) Interpolation by regularized spline with tension: Application to

    terrain modeling and surface geometry analysis. Mathematical Geology, 25, 65769.Pannatier, Y. (1996) Variowin. Software for spatial data analysis in 2D. Statistics and Computing,

    Springer Verlag, Berlin, 91 pp.

    Pebesma, E. and Wesseling, C.G. (1998) GSTAT: A program for geostatistical modeling, prediction

    and simulation. Computers and Geosciences, 24, 1731.

    Pereira, J.M.C., Carreiras, J.M.B., and Perestrello de Vasconcelos, M.J. (1998) Exploratory data

    analysis of the spatial distribution of wildres in Portugal 19801989. Geographical Systems,

    5, 35590.

    Takeyama, M. and Couclelis, H.M. (1997) Map dynamics: integrating cellular automata and GIS

    through Geo-Algebra. International Journal of Geographical Information Science, 11, 7392.

    376 Burrough

  • 7/28/2019 GIS Geostatistics

    17/17

    Tukey, J.W. (1977) Exploratory data analysis, Addison-Wesley, Reading, Massachusets.Van Deursen, W.P.A. and Wesseling, C.G. (1995) PCRaster, Department of Physical Geography,

    Utrecht University.

    Wesseling, C.G., Karssenberg, D., Burrough, P.A., and van Deursen, W.P.A. (1996) Integrating

    dynamic environmental models in GIS: The development of a dynamic modeling language.

    Transactions in GIS 1, 408.

    Wise, S., Haining, R., and Ma, J. (2001) Providing spatial statistical data analysis functionality for

    the GIS user. The SAGE project. International Journal of Geographical Information Science,

    15, 239254.

    Biographical sketch

    Peter A. Burrough, since 1984, is Professor of Physical Geography and Geographical

    Information Systems, Faculty of Geographical Sciences, University of Utrecht. Dr.

    Burrough is also the Director of the Utrecht center for Environment and Landscape

    Dynamics (UCEL). He is Chairman of the Interfaculty center for Hydrology, Utrecht

    (ICHU). He is a member of the advisory committee on Earth Sciences, Physical

    Geography and Geology for the Dutch National Science Foundation NOW, and a member

    of the Scientic Board for the ``Fonds voor Wetenschappelijk Onderzoek'' (FWO) for

    Vlaanderen, Belgium.

    GIS and geostatistics 377