trodd overlay areas and surfaces

Upload: a-d-prasad

Post on 03-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    1/17

    1

    OVERLAY ANALYSIS: AREAS AND SURFACES

    TABLE OF CONTENTS

    1 The importance of overlay in GIS..................................................................................................... 22 Learning objectives............................................................................................................................2

    3 Area and surface overlay analysis......................................................................................................3

    3.1 A brief history............................................................................................................................ 3

    3.2 Field and feature perspectives....................................................................................................3

    4 Polygon-on-polygon overlay operations............................................................................................44.1 Topological issues......................................................................................................................4

    4.2 Creating new geometries............................................................................................................4

    4.3 Weighted overlay and the vector data model.............................................................................6

    4.4 Problems with overlay on vector data model.............................................................................7

    4.4.1 Computational demands.....................................................................................................7

    4.4.2 Sliver polygons...................................................................................................................85 Area-on-area overlay operations on raster data............................................................................... 10

    5.1 Overlay analysis on raster data is easy! .................................................................................. 10

    e-Tutorial Exercise 1................................................................................................................. 10

    5.2 Difficulties of overlay on raster data models........................................................................... 11

    5.2.1 What do the numbers mean?............................................................................................11

    e-Tutorial Exercise 2................................................................................................................. 135.2.2 Constructing area entities.................................................................................................13

    5.2.2 Cell resolution.................................................................................................................. 13

    6 Some more issues in overlay analysis..............................................................................................146.1 Scales of measurement.............................................................................................................14

    6.2 Scale and overlay analysis....................................................................................................... 16

    7 What have you learnt in this lesson?............................................................................................... 16

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    2/17

    2

    OVERLAY ANALYSIS: AREAS AND SURFACES

    1 The importance of overlay in GIS

    Overlay is a fundamental spatial operation. It is one of the functions that distinguishes

    GIS from other systems such as CAD and DBMS. The UK Chorley Report (Department of

    the Environment, 1986) illustrates what a GIS should be able to do by giving the example

    of an industrial siting case study that uses overlay to 'sieve' the various siting criteria and

    identify suitable locations.

    Overlay operators combine data from the same entity type or different entity types. In

    both cases they create new geometries and can change entity type and/or attribute value.

    There are four overlay operators in common use:

    point-in-area (also known as point-in-polygon)

    line-in-area

    area-on-area (also known as polygon-on-polygon)

    weighted overlay

    In this lesson we will concentrate on two of these operators, namely area-on-area and

    weighted overlay. These operators process area and surface entity types respectively.

    You will find that overlay techniques vary with the data model employed by your GIS. This

    means that the results of overlay analysis depend on the data model and, in general,

    techniques to analyse vector data are time consuming and computationally intensive

    whereas overlay of raster data is relatively straightforward, quick and efficient.

    2 Learning objectives

    Upon completion of this lesson you should be able to:

    Identify and explain techniques to perform area-on-area overlay on raster and

    vector data.

    Identify and explain techniques to perform weighted overlay on raster and vector

    surfaces.

    Understand the main weaknesses of these overlay operations as they are

    implemented in GIS.

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    3/17

    3

    3 Area and surface overlay analysis

    3.1 A brief history

    The principles of area-on-area overlay pre-date GIS. Until the arrival of GIS, map overlay

    analysis was performed manually by superimposing transparent acetates of map layers on

    a light table. The stack of acetates was used to visually identify sites that met a number of

    criteria.

    In the 1960s the 'quantitative revolution' heralded a new era for spatial analysis. Several

    influential figures emerged including Ian McHarg whose work in landscape ecology is most

    well known for its attempt to explain the distribution of plants by combining information

    about the environment. His approach was to apply sieve-mapping techniques. This was

    a substantial step forward in computational spatial analysis, allowing considerably more

    work to be performed using the computer than was originally possible from field

    observation and single coverage cartography alone (Simpson, 1989). At the same time

    the first GIS prototypes were being developed and it is not coincidental that early products

    were designed, in part, to automate this `sieve mapping. Speedy and efficient analytical

    techniques such as these were of particular interest to governments intent on examining

    spatial relationships of large regions. Work done under the auspices of the Canadiangovernment in developing CGIS was largely responsible for increasing the prevalence of

    polygon overlay in GIS.

    3.2 Field and feature perspectives

    Overlay analysis has generally taken the form of either area-on-area overlay or weighted

    overlay depending on your perspective. The former is more concerned with the analysis

    of particular features and adopts a discrete object perspective. The objectives of area-on-

    area overlay are to determine whether two features overlap (the technical term is to

    'intersect') and, if so, to define the identity of areas formed by the overlap as one or more

    new area objects.

    Weighted overlay operations combine two or more complete map layers consisting of

    areas or surfaces. In addition to computing the identity of the new geometries the

    objective of weighted overlay is to compute new attribute values. Because the operation

    processes the complete data set so boundaries from all inputs will be retained but

    broken into shorter fragments by intersections that occur between boundaries in one input

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    4/17

    4

    dataset and boundaries in another.

    In both area-on-area and weighted overlay the output entity type is an area or surface

    respectively but the overlay operation has generated new geometries and new attributes.

    4 Polygon-on-polygon overlayoperations

    4.1 Topological issues

    If you are overlaying two vector map layers you need to ensure before you start that the

    input map layers are topologically correct. If this is so then the output maps will also be

    topologically correct (Figure 1).

    Figure 1. Polygon-on-polygon overlay.

    Polygon-on-polygon overlay New geometries

    In polygon overlay it is necessary to add new intersections (nodes) and create new

    polygons to retain topology. Overlaying 2 sets of polygons can produce a large number of

    new polygons and increase the number of nodes and arcs. In Figure 2, for example, the

    number of nodes increased by 75% and the number of arcs by 83%. Warning!! Increasing

    the number of input data sets can rapidly increase the number of output features.

    The algorithms to compute the location of new nodes are the same as those used for line

    intersection. Once these have been identified so the arcs need to be split and then thenew topology constructed.

    4.2 Creating new geometries

    Once the new set of nodes, arcs and polygons have been created the task is to extract a

    meaningful set of polygons. It may be desirable to retain only that area that is common to

    both input features. For example, a farmer is interested in knowing that part of a field that

    has a loam soil. He is able to overlay the map of loam soil polygons on field polygon to

    extract a feature that meets both criteria (loam soil AND in-field).

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    5/17

    5

    Figure 2. Creating geometries in overlay operations on the vector data model.

    Figure 3. Polygon overlay: intersection

    polygon a AND polygon b new feature geometry

    (old boundaries dissolved)

    It is worth noting that the variables the farmer is processing are both of categorical (or

    nominal) data type. This is because mathematicians have developed a suite of algorithms

    to analyse these data, known as Boolean operators, that GIS analysts exploit in area-on-

    area overlay analysis.

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    6/17

    6

    In the example the farmer was analysing 2 criteria and applied an algorithm to create a

    new geometry that met both criteria the area of intersection of polygon a AND polygon

    b. In other situations you may be more interested in features that meet either criteria

    (polygon a OR polygon b). The algorithm is known as union and the effect is to retain all

    parts of both input polygons in the output feature. Likewise other Boolean operators

    frequently used in GIS are NOT and XOR.

    Figure 4. Polygon overlay: (a) Union, (b) NOT and (c) XOR.

    a)

    polygon a OR polygon b Union

    b)

    polygon a NOT polygon b Only parts of polygon a that are outside

    polygon b

    c)

    polygon a XOR polygon b The inverse of intersection

    NOT (polygon a AND polygon b)

    As well as the mathematical rigour conveyed by the use of Boolean operators a strength

    of the basic polygon overlay in that it is intuitive when applied to a vector data model

    because we are handling discrete area objects and nominal attributes.

    4.3 Weighted overlay and the vector data model

    In the basic area-on-area overlay on a vector data model the objective was to identify one

    or more parts of the new geometry that met simple criteria. Areas that did not meet the

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    7/17

    7

    criteria were discarded. This was processed as a single task.

    The objective of weighted overlay is to calculate a new set of values for the complete

    coverage based on a combination of input values. When working with a vector data

    model there are two tasks to perform (i) create a new set of geometries for the entire

    area and (ii) compute a new set of attributes for those geometries.

    The latter task is a matter of describing a mathematical equation to process the input

    values. The first task, however, requires you to extend the basic polygon overlay

    operation to consider every intersection between all polygons in every data layer. As you

    can imagine this can be computationally demanding, especially if the GIS you are using

    computes topology 'on the fly' and does not store it in the data structure. As we shall see

    this is one of the reasons why weighted overlay is more frequently applied to a raster data

    model.

    4.4 Problems with overlay on vector data model

    4.4.1 Computational demands

    The data file produced as a result of polygon overlay may be considerably larger than the

    original because lines have been split into smaller segments and new nodes and polygons

    have been created. Although more file space is required to store the outputs a morecommon problem is that some implementations of polygon overlay in GIS require large

    amounts of memory or temporary file space to hold intermediate products during the

    processing. The result is that most GIS are limited in the number of polygons that they

    can handle in a polygon overlay operation.

    It is fairly obvious that larger map layers will take longer to process. It is therefore prudent

    to develop a strategy to minimise processing time (and memory use). A data processing

    strategy is particularly important if your GIS has to compute topology 'on the fly' e.g. ESRIArcView 3 and ArcGIS 8, because this increases the computational demands. My advice

    is to design your analysis so that the fewest number of features are overlaid.

    Example: generate information on the area of coniferous forest in Bavaria.

    Poor strategy:

    Intersect all states in Germany with all land cover types (wait three

    hours....)

    Select by attribute to extract coniferous forests in Bavaria.

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    8/17

    8

    Smart strategy:

    Select Bavaria

    Reclass by attribute all coniferous forest land cover (and all other

    non-coniferous forest land cover)

    Intersect reclassified coniferous forest with Bavaria (wait 5 mins...)

    The smart strategy requires an extra step in processing but will substantially reduce the

    computational load.

    4.4.2 Sliver polygons

    Rogue or spurious polygons that are produced as a result of overlay are commonly known

    as sliver polygons. If you overlay two sets of data with the same area entities that have

    been acquired from different sources or have been digitised twice from the same source

    then you will almost certainly encounter such polygons.

    Figure 5. Sliver polygons caused by digitising the same line twice.

    The two versions of such boundaries will not be coincident and as a result large numbers

    of small sliver polygons will be created by the polygon overlay process.

    Figure 6. Sliver polygons along the boundaries of administrative units.

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    9/17

    9

    There are two approaches to eliminating them:

    1. close them during processing.

    2. eliminate them afterprocessing.

    Removing them automatically during processing is normally done using a user-defined

    tolerance. The analyst adjusts the tolerance to create an optimal solution. If the tolerance

    is too big then lines which are close together, but actually separate, may be joined.

    Figure 7. Setting tolerance to close sliver polygons.

    The alternative is to remove slivers after processing. This may speed up the actual

    overlay processing but requires a degree of intelligence for a computer to be able to

    distinguish between real and sliver polygons. There are several differences between

    typical (real) polygons and sliver polygons.

    Figure 8. Real and sliver polygons.

    'Real' polygons

    Sliver polygons

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    10/17

    10

    'Real' polygons Sliver polygons

    Size and shape vary Generally small, long and thin

    Generally more than two bounding arcs Generally only two bounding arcs

    Attributes vary randomly between

    neighbouring polygons

    Attributes may alternate between adjacent

    polygons

    Usually three arc intersections Four arc intersections generally

    Once the sliver polygons have been identified they can be closed by replacing them with a

    central line.

    5 Area-on-area overlay operations on

    raster data5.1 Overlay analysis on raster data is easy!

    If two grids are aligned and have the same grid cell size then it is relatively easy to

    perform overlay operations. A new layer of values is produced from each pair of

    coincident cells. The values of these cells can be added, subtracted, divided or multiplied,

    the maximum value can be extracted, mean value calculated, a logical expression

    computed and so on. The output cell simply takes on a value equal to the result of the

    calculation.

    2005 Nigel Trodd

    e-Tutorial Exercise 1

    Time: 20 mins

    Let us return to Klinkenbergs excellent demonstrations of GIS

    operations.

    http://www.geog.ubc.ca/courses/klink/java/java_examples.htm

    l

    Use the Binary Overlays demonstration to investigate the

    effects of different Boolean operators on two layers.

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    11/17

    11

    Figure 9. Some mathematical operators for overlay operations on the raster data model.

    Input layers Output layer

    A

    Simple addition

    A + B = C

    C

    Multiplication

    A * B = D

    D

    B

    Unique conditions

    If A =1, B =1 then E = 1

    If A = 2, B = 1 then E = 2

    If A = 1, B = 2 then E = 3

    If A = 2, B = 2 then E = 4E

    5.2 Difficulties of overlay on raster data models

    The main problems are not technical GIS problems, they are data problems.

    5.2.1 What do the numbers mean?

    The simplicity of the operator makes the overlay process very easy to implement.

    Problems usually start with interpreting the outputs. For example, to identify an area that

    meets criteria on two inputs (intersection) can be done one of two ways. The most logical

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    12/17

    12

    approach is to reclass the cell values in each layer as either 0 or 1 to indicate whether

    they meet the criteria or not and then multiply. The extra effort to reclassify 2 layers is

    time consuming and many analysts will seek to multiply the inputs and then reclassify the

    output. This reduces the effort but might not always produce a set of unambiguous output

    values. For example, using the farmer interested in identifying that part of a field that has

    a loam soil then if the loam soil is coded 3 and the field number is 2 then he should

    look for cells in the output layer with a value of 6. The problem is that other

    combinations of inputs can generate the same value 4 and 2, 6 and 1. The problem is

    caused by the analyst and in my experience it happens far too frequently with the results

    being published without anyone being aware of the consequences. Perhaps this is

    because of the widespread availability of such easy-to-use operators.

    The problem of an output layer not having unique records is not restricted to multiplication.

    Jenks has illustrated how the same problem can be caused by addition and it is easy to

    show the problem arises in all mathematical operators if the analyst is unaware of the

    meaning behind the data.

    Figure 10. Ambiguities in the output of overlay operations on raster data.

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    13/17

    13

    5.2.2 Constructing area entities

    A problem in the overlay analysis of raster data models is the correct identification of area

    features in the output because, unlike polygon-on-polygon analysis on the vector data

    model, there is no intuitive geometry. Each cell is processed individually and the analyst

    has to create new geometries based on only the new cell attributes. The operator does

    not distinguish between area entities and surface entities. The analyst is faced with at

    least 2 questions (i) has the mathematical computation produced unambiguous cell

    values i.e. each value has a single, distinctive meaning?, and (ii) should diagonally

    adjacent cells with the same value be part of the same area feature in the output or should

    adjacency be defined in terms of horizontal and vertical neighbouring cells.

    5.2.2 Cell resolution

    Resolution is the pixel, grid cell or mesh size of spatial data. For example, remotely

    sensed multispectral data from the SPOT satellite has a resolution of 20m. This means

    that each pixel in the image represents a ground area of size 20m by 20m. Imagine you

    wish to overlay a SPOT image with a raster representation of urban fox population

    densities which has been coded with a 10mpixel size. Will your output have a resolution

    of 10m or 20m or 30m size?

    2005 Nigel Trodd

    e-Tutorial Exercise 2

    Time: 20 mins

    Let us visit Klinkenbergs excellent demonstrations of GIS

    operations.

    http://www.geog.ubc.ca/courses/klink/java/java_examples.html

    Can you solve the problem posed in the CROSSTAB and

    reclassification demonstration? This requires an

    understanding of how new output values are created for each

    unique combination of input values.

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    14/17

    14

    6 Some more issues in overlay analysis

    6.1 Scales of measurement

    GIS gives you immense flexibility in the way you can overlay raster data - probably too

    much flexibility for the casual user. There are some computations you can achieve which

    simply do not make sense! For instance imagine you have one map layer coded with

    different soil types given codes such as 1 (clay) or 5 (loam). And you have a second map

    layer with rainfall totals. It is perfectly possible to add, subtract, multiply etc. these two

    map layers, but in all cases the answers are nonsense. Why? Because the two sets of

    data have been collected using different scales of measurement. Rainfall is generally

    measured using values on what is known as a RATIO scale, and soil classes are

    NOMINAL or categorical data. The rainfall values are fine, you can add, subtract, divide

    etc. using ratio numbers, but you cannot apply these operators to nominal scale numbers.

    Using a nominal scale, the numbers allocated are simply labels: they may as well be

    letters A - E. 5 on the soil scale is not five time larger than 1, neither is it one unit more

    than 4. In fact, the only thing we can say about different values on a nominal scale is

    that the property is 'different'. Therefore, multiplying, or adding soil type to rainfall

    produces a meaningless result.

    So, although the GIS will let you perform these operations, it will not tell you when they

    produce meaningless answers. It is important, therefore, to know what scale of

    measurement has been used for the measurement of your data. Scales of measurement

    are summarised below together with the details of the operations possible on each type of

    data. Although the problem is often associated with analysis on raster data models

    because many vector-GIS are supported by a database that recognises alphanumeric

    values you should be aware that knowledge of measurement scale is fundamental to any

    data processing work.

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    15/17

    15

    The table below summarises the operations possible on the different types of

    measurement (adapted from Unwin, 1981):

    Table 2. Scales of measurement.

    Level Basic operations Examples

    Nominal Frequency (count)

    Recognition of equality

    Name (of person or road),

    Address (postcode), House

    type (detached, semi-

    detached, terrace,

    apartment), Colour

    2005 Nigel Trodd

    Scales of Measurement summary

    Nominal : such as I.D. number or soil type. Such numbers have no meaning,

    they simply represent distinct categories. So, cities given a reference number,

    or telephone numbers, are examples of measurements on a nominal scale. The

    only relationship between numbers on a nominal scale is one of identity.

    Ordinal : such as positions in a competition - the order is important. It is

    possible to rank data, but we know nothing about other numerical relationships

    between data. For example, we can rank cities in terms of their population

    totals, with city number 1 having the highest total. However, the city with rank 2

    will not have a population half that of the city with rank 1, but we do know that

    the population of city 2 is smaller than that of city 1.

    Interval : such as temperature measured in centigrade or Fahrenheit. There is

    no real zero but intervals between integers are equal. Temperature data, in

    common with other interval data can be added and subtracted (for example to

    find daily temperature range from the maximum and minimum) but we cannot

    say 20 oC is twice as hot as 10 oC.

    Ratio : such as distance. There is a real zero, negatives are possible, intervals

    between numbers are equal and so is order. Ratio scale data can be added and

    subtracted and have ratio properties. Thus we can say 20/10 equals 30/15.

    Each scale has the property described by its name and, below nominal scale,

    has all the properties of the one above.

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    16/17

    16

    Level Basic operations Examples

    Ordinal Determination of order (rank) Grade (A-B-C-D-E), Tax

    band (High rate, Standard

    rate, Basic rate)

    Interval Addition, subtraction Temperature (degrees

    Celsius), date

    Ratio Addition, subtraction,

    multiplication, division

    Distance, rainfall, income

    6.2 Scale and overlay analysis

    Berry (1991) identified scale as a cause of error that may be incorporated almost

    effortlessly into overlay analysis. Overlay analysis can be implemented on any pair of

    inputs if they cover the same spatial extent. Many analysts, however, ignore the

    consequences of combining data of different scales. Two maps at very different scales

    are frequently the product of very different data modelling exercises e.g. the GB Ordnance

    Survey produces 1:10,000, 1:50,000 and 1:250,000 data products that have been

    maintained separately and are designed for different purposes.

    7 What have you learnt in this lesson?

    The integration of spatial data is at the heart of GIS and area-on-area and weighted

    overlay epitomise the analysis of multiple data layers. They were some of the first

    operators implemented in GIS in its' early years and have attracted considerable attention

    from both researchers to extend the range of algorithms and investigate the

    consequences of different algorithms and software developers to improve efficiency in the

    implementation of algorithms.

    Vector algorithms for area-on-area overlay analysis are elegant and intuitive but

    computationally demanding. They also produce sliver polygons in their thousands. These

    problems can be reduced by adopting a set of heuristics and implementing additional

    processing to clean up the unwanted artefacts. It remains highly desirable to design a

    smart strategy when using polygon overlay. Even so, I suspect that more overlay analysis

    is performed on the raster data model.

    Area-on-area and weighted overlay are simple and quick to apply to the raster data model

    2005 Nigel Trodd

  • 7/28/2019 Trodd Overlay Areas and Surfaces

    17/17

    17

    if the grids are aligned and of equal cell size. The inherent weaknesses of the raster data

    model become apparent in post-processing when the analyst might be faced with making

    some arbitrary decisions as to the meaning of the output.

    2005 Nigel Trodd