trodd overlay areas and surfaces

7/28/2019 Trodd Overlay Areas and Surfaces

1/17

1

OVERLAY ANALYSIS: AREAS AND SURFACES

TABLE OF CONTENTS

1 The importance of overlay in GIS..................................................................................................... 22 Learning objectives............................................................................................................................2

3 Area and surface overlay analysis......................................................................................................3

3.1 A brief history............................................................................................................................ 3

3.2 Field and feature perspectives....................................................................................................3

4 Polygon-on-polygon overlay operations............................................................................................44.1 Topological issues......................................................................................................................4

4.2 Creating new geometries............................................................................................................4

4.3 Weighted overlay and the vector data model.............................................................................6

4.4 Problems with overlay on vector data model.............................................................................7

4.4.1 Computational demands.....................................................................................................7

4.4.2 Sliver polygons...................................................................................................................85 Area-on-area overlay operations on raster data............................................................................... 10

5.1 Overlay analysis on raster data is easy! .................................................................................. 10

e-Tutorial Exercise 1................................................................................................................. 10

5.2 Difficulties of overlay on raster data models........................................................................... 11

5.2.1 What do the numbers mean?............................................................................................11

e-Tutorial Exercise 2................................................................................................................. 135.2.2 Constructing area entities.................................................................................................13

5.2.2 Cell resolution.................................................................................................................. 13

6 Some more issues in overlay analysis..............................................................................................146.1 Scales of measurement.............................................................................................................14

6.2 Scale and overlay analysis....................................................................................................... 16

7 What have you learnt in this lesson?............................................................................................... 16

2005 Nigel Trodd


2/17

2

OVERLAY ANALYSIS: AREAS AND SURFACES

1 The importance of overlay in GIS

Overlay is a fundamental spatial operation. It is one of the functions that distinguishes

GIS from other systems such as CAD and DBMS. The UK Chorley Report (Department of

the Environment, 1986) illustrates what a GIS should be able to do by giving the example

of an industrial siting case study that uses overlay to 'sieve' the various siting criteria and

identify suitable locations.

Overlay operators combine data from the same entity type or different entity types. In

both cases they create new geometries and can change entity type and/or attribute value.

There are four overlay operators in common use:

point-in-area (also known as point-in-polygon)

line-in-area

area-on-area (also known as polygon-on-polygon)

weighted overlay

In this lesson we will concentrate on two of these operators, namely area-on-area and

weighted overlay. These operators process area and surface entity types respectively.

You will find that overlay techniques vary with the data model employed by your GIS. This

means that the results of overlay analysis depend on the data model and, in general,

techniques to analyse vector data are time consuming and computationally intensive

whereas overlay of raster data is relatively straightforward, quick and efficient.

2 Learning objectives

Upon completion of this lesson you should be able to:

Identify and explain techniques to perform area-on-area overlay on raster and

vector data.

Identify and explain techniques to perform weighted overlay on raster and vector

surfaces.

Understand the main weaknesses of these overlay operations as they are

implemented in GIS.

2005 Nigel Trodd


3/17

3

3 Area and surface overlay analysis

3.1 A brief history

The principles of area-on-area overlay pre-date GIS. Until the arrival of GIS, map overlay

analysis was performed manually by superimposing transparent acetates of map layers on

a light table. The stack of acetates was used to visually identify sites that met a number of

criteria.

In the 1960s the 'quantitative revolution' heralded a new era for spatial analysis. Several

influential figures emerged including Ian McHarg whose work in landscape ecology is most

well known for its attempt to explain the distribution of plants by combining information

about the environment. His approach was to apply sieve-mapping techniques. This was

a substantial step forward in computational spatial analysis, allowing considerably more

work to be performed using the computer than was originally possible from field

observation and single coverage cartography alone (Simpson, 1989). At the same time

the first GIS prototypes were being developed and it is not coincidental that early products

were designed, in part, to automate this `sieve mapping. Speedy and efficient analytical

techniques such as these were of particular interest to governments intent on examining

spatial relationships of large regions. Work done under the auspices of the Canadiangovernment in developing CGIS was largely responsible for increasing the prevalence of

polygon overlay in GIS.

3.2 Field and feature perspectives

Overlay analysis has generally taken the form of either area-on-area overlay or weighted

overlay depending on your perspective. The former is more concerned with the analysis

of particular features and adopts a discrete object perspective. The objectives of area-on-

area overlay are to determine whether two features overlap (the technical term is to

'intersect') and, if so, to define the identity of areas formed by the overlap as one or more

new area objects.

Weighted overlay operations combine two or more complete map layers consisting of

areas or surfaces. In addition to computing the identity of the new geometries the

objective of weighted overlay is to compute new attribute values. Because the operation

processes the complete data set so boundaries from all inputs will be retained but

broken into shorter fragments by intersections that occur between boundaries in one input

2005 Nigel Trodd


4/17

4

dataset and boundaries in another.

In both area-on-area and weighted overlay the output entity type is an area or surface

respectively but the overlay operation has generated new geometries and new attributes.

4 Polygon-on-polygon overlayoperations

4.1 Topological issues

If you are overlaying two vector map layers you need to ensure before you start that the

input map layers are topologically correct. If this is so then the output maps will also be

topologically correct (Figure 1).

Figure 1. Polygon-on-polygon overlay.

Polygon-on-polygon overlay New geometries

In polygon overlay it is necessary to add new intersections (nodes) and create new

polygons to retain topology. Overlaying 2 sets of polygons can produce a large number of

new polygons and increase the number of nodes and arcs. In Figure 2, for example, the

number of nodes increased by 75% and the number of arcs by 83%. Warning!! Increasing

the number of input data sets can rapidly increase the number of output features.

The algorithms to compute the location of new nodes are the same as those used for line

intersection. Once these have been identified so the arcs need to be split and then thenew topology constructed.

4.2 Creating new geometries

Once the new set of nodes, arcs and polygons have been created the task is to extract a

meaningful set of polygons. It may be desirable to retain only that area that is common to

both input features. For example, a farmer is interested in knowing that part of a field that

has a loam soil. He is able to overlay the map of loam soil polygons on field polygon to

extract a feature that meets both criteria (loam soil AND in-field).

2005 Nigel Trodd


5/17

5

Figure 2. Creating geometries in overlay operations on the vector data model.

Figure 3. Polygon overlay: intersection

polygon a AND polygon b new feature geometry

(old boundaries dissolved)

It is worth noting that the variables the farmer is processing are both of categorical (or

nominal) data type. This is because mathematicians have developed a suite of algorithms

to analyse these data, known as Boolean operators, that GIS analysts exploit in area-on-

area overlay analysis.

2005 Nigel Trodd


6/17

6

In the example the farmer was analysing 2 criteria and applied an algorithm to create a

new geometry that met both criteria the area of intersection of polygon a AND polygon

b. In other situations you may be more interested in features that meet either criteria

(polygon a OR polygon b). The algorithm is known as union and the effect is to retain all

parts of both input polygons in the output feature. Likewise other Boolean operators

frequently used in GIS are NOT and XOR.

Figure 4. Polygon overlay: (a) Union, (b) NOT and (c) XOR.

a)

polygon a OR polygon b Union

b)

polygon a NOT polygon b Only parts of polygon a that are outside

polygon b

c)

polygon a XOR polygon b The inverse of intersection

NOT (polygon a AND polygon b)

As well as the mathematical rigour conveyed by the use of Boolean operators a strength

of the basic polygon overlay in that it is intuitive when applied to a vector data model

because we are handling discrete area objects and nominal attributes.

4.3 Weighted overlay and the vector data model

In the basic area-on-area overlay on a vector data model the objective was to identify one

or more parts of the new geometry that met simple criteria. Areas that did not meet the

2005 Nigel Trodd


7/17

7

criteria were discarded. This was processed as a single task.

The objective of weighted overlay is to calculate a new set of values for the complete

coverage based on a combination of input values. When working with a vector data

model there are two tasks to perform (i) create a new set of geometries for the entire

area and (ii) compute a new set of attributes for those geometries.

The latter task is a matter of describing a mathematical equation to process the input

values. The first task, however, requires you to extend the basic polygon overlay

operation to consider every intersection between all polygons in every data layer. As you

can imagine this can be computationally demanding, especially if the GIS you are using

computes topology 'on the fly' and does not store it in the data structure. As we shall see

this is one of the reasons why weighted overlay is more frequently applied to a raster data

model.

4.4 Problems with overlay on vector data model

4.4.1 Computational demands

The data file produced as a result of polygon overlay may be considerably larger than the

original because lines have been split into smaller segments and new nodes and polygons

have been created. Although more file space is required to store the outputs a morecommon problem is that some implementations of polygon overlay in GIS require large

amounts of memory or temporary file space to hold intermediate products during the

processing. The result is that most GIS are limited in the number of polygons that they

can handle in a polygon overlay operation.

It is fairly obvious that larger map layers will take longer to process. It is therefore prudent

to develop a strategy to minimise processing time (and memory use). A data processing

strategy is particularly important if your GIS has to compute topology 'on the fly' e.g. ESRIArcView 3 and ArcGIS 8, because this increases the computational demands. My advice

is to design your analysis so that the fewest number of features are overlaid.

Example: generate information on the area of coniferous forest in Bavaria.

Poor strategy:

Intersect all states in Germany with all land cover types (wait three

hours....)

Select by attribute to extract coniferous forests in Bavaria.

2005 Nigel Trodd


8/17

8

Smart strategy:

Select Bavaria

Reclass by attribute all coniferous forest land cover (and all other

non-coniferous forest land cover)

Intersect reclassified coniferous forest with Bavaria (wait 5 mins...)

The smart strategy requires an extra step in processing but will substantially reduce the

computational load.

4.4.2 Sliver polygons

Rogue or spurious polygons that are produced as a result of overlay are commonly known

as sliver polygons. If you overlay two sets of data with the same area entities that have

been acquired from different sources or have been digitised twice from the same source

then you will almost certainly encounter such polygons.

Figure 5. Sliver polygons caused by digitising the same line twice.

The two versions of such boundaries will not be coincident and as a result large numbers

of small sliver polygons will be created by the polygon overlay process.

Figure 6. Sliver polygons along the boundaries of administrative units.

2005 Nigel Trodd


9/17

9

There are two approaches to eliminating them:

1. close them during processing.

2. eliminate them afterprocessing.

Removing them automatically during processing is normally done using a user-defined

tolerance. The analyst adjusts the tolerance to create an optimal solution. If the tolerance

is too big then lines which are close together, but actually separate, may be joined.

Figure 7. Setting tolerance to close sliver polygons.

The alternative is to remove slivers after processing. This may speed up the actual

overlay processing but requires a degree of intelligence for a computer to be able to

distinguish between real and sliver polygons. There are several differences between

typical (real) polygons and sliver polygons.

Figure 8. Real and sliver polygons.

'Real' polygons

Sliver polygons

2005 Nigel Trodd


10/17

10

'Real' polygons Sliver polygons

Size and shape vary Generally small, long and thin

Generally more than two bounding arcs Generally only two bounding arcs

Attributes vary randomly between

neighbouring polygons

Attributes may alternate between adjacent

polygons

Usually three arc intersections Four arc intersections generally

Once the sliver polygons have been identified they can be closed by replacing them with a

central line.

5 Area-on-area overlay operations on

raster data5.1 Overlay analysis on raster data is easy!

If two grids are aligned and have the same grid cell size then it is relatively easy to

perform overlay operations. A new layer of values is produced from each pair of

coincident cells. The values of these cells can be added, subtracted, divided or multiplied,

the maximum value can be extracted, mean value calculated, a logical expression

computed and so on. The output cell simply takes on a value equal to the result of the

calculation.

2005 Nigel Trodd

e-Tutorial Exercise 1

Time: 20 mins

Let us return to Klinkenbergs excellent demonstrations of GIS

operations.

http://www.geog.ubc.ca/courses/klink/java/java_examples.htm

l

Use the Binary Overlays demonstration to investigate the

effects of different Boolean operators on two layers.


11/17

11

Figure 9. Some mathematical operators for overlay operations on the raster data model.

Input layers Output layer

A

Simple addition

A + B = C

C

Multiplication

A * B = D

D

B

Unique conditions

If A =1, B =1 then E = 1

If A = 2, B = 1 then E = 2

If A = 1, B = 2 then E = 3

If A = 2, B = 2 then E = 4E

5.2 Difficulties of overlay on raster data models

The main problems are not technical GIS problems, they are data problems.

5.2.1 What do the numbers mean?

The simplicity of the operator makes the overlay process very easy to implement.

Problems usually start with interpreting the outputs. For example, to identify an area that

meets criteria on two inputs (intersection) can be done one of two ways. The most logical

2005 Nigel Trodd


12/17

12

approach is to reclass the cell values in each layer as either 0 or 1 to indicate whether

they meet the criteria or not and then multiply. The extra effort to reclassify 2 layers is

time consuming and many analysts will seek to multiply the inputs and then reclassify the

output. This reduces the effort but might not always produce a set of unambiguous output

values. For example, using the farmer interested in identifying that part of a field that has

a loam soil then if the loam soil is coded 3 and the field number is 2 then he should

look for cells in the output layer with a value of 6. The problem is that other

combinations of inputs can generate the same value 4 and 2, 6 and 1. The problem is

caused by the analyst and in my experience it happens far too frequently with the results

being published without anyone being aware of the consequences. Perhaps this is

because of the widespread availability of such easy-to-use operators.

The problem of an output layer not having unique records is not restricted to multiplication.

Jenks has illustrated how the same problem can be caused by addition and it is easy to

show the problem arises in all mathematical operators if the analyst is unaware of the

meaning behind the data.

Figure 10. Ambiguities in the output of overlay operations on raster data.

2005 Nigel Trodd


13/17

13

5.2.2 Constructing area entities

A problem in the overlay analysis of raster data models is the correct identification of area

features in the output because, unlike polygon-on-polygon analysis on the vector data

model, there is no intuitive geometry. Each cell is processed individually and the analyst

has to create new geometries based on only the new cell attributes. The operator does

not distinguish between area entities and surface entities. The analyst is faced with at

least 2 questions (i) has the mathematical computation produced unambiguous cell

values i.e. each value has a single, distinctive meaning?, and (ii) should diagonally

adjacent cells with the same value be part of the same area feature in the output or should

adjacency be defined in terms of horizontal and vertical neighbouring cells.

5.2.2 Cell resolution

Resolution is the pixel, grid cell or mesh size of spatial data. For example, remotely

sensed multispectral data from the SPOT satellite has a resolution of 20m. This means

that each pixel in the image represents a ground area of size 20m by 20m. Imagine you

wish to overlay a SPOT image with a raster representation of urban fox population

densities which has been coded with a 10mpixel size. Will your output have a resolution

of 10m or 20m or 30m size?

2005 Nigel Trodd

e-Tutorial Exercise 2

Time: 20 mins

Let us visit Klinkenbergs excellent demonstrations of GIS

operations.

http://www.geog.ubc.ca/courses/klink/java/java_examples.html

Can you solve the problem posed in the CROSSTAB and

reclassification demonstration? This requires an

understanding of how new output values are created for each

unique combination of input values.


14/17

14

6 Some more issues in overlay analysis

6.1 Scales of measurement

GIS gives you immense flexibility in the way you can overlay raster data - probably too

much flexibility for the casual user. There are some computations you can achieve which

simply do not make sense! For instance imagine you have one map layer coded with

different soil types given codes such as 1 (clay) or 5 (loam). And you have a second map

layer with rainfall totals. It is perfectly possible to add, subtract, multiply etc. these two

map layers, but in all cases the answers are nonsense. Why? Because the two sets of

data have been collected using different scales of measurement. Rainfall is generally

measured using values on what is known as a RATIO scale, and soil classes are

NOMINAL or categorical data. The rainfall values are fine, you can add, subtract, divide

etc. using ratio numbers, but you cannot apply these operators to nominal scale numbers.

Using a nominal scale, the numbers allocated are simply labels: they may as well be

letters A - E. 5 on the soil scale is not five time larger than 1, neither is it one unit more

than 4. In fact, the only thing we can say about different values on a nominal scale is

that the property is 'different'. Therefore, multiplying, or adding soil type to rainfall

produces a meaningless result.

So, although the GIS will let you perform these operations, it will not tell you when they

produce meaningless answers. It is important, therefore, to know what scale of

measurement has been used for the measurement of your data. Scales of measurement

are summarised below together with the details of the operations possible on each type of

data. Although the problem is often associated with analysis on raster data models

because many vector-GIS are supported by a database that recognises alphanumeric

values you should be aware that knowledge of measurement scale is fundamental to any

data processing work.

2005 Nigel Trodd


15/17

15

The table below summarises the operations possible on the different types of

measurement (adapted from Unwin, 1981):

Table 2. Scales of measurement.

Level Basic operations Examples

Nominal Frequency (count)

Recognition of equality

Name (of person or road),

Address (postcode), House

type (detached, semi-

detached, terrace,

apartment), Colour

2005 Nigel Trodd

Scales of Measurement summary

Nominal : such as I.D. number or soil type. Such numbers have no meaning,

they simply represent distinct categories. So, cities given a reference number,

or telephone numbers, are examples of measurements on a nominal scale. The

only relationship between numbers on a nominal scale is one of identity.

Ordinal : such as positions in a competition - the order is important. It is

possible to rank data, but we know nothing about other numerical relationships

between data. For example, we can rank cities in terms of their population

totals, with city number 1 having the highest total. However, the city with rank 2

will not have a population half that of the city with rank 1, but we do know that

the population of city 2 is smaller than that of city 1.

Interval : such as temperature measured in centigrade or Fahrenheit. There is

no real zero but intervals between integers are equal. Temperature data, in

common with other interval data can be added and subtracted (for example to

find daily temperature range from the maximum and minimum) but we cannot

say 20 oC is twice as hot as 10 oC.

Ratio : such as distance. There is a real zero, negatives are possible, intervals

between numbers are equal and so is order. Ratio scale data can be added and

subtracted and have ratio properties. Thus we can say 20/10 equals 30/15.

Each scale has the property described by its name and, below nominal scale,

has all the properties of the one above.


16/17

16

Level Basic operations Examples

Ordinal Determination of order (rank) Grade (A-B-C-D-E), Tax

band (High rate, Standard

rate, Basic rate)

Interval Addition, subtraction Temperature (degrees

Celsius), date

Ratio Addition, subtraction,

multiplication, division

Distance, rainfall, income

6.2 Scale and overlay analysis

Berry (1991) identified scale as a cause of error that may be incorporated almost

effortlessly into overlay analysis. Overlay analysis can be implemented on any pair of

inputs if they cover the same spatial extent. Many analysts, however, ignore the

consequences of combining data of different scales. Two maps at very different scales

are frequently the product of very different data modelling exercises e.g. the GB Ordnance

Survey produces 1:10,000, 1:50,000 and 1:250,000 data products that have been

maintained separately and are designed for different purposes.

7 What have you learnt in this lesson?

The integration of spatial data is at the heart of GIS and area-on-area and weighted

overlay epitomise the analysis of multiple data layers. They were some of the first

operators implemented in GIS in its' early years and have attracted considerable attention

from both researchers to extend the range of algorithms and investigate the

consequences of different algorithms and software developers to improve efficiency in the

implementation of algorithms.

Vector algorithms for area-on-area overlay analysis are elegant and intuitive but

computationally demanding. They also produce sliver polygons in their thousands. These

problems can be reduced by adopting a set of heuristics and implementing additional

processing to clean up the unwanted artefacts. It remains highly desirable to design a

smart strategy when using polygon overlay. Even so, I suspect that more overlay analysis

is performed on the raster data model.

Area-on-area and weighted overlay are simple and quick to apply to the raster data model

2005 Nigel Trodd


17/17

17

if the grids are aligned and of equal cell size. The inherent weaknesses of the raster data

model become apparent in post-processing when the analyst might be faced with making

some arbitrary decisions as to the meaning of the output.

2005 Nigel Trodd

trodd overlay areas and surfaces

Documents