gis data capture hardware and software

Upload: bernard

Post on 05-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Gis Data Capture Hardware and Software

    1/61

    GIS DATA CAPTUREHARDWARE AND SOFTWARE

  • 7/31/2019 Gis Data Capture Hardware and Software

    2/61

    INTRODUCTION

    Progress in the commercial application of GIS technology isin practice more likely to be limited in the foreseeablefuture by the rate at which GIS databases can be populatedrather than by shortcomings in the applications software. Itis now widely accepted in the literature, and has beenapparent for some time to the practitioners, that the cost ofdata collection (or data capture) is by far the dominantcomponent of overall GIS project costs. For example, Prof.G. Konecny in his Keynote Paper to the fourth EuropeanAM/FM Conference in 1988 (Konecny 1988) analysed arange of mature Land Information System projects andconcluded that acquisition of the data for the databaseconstituted the single largest expenditure element, between38 and 84 per cent of total cost. The larger the project, theless the hardware and software costs mattered.

    .

  • 7/31/2019 Gis Data Capture Hardware and Software

    3/61

    Certain GIS applications can be supported entirely bydata in raster form, but for many GIS purposes datahave to be available in the featurecoded, vector form.Increasingly there is a requirement for structured

    data, either in a link and node form or in some object-oriented .In practice, with the growth of hybrid raster/vectorGIS capabilities, both forms of data are required. Thispresentation describes hardware and softwaretechniques for raster and vector data capture, and the

    associated issues of data structuring and attributetagging.

  • 7/31/2019 Gis Data Capture Hardware and Software

    4/61

    The data capture process canbe split into two differentoperations:

    . Primary data collection, for examplefrom aerial photography or fromremote sensed imagery.

    .Secondary data collection, forexample from conventionalcartographic sources.

  • 7/31/2019 Gis Data Capture Hardware and Software

    5/61

    Once the data are integrated into the database for analysismany issues are raised, for example:

    What was the source of information for the map and what are thecharacteristics of this source? What was the inherent precision ofthe source

    materials?

    What interpretation was applied in the mapping process?

    Where there multiple sources?

    Is the categorization of data defined, for example what constitutesthe difference between urban open space on the periphery of anurban area and non-urban land use?

    Is the categorization applicable to the current GIS application?

  • 7/31/2019 Gis Data Capture Hardware and Software

    6/61

    These and further aspects of secondary data havegiven many scientists and end applications users aninstinctive preference towards primary data capturewhere greater control and specificity can be applied.

    While tailored survey is possible in some instances, itis frequently ruled out on the grounds of cost and theelapsed time necessary to undertake the work. Thishas particularly been the case where comprehensive,large area coverage is required of topography andrecourse is made to national generalized map series

    information. In practice both forms of data areessential to GIS and for practical reasons secondarydata dominate. Both approaches are discussed below.

  • 7/31/2019 Gis Data Capture Hardware and Software

    7/61

    PRIMARY DATA CAPTURE

    The introduction ofremote sensing, particularly from theearly 1970s with the land resource satellites, produced aclimate of expectation that generalized access to primarydata sources would be available together with automatedtechniques of land use and feature extraction. Undulyoptimistic attempts at totally automated classification ofimagery led to a period of scientifically interesting, butultimately unsuccessful, research into ever moresophisticated attempts to use remotely sensed data in anisolated image processing environment.

  • 7/31/2019 Gis Data Capture Hardware and Software

    8/61

    The early approaches to using digital remotelysensed data were preoccupied by the issues ofhandling and realtime processing of large raster datasets and the desirable full colour display capability

    needed for visualization. These systems employeddedicated and often specialized pipeline or parallelprocessing hardware. The software was focused onraster data manipulations and classification of theimage data. The image processing environmentswhich emerged were very different from the typically

    vector graphics-based cartographic and mappingsystems.

  • 7/31/2019 Gis Data Capture Hardware and Software

    9/61

    The evolution of hardware and software for primary data collection has,however, been more rapid in the late 1980s. Direct surveying techniquesemploying in-the-field digital recording, GPS technology for precisionpositioning and vehicle location tracking are becoming routine tools . Theuse of remotely sensed data is gradually offering the theoretical benefitsidentified in the early 1970s. These benefits are only being achieved,however, by introducing a radical change of thinking in the user community.

  • 7/31/2019 Gis Data Capture Hardware and Software

    10/61

    The new hardware and software resulting from thistrend reflect the need for an open architecture andpowerful integrated data processing environment withgood visualization. This is met in hardware terms by

    the current and emerging generation of desktopworkstations. These offer an easily network ableenvironment for local or wide area processing yetprovide locally to a single user a dedicated fastprocessor (e.g. more than 20 MIPS), large memory(e.g. more than 24 Mb) and substantial disk storage

    capacity (e.g. over 1 Gb) in a desktop .

  • 7/31/2019 Gis Data Capture Hardware and Software

    11/61

    The corresponding evolution of software has requireda much greater consideration to be given to theoverall design of GIS. This has been necessary toensure access to multiple data structures, to allowgreater attention to be given to quality assurance and

    error train analysis, and to provide a user interfacethat gives a consistent view of all data and facilitiesavailable rather than one specific to a single type ofdata. Much has been achieved in the interim bydeveloping more efficient links between remotesensing packages and mapping systems, as in the

    ARC/INFO-ERDAS 'live-link'. This approach is nowbeing replaced, however, by new 'integrated GIS'packages which incorporate the fundamental redesignnecessary .

    C t l d i f I t t d

  • 7/31/2019 Gis Data Capture Hardware and Software

    12/61

    Conceptual design of an IntegratedGIS.

  • 7/31/2019 Gis Data Capture Hardware and Software

    13/61

    SECONDARY DATA CAPTURE

    Despite the perceived benefits ofprimary data, for the immediatefuture the largest source for GIS datawill continue to be existing maps. Insome countries, the scale of mapdigitizing programmers is

    considerable, with near term targetsof substantial or complete coverageat certain scales.

    F t hi h ff t th t

  • 7/31/2019 Gis Data Capture Hardware and Software

    14/61

    . Factors which affect the capturetechniques that can be used includethe following:

    Maps are accurate and to scale. The scanner or digitizer, and theprocessing algorithms, must deliver a high and consistent planmetric accuracy. .

    Maps are high resolution and contain fine detail. Line weights of0.004 in (0.1 mm) are commonplace.

    Maps contain a wide variety of symbolization and linestyle, and

    different map series differ widely in these respects. Maps, in particular small-scale maps, are multicoloured

    documents. Map sheets represent parts of a large continuum and edge

    matching is generally required. .Maps are multi-purpose documents and, in consequence, map

    data formats and quality standards have to support a range of

    users and applications. The possibilities for redesign of maps inorder to improve data capture have been discussed at length byShiryaev (1987) but generally the needs of data capture have hadlittle impact on map design.

    Maps, like all other documents come in variable qualities. Thepaper map is subject to substantial distortion when folded, orwhen affected by varying humidity.

  • 7/31/2019 Gis Data Capture Hardware and Software

    15/61

    . In addition to the definition of the format and structure of the

    digital data required, the following factors have to be defined toarrive at an adequate specification of the data capture task:

    Accuracy.A traditional specification is to require that the digital datarepresent the source to within one line width (or a half line width). It isimportant that any automated conversion process provides an inbuilt

    accuracy check to the required tolerance. Representation.Data volumesshould be minimized, for example a rectangle should be represented by fourpoints. Different classes of features require different representations.

    Abstraction.For some classes of features the task is not to reproduce thegeometry on the map, but an abstraction, for example cased roads bycentre lines, broken lines by coordinate strings with appropriate codes,point symbols by coordinate pairs with symbol code text by ASCII codes.

    Selection/completeness. In many cases, not all the information recorded on

    the source map is required in the GIS database. None of the required datamay be omitted or repeated. The specification is not complete until the external Quality

    Assurance procedure is also defined.

  • 7/31/2019 Gis Data Capture Hardware and Software

    16/61

    HARDWARE Manual digitizers The most commonly used equipment for graphical data capture is the manual digitizing

    table or tablet. The key elements of the technology are the digitizing surface and thepuck or cursor used bythe operator to record coordinates. The whole may be regardedconceptually as a 'reverse drawing board'. The surface may be translucent, providing abacklit capability which is very useful for film, as opposed to paper, source documents.In the most widely used technologies, the surface contains a precise grid of current-carrying fine wires. The precision of this grid determines the basic accuracy of the

    table. Accuracy specifications are typically expressed as root-me an-square (RMS)deviations from a true and square grid over the active area of the table. The cursorused bythe operator to record coordinate measurements consists of a target (usuallycross-hairs), possibly viewed under magnification, embedded in a conveniently heldpuck. This incorporates buttons used to trigger coordinate measurement and tocommunicate with the controlling software. Sometimes 'stream digitizing' is used tocontrol the frequency of coordinate measurement, on a time or distance basis. Typicallythe electrical interface to the receiving computer system is a serial line and the dataformat is coordinate strings in ASCII format.

    Operator fatigue is the major issue in manual digitizing, and table ergonomics are of

    crucial importance. A goodmodern design such as the Altek table illustrated in Plate17.1 incorporates fully adjustable height and tilt, variable backlighting and alightweight cursor with four control buttons and 16 feature coding buttons. Accuraciestypically range from 0.003 inch (0.075mm) to 0.010 inch (0.25mm). Digitizing tabletsare similar in concept

    to digitizing tables, offering reduced accuracy at lower cost.

  • 7/31/2019 Gis Data Capture Hardware and Software

    17/61

    HARDWARE Scanners A scanner is a piece of hardware for converting an analogue source document into digital raster form.

    The most commonly encountered scanner in everyday life is the FAX machine, The key characteristics of a scanner reflect the documents it can handle (size, accuracy and speed) and

    the nature of the data it produces (resolution, greyscale and colour). All scanning involves systematicsampling of the source document, byeither transmitted or reflected light. The fundamental technologyused in many scanners is the charge-coupled device (CCD) array. CCD arrays are available as one- ortwo-dimensional regular rectangular structures of light-sensitive elements. Two-dimensional arrays arenot at present economically available at resolutions useful for map source documents. A single linearCCD array typically has a resolution of5000 elements. The key decision in utilizing this in a scannerdesign is whether to move the document or the scanning element. A low cost arrangement involvesscanning a single linear CCD array over a magnified image of the source document - the so-called'digital camera'. These are extremely rapid in operation (a whole image in a matter of seconds) but arecurrently restricted in resolution to about 5000 by 6000 elements.

    For most GIS applications, a larger information content (resolution) is required than can be provided bydigital cameras and scanners based on multiple linear CCD arrays are necessary. A commonly usedarrangement is the 'continuous feed' scanner illustrated in Plate 17.2, in which the document is passedrapidly bya set of, say, five or ten concatenated linear arrays, or CCD cameras. The key elements arethe pinch roller document handler (accuracy in the direction of document motion is determined entirelybythis component) and the alignment optics (see Plate 17.3). The hardware design must manage theoverlap between the individual CCD cameras in a mechanically stable manner, in addition to

    compensating for differences in sensitivity between them. Continuous feed scanners provide highthroughput rates [an AO sheet at a resolution of 500 dots per inch (dpi.) or 20 dots per millimeter in afew minutes] at reasonable cost, with accuracies of the order of 0.02 inches (0.5 mm). Input widths upto 60 inches (150 cm) are available and document length is theoretically unlimited. Documents can bepaper, film, vellums, sepias, linen or cardboard.

  • 7/31/2019 Gis Data Capture Hardware and Software

    18/61

    HARDWARE

    Workstations and data compaction The pace of advances in workstation technology is such that

    the availability of computing power is rapidly ceasing to bea limiting factor in GIS data capture applications,particularly in distributed systems using local area network

    (LAN) or cluster technology. Standard platforms with opensystem architectures and windowing environments needlittle or no augmentation. Special purpose parallel hardwaremay have some place, but it is at least arguable that thesame results will be achieved by tomorrow's conventionalprocessors. Despite the advances in optical storagetechnology, the widespread use of raster data is likely to

    pose a continuing requirement for data compactionimplemented in hardware or by software. Since thisrequirement is particularly associated with GIS applications,it is appropriate to outline some of the principles involved.

    The simplest forms of raster data compaction use run length encoding (RLE) based

  • 7/31/2019 Gis Data Capture Hardware and Software

    19/61

    The simplest forms of raster data compaction use run length encoding (RLE), basedon the observation that it takes fewer bits to say '123 blank pixels' - namely 7 bitsfor 123 and 1 control bitthan 123 bits each zero (see discussion in Egenhofer andHerring 1991 in this volume and Blakemore 1991 in this volume). PackBits (AldusCorporation 1988) is a byte-oriented run length scheme modified to handle literaldata for areas that do not compress well.

    Where space efficiency is paramount, the CCITT-3 and -4 standards established by

    the International Telecommunications Union (!TU) for facsimile transmission havebecome de facto standards (CCITT 1985). Both I-D (modified Huffman RLE) and 2-D forms exist. The main characteristic of the 2-D formats is that each scan line isdescribed in terms of changes from the previous scan line. Hardware forcompression and decompression of large raster data sets is not yet readilyavailable, but tiling techniques can be used to overcome this. LZW (Lempel-Zif andWelch) is an encoding scheme (Welch 1984) which can handle all kinds of data from

    binary to full RGB colour (colour defined by its red, green and blue components) atgood compression factors, while being fully reversible. Originally designed to beimplemented in hardware, it has proved challenging to implement efficiently (Welch1984).

    All the above data compaction schemes are encompassed in TIFF - the Tag ImageFile Format devised by Aldus/Microsoft for transfer of image data between desktoppublishing systems (Aldus Corporation 1988). This is now seeing increased use formap image data and is becoming a de facto standard for scanner output formats.Extensions are also under way to encompass tiled raster images. TIFF defines astandard housekeeping header around the various encodings. It is an example of astandard arising in the wider information

    technology context, but having relevance to geographicalinformation systems.

  • 7/31/2019 Gis Data Capture Hardware and Software

    20/61

    SOFTWARE

    Manual data capture The reader will have already observed incursions across the

    hardware/software divide, if such can be said to exist. Software formanual data capture using digitizing tables is sufficiently wellestablished not to need detailed description and in any case it isreviewed in Rhind (1974), Marble, Lauzon and McGranaghan

    (1984) and Yeung and Lo (1985). Efficiency is determined at leastas much by operator procedures and flowline design as bysoftware functionality. A macro command language is highlydesirable to enable flowlines to be efficintly tailored and mostsystems now incorporate on-line digitizing with immediategraphical feedback, including colour display of feature coding.Some protagonists still argue, however, that off-line digitizeroperation is more efficient, because of the constant operator

    distraction caused by viewing the graphics display. Despite the useof pop-up or pull-down menus, complex feature coding schemataare an unavoidable burden on the operator. Some users havereported success in using voice input to alleviate the featurecoding burden, but this technique is by no means well established.Further improvements in the cost of voice recognition technologyremain to be exploited

  • 7/31/2019 Gis Data Capture Hardware and Software

    21/61

    Overlay digitizing The widespread availability of hybrid vector/raster GIS software, or at least, of vector

    editing/drafting software supporting raster image data as backdrop, has led to newmethods of manual data capture.

    Such systems were originally developed as 'interim solutions' which allowed many GISapplications, for which map data are required only as a passive background frame ofreference, to proceed in the absence of vector map data. They depend on establishinga means of registration between the vector data and the raster image, and of providing

    fast zoom and pan capabilities. These capabilities also provide the means of 'heads-up'or screen digitizing from raster images of map sources. Vector data, created either bymanual point input using a screen cursor or by use of higher level drafting functions, isimmediately displayed, superimposed on the raster source image (Plate 17.6).Accuracy is still dependent on manual positioning, augmented, albeit clumsily, by theability to work at high magnification. The content of the available display window isalso a significant limitation. Nevertheless, many protagonists have reported significantgains

    over the use of digitizing tables, particularly for large-scale maps and plans. As rasterstorage of source documents becomes more the norm, the small footprint and other

    advantages of this technique make it increasingly attractive. It is worthy of note that if greyscale

    backgrounds are supported the technique can be applied to the creation of vector datafrom remote sensing images or scanned aerial photographs. Also, if interactivethresholding of the greyscale background is available, useful data can be captured frompoor quality source documents. A recent advance has been the use of 'raster-snapping'to improve accuracy.

  • 7/31/2019 Gis Data Capture Hardware and Software

    22/61

    Interactive automatic systems An important alternative to fully automatic raster to-vector conversion techniques is exemplified by the Laser-Scan

    VTRAK system (Waters 1989) and the Hitachi CAD-Core Tracersystem (Sakashita and Tanaka 1989). These systems involve theextraction of vector data from the raster source on a feature-by-

    feature basis, with real-time display of the results to operators,who control the overall sequence of data capture, provide theinterpretation necessary for feature coding prior to featurecapture, and intervene in the case of ambiguity or error. Thisapproach also provides for selective data capture in the frequentlyoccurring case where only some of the features present in thesource documents are required in the GIS database.

    Coding of features prior to capture provides an invaluable aid toautomatic feature extraction in that the extraction algorithm usedcan be matched to the class of feature. In an ideal system, featurerecognition would be automatic, but in practice when working withcartographic sources this goal is rarely achievable.

  • 7/31/2019 Gis Data Capture Hardware and Software

    23/61

    Since coding has to be done at some stage it is a system advantage to do it early, so that the appropriate automatic

    feature extraction algorithms can be invoked, and the appropriate data representationscreated. Thus, using the VTRAK system as an example, for contours and othercurvilinear features, a centre-line extraction algorithm and a data point reductionalgorithm (based on the Douglas-Peucker algorithm described below) which preservesshape to within prescribed tolerances is appropriate (Plate 17.7). Rectilinear featureson the other hand require vertex extraction algorithms and, in the case of buildings,

    optionally a squaring algorithm (Plate 17.8). Broken lines, the edges of solid areas andthe centre lines of cased roads can all be followed and the appropriate vectorrepresentation produced. The data produced can be either vector spaghetti or, if junction recognition is invoked, a link -andnode structure. Thus in Plate 17.9 a networkof road centre lines is being created from cased roads. Nodes and intermediate datapoints are differentiated in the data (by colour on the screen). In this mode, nodes andlinks are measured once only, are given unique values and a topological structure iscreated for further processing. Symbol measurement is also provided, for example forbuildings and cadastral symbols.

    The key elements of such systems are: the ability to zoom and pan rapidly across the

    combined raster source and vector overlay; appropriate local feature extractionalgorithms using all the available raster information; and a 'paintout' facility as a visibleand logical progress check. As features are captured, their representation in the rastersource is changed, so that they are displayed to the operator in a different colour (as'done'), and so that they are no longer 'visible' to the feature extraction algorithms.This avoids duplication, and also simplifies the data capture task as the whole processis subtractive. In cases where the source document is of variable quality, the sourceraster image can be held as greyscale. This increases the size of working files (e.g. bya factor of four). However, the ability to vary the threshold according to the context isvery powerful and enables clean vector data to be produced from unpromising material

    (Plate 17.10).

  • 7/31/2019 Gis Data Capture Hardware and Software

    24/61

  • 7/31/2019 Gis Data Capture Hardware and Software

    25/61

    and for polygon networks, particularly asthere is provision for indicating 'no-go'areas. The interactive automatic systemsoftware can be installed on a standardworkstation platform, together with editingand post-processing software. A typicalflowline is a combination of autopass,interactive feature extraction and overlay

    editing. At all stages there is a continuousvisual assessment of the resulting vectordata against the raster source, building indata quality checks as the data are created

  • 7/31/2019 Gis Data Capture Hardware and Software

    26/61

    An interesting alternative technique for the creation of structured andattributed vector data is exemplified by the SysScan GEOREC system(Egger 1990). In this, the startpoint is the set of vectors created by anautomatic raster-to-vector conversion process. Features are recognized andextracted from this set of vectors by the application of a 'production line'which can utilize combinations of more than 150 algorithms held in a'method bank'. Algorithms include vector geometry enhancements,

    methods which handle neighbourhood relationships, a statistical recognitionpackage for text and methods for replacing vectorized geometry withsymbol references. Geometrical elements are classed as nodes, symbols,lines, areas and texts. Topological information between these elements ismaintained via a set of suitable forward and backward pointers, and groupsof geometrical elements which form a logical entity can be combined insets. Recognition and structuring proceeds by sequences of operationsunder the generic descriptions of 'select', 'grow' and 'apply'. A 'productionline' is usually set up interactively, but is controlled by a programming

    language (GPL), so that once the control structures have been created forthe classes of features in a given map series, the whole process of featureextraction can be invoked automatically, with only exceptions needingsubsequent manual editing. Knowledge and experience of manual digitizingflowlines are invaluable in the development of GPL programs. Good quality,cost effective results are reported in some instances from good quality,wellbehaved source maps.

  • 7/31/2019 Gis Data Capture Hardware and Software

    27/61

    Data capture and processingalgorithms

    The resolution of the source raster image must beadequate for the geometry to be accurately extractedby the vectorization algorithms. Typically, there needsto be at least 2-3 pixels in the finest lines in order toestablish a cartographically acceptable vector

    representation. On the other hand it is important thatthe vector representation contains an optimal numberof points, approximately the same as would resultfrom an experienced manual digitizer. Superfluouspoints will clutter up GIS databases for a long time!Data point reduction is therefore an important

    requirement, and the Douglas-Peucker algorithm(Douglas and Peucker 1973), originally devised in thecontext of cartographic generalization, is widely usedfor this purpose. The principle is illustrated in nextfigure.

    l ith t d th b f

  • 7/31/2019 Gis Data Capture Hardware and Software

    28/61

    algorithm to reduce the number ofdata points required to representcurvilinear features.

  • 7/31/2019 Gis Data Capture Hardware and Software

    29/61

    The following description is adapted for thinning data points from a denseset representing a line, as they emerge from a line-following algorithmapplied to a raster source. The first point on the line is used as an 'anchor'point and the last point of the line segment currently under consideration asa 'floater'. The point with the greatest perpendicular distance from the linejoining the anchor to the floater is examined. If this distance is less thanthe prescribed tolerance, the next point along the line as extracted from theraster image becomes the floater and the process is repeated. If thetolerance is exceeded, the point preceding the floater is passed through tothe vector representation and becomes the new anchor point, and thewhole process recommences. Intermediate points from the raster image arediscarded. If the tolerance is chosen to be half the line width, say, anacceptable representation of the shape is obtained with an optimal numberof data points.

    Vertex extraction algorithms hinge on there cognition of changes ofdirection, and on the fitting of straight line segments to the data points on

    either side of the putative vertex. Special cases arise when vertices areseparated by distance comparable to the line thickness. Squaringalgorithms abound, differing in the sophistication of the control parametersthey provide.

  • 7/31/2019 Gis Data Capture Hardware and Software

    30/61

    Software for dealing with source document distortion andwith changes of geographical projection (Snyder 1987; seealso Maling 1991 in this volume) is well established. It isgood practice to ensure before any correction is applied thatthe vector data are totally congruent with the raster source,except where discrepancies are deliberate. Such checking

    can be performed on screen, by vector-on-raster overlay, orby the traditional checkplot. Such quality assuranceprocedures are treated in more detail in the next section.Distortion is typically removed by least-squares fitting to anappropriate number of control points, over and above thecorner points required to register the coordinate system. Insome instances it may be appropriate to use any

    orthophoto sources to improve or correct the control on thecartographic sources. Coordinate transformations forprojection changes can then be applied before output in therequired GIS data format.

  • 7/31/2019 Gis Data Capture Hardware and Software

    31/61

    The problems of creating a seamless continuous map froma set of not necessarily homogeneous map sheets ispeculiar to GIS applications. Quasiautomatic edge-matchingsoftware is available, but in practice the prevalence ofanomalies can dictate a considerable human input to theprocess if fully edge-matched data are a requirement.

    Techniques for organizing sheet-based source data topresent effectively continuous or 'seamless' cover are nowwell established. The practical problems arise from sourcedata overlap or inconsistency (see Fisher 1991 in thisvolume).

    The automatic creation of link-and-node structured datagreatly facilitates the creation of correct polygon or parceldata, although software is also available to create such datafrom unstructured 'spaghetti'. Again the issue isconsiderably complicated by issues of matching acrosssheet boundaries.

  • 7/31/2019 Gis Data Capture Hardware and Software

    32/61

    Quality assurance

    The key to the reduction of the burden of data capturecosts on a project is data sharing. The most importantaspect of data sharing is validation that the data areof a quality acceptable to the needs of a widecommunity of users. Validation needs to be based onobjective tests that can be externally applied howeverthe digital data are captured. Considerable effort byinterested parties in the United Kingdom led to theestablishment of agreed criteria between theOrdnance Survey and the National Joint Utilities

    Group (NJUG 1988). Although these criteria are drawnup in terms of large scale (1 : 1250 and 1 : 2500)plans, the principles are of general applicability. Eighttests are applied:

  • 7/31/2019 Gis Data Capture Hardware and Software

    33/61

    1-Data format - readability

    2-Number of data points - no more than 25 per cent excess

    3-Coding accuracy - colour code, visual check

    4-Positional accuracy - remeasure random samples; mean andstandard deviation criteria

    5-Squareness of buildings - tested on a sample basis

    6- Line junction fitting - by visual inspection on a workstation

    screen

    7-Text - by random sampling in each category

    8-Completeness - using an overlaid check plot

  • 7/31/2019 Gis Data Capture Hardware and Software

    34/61

    Given a specification and an appropriate validationprocedure, how are valid data to be captured in acost-effective and timely manner? Whether manual orautomatic techniques are used, checks and feedbackmechanisms must be built in throughout the process,

    as it is not sufficient to check quality only at the end.Automatic techniques, properly applied andcontrolled, can produce consistent and reliable dataquality much more rapidly and cheaply than manualtechniques, but flowlines must be designed so thatautomatic processes fail safe rather than producingcopious errors that are then expensive to correct.Whenever possible, structure inherent in the data(e.g. a link and node structure) should be used toensure that data are correct.

  • 7/31/2019 Gis Data Capture Hardware and Software

    35/61

    SPATIAL DATA SOURCESAND DATA PROBLEMS

  • 7/31/2019 Gis Data Capture Hardware and Software

    36/61

    INTRODUCTION

    Many of the data that are incorporated intoGIS are initially in analogue form, mostcommonly as hardcopy maps. To be used in

    GIS, however, map data must undergo aconversion process known as digitizing, alabour-intensive task which is timeconsuming and prone to error. Fortunately,

    increasing amounts of data are nowobtainable directly in digital form.

  • 7/31/2019 Gis Data Capture Hardware and Software

    37/61

    ANALOGUE DATA SOURCES

    The most important source of analogue spatial data is themap. Since prehistory, maps have been produced with thespecific purpose of recording the spatial relationshipsobserved and measured by the map's compiler. Maps areused to convey spatial

    Many of the problems encountered in developing

    geographical databases from maps have not beenhindrances to the use of maps in the past, becauseconventional uses place less stringent demands on theanalogue medium. But users of digital geographicaldatabases are often unaware of the limitations of

    conventional maps and consequently may makeunreasonable or inappropriate assumptions about the dataderived from them. The following discussion about thepotential limitations of analogue maps is based mainly onRhind and Clark (1988).

  • 7/31/2019 Gis Data Capture Hardware and Software

    38/61

    Map scale

    Scale determines the smallest areathat can be drawn and recognized ona paper map (Table 13.1).

  • 7/31/2019 Gis Data Capture Hardware and Software

    39/61

    On a topographic map at a scale of 1 : 50000. it is notpossible to represent accurately any object ofdimensions less than one line width, or less thanabout 25 m across. However, small features can beimportant, so cartographers have devised methods for

    selecting and symbolizing small but significantfeatures, even though their physical dimensions onthe ground may be less than one line width Thusmany roads and rivers which are less than 25 macross are nevertheless shown on 1 : 50000 maps.Scale may determine which rivers are shown in adrainage network or which roads in a road networkSimilarly, scale may determine whether the variousfeatures in a class, such as roads, are shown as asingle feature class or differentiated (e.g. highway,motorway, main road, minor road, etc.).

  • 7/31/2019 Gis Data Capture Hardware and Software

    40/61

  • 7/31/2019 Gis Data Capture Hardware and Software

    41/61

    A

  • 7/31/2019 Gis Data Capture Hardware and Software

    42/61

    B

  • 7/31/2019 Gis Data Capture Hardware and Software

    43/61

    By contrast, a digital database appears, initially, to beindependent of scale because it may be portrayed at anyscale. If the data were originally collected from a map ormaps, then the map scale is important because itdetermines the size of the minimum mapping area (Table13.1) and the material included and excluded. As a piece of

    information with respect to the digital data, however, it isonly an identifier of the original map series. In thedatabase, it is more appropriate to identify the map seriesexactly, and then give the accuracy of the database as arepresentation of the map. Indeed, this is exactly theapproach used by various agencies in producing digitaldatabases for general use (USGS 1987; SCS 1984b).

    Scale is misused all too often as a measure of accuracy.

  • 7/31/2019 Gis Data Capture Hardware and Software

    44/61

  • 7/31/2019 Gis Data Capture Hardware and Software

    45/61

    Map audience

    Assumptions about the map's audience determine

    the intensity of information included, and the need foradditional reference material. A map designed for atechnical audience will probably have a higher

    information density compared to one designed for thepublic or one designed for a 'wide user community'.Those in the latter category may contain more by wayof contextual information such as roads, buildings andtowns, at the expense of accurate representation offeature position The cartographer must juggle the

    conflicting needs of audience and scale. Similarly, thecompiler of a database formed by digitizing maps mayneed to consider the purposes for which those mapswere created.

  • 7/31/2019 Gis Data Capture Hardware and Software

    46/61

    Currency

    A map is a representation of features in space as theyexisted at the time they were surveyed. The realworld of geographical information changescontinuously, but many maps remains static. Thusmaps become increasingly inaccurate as a

    representation of the world over time. The longtimedelay between mapping and publishing often meansthat most maps are not true records of spatialrelations when they are used. Most human usersexpect this, and compensate for it, although manyroad-map users still may not understand exactly whysome roads are not shown on their maps. Map sheetsare revised periodically, of course, and all nationalmapping agencies maintain a revision programme,but features continue to change.

  • 7/31/2019 Gis Data Capture Hardware and Software

    47/61

    Fig. 13.2 The purpose for which amap is designed affects map contentand precision of that content. Here

    the actual line plan is compared withthe same area, at the same scale, butin a street atlas. Many roads haveexaggerated widths, to accommodate

    road names and enhance visibility,while many building and blockoutlines are simplified (Source:Keates 1989).

  • 7/31/2019 Gis Data Capture Hardware and Software

    48/61

  • 7/31/2019 Gis Data Capture Hardware and Software

    49/61

    Vegetation and land use maps requireconstant revision. Although soi]s andgeology are less subject to change,

    even these classes of maps must beupdated regularly to accommodatenew field work and general

    improvements in the level of humanunderstanding of soils and geology.

  • 7/31/2019 Gis Data Capture Hardware and Software

    50/61

    Map coverage

    The actual geographical area for which a

    geographical database might be constructed isvariable, from less than one to thousands of squarekilometres. Therefore, the source-map coverage mustbe chosen as appropriate to the task in hand. Scalesand completeness of map coverage of differentgeographical areas are, however, highly variable. Thispoint can be illustrated by reference to mapping intwo of the world's most advanced countries: theUnited States and Britain. In the United States the

    most detailed complete coverage scale of topographicmaps is 1 : 24000, whereas in Britain it is acombination 1 : 1250, 1: 2500 and

  • 7/31/2019 Gis Data Capture Hardware and Software

    51/61

    Map accuracy

    As noted above, maps are an abstraction of reality, and somap makers have been concerned to give concisestatements of the accuracyoftheir products. The USNational Map Accuracy Standard, issued by the Bureau ofthe Budget in 1947 and still in force, is perhaps the bestknown example of these (see Thompson 1988). Some of

    the major points included are summarized in Table 13.3,and the standard has recently been revised by a committeeof the American Society of Photogrammetry and RemoteSensing (Merchant 1987) which specifies acceptable root-mean-square error terms for horizontal locations for variousmaps (Table 13.4).

    Table 13.3 Summary of important parts of the US NationalMap Accuracy Standard US Bureau of the Budget .On scalessmaller than 1 : 20 000, not more than 10 per cent ofpoints tested should be more than 1/50 inch in horizontalerror, where points refer only to points which can be welldefined on the ground.

  • 7/31/2019 Gis Data Capture Hardware and Software

    52/61

    On maps with scales larger than 1 : 20 000 thecorresponding error term is 1/30 inch.

    At no more than 10 per cent of the elevations testedwill contours be in error by more than one half thecontour interval.

    Accuracy should be tested by comparison of actualmap data with survey data of higher accuracy (notnecessarily with ground truth).

    If maps have been tested and do meet thesestandards, a statement should be made to that effectin the legend.

    Maps that have been tested but fail to meet therequirements should omit all mention of the standardson the legend.

  • 7/31/2019 Gis Data Capture Hardware and Software

    53/61

    Planimetric coordinate accuracyrequirement of well-defined points

  • 7/31/2019 Gis Data Capture Hardware and Software

    54/61

    Map sheets and series

    The traditional paper map series is sometimes designed,drafted and published as a collection of individual mapsheets, because each separate map sheet or quadrangle isintended to stand alone as a single entity. This gives theindividual paper map an internal coherence and a pleasingappearance. If the reader is interested in an area beyond

    that covered by the current map, it must be filed andanother extracted from the library. There is no guarantee ofconformity across the seam of the maps, however. Manyresearchers and other users have found to their cost thatedge-matching between map sheets can be a majorproblem. In map series with overlap between contiguoussheets (e.g. 1 : 50000 OS maps of Britain) large features in

    the zone of overlap may not conform between sheets . Thiscan be even worse in the case of maps prepared on poorlyrectified orthophotomaps such as those included withcounty soil reports by USDA SCS

  • 7/31/2019 Gis Data Capture Hardware and Software

    55/61

  • 7/31/2019 Gis Data Capture Hardware and Software

    56/61

  • 7/31/2019 Gis Data Capture Hardware and Software

    57/61

    Map users may also experience difficulties when theyattempt to compare different map attributes because ofvariations in scale and projection. In Britain, for example,the use of 1:63 360 base maps for both soil and geologyhas allowed these attribute themes to be compared, butlack of conformity with the topographic map series of that

    scale has meant that comparison with topographicinformation is not necessarily reliable. In the United States,geology is mapped at the various scales of the standardtopographic map series, butsoils are commonly mapped bycounty on orthophotomaps at a scale of 1 : 15840, almostprecluding precise comparison with either topography orgeology. The ability to analyze and overlay maps with

    different attribute data types is, however, integral to GIS.Software has been developed to force data into commonscales and projections, either through mathematicaltransformations or 'rubber-sheet' approximations.

  • 7/31/2019 Gis Data Capture Hardware and Software

    58/61

    ATTRIBUTE DATA

    Attribute data are complementary tolocation data and describe what is ata point, along a line, or within a

    polygon. All spatial features havesome immediately associatedattribute or attributes, such as

    building type, soil type, etc. Some,such as low level census divisions,are no more than a code value toenable association with other

    attribute information.

  • 7/31/2019 Gis Data Capture Hardware and Software

    59/61

    Socio-economic attributes

    Some of the most widely used sets of longitudinal (time series)attribute data are derived from national census offices. Censusdata are essential to planning by many government agencies. Mostcensus offices prepare a number of different censuses (Bureau ofthe Census 1982, 1984a, 1984b), but the most common andimportant is the census of population. All population censusescollect a myriad of social variables wherever people occur and

    operatives can reach . Census results are reported at a number of spatial resolutions

    and using a variety of media. The results of the 1990 US Censusare available in computer readable form for all 7.5 million censusblocks (basic enumeration

    areas), as well as in printed form for the sub-state localgovernment units, and for the smaller block numbering areas

    (Fulton and Ingold 1989). Data are available as printed tabulations and in computer-readable form

    on both tape and CDROM (Fulton and Ingold 1989). As in mostcensus reporting, some data are published as absolute countswithin geographical areas, while others are based on only a sampleof households .

  • 7/31/2019 Gis Data Capture Hardware and Software

    60/61

    Reported census data are, however, subject to numerousproblems of accuracy and reliability. The foremost of theseis that individuals are counted in the United States by their'usual place of residence', which may be different from theirlegal residence, their voting residence or their domicile(Bureau of the Census 1982). Undercounting is a perennial

    problem, due to illiteracy, illegal immigration, homelessnessand simple unwillingness to complete census returnsdespite legal inducements (Bureau of the Census 1982).Overcounting is also a problem. In some countries a moresystematic bias may be introduced because census takersfail to penetrate particularly inaccessible regions, or havetrouble counting all the inhabitants of a village.

    Furthermore, it is usual in a census to count people by theirnight time location and, although location of workplace maybe included in the census (Bureau of the Census 1982),only a poor representation of day time and working placepopulation distributions may be recorded.

  • 7/31/2019 Gis Data Capture Hardware and Software

    61/61

    In some cases the counts for certain geographicalareas may be so small that statistical representationis uncertain (Kennedy 1989). This introduces aconsiderable problem of confidentiality since if too fewindividuals are in a sample it may be possible to

    identify the individuals concerned. Indeed, in theUnited Kingdom the data are specifically modified byrandomly assigning values of + 1,0 and -1 to lowcounts (Dewdney 1983), while in the United Stateslow counts are simply suppressed (Bureau of the

    Census 1982).