cartographic data structures

15
* Semiannual Journal of the American Congresson Surveyingand Mapping 430 Woodward Building, 733 15th Street, N.W., Washington, D.C. 20005 Cartographic Data Structures ABSTRACT. Efficient and flexible data structures are important to the development of computer mapping. Most current data banks are characterized by 1) structures which are convenient at the input stage rather than at the stages of use within computer programs, 2) separate and uncoordinated files for different types of geographic fea- tures, and 3) a lack of information about neighboring entities. The term "neighbor- hood function" may be used to indicate the relative location of a geographic entity and is a concept which is involved in all three of these characteristics. Ongoing re- search on data structures had led to work on the GEOGRAF system for encoding planar data and the GDS ("Geographic Data Structure") for encoding three-dimen- sional surfaces. Both involve data manipulation between the digitizing stage and the actual use of the data within computer mapping programs. INTRODUCTION A series of ongoing research projects are concerned with efficient and flexible data structures for geographic and carto- graphic analysis. The three main points of concern in the research can be sum- marized as follows: 1) In most cartographic data banks, the arrangement of the data is guided by the input stage. In other words, little ma- nipulation of the data is performed after the data have been input into the system from maps. 2) Cartographers and computer scien- tists have made few attempts to combine different types of cartographic information, for example, height with other cartographic features. Therefore, the different types of cartographic entities are stored in different files and it is usually extremely time-con- suming to combine them. 3) The data structure is usually very simple and lacks one facet in particular which is essential for much geographic and cartographic analysis-an indication of the relative location of a geographic entity, i.e., the position of a geographic entity with respect to its neighboring entities. These three points may be abbreviated with the terms flexibility, comparability, and topology. This paper will characterize types of existing geographic and carto- graphic data systems for planar and three- dimensional surfaces, especially with re- spect to these three points. The paper will also describe attempts which have been made by the authors to produce data sys- tems which eliminate some of the problems of existing ones. The term "neighborhood function" will playa major role throughout the paper and will therefore be explained in more detail in the following section. NEIGHBORHOOD FUNCTION When asked for the location of a city, we will· give the location with respect to a river, a seacoast, a pass, a neighboring larger city, or other feature. Rarely will Thomas K. Peucker is associate professor, hi' . Simon Fraser University, Burnaby, Canada. we use t e geograp llC coordmates of longi- Nicholas Chrisman is senior programmer, Lab- tude or latitude, nor will we use map co- oratory for Compu~erGraphicsand Spatial Anal- ordinates. Weare taught in elementary ysis, Harvard University, Cambridge,Massachu- geography that the geographic coordinates setts. The research for this paper has been will tell us little about either the large-scale supported by ONR contract NOOOl4-73-C-OI09 and the Laboratory for Computer Graphics and (site) or small-scale (situation) char- Spatial Analysis, Harvard University. acteristics of a place. Similarly, if I de- The American Cartographer, Vol. 2, No.1, 1975, pp. 55-69

Category:

Education


0 download

TRANSCRIPT

* Semiannual Journal of the American Congresson Surveyingand Mapping430 Woodward Building, 733 15th Street, N.W., Washington, D.C. 20005

Cartographic Data Structures

ABSTRACT. Efficient and flexible data structures are important to the development ofcomputer mapping. Most current data banks are characterized by 1) structures whichare convenient at the input stage rather than at the stages of use within computerprograms, 2) separate and uncoordinated files for different types of geographic fea-tures, and 3) a lack of information about neighboring entities. The term "neighbor-hood function" may be used to indicate the relative location of a geographic entityand is a concept which is involved in all three of these characteristics. Ongoing re-search on data structures had led to work on the GEOGRAF system for encodingplanar data and the GDS ("Geographic Data Structure") for encoding three-dimen-sional surfaces. Both involve data manipulation between the digitizing stage and theactual use of the data within computer mapping programs.

INTRODUCTIONA series of ongoing research projects

are concerned with efficient and flexibledata structures for geographic and carto-graphic analysis. The three main pointsof concern in the research can be sum-marized as follows:

1) In most cartographic data banks, thearrangement of the data is guided by theinput stage. In other words, little ma-nipulation of the data is performed afterthe data have been input into the systemfrom maps.

2) Cartographers and computer scien-tists have made few attempts to combinedifferent types of cartographic information,for example, height with other cartographicfeatures. Therefore, the different types ofcartographic entities are stored in differentfiles and it is usually extremely time-con-suming to combine them.

3) The data structure is usually verysimple and lacks one facet in particularwhich is essential for much geographic and

cartographic analysis-an indication of therelative location of a geographic entity,i.e., the position of a geographic entity withrespect to its neighboring entities.

These three points may be abbreviatedwith the terms flexibility, comparability,and topology. This paper will characterizetypes of existing geographic and carto-graphic data systems for planar and three-dimensional surfaces, especially with re-spect to these three points. The paper willalso describe attempts which have beenmade by the authors to produce data sys-tems which eliminate some of the problemsof existing ones. The term "neighborhoodfunction" will playa major role throughoutthe paper and will therefore be explained inmore detail in the following section.

NEIGHBORHOOD FUNCTIONWhen asked for the location of a city, we

will· give the location with respect to ariver, a seacoast, a pass, a neighboringlarger city, or other feature. Rarely will

Thomas K. Peucker is associate professor, hi' .Simon Fraser University, Burnaby, Canada. we use t e geograp llC coordmates of longi-Nicholas Chrisman is senior programmer, Lab- tude or latitude, nor will we use map co-oratory for Compu~erGraphics and Spatial Anal- ordinates. Weare taught in elementaryysis, Harvard University, Cambridge,Massachu- geography that the geographic coordinatessetts. The research for this paper has been will tell us little about either the large-scalesupported by ONR contract NOOOl4-73-C-OI09and the Laboratory for Computer Graphics and (site) or small-scale (situation) char-Spatial Analysis, Harvard University. acteristics of a place. Similarly, if I de-The American Cartographer, Vol. 2, No.1, 1975, pp. 55-69

scribe n'y position on a piece of terrain, Iwill not use my map to determine my lo-cation within the UTM-grid; rather I willlook for nearby relief features (peaks,rivers, slopes, roads) as orientation char-acteristics,

In contrast, when a geographic data bankof any kind is created, it utilizes some kindof absolute coordinate system. Usuallyneither the geographic evaluation at thetime of digiti7.in~ nor the mapping systemused with the data allow the inclusion ofsuch cartographic features as streets, rivers,and roads which would give us an indica-tion of relative location.

While the human user can be aided inhis orientation on a map through overlaysor man comparison. the computer has dif-ficultv in determininl! relative location. Ifthe relative location in terms of the closestpoints for each of, say, 5,000 points is tobe calculated. the program literally has tocompute every point's distance to everyother point. Indeed, some widely-usedprOl!rams do this computation several timeswithin one program run.

Some indication of the relative locationof a geographic feature can be very useful:This neighborhood relationship will be re-ferred to as a "neighborhood function."It can be expressed in different ways: asan explicit or implicit function, or as a dis-crete function in the form of a table.

The explicit function can be a polynomialor trigonometric equation set for a discretegrid of surface patches which give the formof the surface at each point within thepatch. Typical for this approach is thework of Junkins, et al. (1973). Two-dimensional spline functions also fall intothis category (Holroyd and Bhattacharyya.1970). A much more frequent way of de-fining a neighborhood is by the explicitfunction in the form of a sort routine whichfinds the closest neighbors. This is donein various interpolation algorithms to pro-duce a regular grid of points (Shepard,1968; Heiskanen and Moritz, 1967). Thecomputations increase close to the squareof the increase in the number of points,since the search has to be repeated forevery point and all points, or at least a

large number of them, must be processedeach time.

This search procedure also applies inthe case of planar surfaces where neigh-boring polygons must be found. For ex-ample, when contiguity constraints are im-posed in problems of factorial ecology andother regional correlations, all polygonpoints must be searched to find those whichare in common for a pair of polygons.Again, the problem increases in complexityaccording to the square of the number ofthe items being searched.

The implicit function expressing neigh-borhood relationships is usually a functionthat describes the coding structure of thegeographic entities. One very good caseis described in Rosenfeld (1969) for dif-ferent types of neighborhood relationshipswithin a regular grid in which the pointPi; has the four neighbors (i + 1, j), (i.i +I), (i-I, j), (i. j-I) and it has the eightneighbors (i + I, j), (i + I, j + I), (1. j + 1),(i-I. j + I), (i - I, j), (i - 1, j - I), (i, j -I ), (i + 1, j - 1) .

Neighborhood relationships in the formof tables are very rarely used. This typerecords the neighborhood function by"pointers" indicating neighboring geo-graphic entities. For example, a structurewhich is built on the basis of ThiessenPolygons could have such a structure bysimply having the labels of the neighboringpoints accompany the record of each point.The most widely-known structure of thistype is the DIME file of the U.S. Censuswhich encodes line segments, the names ofthe polygons to the left and right of eachline segment, and the names of the twonodes at either end. The neighborhoodrelationships used in the DIME develop-ment are derived from the discipline oftopology (Cooke and Maxfield, 1967) . .if.

Neighborhood relationships will be dis-cussed in greater detail in the analysis ofvarious existing data structures. Twotypes of geographic data bases win be dis-cussed: those defining planar surfaces andthose defining three-dimensional surfaces.For both types, a summary of their his-torical development will be presented, andit will be shown that, although presently

at different stages of development, thesetwo types can be treated as special casesof one topological data structure.

DATA STRUCTURES FORPLANAR SURFACESTypes of Structures

The types of geographic entitles onplanar surfaces are points, lines, and area-enclosing lines or polygons. The latter areperhaps the most frequently encoded fea-ture in geographic data systems.

The 'simplestdata base system for planarsurfaces is that of encoding entity by entitywith little or no regard for entity overlapsor adjacencies (Fig. 1). In other words,every polygon in a polygon system is en-coded and stored without any regard forcontiguous polygons, and lines are encodedwithout regard for the fact that they mayintersect or merge with other lines. Theresults of such an encoding are "sliverlines" (duplication of lines in slightly dif-ferent positions). These sliver lines areconfusing and unaesthetic and, hence, it isvirtually impossibleto do anything directlywith such a data base except an extremelycoarse graphic image.

To go beyond the use of such data forthe production of coarse images, editingmust be performed. This alternative hasbeen attempted in several cases, one beingthe MAP-MODEL system (Arms, 1970).The editing in this system is guided by theassumption that every segment has to berepresented twice except for segments onthe outer boundary. For each segment,the editing program sorts through all re-maining segments to find its complement(the identical line of the neighboring poly-gon) . Those segments for which it hasnot found a complement are tagged to in-dicatepotential errors to the user.

To overcome some of the limitations ofindependently encoded entities, systemshave been developed based on a commonlocation dictionary. This dictionary con-tains the coordinates of every boundarypoint on the map. Polygon boundary listsare then compiled which consist of thelabels (location numbers) of these bound-ary points (Fig. 2). Line information can

LOCATION LIST

POLYGON 1 x~' Y',lX~2 Y',2

POLYGON 2 X~,l Y2,'1,", x2,2 Y2,2

X2,m Y2,m

Fig. 1. The simplest encoding is areal en-tity by areal entity. In this system, mostpoints are recorded twice and some (e.g.,PI,.; PI,ll; Pa,T) three times. A point doesnot necessarilyhave identical coordinates inall,recordings.

be handled in the same way. Programsbased on this structure include CAL-FORM from the Laboratory of ComputerGraphics and Spatial Analysis. Otherprograms have subroutines to convert thispoint dictionary structure to the simpleentity-by-entity line list described above.The data can then be used in programssuch as SYMAP (Laboratory for Com-puter Graphics and Spatial Analysis)wh~ch are. compatible with the entity-byentIty structure. Programs have also beendeveloped to simplify data input throughautomated polygon identification (Douglas,1973).

POINT DICTIONARYPOINT

1 x Y2 x Y3 x y

POLYGON

3 1,25, 24,23.22,21,43,44,45,46,47, 48. 30.29. 28. 27, 26, 1

5 44,43,42,41.40.39,38.37, 36,35, 44

Fig. 2. In the second type of polygon sys-tem, the individual points are encoded onlyonce and are stored in the Point Dictionary.For every polygon, a Polygon Boundary Listis then established.

The point dictionary data base has theadvantage that sliver lines do not occur.However, the problem of neighborhoodrelationships is not handled any better thanwith the entity-by-entity approach. Thesearch for common lines is no longer ac-cording to the coordinates of points but bytheir labels; this brings us closer to asolution only by a little less computer time.It also creates difficulties. A point diction-ary can and will be accessed in an arbitraryorder, since there are no restrictions re-garding point placement. The standard

response to this problem is to make thedictionary core resident. Unfortunately,this will limit the complexity of the mapthat can be handled in this manner, sinceall points must be stored throughout theoperation of the program. This short-coming of such sharing of data is aug-mented by the continued independence ofthe entities created by the dictionary; in-stead of n points with their x and y co-ordinates, there are simply n referencesto points.

Some of the objections to the ordinarypoint dictionary approach can be eliminatedby formulating an intermediate object be-tween the entity and the points used as anaddressing scheme (Nake and Peucker,1972; Peucker (ed.) 1973). A geographicentity can be created from a list of line seg-ments; these segments are, in turn, createdfrom references to the point dictionary.This system allows for easy definition ofthe entities with a minimum of pointers,but each entity is still independent in thesense that its neighbors are not known.The direction of access is still from entityto location but not the reverse.

All of the data structures described thusfar are of limited flexibility and utilitybecause neighborhood relationships are notknown. By adding the topological neigh-borhood function of each element to a datastructure, large improvements in flexibilityand scope of applications can be realized.

If one is concerned about the memorycapacity which is needed for the storage ofexplicit neighborhood relationships, onemight consider a system with implicitneighborhood functions by modifying the

. entity form in the encoding stage. Manyexisting geographic information systemsstore land-use data in grids of rectangularcells (Hsu, 1975). However, a seriousproblem is encountered with such a regulardiscrete encoding of planar surfaces. Ac-cording to the sampling theorem, thesampling interval has to be half the size ofthe smallest features to be encoded (e.g.,Tobler, 1969). Hence, either the size ofthe cells has to be very small to enable en-coding of detailed variations (e.g., urbanland uses) or a grosser cell size can be

used for encoding only the very slowlychanging variations (generalized landuse). In the case of the small units, a verylarge volume of redundant information iscreated in an area of uniform land use andmatrix reduction techniques such as run-length encoding (recording in each rowonly those cells differing from an immedi-ately previous cell) create only physicaland not logical compaction (Amidon andAkin, 1970). In the case of the grossercell size, highly varying land use featuresare aggregated to a degree which reducesthe usefulness of the whole system.

Beyond sampling problems, a grid struc-ture imposes a bias toward specificorienta-tions of features. Diagonals, althoughphysically longer, are given the apparentcell relationships of unit distances alongthe major axes. The grid structure alsocreates the illusion of working with a dis-crete point space, rather than a regularareal partitioning.

Recent studies have developed param-eters which determine the degree of in-accuracy of a given sampling mesh(Switzer 1975). In cases where sometypes of geographic features vary oversmall areas and others over very largeareas, such as the case of statewide land-use patterns, the error is averaged over allfeatures and will affect features with ahigh-frequency variation more than others.

'The variation of the mesh size within theinformation system would be of only littlehelp. The data management and the ma-nipulation and display routines would be-come more complex, although probably lessthan many researchers imagine. Exactfigures about the differences cannot begiven since no literature about such·a sys-tem is known to the authors.

Structures Using ExplicitTopological Relationships

One of the first known attempts to in-corporate explicit topological structure intoa geographic data base is the Dual Inde-pendent Map Encoding (DIME) systemof the U.S. Bureau of the Census (Fig. 3).The DIME files were originally developedas an automated topologicalerror detection

CHUICH 9 STIEfT to041 G[

101 • HM STlffT n28

TRACT !19

IIl.OCK LOWADDI'l_

'02 30103 12'04 2'06 ,10tl "

'0' '8

HIGH ADDllES$

4221'0•174'

DIME STREET SEGMENT RECORDS

SlREET NODE NODE TfIACT Il.OCK TflACT -.oat LOW -.STAAT END LEFT un IlIGHT "IGHT ADOIl. ADDI'l.

Mein • 6 , '06 '004 , '0M.in • 7 , '06 '03 11 17U.in 7 • ,

'0' '03,. 2.

Main I • ,'0' '02 30 42

Fig. 3. The DIME-file base. Representedis a portion of Tract 1 in a hypotheticalcity(Cooke and Maxfield, 1967). The smallnumbers at the street intersections are thenodes, the larger numbers on Main Streetare the addresses. Two types of records arecreated as illustrated in the tables.

system for the Address Coding effort ofthe 1970 census.

The basic element of the DIME file is aline segment defined by two end points.It is assumed that the segment is straightand not crossed by any other line. Themetropolitan files usually define this unitas a street block face. Complex lines arerepresented by a series of segments ap-proximating the line. The segment hastwo "node" identifiers, along with the co-ordinates of its two end points and codesfor the polygon on each side of the seg-ment.

While DIME topology makes much in-

formation accessible to urban researchers,neighborhood relationships are not madeexplicit. Segments sharing a node, forexample, must be found by laborious searchprocedures. Search is also required toassemble the outline of a polygon. Moreimportantly the DIME structure is cum-bersome to use for many cartographic ap-plications involving areas made up of com-plex lines. For procedures in a one-timechecking effort, as is the case of AddressCoding and Address Matching in metro-politan areas, it is quite adequate. Forefficient computer storage and retrievaland for many applications, however, im-provements are desirable. For example,the reliance on the individual line segmentmakes the reduction of detail for displaypurposes difficult since line segments can-not be simply deleted without correctingthe reference codes for the affected nodes.

At the Laboratory for Computer Graph-ics and Spatial Analysis, the junior authorhas developed a data structure, POLY-VRT, that is designed to contain all theinformation needed to construct any of thepreviously enumerated planar structures.The basic object of POL YVRT is the"chain." Like a DIME segment, a chainhas nodes at its two ends, separates twoareas, and is assumed to be uncrossed. Itdiffers in that the POLYVRT chain maybe made up of many points whereas thebasic DIME unit has only two points: Aboundary between two polygons can bereferenced by a single chain no matterhow complicated, because line detail istopologically unimportant (Laboratory forComputer Graphics and Spatial Analysis,1974).

The coding of a complicated boundaryas a unit is not unique to POLYVRT.The data bank used in the project "TheInteractive Map in Urban Research"(Nake and Peucker, 1972; Peucker (ed.)1973) as well as the World Data Bank I(Schmidt, 1969) are composed of "lines."In the latter case some of them containover 4.000 points. The chain based systemof POLYVRT, however, is a different

. type of structure because of the topologicalrole assigned to the chain and the subse-

quent construction of a list data structure.Based upon this assignment, the topologi-cal information about a chain resembles theinformation on a DIME record exceptthat the distinction between nodes (i.e.,points used for more ,than one chain) andthe points internal to a chain allows in-ternal points to be eliminated without in-fluencing the neighborhood relationships.The main innovation in developing a chainrepresentation is that areas of significantline detail may be efficiently handled.Topological checking is reduced from de-pendency on the number of points to de-pendency on the number of boundaries.

In addition to the indication of the rela-tive location of the chain with respect toits neighboring polygons, POL YVRT in-formation is stored in separate lists assem-bling the bounding chains for every poly-g-on. Thus, searches can take place in twodirections, from the chain to the polvg-onand from the polygon to the chain. Thisis very important for any type of neighbor-hood manipulation, since neig-hboringenti-ties can be found through their "bounding"or "hounded" complements. In otherwords. to follow along a group of chainsone flips through chain to polygon to thenext chain, etc., whereas to traverse aseries of pOlygons one tests for adiacentpolygons by going through the chain di-rectory for each polygon.

The POL YVRT program places pointinformation in secondary storage. Thethree higher level ohiects (chains, nodes,and polygons) are core resident. Onlythe chain refers directly to the point filein the secondary storage. In addition toindicating the locations of the points in thepoint file, the chain record incorporates thename of the chain. the labels of the startingand ending nodes. and the left and rig-htpolvg-ons. Converselv. the polvgon listconsists of the bounding-chains in propersequence (Figs. 4 and 5). A list linking-nodes to chains could easily be constructed.

DATA STRUCTURES FORTHREE-DIMENSIONAL SURFACES

Boehm (1967) describes in a very de-tailed analysis the advantages and disad-

)(,y, •••• 'X,,)'1I

88lI 5 •• 11 3 •

x,y,.••. ·'.Yo• 55 77 2

2 I'Ol. YGOIl .22 NODE

888 CHAIN 3 POINT WITHINCHAIN

Fig. 4. The external representation of thePOLYVRT chain-file. Every chain has a"name," the numberof inner points (length),the two limiting nodes, the two boundarypolygons,and a series of coordinates for thepoints.

vantages of different ways of encoding sur-faces. He comes to the conclusion that theencoding of surfaces by contours minimizesthe storage capacity, whereas a regulargrid of surface points minimizes the com-puting time necessary for several types ofmanipulations. Boehm did not include inhis study a data system with an irregulardistribution of points but only contour en-codings and regular grid structures of con-stant and variable mesh width. He didnot reflect on the reasons why contoureddata minimize storage capacity whereas aregular grid minimizes computing time. Ifhe had done so it is quite possible that thedevelopment of geographic data baseswould have taken different routes. Sur-

prisingly this topic has produced little dis-cussion despite the fact that extremelylarge data banks of terrain (digital terrainmodels) have been developed.

When encoding surfaces, it is necessaryto adapt the density of points to the varia-tion in the local terrain. The question ofhow dense the points have to be can beanswered by again using the philosophy ofthe sampling theorem. For the terrainwithin a typical map, the variation canchange considerably, resulting in a needfor frequent adjustments of the samplinginterval.

In a contour map the density of contourlines changes with the density of reliefvariation. Therefore, it fulfills the re-quirements set by the sampling theoremfor a "non-stationary surface," i.e., a sur-face with changing terrain. For the regu-lar grid, on the other hand, if the smallestobject one wishes to detect anywherewithin a study area is of size ("wavelength") S, then the grid spacing every-where must be Sj2 or less (note, for ex-ample, Mark, 1974). The regular gridagain tends toward redundancy sincesmooth areas within the study area willcontain far more points than are needed toaccurately portray their form. To improvethe "resolution" of a grid by a factor off, the grid spacing must be decreased by

NODEI- ·22 • Y

os • y

" · y

Fig. 5. The internal representation of thePOLYVRT chain-file. Reference to poly-gon chains are positive or negative accord-ing to the directionof the chain used.

the reciprocal of this factor and the totalnumber of points is increased by f2.

The question has been raised many timesin photogrammetry whether contours arethe best representatives of topographicsurfaces. It has been stated that contoursdo not detect many types of breaks whichare frequent on terrain (e.g., Brandstatter,1957). Therefore, the encoding of sur-faces by vertical profiles has beenattempted many times in photogrammetryin recent years. Points are encoded onlywhen the slope of the surface changes(e.g., Kraus, 1973). In another case thebreak lines are encoded in addition to aregular grid (Sima, 1972), and in anotherapproach only the break lines or only theridges and channel lines of a surface areencoded (Grist, 1972). The amount ofdetail with these approaches depends onthe scale used.

When performing numerical computa-tions on the basis of these digital terrainmodels, the quantity of data involved willbe only one determining factor for theamount of programming and computationsneeded. Most of the numerical computa-tions on surfaces require some type ofneighborhood function, either to computesome surface behavior, such as slope orlocal variation of relief, or to find the nextunit for the drawing of a "Contour·or avertical profile. For a set of contours itis relatively easy to create a directorywhich indicates a sequence of contours ina type of tree in which the surroundingcontour is the base and the other contoursare the branches (Morse, 1968). To findthe neighboring points on the two ad-jacent contours for a given point on anintermediate contour, however, one mustsearch through all the points on the twoadjacent contours and compute the dis-tances to the point in question. This pro-cedure is time-consuming if the contourlines are represented by very many points.A regular grid on the other hand has animplicit neighborhood function and findinga neighbor does not involve search norextra computer time. For a set of irregu-larly-distributed points as they are repre-sented in very simple data structures (e.g.,

SYMAP) the creation of a neighborhoodfunction is usually performed by findingthe closest small number of points wherethe number varies around six.

THE GENERAL CASE

I t has been noted that although planesurfaces and three-dimensional surfaceslook rather different it takes only a fewassumptions to treat one as the subset ofthe other. While three-dimensional sur-faces are always based on interval or ratiodata, planar surfaces often involve ordinaland nominal data. However, it has beenshown that ordinal and nominal data mavbe treated as interval data (N ordbeck andRystedt, 1970; Rosenfeld, 1969) and evenwithout this conversion we can combinethe two types into one general case bysimply using different assumptions aboutneighborhood. .

Given a set of n data points for whichx, y, and z coordinates are known, con-tinuously defined surfaces (i.e., surfacesfor which one and only one value existsat every point) can be created, using anyof three different assumptions about thesurface behavior (Peucker, 1972). Thefirst assumption is that of a stepped sur-face, which says that the surface retainsthe value of a data point within the neigh-borhood of that data point, where neigh-borhood is defined either by a given poly-gon (the choropleth approach) or by thefact that the area is closer to one datapoint than to any other one in the neighbor-hood (the proximal approach). The sec-ond assumption is that each data pointrepresents a sample of a single value on aconstantly changing surface. Neighbor-hood is then a number of closest neighbor-ing data points, and intervening values areinterpolated with different types of in-terpolation procedures. The third assump-tion is that the data point is a sample froma constantly changing surface that maycontain errors; thus the data point is notnecessarily located on the surface, butclose to it. This approach implies thefurther assumption that the actual surfaceis smoother than the surface constructedthrough the sample data points.

With these assumptions about surfacebehavior, any point or areal distributionof a variable can be treated as a continuousfunction z = f (x, y), and planar and three-dimensional surfaces can be combined intoone type. This has already been done insome computer programs, the most notablebeing SYMAP. Many cartographic ap-plications must treat both types of sur-faces and, therefore, data structures mustbe developed which can handle both typesat one time.

THE PROPOSED DATA STRUCTURESPlanar Surfacel

The need to incorporate different types,or hierarchies, of polygons uncovered oneof the limiting assumptions made inPOLYVRT. A chain plays a dual role:first, it is the boundary of two areal enti-ties, and secondly, it is the unbroken unitof point retrieval. This is not a problemif one is limited to a single nonoverlappingset of polygons. The following proposednew system, to bear the name GEOGRAF,is an attempt at greater flexibility.

Because of the addition of many layersof complexity involving multiple polygonsets, the chain cannot remain, to the samedegree, the controlling object of the datastructure. Just as the notion of an un-broken line is important, so is the notionof an unpartitioned space. In a svs~which ~t han.dle overllY?£fngpolYeJ>pnetworks there IS a need root ob-ject which is e ned as an area uncnt bx-any ~rtition~ This object istefii1ed the L~ommon GeographicUnit (LCGU). The LCGU's are con-structed as a POLYVRT polygon directlyfrom chains. The relationship of theLCGU's to all other polygon types is hi-erarchical (Fig. 6).

In turn, the existence of the LCGUallows for the creation of each class ofpolygon. In order to allow simple codingof the boundary relationships at thesehigher levels in the structure, the "chaingroup" was devised. A chain group is aset of chains which form a boundary of twoareal units for a given polygon class.These polygons are constructed of chain

r=l 0LJ--LJ1 1

8-0-r-=l r.::l-D-~-LJ1 1

R ~Fig. 6. The relationship between the vari-ous parts of the data structure.

groups which, in turn, are constructedfrom chains.

Line features can be built up of chainsin the same manner as chain groups.Note that the chain group listing for eachlevel of polygon and each listing for a linefeature reference only the chains them-selves. This allows each system to beconsidered as a separate directory whichis core-resident only when that class ofobjects is retrieved.

The LCGU has other implications andapplicat·ionsthat are useful because .of thetopological data structure. The LCGU,with its coding for each of the polygonsets, can be combined with contiguity in-formation of linear feature types to producean Attribute Cross Reference (ACR).The ACR is a table in which all objects(in polygon and linear systems) are cross-referenced to each other to determinenesting. By using attributes of chains(lengths) and LCGU's (areas, populationdensities, etc.) this cross-referencing cap-ability can assign a string of data, collectedfor one polygon type, to a string of asecond type.

Topological manipulation routines arecentral to the success of this structure.The intersection of geographic featureswill rely on topological knowledge to re-alize economies of scale in processing largefiles. All operations with lines will actu-ally work with bands, built with endpoints

of the line, and the furthest deviants toboth sides (if the bands become too widethe lines are split, etc.). With thisapproach, the windowing process uses non-linear windows (which is often the casewith geographic coordinates and map pro-jections) and becomes quite elegant. Sim-ilarly, intersection procedures such aspoint-in-polygon, line-across-polygon, andpolygon-over-polygon determination allowfor gainful application of the topologicalprinciple. For point-in-polygon searches,chain groups are constructed which bisectthe universe into parts to sort the points{or nodes of polygons) into three groups:left, within the band, and right. Only thesecond group needs more detailed treat-ment. The point set is then recursivelypartitioned until it reaches the level of theLCGU's. For line-oriented problems, thetopologicalconnections of the two sets com-pared would allow intersections to belimited to immediate neighbors.

Another important procedure will be anested chain-intersection routine. Here,again, the chain band and its recursivesegmentation is used. The number ofpoints which define a chain is constantlyincreased until the intersection test can bedetermined. The search of a line througha set of polygons will use a graph-searchalgorithm developed for the GDS (Geo-graphic Data Structure) project discussedin the next section. The neighborhoodsearch routines create records of neighborsfor every point or line or polygon at anyspecified depth of neighborhood.

Three-Dimensional Surfaces

To implement the ideas presented here,the senior author is developing a geo-graphic information system for three-dimensional surfaces. Both systems, thethree-dimensional and the planar, arebased on data structures with explicittopological neighborhood relationships.

The basic philosophy of both approachesis to separate the data base from the appli-cation programs. In the early days ofcomputer cartography, a data set was

. generally tied to the application programthat used it. Now the available data have

become so voluminous and the applicationprograms so varied that extra efforts inthe preparation of data for more efficientcomputations seem to be justified. We arewell in step with modern computer scienceto separate the data base from the applica-tion programs with the data structure be-coming the link between the two.

The creation of a structured data baseis nothing new. The interpolation of anirregular grid to a regular grid of height-points provides a data structure throughthe implicit neighborhood function of thegrid. And in the case where polygons areindependently defined by a series of points,the neighborhood relationship is replacedby a search algorithm which finds theneighbors for every polygon by searchingfor matching segments in the boundaryfiles of the other polygons. The idea is tospend computing time before any applica-tion has been performed in the anticipa.-tion of heavy uses of the data base. Anefficient structure of the data base shouldnot only speed up computations consider-ably, but should also simplify the produc-tion of application programs.

The data structure developed under theworking title "Geographic Data Structure"(GDS) is based on irregularly distributedpoints which are assumed to be samplepoints without sampling errors from asingle-valued surface. Two types of struc-tures form the core of the data bases. Thefirst creates neighborhood relationships by"triangulating" the data set and storing forevery point the labels of all points whichare linked with the point by a triangleedge (Fig. 7). The second structure isproduced by selecting those points of thesurfaces which lie along lines of high in-formation content, such as ridges andchannel lines, and defining them by theirnodes, which are peaks, passes, and pits(Fig. 8). This second data structureserves two purposes: First, it is a generalrepresentation of the surface for roughcomputations; second, it is a "directory"into the more detailed first structure.

In the first structure, the creation of theneighborhood relationship is based on theassumption that data-sets are usually of

16 17353415 3

17 _ l' 36 35 16 3,'

16 X Y z i17 X Y z i+6

l' X Y z j +t3

3_ X Y z j

173534

- 1523

~~~ ~l'

Fig. 7. The GDS-first data structure. Theillustration shows the points on the surfacewith their links to neighbors (edges). Theexternal representation is shown by theneighborhood relationships. The internalrepresentation is composedof the point-fileand the pointer-file.

two types: (a) sets of irregularly-distri-buted points which were digitized with theunderstanding that every point is signif-icant, and (b) sets of regularly or irregu-larly-distributed points where it is knownthat a number of points are redundant andcan be eliminated from the set, e.g., regulargrids of points and encoded contours.

The first type of data set is linked bysome type of triangulation. At least twoapproaches exist. The first (Dueppe and

Gottschalk, 1970) creates all possible links,chooses the shortest, and eliminates alllinks which intersect the shortest. Thisprocedure is repeated with the next short-est links until no links intersect. The re-sult is the set of links with the minimumcumulative distance between neighboringpoints.

The ~rocedure has one disadvantage:since(~ ) links have to be created, the num-ber of points is therefore limited toonly several hundred. The first step inour approach therefore limits the links toa number of "potential neighbors," amongwhich the shortest link is chosen and in-tersected with all other links originatingin these potential neighbors. This pro-cedure limits the number of tests for in-

n'm2

tersections of links to less than - ••.-

where m is the number of potential neigh-bors, an arbitrary number between 8 and14 depending on the density variation ofthese points. The procedure does notguarantee, however, that only triangles areconstructed; polygons with more thanthree sides can result, although they arerelatively rare. The check for such poly-gons and their elimination is very easy andfast.

The second possibility is to create atriangulated structure through use ofThiessen polygons. A published solution(Rhynsburger, 1973) intersects for everypoint the links to every other point midwayand chooses the smallest polygon created

I_. CHA. I

~COID~oImJIIL ..•....• ........ER. CHAIN••••• _

[[]]-D D[ _.T ~TA ~ROCT~ JFig. 8. The GDS-second data structure.Both the node-file and the chain-file haveaccess to the first data structure, the node-file directly and the chain-file through achain-pointer-file.

by the perpendiculars. Every point whichcontributes to the Thiessen polygon is aThiessen neighbor. This procedure canagain be simplified by the assumption of alimited set of "potential neighbors." Thesame checking routines as above have tobe applied.

Another approach which limits the num-ber of necessary tests, but is mathemat-ically correct at the same time, has beendeveloped within the project by KurtBrassel at Harvard University. The pro-cedure is based on "fields of potentialneighbors" which converge very rapidly.

The alternative to triangulation is three-dimensional generalization, i.e., to selectfrom a set of points those which define thestructure with the least deviation from theoriginal surface. The basic concept, de-veloped by Randolph Franklin of HarvardUniversity and T. K. Peucker, is to ap-proximate the surface of a series of tri-angles through a selected set of points,where each additional point included intothe set is the one which deviates the mostfrom the approximated triangles until thedeviations are below a given value (seePeucker, 1974).

In both cases, the triangulation and gen-eralization of the surface, the result is a"linked list" of surface points. The termlinked list means that points are linked withone another through pointers. In otherwords, a point is not only identified by itsx, y, z coordinates, but also by a list of thelabels of the points which form edges oftriangles with the point. In our case eachrecord consists of the x, y, z coordinates ofa point and a reference to the start of thepointers to the neighbors in a pointer list.The reason for not having the neighbor-hood pointers with the point record is thatthe number of points varies considerably(Mark, 1974). Since the record has to belong enough to include all possible num-bers of neighbors, large parts of the pointersections would be empty most of the time.The pointers are sorted, starting with thepointer the least East of North of a point(Fig. 7).

The use of this type of data-structure isvery simple and efficient. For every search

(profile, contour, etc.) a criterion for edge-intersection is developed. For contouring,for example, the critical question iswhether one point of the edge is above thecontour level and the other below. A startis found and one point of the edge is con-sidered a reference point and the other asubpoint. The next subpoint is found bylooking up the next neighbor in the pointerlist. If the test is positive, the intersectionis performed and the process repeated. Ifthe test is negative, the reference and sub-points are switched and the process re-peated.

Other procedures are equally simple.To find a triangle, for example, one hasonly to have a reference and a subpoint.The third point is the next label in thepointer list of the reference point after thesubpoint. To find all triangles one goesthrough the total pointer list leaving -outall those edges connecting reference pointswith subpoints with a smaller label sincethey would create triangles which hadbeen treated when the subpoint was areference point.

Although the first data structure, aspresented above, seems to be efficient interms of storage capacity, it does not pro-vide easy access to the data base which isoften very large. It is for this reasonthat we are developing the second datastructure to represent the general structureof the surface and to serve as a directoryto reach into the first data base (Pfaltz,1975) (Fig. 8).

The first step in the creation of thesecond data structure is to find the ridge,channel, and break lines on the surface. Ifone labels the highest point for every tri-angle, the unlabeled points are membersof the channel line, at least on smoothsurfaces. A subsequent search routinedeals with the irregularities. Points alongridges are found by eliminating the lowestpoint of every triangle, using a routinedeveloped for a regular grid by D. M.Douglas. The detection of break lines issomewhat more difficult.

Some theoretical studies of surfaces byWarntz (1966) show that ridge lines andchannel lines cross at passes, a useful point

of information relative to the developmentof the second data structure. Practicalconsiderations suggest that in terrain andother surfaces, this regularity is not alwayspresent.

Once the topological structure is at hand,it can be used as a directory into the firststructure. The line is therefore treatedas a chain similar to the chain of thePOLYVRT system. The nodes of thechain are the peaks, passes, pits, and otherendpoints of chains on the surfaces. Thesepoints are stored with their coordinatesand the names of the chains which termi-nate at the nodes. The chains are storedwith the labels of the nodes and pointersinto the chain lists which consist of labelsof points in the first data structure (notethat here the chain structure differs fromthat of POLYVRT).

A third component of the "GeographicData Structure" should be mentioned sinceit illustrates very well the logical. a'dapta-tion of a computer problem solution togeographical data. The problem at handis the partitioning of the data set. Sincewith large data sets only portions can bekept in fast memory, the data base is seg-mented into "pages" which are broughtinto memory as units. For the "Geo-graphic Data Structure" the paging sys-tem can solve several problems inherentin a complex geographic information sys-tem.

The boundaries of "patches," as we callthe areal extension of a "page," are chainsalready defined for the second structure.Since detail along the chain is of no topo-logical interest, the density of points alongthe chain can differ for its two sides. Inother words, the density of triangles canchange from patch to patch. This allowsfor very efficient data encoding even interrain with sudden changes in the sur-face behavior as at the change from amountainous area into a plain (Peucker,19i2) .

Another advantage of the paging-systemis the ease of including topographic andplanar information. Linking point, line,and areal data to the triangulated pointswould lead to high definitional redundancy.

The secondary structure could lead to am-biguities where the terrain is very elon-gated. Since an attempt has been made tokeep the shape of the patches as compactas possible, the combination of non-terraindata with patch boundaries seems to bemost appropriate.

Since the patch boundaries are againchains, another virtue comes to light: Thepatches can be treated as polygons of thePOLYVRT system with little difficulty.This link between the two systems lendshope that eventually they may be merged.

It is an appropriate question to ask whatsuch a data structure as the GDS will beable to accomplish. A number of displayroutines have already been developed(Cochrane, 1974) and a series of proce-dures for surface analysis based on heuris-tic searches are underway (Fig. 9). Sinceboth levels of data structure are graphs,we will be able to rely on many of the de-velopments connected with operations re-search, specifically network analysis, forthe manipulative treatment of the data.

As both systems, GDS and GEOGRAF,have topological structures, it is possibleto merge the two. The creation of poly-gons from points is the major link fromthe GDS project to GEOGRAF. Thecreation of a set of centroids for polygonsallows the conversion in the opposite direc-tion. This way, surfaces can be treated aspolygonal sets and can be displayed andmanipulated by the routines of GEO-GRAF. Conversely, polygonal data canbe treated as surfaces for GDS. Theneighborhood routines are what make theproject useful in quantitative geographyand planning. Neighborhood searches areextremely expensive without the topologi-cal data structure, but they are usually amost important part of urban and environ-mental analyses once a general overview isobtained from the data.

Although basic research and applicationdevelopment are two sides of one coin andmust go together to obtain lasting results,this paper has concentrated on the theo-retical parts of the project since their de-velopment is ahead of the application rou-tines, a fact which should be expected.

ContouringBlock DiagramSh8ded ContoursInclined ContouraViaibility TeatRadar Imalle

Fig. 9. The problem flow for GDS from the data base, via thechains for the two types of data structures, to their application.

REGULAR GRID

IGENERALiZA TlON

\

TRIANGULATEDIRREGULAR GRID

Firat Data Structure

!

The quintessence of the research so far isthe hypothesis that topologically-structureddata bases of three-dimensional and planarsurfaces can result in reduced efforts inthe development and execution of applica-

REFERENCESAmidon, E. L. and G. S. Aiken (1970), "Al-

gorithmic Selection of the Best Method forCompressing Map Data Strings," Communica-NOns of the Association for Computing Ma-chinery, Vol. 14, pp. 769-774.

THEORY OFSURFACE SPECiFICPOINTS AND LINES

SURFACE SPECIFICPOINTS AND LINES

Second Data Structure

HEURISTIC SEARCH

1SEARCH

Profile SearchGeographic DiaorientatlonOplimum SearchNeighborhood Analyaia

tion routines. We have some indication

that the hypothesis is correct; the real test

will come when the bulk of the applicationroutines is completed.

Arms, A. (1970), MAP/MODEL System: Sys-tem Description and User's Guide, Bureau ofGovernmental Research and Service, Univer-sity of Oregon, 59 pp.

Boehm, B. M. (1967), "Tabular Representation

of Multivariate Functions with Applicationsto Topographic Modelling," Proceedings, 22ndNational Conference, Association for Comput-ing Machinery, pp. 403-415.

Brandstaetter, L. (1957), "Exakte Schichtlinienund Topographische Gelaendedarstellung,"Oesterreichische Zeitschrift fuer Vermessungs-wesen, Sonderheft 18, Wien, 90 pp.

Cochrane, Douglas (1974), "Cartographic Dis-play Operations on Surface Data Collected inan Irregular Structure," M.A. thesis, SimonFraser University.

Cooke, D. F. and W. F. Maxfield (1967), "TheDevelopment of a Geographic Base File andIts Uses for Mapping," Urban and RegiOJlalInformation Systems for Social Programs, Pa-pers from the 5th Annual Conference of theUrban and Regional Information System As-sociation, pp. 207-218.

Douglas, David H. (1973), "BNDRYNET,"Peucker, T. K. (ed.) , The Interactive Map inUrban Research, Final Report After Year One,University of British Columbia.

Dueppe, R. D. and H. ]. Gottschalk (1970),"Automatische Interpolation von Isolinien beiwillkuerlich verteilten Stuetzpunkten," Allge-meine Vermessungs-Nachrichten, Vol. 129, No.10 (Oct.), pp. 423-426.

Grist, M. W. (1972), "Digital Ground Models:An Account of Recent Research," Photogram-metric Record, Vol. 70, No.4 (Oct.), pp. 424-441.

Heiskanen, W. A. and H. Moritz (1967), Phys-ical Geodesy, W. H. Freeman, San Francisco,Chapter 7.

Holroyd, M. T. and B. K. Bhattacharyya(1970), Automatic Contouring of GeophysicalData Using Bicubic Spline Interpolation, Geo-logical Survey of Canada, Paper No. 70-55.

Hsu, M. L., et al., (1975), "Computer Applica-tions in Land Use Mapping and the MinnesotaLand Management Information System," Da-vis, J. c. and M. McCullagh (eds.), DisPlayand Analysis of Spatial Data, New York, JohnWiley and Sons.

Junkins, ]. L., G. M. Miller and J. R. Jancaitis(1973), "A Weighting Function Approach toModelling of Irregular Surfaces," Journal ofGeoph:!o,sicalResearch, Vol. 78, No. 11 (Apr.),pp. 1794-1803.

Kraus, K. (1973), "A General Digital TerrainModel," translation from an article in Acker-mann, F. (1973), Numerische Photogram-meme, Sammlung Wiechmann, Neue Folge,Buchreihe. Bd. 5, Karlsruhe.

Laboratory for Computer Graphics and Spatial

Analysis, Harvard University (1974), POLY-VRT Manual, Cambridge, Mass.

Mark, David M. (1974), "A Comparison ofComputer-based Terrain Storage MethodsWith Respect to Evaluation of Certain Geo-morphometric Measures," M.A. thesis, Uni-versity of British Columbia.

Morse, S. P. (1968), "A Mathematical Modelfor the Analysis of Contouring Line Data,"Journal of the Association for Computing Ma-chinery, Vol. IS, No.2, pp. 205-220.

Nake, F. and T. K. Peucker (1972), The Inter-active Map in Urba" Research, Report afterYear 0,." University of British Columbia.

Nordbeck, S. and B. Rystedt (1970), "IsarithmicMaps and the Continuity of Reference IntervalFunctions," Geografiska Annaler, Vol. 52, Ser.B., pp. 92-123.

Peucker, T. K. (1972), Computer Cartography,Association of American Geographers, CollegeGeography Commission, Resource Paper No.17, Washington, D.C.

Peucker, T. K. (ed.) (1973), The InteractiveMap in Urban Research, Final Report AfterYear One," University of British Columbia.

Peucker, T. K. (1974), Geographical DataStructures Report After Year One, SimonFraser University.

Pfaltz, J. L. (1975), "Surface Networks," Geo-graphical Analysis (forthcoming).

Rhynsburger, D. (1973), "Analytic Delineationof Thiessen Polygons," Geographical Analysis,Vol. 5, No.2 (Apr.), pp. 133-144.

Rosenfeld, A. (1969), Picture Processing byComputer, New York, Academic Press.

Schmidt, W. (1969), "The Automap System,"Surveying and Mapping, Vol. 29, No. 1(Mar.), pp. 101-106.

Shepard, D. (1968), "A Two-Dimensional Inter-polation Function for Irregularly SpacedData," Proceedings, 23rd National Conference,Association for Computing Machinery, pp. 517-524.

Sima, J. (1972), "Prinzipien des CS digitalenGelaendemodells," Vermessungstechnik, Vol.20, No.2, pp. 48-51.

Switzer, P. (1975) , in: "Sampling of PlanarSurfaces," Davis, J. C. and M. McCullagh(eds.) , Display and Analysis of Spatial Data,John Wiley and Sons, New York.

Tobler, W. R. (1969), "Geographical Filters andtheir Inverses," GeograPhical Analysis, Vol. 1,pp. 234-253.

Warntz, W. (1%6), "The Topology of Socio-Economic Terrain and Spatial Flows," Papersof the Regional Science Association, Vol. 17,pp. 47~1. •