modeling spatial, temporal and spatio-temporal data in object … · 2003-10-29 · modeling...

Modeling Spatial, Temporal and Spatio-Temporal Data

in Object-Relational Database Systems

Vom Fachbereich Informatik

der Universitat Hannover

zur Erlangung des Grades

Doktor der Naturwissenschaften

Dr. rer. nat.

genehmigte Dissertation

von

Dipl.-Math. Carsten Kleiner

geboren am 10.03.1970 in Langenhagen

2003

Referent: Prof. Dr. Udo W. Lipeck

Korreferent: Prof. Dr. techn. Wolfgang Nejdl

Tag der Promotion: 04.02.2003

Abstract

This thesis investigates the specifics of modeling data for spatial, temporal and spatio-temporal applications (STOSTA) in the presence of object-relational database systems(ORDBS). This work was motivated by actual scientific and commercial applications.Some of those are described in detail in order to show how these applications benefitfrom modeling their domain data with object-relational database systems.

In particular, all classical stages of the process of data modeling are considered. Theconceptual stage naturally shows only marginal changes from traditional developmentsince it is implementation independent. In logical modeling some new concepts are in-troduced: the combination of data of different domains into new data types is illustratedby the typeVT INTEGER combining numerical and temporal information. This type,which should be used at the conceptual level already, leads to changes on the logicallevel where concrete definitions of the type are required.VT INTEGER as well as otheruser-defined data types required for STOSTA can be constructed by combining informa-tion from different domains in a single data type. These new data types can be integratedinto object-relational systems smoothly, easily and efficiently on the physical level byusing the extensibility features as will be shown in this work.

As main contribution different possibilities of physical design for the aforemen-tioned newly defined data types are researched. Different index structures and querytypes are considered and compared with respect to their integration into a commercialORDBS. The generalized search tree (GiST) approach is used to provide for widest pos-sible portability of the index structures to other domains. For the first time, to the bestof my knowledge, GiST have been integrated as indexing method into commercial ex-tensible ORDBS. Moreover ideas for query optimization by use of extensible cost andselectivity estimation are presented.

Finally applications on top of data in the ORDBS are investigated. As examples vi-sualization of spatial data, exchange of non-standard data types in XML and developingscientific database applications are illustrated. Especially the topic of XML generationfrom database schemata reaches far beyond the spatial and temporal domains.

A detailed extensive introduction to all special domains in focus as well as to object-relational database systems in general is given in chapters 2 to 5. Thereafter the newlydeveloped techniques in the different stages of the modeling process are presented inchapters 6 through 8. Finally chapter 9 shows how these techniques are and could beused in concrete applications.

Keywords: Object-Relational Database Systems, User-Defined Data Types, Spatio-Temporal Databases

Zusammenfassung

In dieser Arbeit werden die speziellen Bedürfnisse der Datenmodellierung für raumliche,zeitliche und raumlich-zeitliche Anwendungen (STOSTA) im Umfeld objekt-relationa-ler Datenbanksysteme (ORDBS) untersucht. Dabei ist die Arbeit motiviert durch ak-tuelle wissenschaftliche und kommerzielle Anwendungen, von denen einige im Detailbeschrieben werden, um zu zeigen, wie sie von der Modellierung der Fachinformationenin objekt-relationalen Datenbanksystemen profitieren können.

Es werden hierbei alle Stufen des klassischen Datenmodellierungsprozesses betra-chtet. Auf konzeptioneller Ebene ergeben sich verständlicherweise nur geringeAnde-rungen, da sie implementierungsunabhängig ist. Auf der logischen Ebene werden imUnterschied dazu wesentliche neue Konzepte eingeführt: die Kombination von Datenaus verschiedenen Typen zu neuen Datentypen wird exemplarisch am DatentypVT IN-TEGER dargestellt. Dieser Typ kombiniert klassische numerische Daten mit temporalenInformationen und sollte bereits auf konzeptioneller Ebene verwendet werden. Auf derlogischen Ebene ergeben sich dann wesentlicheAnderungen, da hier konkrete Definitio-nen des Typs und seiner Operationen erforderlich sind. Der TypVT INTEGER sowie an-dere benutzerdefinierte Typen für STOSTA konnen durch Zusammenfassung von Infor-mationen unterschiedlicher Typen zu einem neuen Datentyp konstruiert werden. Dieseneuen Typen können dann, wie in dieser Arbeit gezeigt wird, auf der physischen Ebeneeinfach, elegant und vor allem effizient in objekt-relationale Datenbanksysteme integri-ert werden, indem die Erweiterungsmechanismen dieser Systeme verwendet werden.

Als Hauptthema dieser Arbeit wird der physische Entwurf der oben genannten neuenDatentypen untersucht. Dabei werden verschiedene Indexstrukturen und Anfragetypenbetrachtet und im Hinblick auf ihre Einbettung in kommerzielle ORBDS verglichen.Um eine grosstmogliche Portabilitat der Indexstrukturen für andere Datentypen zu er-reichen, werden verallgemeinerte Suchbäume (GiST) verwendet. Diese werden in dieserArbeit erstmals als Indexmethode in kommerziellen, erweiterbaren ORDBS eingesetzt.Weiterhin werden Vorschläge zur Verwendung der benutzerdefinierten Kosten- und Se-lektivitatsschätzung in der Anfrageoptimierung gemacht.

Schliesslich werden auch Anwendungen, die die entsprechend modellierten Datenverwenden, untersucht. Als Beispiele werden dazu die Visualisierung räumlicher Daten,der Austausch von räumlichen Daten in XML sowie die Entwicklung von wissenschaft-lichen Datenbankanwendungen betrachtet. Speziell die Erzeugung von XML-Datenfur benutzerdefinierte Datentypen reicht dabei weit über die Ebene der räumlichen undzeitlichen Informationen hinaus.

Eine detaillierte Einführung in alle im Fokus stehenden Gebiete sowie in objekt-relationale Datenbanksysteme im allgemeinen findet sich in den Kapiteln 2 bis 5. Da-nach werden die neu entwickelten Konzepte und Techniken in den Kapiteln 6 bis 8 indie verschiedenen Stufen des Modellierungsprozesses integriert. Schliesslich wird inKapitel 9 gezeigt, wie diese Techniken in konkreten Anwendungen eingesetzt werdenkonnen.

Schlagworte: Objekt-Relationale Datenbanksysteme, Benutzerdefinierte Datentypen,Raumlich-Zeitliche Datenbanken

Acknowledgment

This work is of course not the result of the activities and endeavors of a single indi-vidual namely the author. Many others have provided important support, be it mentally,orally or in the form of products that were used in this work. I will name the mostimportant of those supporters briefly.

Firstly I want to thank the referent of this thesis, Prof. Dr. Udo Lipeck, who hasmotivated me to start, continue and conclude the research activities in the first place.We have discussed many intermediate results for publications and have shared the su-pervision of student thesis which helped a lot in this work. Also he made many helpfulremarks in the process of assembling the results in the form of this thesis.

I thank Prof. Dr. Wolfgang Nejdl for unconditionally accepting the duty of co-referency and also for helpful comments on a previous version of this work.

Regina Sebastiani, secretary of our group, has done a lot of important administrativework. My ex-colleague, Dipl.-Math. Thomas Esser, provided an always perfect com-puter system environment and also helped in some thematic discussions about databasetuning. After he had left, our student assistants, Serdar Yapici, Lothar Grall and Chris-tian Stahlhut, administered the system and database with great commitment.

The diploma theses by Dipl.-Math. Sascha Klopp, Dipl.-Math. Ulf Löckmann, Dipl.-Math. Frank Beier and Dipl.-Math. Jens Helge Pfau who were completed under mysupervision produced important results and products for this thesis. The same holdsfor the several intermediate theses who were developed under my supervision, namelythe ones by Dipl.-Math. Sascha Klopp, Dipl.-Math. Ulf Löckmann, Stefan Falke, NoraRipperda, Sebastian Schersich, Michael Standke and Thomas Lamping.

The interesting applications in cartography that were used as motivation were intro-duced and explained to me by the following members of the institute for cartographyand geoinformatics: Dipl.-Inf. Hans Koch, Dr.-Ing. Ulrich Lenk, Prof. Dr.-Ing. DietmarGrunreich (now president of the German federal office of cartography and geodesy),and Prof. Dr.-Ing. Monika Sester. The same holds for former members of the institutefor physical geography Dipl.-Geogr. Markus Neteler and Prof. Dr. Rainer Duttmann forapplications from the field of physical geography.

Last but not least I want to thank my family and friends who have also motivatedme to start and continue research and to go (back, as some would say) to academia fromindustry. But most credit definitely goes to my fiancee, Karin Küker, who was alwaysthere to listen to my concerns and never let me stop pursuing the ultimate goal. Shenever complained about the fewer time we spent together or some of my bad moodsbecause of little things that would not work properly.

Contents

Abstract i

Zusammenfassung iii

Acknowledgment v

Table of Contents vii

I Foundation 1

1 Introduction 3

2 Object-Relational Database Systems 92.1 Types of Database Systems . . . . .. . . . . . . . . . . . . . . . . . . 92.2 Relational Database Systems . . . .. . . . . . . . . . . . . . . . . . . 142.3 Object-Relational Systems . . . . .. . . . . . . . . . . . . . . . . . . 16

2.3.1 General Issues. . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.2 Implementing Extensible Query Optimization . .. . . . . . . . 212.3.3 Commercial Products . . . .. . . . . . . . . . . . . . . . . . . 25

3 Spatial Databases 273.1 Data Models . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.1.1 Coordinate Systems . . . .. . . . . . . . . . . . . . . . . . . 293.1.2 Spatial Operators . . . . . .. . . . . . . . . . . . . . . . . . . 30

3.2 Spatial Querying . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 333.3 Index Structures . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3.1 Z-Code Based Hashing . . .. . . . . . . . . . . . . . . . . . . 353.3.2 R-Tree Indexing . . . . . .. . . . . . . . . . . . . . . . . . . 39

4 Temporal Databases 454.1 Data Models . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 Query Languages . . .. . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.1 Requirements .. . . . . . . . . . . . . . . . . . . . . . . . . . 504.2.2 VTSQL2 - An Example . .. . . . . . . . . . . . . . . . . . . 51

viii CONTENTS

4.3 Index Structures . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.3.1 One-Dimensional Interval Management Problem . . . . . . . . 534.3.2 Two-Dimensional Interval Management Problem . . . . . . . . 55

5 Spatio-Temporal Databases 575.1 Requirements and Data Models .. . . . . . . . . . . . . . . . . . . . . 57

5.1.1 Modeling Spatio-Temporal Data . . .. . . . . . . . . . . . . . 585.1.2 Querying Spatio-Temporal Data . . .. . . . . . . . . . . . . . 59

5.2 Index Structures . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

II Data Modeling with Object-Relational Databases 63

6 General Issues and Case Studies 656.1 General Issues . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.2 Case Study 1: ATKIS . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.2.1 Analysis .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.2.2 Conceptual Data Model. . . . . . . . . . . . . . . . . . . . . 686.2.3 Logical Model: Object-Relational Database Schema . . . . . . 696.2.4 Physical Design . . . . .. . . . . . . . . . . . . . . . . . . . . 716.2.5 Integration of Elevation Information .. . . . . . . . . . . . . . 71

6.3 Case Study 2: Physical Geography . . . . . . . . . . . . . . . . . . . . 726.3.1 Requirements Analysis .. . . . . . . . . . . . . . . . . . . . . 726.3.2 External Database Schemata . . . . .. . . . . . . . . . . . . . 746.3.3 Integrated Conceptual Model . . . . .. . . . . . . . . . . . . . 766.3.4 Logical Model . . . . .. . . . . . . . . . . . . . . . . . . . . 776.3.5 Physical Design . . . . .. . . . . . . . . . . . . . . . . . . . . 806.3.6 Sample Queries . . . . .. . . . . . . . . . . . . . . . . . . . . 82

7 Consolidation of Conceptual Modeling in STOSTA 857.1 ER-based Conceptual Modeling. . . . . . . . . . . . . . . . . . . . . 857.2 Object-Oriented Conceptual Modeling . . . .. . . . . . . . . . . . . . 86

7.2.1 Spatial Data Types . . .. . . . . . . . . . . . . . . . . . . . . 867.2.2 Temporal Data Types . .. . . . . . . . . . . . . . . . . . . . . 877.2.3 Spatio-Temporal Data Types . . . . .. . . . . . . . . . . . . . 917.2.4 Application to ATKIS .. . . . . . . . . . . . . . . . . . . . . 91

7.3 Standard Logical Modeling in STOSTA . . .. . . . . . . . . . . . . . 937.4 Advanced Logical Modeling in STOSTA . . .. . . . . . . . . . . . . . 93

7.4.1 Spatial Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . 937.4.2 Temporal Data . . . . .. . . . . . . . . . . . . . . . . . . . . 967.4.3 Spatio-Temporal Data .. . . . . . . . . . . . . . . . . . . . . 99

CONTENTS ix

8 Physical Design in Object-Relational Database Systems 1038.1 Features provided by the DBS for Spatial Data . .. . . . . . . . . . . . 1038.2 User-Defined Extensions . . . . . .. . . . . . . . . . . . . . . . . . . 104

8.2.1 Object-Relational Features for Indexing .. . . . . . . . . . . . 1048.2.2 Generic Index Structures . .. . . . . . . . . . . . . . . . . . . 1058.2.3 Example: STO-GiST - A Spatio-Temporal Index. . . . . . . . 108

8.3 Physical Model and Index Structures for Selections . . .. . . . . . . . 1138.3.1 Temporal Data. . . . . . . . . . . . . . . . . . . . . . . . . . 1148.3.2 Spatial Data . .. . . . . . . . . . . . . . . . . . . . . . . . . . 1288.3.3 Spatio-Temporal Data . . .. . . . . . . . . . . . . . . . . . . 1358.3.4 Redundancy versus Query Performance .. . . . . . . . . . . . 1448.3.5 Spatial Index Creation . . .. . . . . . . . . . . . . . . . . . . 145

8.4 Index Structures for Joins and other Queries . . .. . . . . . . . . . . . 1478.5 Cost and Selectivity Estimation . . .. . . . . . . . . . . . . . . . . . . 151

8.5.1 Selectivity Estimation . . .. . . . . . . . . . . . . . . . . . . 1528.5.2 Cost Estimation for User-Defined Methods . . .. . . . . . . . 1538.5.3 Results of the Prototypical Implementation . . .. . . . . . . . 154

III Applications on OR Databases and Prospect 157

9 Applications for Spatial and Temporal Data in ORDBS 1599.1 Visualization of Spatial Data . . . .. . . . . . . . . . . . . . . . . . . 160

9.1.1 Requirements .. . . . . . . . . . . . . . . . . . . . . . . . . . 1609.1.2 Visualizing Spatial Data in Oracle 9i . . . . . . . . . . . . . . 161

9.2 Exchanging Non-Standard Data over the Web . .. . . . . . . . . . . . 1649.2.1 Case Study: ATKIS . . . . .. . . . . . . . . . . . . . . . . . . 1659.2.2 Generation of XML from OR Schemata .. . . . . . . . . . . . 173

9.3 Developing Scientific Applications .. . . . . . . . . . . . . . . . . . . 1749.3.1 Case Study: Physical Geography . . . . .. . . . . . . . . . . . 1749.3.2 General Model for Spatio-Temporal Applications. . . . . . . . 176

10 Summary and Outlook 183

Bibliography 185

x CONTENTS

Part I

Foundation

Chapter 1

Introduction

In recent years the commercial importance of spatial information has been increasingconsistently. That is due to the many interesting conclusions that can be drawn from theevaluation of information with spatial content. These conclusions may lead to a con-siderable performance improvement for several businesses. Since company managershave to make important decisions quickly, spatial data will only be taken into account ifthe important information is available immediately. At the technical level this requiresefficient storage and retrieval of spatial information.

Research has been concerned with efficient retrieval of spatial information for quitea while. This has led to many proposals on how to store spatial data in order to beable to retrieve it efficiently. An important point to note is, that the efficiency of anystoring method strongly depends on the types of queries that will be used to retrieveit later on. Consequently different storing methods will perform differently on vary-ing queries. Moreover most methods were designed and evaluated as file-based storingmethods. Since nowadays (object-)relational databases are used, that support conve-nient and hopefully efficient data access, the applicability of the proposed methods forcurrent database management systems has to be investigated. This is especially true inthe commercial setting where spatial information is only valuable in conjunction withdomain-specific data that has been stored in database systems for several years.

As a more concrete motivating example consider the following problem: the secre-tary of the Tippecanoe county council is preparing a proposal for next years budget plan.In order to be able to estimate the cost for road maintenance she needs to know the totallength of all roads in Tippecanoe county. She is provided with a map of all roads andzip code polygons in the state of Indiana (figure 1.1), as a visualization of the contentsof some database tables containing that information. She is now confronted with theproblem of retrieving all roads that run through Tippecanoe county (marked in red infigure 1.1). More generally she needs to determine all roads from a given table that liewithin a given rectangle (or window). This is the case of aspatial selectionquery (orwindow query). The importance of support for efficient window querying is obvious, ifwe consider that this query without any index takes 1480 seconds or about 25 minutes

4 Introduction

Figure 1.1:Map of zip code polygons and roads in Indiana

5

on the database server used throughout this work1. This is clearly not acceptable. Byusing insights gained from this work the time can be reduced to about 82 seconds forselecting the roads required2. Additional time is needed for computing the total length,but this time is in the range of few seconds also. This example shows the importance ofefficient spatial selections.

On the next day additional costs have to be added for all bridges to be maintainedin Tippecanoe county. The secretary has an additional database table of all rivers inIndiana. Suppose for simplicity that smaller tables of all roads and rivers in Tippecanoecounty have been generated by the council’s database programmer after the problemscaused on the previous day (visualized in figure 1.2). Still the secretary cannot obtaina list or the total length of all the road bridges in the county from just seeing them onthe screen. Even by zooming to more interesting parts (see figure 1.3) of the map a purevisualization is not sufficient since no electronic processing of the results is possible. Sheneeds to select all road objects from the database that intersectany river in the databaseand obtain the result in electronically processable form. Required is aspatial joinquerysince spatial information from two tables has to be joined with regard to their spatialproperties in order to compute the answer. This is a very computationally intensive taskwhich, even if implemented optimally on the current database system, takes time in therange of a few hours if executed on the entire Indiana state dataset. This work willdescribe a foundation which should simplify the execution of such complex queries byusing user-defined extensions to recent database systems.

Another task to be solved by the secretary could be to determine all roads in thecounty that have a length between 500 yards and 1500 yards since there is a specialstate-wide support program for such streets. With current methods one can either spa-tially index the geometry and then let the system check all retrieved roads for their lengthor index the length of the roads and then let the system check which of the roads of therequired length lie in Tippecanoe county. Each option is suboptimal, and this work de-velops the basis3 for a better way by indexing geometry and domain-specific information(length in the example but in general it may be any kind of domain-specific information)togetherin the same index structure. Other applications for this method include deter-mining all cities in a given country that have between 10000 and 30000 inhabitants orretrieving all fields in a given area that have a pH-value higher than a particular value.

Closely related to spatial information is temporal information. Consider new streetsbeing built that replace other streets, or country boundaries being changed. This kindof information includes time-varying spatial information. Therefore efficient manage-ment of spatio-temporal data is almost as important as the problems described above.This work will also show how this problem may be solved in the case where geome-tries change discretely (as opposed to continuously). The basic idea is to extend the

1The server is a recent dual-processor database server with 1 GB main memory running Oracle 8i /9iunder Linux.

2If no additional query facilities are required the time can even be reduced to about 2 seconds. Detailscan be found in chapter 8.3.2.

3Experiments were carried out on data in the temporal domain with additional domain-specific infor-mation, see chapter 8.3.1.

6 Introduction

Figure 1.2:Map of rivers and roads in Tippecanoe county

7

Figure 1.3:Map of rivers and roads in Lafayette, Indiana

8 Introduction

spatial data management to spatio-temporal data and reuse as much as possible. Theissue of spatio-temporal data will become increasingly important in applications in thefuture because the longer computer-based data management exists the more informationchanging over time will occur.

The area of managing pure temporal information (without spatial) also greatly bene-fits from using modern database technology. It will be shown how improved modeling atall modeling stages may be used to efficiently manage temporal information and answerqueries of different types effectively.

All work presented here has been carried out with applications in mind. Conse-quently two extensive case studies, one from an official administrative domain (man-aging governmental cartographic base data) and one from a scientific domain (physicalgeography), are presented before going into technical details. To show that the technicalpart is still beneficial for the applications and is not for theoretical benefit only, anotherreview of the use of the methods proposed for the applications at hand is presented to-wards the end of the work. Applications using the previously explained kind of data willbe called STOSTA in this work (spatial, temporal or spatio-temporal applications).

This work is organized in three parts: the first part reviews basic concepts from theliterature and should be sufficient to understand the rest of the work. In particular basicdatabase system concepts are reviewed in chapter 2 before special aspects of databasesfor spatial (chapter 3), temporal (chapter 4) and spatio-temporal data (chapter 5) arepresented. All important concepts for the remainder of this work are explained in detailand references are given to other publications in the respective areas for the interestedreader. The second part presents all new concepts that were developed during our work.It starts with two motivating case studies in chapter 6 and is followed by a chapter re-viewing conceptual and logical modeling in the presence of object-relational databasesystems and the domains as mentioned before. As a main contribution in chapter 8proposals for physical modeling of the newly proposed data types in ORDBS are pre-sented. Many different indexing options for the data types are compared by extensiveexperiments and the optimal index structures for different operators are determined. Thesecond part closes with ideas to improve physical design for other operators as well asmore advanced physical optimizations such as user-defined cost and selectivity estima-tion. Some of these ideas have not been implemented yet either due to time constraintsor due to technical problems that are still present in ORDBS at the time of writing.In the last part applications making use of the newly proposed modeling methods areconsidered and some prototype implementations are presented in chapter 9. The presen-tation of more advanced applications leads to a summary of the work and an outlook forfuture work remaining in this area. An extensive bibliography is also provided for theinterested reader.

Chapter 2

Object-Relational Database Systems

The need to store electronic data persistently is almost as old as computers themselves.At least it came up since computing machines were employed for useful purposes.Therefore the termdatabase systemcan be used in a wide sense for any system thatis capable of storing data persistently. Over the decades requirements and facilities forpersistent storage have changed dramatically. We will present a possible classificationwhich is due to [SB99] in the next section. In essence this classification will show thatobject-relational database systemswhich are currently in the beginning stages of beingused in scientific and commercial applications are the most growing kind of system atthe moment. They will probably be used in many areas in the near future and there-fore deserve and demand an in-depth treatment, both theoretically as well as practically.With that in mind they will also be used and evaluated extensively on spatial, temporaland spatio-temporal data in this work. A description of object-relational systems can befound in the remaining sections of this chapter.

2.1 Types of Database Systems

According to [SB99] database systems can be classified by two orthogonal features:the first is, if they can store simple or complex data, and the second is, if they supportarbitrary querying of the data stored or not. By displaying these two requirements in atwo-dimensional coordinate system we obtain four different types of persistent storagesystems as depicted in figure 2.1.

In addition to the requirements, the figure shows a name for each type of system andprojected relative importance of this class in the year 2004 in commercial applications inthe opinion of [SB99]. In the next paragraphs each of the matrix entries will be describedbriefly by stating possible applications.

File System Storage

In the case where only data of simple types like integers or characters need to be storedpersistently and no query facilities are required because one set of data is always read orwritten as a whole, the best system to choose is a standard operating systems file system.

10 Object-Relational Database Systems

Q I

Simple Data Complex Data

Queries

1

Q III

Q II

100

Relational DBMS

Q IV

150

Object−Relational DBMS

Persistent Language(Object−Oriented DBMS)

File SystemSimple

ComplexQueries

Figure 2.1:Classifying database systems ([SB99])

Example applications for this class include simple text editors (where buffers need to bestored as a whole) or video-on-demand systems (where large files containing video dataare read and then transferred to the customer as a whole). For these applications there isabsolutely no need to use a different storage system, since current operating systems arefast, robust and do not incur any overhead for these simple tasks.

Relational Systems

In quadrant II in figure 2.1 we find applications requiring persistent storage of simpledata also but with the need for arbitrary querying of this data. A well known example forthese applications is a database storing information about employees of a company anddepartments of that company. This information may be captured in a relational databaseschema consisting of two relations which are defined in SQL by:

CREATE TABLE emp (empno NUMBER(4),ename VARCHAR2(10),job VARCHAR2(9),manager NUMBER(4),hiredate DATE,salary NUMBER(7,2),deptno NUMBER(2))

2.1 Types of Database Systems 11

CREATE TABLE dept (deptno NUMBER(2),dname VARCHAR2(14),location VARCHAR2(13))

Data stored in tables like this can be queried by SQL queries arbitrarily:

1. Find names of employees who were hired after 1995 and who make more than10000.SELECT enameFROM empWHERE hiredate > ’31-12-1995’ AND salary > 10000

2. Find names of employees who work in Chicago and make more than 10000.SELECT enameFROM emp e, dept dWHERE e.deptno = d.deptnoAND d.location = ’Chicago’AND e.salary > 10000

3. Find the average and maximum salary of managers.SELECT avg(salary), max(salary)FROM empWHERE job=’Manager’

One of the main differences facilitating queries is that data is structured by the defi-nition of tables and is not treated as a black box as in files. Relational database systemsare extremely popular in commercial applications and appear in almost any company,usually in systems by different vendors and on different architectures. These systemsare well-established and almost perfectly optimized for performance with regard to highthroughput. Other features include many and sophisticated client tools and user inter-faces, concurrency control, consistency checks, transactions, data security, backup andrecovery facilities. A detailed description of such systems can be found in any standardtextbook on database systems such as [SKS97, EN00]. Moreover relational systems arebased on a solid theoretical foundation.

Persistent Languages

In quadrant III of figure 2.1 we find applications that need to store complex data persis-tently but that require only very basic querying facilities on the data since data is alwaysread and written as a whole. The video-on-demand system mentioned previously wouldbe an example for this class if more sophisticated manipulation of videos is requiredbefore they are transferred. Since usually such systems are based on programs in object-oriented programming languages that need to persistently store some of their objects


they are also calledpersistent languagesystems. For a better illustration of these sys-tems consider a room access system that stores fingerprints of employees who shouldhave access to certain rooms only. The system is running continuously except for cer-tain maintenance periods. Since access control needs to work fast, a sophisticated treestructure in main memory is required to check if a given fingerprint is allowed to accessthe room. Since it is very costly to set up this structure it should be stored on disk priorto maintenance periods and be read from there thereafter. The system information canalso be stored in two tables:

CREATE TABLE room (number NUMBER(4),building VARCHAR2(10),allowed SET-OF(employee))

CREATE TABLE employee (name VARCHAR2(15),fngrprnt IMAGE)

Clearly some of these data types are much more complex than in the example forrelational systems. For the persistency part there is no querying required since the treefor all fingerprints and access information is to be stored as a whole. This task can bestbe solved by a persistent programming language: the language, the program is writtenin, should offer some way to state, that all current information is to be stored on disk forcomplete rereading after restarting the application. Some programming languages, espe-cially newer ones like Java, offer simple constructs for these purposes, but they facilitateonly complete storage of datasets between sessions and do not offer memory control atruntime. Since they are not very comfortable to use and provide only basic features,object-oriented database systemswhose task is to store complex data such as objectsin object-oriented programming languages persistently have been developed recently.These systems are still in a market niche but are improving consistently especially byadding support for querying facilities which moves recent versions of them into quad-rant IV of figure 2.1. Commercial products in this area include Objectivity, Versant,ObjectStore/Excelon and Tamino (specialized on XML documents as objects), but arenot discussed here any further due to acting on a small market only at the moment.

Object-Relational Systems

Finally quadrant IV of figure 2.1 contains applications that require persistent storageof complex data but also the possibility to query this data arbitrarily and efficiently.This class of applications is also illustrated by an example. Consider an environmentalorganization managing digital photographs of biotopes in a certain country as well as alist of villages and cities in that country. Tables storing these kind of data can be createdusing the following SQL statements:

2.1 Types of Database Systems 13

CREATE TABLE biotope (id NUMBER(6),date DATE,caption TEXT,location POINT,pic IMAGE)

CREATE TABLE village (name VARCHAR2(30),shape POLYGON)

Since there are many biotopes and villages to be stored and an internet-based informa-tion system for citizens is required, classical database features such as multi-user sup-port, data security and efficient querying are required. Also it is clear that simple datatypes are not sufficient for these applications as documented by typesTEXT, IMAGE,POINT andPOLYGON in the tables above. Imagine, a new chemical plant is to be builtnear the village ofIsernhagen and the environmental organization wants to prevent thebuilding of the plant. Their strategy is to present photographs of some nice lakes withina forest in one of the biotopes nearIsernhagen that would be destroyed by this building.Since decision makers plan to decide immediately they need to act fast in order to besuccessful. They would need a system that is capable of answering the following queryquickly:

SELECT id, caption, picFROM biotope b, village vWHERE lake(b.pic) > 0.8AND contains(b.caption, ’forest’)AND distance(b.location, v.shape) < 10000AND v.name = ’Isernhagen’

This query retrieves all biotope pictures together with theirid and caption that containa lake with probability higher than 80 percent and the wordforest in their captionthat are within 10 km ofIsernhagen. Again several features not supported by traditionaldatabase systems appear: user-defined functions and operators on the complex data.For instance the functionlake applied to a picture returns the probability that a givenpicture contains a lake. Moreover it would be nice to have a client tool that immediatelydisplays the pictures queried graphically. Finally efficient retrieval of such data requiresadvanced index structures as well as query optimizers. All these features are specificto object-relational database systems which will be described in further detail in section2.3.

Universal Systems

Finally a term should be mentioned that is often used for systems that have require-ments from quadrants IIand IV: these systems are usually calleduniversal systems.


That is due to their almost universal functionality regarding database systems. Theycombine the advantages of classical relational databases such as good performance andsecurity with the flexibility of object-relational systems. Since most of the commercialand non-commercial vendors of traditional relational systems have changed or are aboutto change their systems to object-relational databases these systems can be expected tobecome universal systems. All the advantages of the relational system are still presentwhile adding object-relational functionality over time.

On the contrary systems evolving from quadrant III to IV can only preserve theiradvantages on complex data. Query support needs to be added completely from scratch;this feature seems much more difficult to add than adding the support for complex datawhen evolving from quadrant II to IV. While a new design always incurs some chancesfor a better result one should be pessimistic about how fast, efficient and reliable prod-ucts will be on the market. This is another reason that in this work the focus is onobject-relational databases that stem from relational systems as opposed to ones thathave their foundation in object-oriented systems.

2.2 Relational Database Systems

In this section we want to give a brief introduction to query optimization in relationaldatabase systems only. Details about the theoretical background can be found in anystandard database textbook such as [SKS97] or [EN00].

Query Optimization

Queries in a relational database system can be evaluated by the system in many possibleways. If for example we pose the second query from the example on page 11, theoptimizer has to choose among several options on how to answer the query as fast aspossible. The basic operations to consider are the selections onemp anddept as wellas the join on the two tables. The optimizer1 will generate a heuristic subset of allpossible execution plans and estimate their cost by computing

cost = E(# records examined) + read factor · E(# pages read)

whereE means the expected value2 and then choose the plan with minimum expectedcost for execution. In the example above it will perform the selections first and thencompute the join since joins are usually considered more expensive than selections fortheir I/O intensity. The number of records examined can only be estimated and is ameasure for CPU time since relational systems do not offer computationally intensive

1Most optimizers in relational systems work more or less the same in principal following rules intro-duced in [SAC+79].

2Since the size of the query result is not known in advance estimates (denoted byE) of these valueshave to be used instead. Selectivity estimation is an important topic of its own especially in object-relational database systems and will be treated later.

2.2 Relational Database Systems 15

functions on data; this will have to be modified in ORDBS where user-defined opera-tions on data may be computationally intensive. The number of pages read is also onlyestimated while theread factor is a system dependent constant measuring how muchlonger an I/O operation takes compared to operations in main memory.

Within the selections there are also different options to choose from: the restrictionon salary can be computed by a sequential scan on relationemp or by using an indexscan of a B-Tree index onsalary (if one exists). In the first case the selection has thefollowing cost, ifn records are inemp stored onN disk pages:

costsequential scan = n + read factor · N

On the other hand, if an index scan on a clustered index is used andk records have to beretrieved (according to the selection criterion), we obtain:

costindex scan = k + read factor · k

records per page

In the case of an unclustered index the cost would bek + read factor · k. In any casecostindex scan could be more or less thancostseq depending onn, N andk. The quan-tity k of how many records satisfy the selection criterion can be estimated by the op-timizer very well, if statistics on the table have been computed. In the example a his-togram on the salary values present in the table could be analyzed and a precise estimatecould be drawn. If no statistics are present only a heuristic estimate independent of theactual criterion can be used, leading to a potentially very inaccurate estimate.

To process the join of the two tables after the selections have been computed, thereare also different techniques that could be used. In thenested loopstechnique one ofthe tables is chosen as outer table. Each record resulting from the selection on the outertable is retrieved and then all rows matching the join criterion from the other (inner)table are fetched. This is done iteratively for all result rows from the outer table. Weobtain:

costnested loops join = n + read factor · N + n · costquery inner table

The cost of the query to the inner table can be estimated as described for selectionsabove. For the given example query with a 2-way join eitheremp or dept can bechosen as outer table, resulting in two estimates to be computed.

If using amerge jointhe two participating tables are sorted on the join field (if theyare not already in sorted order as is the case e. g. after an index scan) and then matchingrecords are merged. The CPU cost of the sort is proportional ton logn and the numberof pages inspected is similarly proportional toN log N . The cost of the merge operationis linear for both CPU cost as well as number of pages read. Altogether the cost for amerge join is:

costmerge join = c1 · (n logn + m logm) + c2 · (N log N + M log M)

+n + m + read factor · (N + M)


if both tables have to be sorted first3. The numbersc1 andc2 are constants dependingon the sorting algorithm (c2 may also depend on the read factor). This strategy is veryefficient, if one or both tables are already sorted, e. g. after a selection with a clusteredindex scan.

As last alternative considered here the optimizer may also choose to use ahash join.In this strategy one table is hashed on the join field into a hash table ofH buckets. Afterthat the second table is read sequentially and for each row the corresponding hash bucketis checked for join candidates. This way we obtain the following total cost:

costhash join = n + n · m

H+ read factor · (N + M).

In the computation we assumed that allH hash buckets fit into main memory,n is thesize of the hashed relation andm is the size of the relation read sequentially. If notall buckets fit into main memory more sophisticated algorithms have to be used, seee. g. [DGS+90, GMUW00]. This cost can be smaller or larger thancostmerge joindepending on the parameters used. Moreover it may be used with either relation as hashrelation.

An important restriction for merge and hash joins is that they can be used for equi-and non-equi-joins (i. e. joins with comparison operatorθ ≡= or θ ≡�=) only. All otherjoins on standard types must be computed by using the nested loops technique. If joins ofmore than two tables are requested the optimizer additionally needs to compute expectedcosts for any possible join order. In the process the size of a join result on a previousstage has to be estimated in order to compute input sizes of the current join. This isa potentially very inaccurate estimate since it uses other estimates on lower levels andin addition estimates the join selectivity of the previous joins. Therefore the executionof complex joins is very difficult to optimize in general and optimizer hints may berequired. In general optimizer hints given by the user tell the DBS which execution planto use for query evaluation, e. g. by giving names of indexes to be used.

2.3 Object-Relational Systems

Some advanced database applications require advanced database systems (cf. [Dat00,EN00]) in order to be useful as illustrated in section 2.1. Therefore general require-ments and features of object-relational database system (ORDBS) are described brieflyin section 2.3.1 (for details see [Sar98, DD98, SB99, CZ01]). Comments on the fulfill-ment in commercial systems currently available can be found in section 2.3.3.

2.3.1 General Issues

Object-relational databases have been described in several books, including [Sar98],[DD98] and [CZ01]. While most descriptions of ORBDS are similar, we will follow the

3Assumingn records in the first table onN pages andm records in the second onM pages

2.3 Object-Relational Systems 17

books by Stonebraker ([SB99] and [SM96]) in this work. Information about queryingORDBS can be found in SQL3 which is described e. g. in [For99].

In the sequel key features that should be offered by systems termedobject-relationalare described and listed. The description is based on relational database systems andlists extensions to those. Nevertheless it is also possible to translate these requirementsto similar ones for other types of DBS (e. g. object-oriented DBS) as basis and definehow they have to be extended to be calledobject-relational. The features explained inmore detail will be used later in this work, whereas the others are only mentioned; detailscan be found in [SB99] for instance.

Complex Types

Complex types in an ORDBS are types that are composed of multiple base or user-defined types. Depending on the particular characteristic they can be eitherrow types(similar to records in traditional programming languages),collection types(sets of ob-jects of simpler type) orreferences(similar to pointers in programming languages).

Row types combine attributes of different types into a type of a new name likerecords in procedural programming languages or like attributes of a class in the object-oriented sense.

For set types, also calledcollections, the requirement is that for every typeT in thetype system collection(T ) is also a valid type, where collections can be (at least) sets,lists or multisets. As an example consider a data type to store small text fragments. Itmay be defined as a list of words as follows:

CREATE TYPE word AS LIST OF CHAR;CREATE TYPE smallText AS LIST OF word;

It is important to note that support for collections of collections as well as collections ofreferences is definitely required as can be seen in the above example.

Like all other user-defined types collection types have user-defined functions oper-ating on them. For texts a function checking, whether a certain word is contained in atext is helpful for searching:

CREATE FUNCTION contains (smallText txt, word wrd)RETURNS BOOLEAN AS

FOR i = 1 TO txt.length() DOIF txt(i) = wrd THEN RETURN true;

RETURN false;CREATE OPERATOR BINDING textContains TO contains;

Since it is not very efficient to search every single word in every text in a large databasein order to retrieve the desired texts, user-defined access methods supporting the operatortextContains are required and will be illustrated in section 2.3.2.

References are similar to pointers in programming languages and should be typed,i. e. a reference to a certain object must point to an object of the declared type. In


general references to objects and to collections of objects should be supported. Some-times references to base data types may be required but in general the use of such seemsquestionable.

Between the different complex types and tables conversion routines for standardconversions are required. The operatorTABLE applied to a collection of a row typeshould result in a relational table for instance. Other conversion operators includeREFandDEREF to switch between references and the object referenced.

User-defined Query Optimization

A traditional relational optimizer has to be extended by a plethora of additional functionsto become a good object-relational optimizer. In [SB99] many of those are explained. Inthe following paragraph only the most important functions for this work are explainedin detail. All other requirements will only be mentioned briefly.

The first important extension is that theB-Tree index built into relational databaseshas to be madegeneric. That means that it must be possible to use it on user-definedtypes together with user-defined operators. Moreover it should be possible toindexvalues of user-defined functionson data that have results of standard types to be indexedby B-Trees. For instance the tablebiotope on page 13 should be indexable by thesystem built in B-Tree on the brightness of the stored image (which is a float value andthus can be indexes by a B-Tree).

CREATE INDEX idx_biotope_bright ON biotope(pic)USING b-tree (brightness(pic));

A similar technique would be required to index the distance of thelocation attributefrom a fixed point (i. e. the company headquarters). In addition the generic B-Treeshould also operate on user-defined comparison operators: if user-defined types are in-dexed the operators used on the data will probably also be non-standard operators.

Since the ORDBS cannot know about the selectivity of user-defined operators itmust be possible to specifyuser-defined selectivityfunctions. In particular for eachuser-defined function it should be possible to specify a corresponding selectivity func-tion that returns an estimate of the selectivity given the comparison operator used and aconstant value with which the attribute is compared. These functions may range fromsimple estimates to complex histogram analysis functions. For example the operatorbrightness could be assisted by a histogram storing how many pictures in the ta-ble fall into each ten percent brightness range. To estimate the selectivity of a call tobrightness for pictures with a value smaller than 40% the number of pictures in thefirst four ranges are summed up and divided by the total number of pictures in the table.This yields the selectivity estimate for the above call of the operator.

Histograms may actually require the use of rules during database updates to providefor efficient access to the required information. For example a two-dimensional his-togram could be used to estimate how many points are to the west or north of a givenpoint. This histogram needs to be updated frequently in dynamic databases. Thereforethe system may even require user-defined statistics to be computed wherever necessary.


Classical relational systems usually execute selections in arbitrary order in queryprocessing under the assumption that all selections are equally fast to compute4. In thepresence of user-defined functions in ORDBS which may be computationally expensivein terms of CPU cost the assumption no longer holds in general5. Consider for instancethe following query posed on the table from page 13:

SELECT id, captionFROM biotopeWHERE date BETWEEN ’01-01-1999’ AND ’30-06-1999’AND brightness(pic) > 0.5;

If the first clause to be evaluated on all rows is the computation of the brightness of allimages, which is computationally expensive compared to checking if the date is withinthe given range, the query performance will be poor. If on the other hand the daterestriction is checked first, the expensive brightness computation will only be carried outfor few images which leads to much better query performance. A good object-relationaloptimizer should automatically decide on which predicate to check first. Perhaps it isnecessary that the user specifies some kind of estimate on how expensive a user-definedfunction is to assist the system. This information may consist of the CPU cost as wellas the I/O cost which may be smaller than that for reading the whole argument sincefunctions may sometimes be computed on only a part of the whole argument.

The problem of expensive functions carries over to join processing as well. As de-scribed in section 2.2 relational optimizers use the heuristic to process selections beforejoins. This is good enough for traditional systems since selections can be executed faston standard data types. This is however not true anymore in the presence of expensivefunctions. The query

SELECT v.name, b.dateFROM village v, biotope bWHERE v.name = b.vnameAND area(v.shape) > 10000AND brightness(b.pic) > 0.5;

for instance is not processed efficiently this way. Here the join over the standard attributename should be computed first and then the expensive selections involving user-definedfunctions should be processed. This technique is calledpredicate migration. However,it is not true in general that user-defined predicates should be processed after joins.A formal analysis of when to use predicate migration can be found in [Hel98]. Thealgorithms presented there should be included in ORDBMS.

Furthermore it is important for querying user-defined types thatuser-defined opera-torsmay be defined to be used inWHERE-clauses. User-defined operators are operations

4They may take expected selectivity into account if statistics are present: they would execute theselection with smaller selectivity first.

5The execution cost for a function may also be more important than selectivity considerations.


on user-defined types whose execution may be assisted by special indexes designed par-ticularly for these operators. Since such user-defined types and operators are typicallyused in certain application domains only, specialized user-defined indexes are also calleddomain indexes. The same operator can be assisted by a certain domain indextype6 butstill be used on different data types. Consequently the operator has to be bound to (pos-sibly multiple) functions. When executing a query the function with matching signatureis chosen for execution. In practice usually a function capturing the semantics of theoperator to be defined is written and then the operator binding to this function is defined.This important extension will play a major role in the remainder of this work. Obviouslyfor queries that require the efficient computation of e. g. spatial overlap, a standard one-dimensional B-Tree index is not sufficient. The user must be able to specify a set offunctions for index creation, update, deletion as well as start, step and closing of an in-dex scan. Moreover the operators and data types that the index may be used for also haveto be specified. Access methods for spatial, temporal and spatio-temporal domains andtheir exemplary implementation will be described in section 3.3, 4.3 as well as in partII of this work; section 2.3.2 explains how user-defined query optimization can be im-plemented in ORDBS. In addition to the basic functionality classic database issues suchas locking, recovery and page management should also be addressed by a user-definedaccess method.

Other object-relational features that do not need further explanation includeexternalfunction implementationand thepossibility for database callbacksin functions. Object-relational features that are either not used in this work or not supported by current OR-DBS implementations include (for details see [SB99]):inline storage of collection types,base type extension, user-defined aggregates, types of arbitrary length, functional nota-tion for operators,user-defined negatorandcommutator operators, flattening of com-plex object queries, indexing of attributes of collections, dataandfunction inheritance,multiple inheritanceand efficient support ofjoins on inheritance hierarchies.

Formal Definition

Relational systems have a solid theoretical foundation. Since object-relational databasesystems are a relatively new concept as of now no widely accepted formal specificationof object-relational databases can be found. It is beyond the scope of this work to pro-vide such a definition. Nevertheless some comments on how to extend the theoreticalframework of relational systems seem appropriate.

The setD of valid domains for the system has to be extended. Moreover this setis not static for a particular system since users can define their own types. These user-defined types need to be added toD . Extension to subtypes of standard types and torow types is pretty straightforward (considering that row types only become importantwhen used in a table). For collection and reference types it is much more difficult sinceone obtains infinitely many domains. All other components of relational systems likeconstraints and manipulation semantics also have to be extended substantially. The ex-

6An indextype is given by the particular index structure to be used and its specifications.


tension is relatively easy as long as objects can beflattenedlike row types of simpletypes in tables and is very difficult when this is no longer the case. It will probably takesome more time until a real solid foundation for ORDBS is available which could besimilar to ideas in the theory of NF2 relations (see e. g. [JS82]).

The remainder of this work will use a hybrid approach of specification of ORDBMS:in concrete applications and implementations the features offered by currently availableORDBMS (mostly Oracle™) will be used as foundation. In more theoretical or abstractsections the ideal object-relational system as described in [SB99] will be assumed. Thishas the advantage of being able to implement and evaluate certain extensions on the onehand while still providing more general and durable models that could be implementeddifferently once the advanced features become available on the other hand.

2.3.2 Implementing Extensible Query Optimization

In this section the features for extensible query optimization of the Oracle 9i object-relational database system will be briefly described. While few technical aspects may besystem-dependent the general principles of extensible object-relational optimizers willbe the same regardless of the particular system used as long as it is an ORDBS.

As explained in the previous section user-defined data types in ORDBS need datatype-specific query optimization to be used efficiently. In Oracle this is implementedby providing special interfaces for extensible optimization. Functions defined in theseinterfaces are called automatically by the database server if an implementation of theinterface for the particular data type is registered with the server. This way extensibleoptimization is as directly integrated into the database server as possible for arbitrarydata types. The interfaces of the extensible optimizer are described in detail in [Ora99a].The most important interfaces for this work are theODCIIndex interface for definingdata type-specific indexes andODCIStats interface to specify data type-specific es-timation of cost and selectivity. As an example in this section consider the data typesmallText with operatortextContains from the previous section.

User-Defined Indexing

Implementations of theODCIIndex interface have to be registered with the server foreach particular operator that they support:

CREATE INDEXTYPE textContainBTreeFOR textContains(smallText, word) USING contains;

The last part of this definition gives the functional implementation of the operator; thisimplementation is used by the server if for any reason the index implementation is notchosen for execution. In that case the functional implementation is called for each rowof the table in order to evaluate the operator.

The indextypetextContainBTree must implement theODCIIndex interfaceof the database. In the sequel the most important functions of that interface will beillustrated by using the above example on a tablepublication having a column


id of typeNUMBER and a columncontents of typesmallText. The indextypeprovides a B-Tree like indexing structure efficiently returning identifiers of documentscontaining a particular word.

A user-defined index is created by:

CREATE INDEX idx_contain ON publication(contents)INDEXTYPE IS textContainBTree PARAMETERS(’fanout=10’);

On execution of this statement the database server automatically calls the functionODCI-IndexCreate first. This function should create all structures necessary for using thedomain index that are independent of a particular datum. The user can pass a parameterstring to this routine in order to control specifics of the index creation. In the above ex-ample a table for storing the index contents would be created and meta information aboutthe index would be stored. After that the server performs a call toODCIIndexInsertfor each row already present in the table (cf. figure 2.2). The same function is also calledlater for each row inserted. It performs the required operations to insert the informationabout this row into the index structure. Identification of the particular row works on aso-calledROWID which is passed to the routine; aROWID is a unique identifier of a rowin the database and is the fastest way to access a particular row in a database; it may bepictured like a pointer. In object-based view it corresponds to an object identifier (OID)which uniquely identifies an object over its full lifetime.

ODCIIndexCreate

Index Metadata Index Information

ODCIIndexInsert ODCIIndexDrop

delete structuresset up structures read and modify

for each tuple

Figure 2.2:Simple sequence of cartridge operations over index lifetime

Users may now issue the following index-assisted query on the table of publicationsto retrieve all rows which contain the wordindex:

SELECT id FROM publicationWHERE textContains(contents,’index’) = true;

In this case the server firstly calls the functionODCIIndexStart to set up a scanof the user-defined index. This function is provided with the operator used, range ofdesired operator results and operator arguments. It has to set up everything that is nec-essary to execute an index scan. In particular it has to prepare a scan context whichstores all information required from one call ofODCIIndexFetch to the next. After


ODCIIndexFetch ODCIIndexClose

delete structuresset up structures

ODCIIndexStart

Scan Context

read and modify

until all tuples fetched

Figure 2.3:Sequence of cartridge operations in query execution

that the server callsODCIIndexFetch to retrieve the next set of result rows. For effi-ciency this function is called multiply until all result rows are retrieved. The maximumnumber of rows retrieved for each call is passed as an argument. It is not passed anyinformation about the operator, operator arguments or required result values. If thesevalues are necessary for execution they have to be stored in the scan context by theODCIIndexStart routine. After all results have been retrievedODCIIndexCloseis called to clean up processing of an operator by e. g. freeing memory used for a scancontext (cf. figure 2.3).

If the index is not required anymore the user may issue a

DROP INDEX idx_contain;

statement. This leads to a call ofODCIIndexDrop which should remove all index in-formation and metadata set up during creation (cf. figure 2.2). Depending on the particu-lar index a modification of certain index parameters may be possible without a completedeletion and insertion of an index. In this case theALTER INDEX statement can beused and the cartridge function associated with this statement isODCIIndexAlter.

Similarly, in the case where a modification of a row of an indexed table is executed,instead of deleting and re-inserting of the according index information, it may be moreefficient to modify the existing index information. This task can be done by implement-ing ODCIIndexUpdate which is called automatically on each update of a row of anindexed table.

As with any PL/SQL function all these methods in an indextype may be implementedin either PL/SQL directly7 or in other programming languages such as C, C++ or Java.C and Java methods can be invoked directly from PL/SQL, whereas C++ methods haveto be wrapped in a C function that is called from the database. Since indexing is usuallytime-critical, implementation in C is strongly recommended. With the callback facilitiesit is still possible to store index information inside the database and thus achieve alladvantages of a DBS for the domain index also.

7which is not very efficient for indexing large tables, see section 8.2.1


User-Defined Statistics

User-defined statistics may be beneficial for different user-defined schema objects. Thusstatistics can be defined for table columns, functions, packages, types, indextypes or do-main indexes. The definition is performed by implementing a user-defined type imple-menting theODCIStats interface. Statistics for multiple related schema objects maybe defined in the same statistics type which is useful for e. g. a data type, its methodsand the corresponding indextype on this data type. After the definition the user-definedstatistics have to be registered with the server. For the sample indextype in this sectionthis works as follows (ifmyTCBTstats is the type implementing the interface):

ASSOCIATE STATISTICS WITH INDEXTYPE textContainBTreeUSING myTCBTstats;

In a typical setting as described above the user would firstly issue statements tocollect the user defined statistics:

ANALYZE publication COMPUTE STATISTICS;ANALYZE idx_contain COMPUTE STATISTICS;

In this case the server automatically calls the functionsODCIStatsCollect for theparticular schema objects and the required information should be assembled and storedappropriately by these user-defined methods. The statistics collected can either be storedin type variables or regular database tables accessible by other functions of this typeoperating on the statistics.

In the presence of user-defined statistics the aforementioned query

SELECT id FROM publicationWHERE textContains(contents,’index’) = true;

would be executed as follows by the optimizer. The extensible optimizer will computethe cost of two query execution plans for the user-defined operator: one plan involves afull table scan and a call to the functional implementation of the operator for each row inthe table, it makes use of theODCIStatsFunctionCost function. In the examplethe cost of the operator execution may be approximated by the number of publications inthe table multiplied by the average number of words in a publication. Consequently thesetwo measures would have been computed during statistics collection on the column.

The other plan involves retrieving all rows satisfying the operator by an index scanwhich requires functionsODCIStatsIndexCost andODCIStatsSelectivity.In particular by invoking the functionODCIStatsSelectivity an estimate of thepercentage of rows that will be retrieved when executing the operator is computed. Thisis multiplied by the cost resulting from the call toODCIStatsIndexCost whichcomputes the average cost of accessing a single row by an index scan8. The plan with the

8A previous version of Oracle used this function to compute the total cost of the operator executionwith the selectivity passed as a parameter.


smaller expected cost is then chosen by the optimizer for execution. Since choosing anexecution plan is also executed during query execution, the last three functions involvedshould be implemented very efficiently, since their execution time counts towards totalexecution time of a query.

Finally costs are specified in a special typeODCICost which has fields for CPUcost, I/O cost and network cost. Thus costs should be determined for each of thesecategories. The different components of cost will be combined by the server dependingon the particular system used.

2.3.3 Commercial Products

Most traditional relational database systems have evolved into object-relational sys-tems to some degree in the last few years. On the commercial side the widely useddatabase systems of Oracle™ can be called object-relational from version 8 onwards(see e. g. [HP98] or [Ora99b]). IBM’s system DB2™ is also OR to a certain degreebeginning with version 5 ([Cha96, CCD99]) and Informix™ ([BM00, Bro01]) now alsobelonging to IBM called its latest database product dynamic server to show its capabil-ities; it is based on Illustra which was the first object-relational system available. Thereare also several other commercial systems that are object-relational to a certain degree orobject-oriented with some relational features which cannot be listed here in entirety. Onthe open source market the most known system that is OR is PostgreSQL9 from version7 onwards.

None of the systems mentioned above supports all the features specified in this sec-tion nor in the literature. That is due to the ongoing development and novelty of thefield. New versions of all these systems are released frequently and the number of fea-tures supported change with every version. Therefore it is virtually impossible to pro-duce an exact listing with the current status about supported and unsupported featuresof every system. Despite the often changing releases this is also due to the fact, thateven though some systems are sold as providing several features, it is not guaranteedthat all of those really work as expected. If one tries to use them extensively severalunsolvable problems in the form of errors arise or inefficient workarounds are requiredto be able to reallyuse those features. Our work showed this experience especially whencombining several different new features into a single cartridge10 application. Moreoversome features are purposely only supported by workarounds such that an efficient usageis basically impossible. The reader is referred to the release notes of the particular prod-uct and version for a listing of the theoretically available concepts. Before spending toomuch work on a particular feature, prototypical case studies are strongly encouraged tomake sure efficient support for this feature can in fact be expected. These problems aremost likely due to the relative novelty of the field and the fact that relational systems are

9More information can be found athttp://www.postgresql.org10A package containing all required object-relational extensions as described earlier in this chapter for

a particular application domain is also calledcartridge. This notion is used since this package may beseen as an extension to the database server that can be attached to it like a cartridge.


extended by the new features leading to some integration problems. They will surely beovercome, once OR systems become widely used and settled.

Chapter 3

Spatial Databases

The ability to store data that carries any kind of spatial information has been desiredfor quite a while now. It is so important for many applications that it has developed itsown branch in the software industry, commonly known as GIS (GeographicInformationSystem). These systems were developed starting in the 1980s in order to support model-ing, storing, managing, evaluating and especially visualizing data in an application areathat describes some phenomenon of the real world at some point1 in space. Most ofthese commercial as well as public domain systems support the management of (mostlytwo-dimensional) spatial data. They provide a lot of domain-specific functions as wellas sophisticated visualization features for certain domains. Users in a lot of differentapplication areas that are inherently spatial (e. g. cartography, power supply) use themfor their everyday tasks.

The advent of advanced database systems (see section 2.3) on one side and the desirefor process optimization by advanced data analysis on the commercial side on the otherhand increased the demand for use of spatial data. More and more benefits from usingspatial informationintegratedwith other domain-specific information have been discov-ered in applications. For example a grocery store chain can determine, if a possible newshop location will be successful, by analyzing current locations, their sales figures andthe population in the area. The important point is to note that these improved applica-tions need traditional business data (sales figures) which are usually already stored inrelational database systems together with spatial information (be it from the company it-self, e. g. shop locations, or from an external supplier, e. g. census data for population ina certain area). This is enabled by using ORDBS, which can store traditional relationaldata together with the spatial information by using complex types.

Since spatial information has been considered important for some years now, thereare books as [LT92, RSV01] describing all aspects of spatial information systems. Thesecan be GIS as well as advanced database systems storing spatial data. There is alsoliterature in this area either more database-oriented ([G¨ut94, AG97]) or more orientedtowards GIS ([KNRW97, SB97, Bar00]). Requirements of systems for more specialized

1The term point in this context describes any subset of the space modeled in the application. It neednot be a point in the geometric sense.

28 Spatial Databases

application areas such asenvironmental information systemswere investigated e. g. in[Gun98]. There were also efforts undertaken to apply advanced database concepts likeconstraint databasesto the spatial domain. More information on this topic can be founde. g. in [GRSS98].

In the first section of this chapter some fundamental issues in spatial databases arepresented. These focus on the different domains for spatial data and on models how torepresent these in a database. The second section will focus on the efficient retrievalof spatial data residing in a database. Because of the multidimensionality of the dataeven seemingly simple queries (likeWhich highways cross rivers ?) can be very timeconsuming and therefore need support for efficiency. Some fundamental index structuresof the many proposed for spatial data in the literature will be discussed.

3.1 Data Models

There are many ways to look at the same piece of spatial information which in itself canhave different properties. Depending on the application and computing resources wecan distinguish between the following ways to model spatial information:

• object-based or network view

• vector or raster representation

• two or three dimensions

• layered organization with thematic, temporal or spatial layers

Thenetworkor topologicalview models spatial information (usually in two dimen-sions) under the assumption that at every point in space there can be exactly one valuefor each kind of thematic information. Consequently it is also assumed that there existhomogeneous areas (space in 3d) of each theme. Therefore the borders of these homo-geneous areas can be described by connected line strings (surfaces in 3d) which forma network. Theobject-basedview on the other hand assumes the existence of somereal-worldobject with thematic attributes whose spatial position or extent can be de-scribed by an additional attribute of a spatial data type. The advantage of the latterapproach is more flexibility: the definition how anobject is formed is less restrictive.Modeling objects and their properties in different themes and from different domainscan be integrated much better this way. Also overlapping thematic information can onlybe modeled by the object-based view. On the other hand the network view providesinherent spatial constraints which do not have to be checked manually.

The vector representation of spatial information uses lines or spheres to describethe border of the spatial feature of this information. These borders were traditionallydescribed by two-dimensional vectors or line strings and that is where its name comesfrom. Therasterrepresentation on the other hand divides the whole spatial domain intosmall cells of equal size with each cell carrying values for all thematic attributes. Both

3.1 Data Models 29

representations have their advantages and drawbacks and thus one may wish to storeboth; an efficient approach for this two-way storing has been presented in [Win98].

Very important for modeling spatial information is the concept oflayers. Tradition-ally maps were created by drawing geometries from a certain theme onto a sheet oftranslucent paper. By putting all required sheets for a certain region on top of each otherthe desired map was obtained. Even though nowadays maps are constructed differentlythe concept of a layer is still very important. In many geo-scientific applications dif-ferent thematic attributes are modeled by using different layers. Since at every pointin space there can be only one value of the thematic attribute one obtains a raster-likerepresentation of the spatial thematic attribute. If regions of the same value of a thematicattribute are drawn instead of point-based information, one obtains a network represen-tation by using the borderlines of the regions of homogeneous thematic behavior. To thisend the concept of layers is closely related with raster representation or network-basedview of spatial information. Layers are sometimes also used to model thematic attributesat different points in time or at different depths/heights. This technique tries to modelinformation in a third dimension, either a temporal or a third spatial dimension, by usinga layer for each value in the new dimension. While for some applications this modelmay be sufficient more advanced models of multidimensional data will be consideredin chapter 5. These models are required both by advanced applications and even moreimportantly from a conceptual point of view since the aforementioned approach is notscalable and also conceptionally inappropriate.

3.1.1 Coordinate Systems

Whereas the computer representation of spatial coordinates usually assumes rectangularcoordinate systems the earth unfortunately does not provide this feature. Modeling spa-tial information on the earth’s surface requires using coordinates on a three-dimensionalsphere. Since especially maps and older GIS are not capable of displaying or usingthree-dimensional information, a plethora of methods has been developed to map coor-dinates on the surface to a rectangular coordinate system with as few error as possible.These methods are developed incartographyand are therefore out of the scope of thispresentation. The interested reader is referred to standard books in cartography like[HG94]. The only type of spatial reference system that should be mentioned here, sinceit is used in the case studies in later chapters are Gauss-Kruger coordinates. This is acylindric mapping system using a Gaussian conformable spheric projection. It is usedsince it is widely accepted in European governmental authorities and most of the datain applications investigated in this work were provided in Gauss-Kruger2 coordinateformat.

A list of the most important coordinate systems also known asspatial reference sys-temsis provided by the OpenGIS Consortium (OGC). Moreover this non-profit mem-

2The Gauss-Kruger coordinate system is due to be replaced by the world-wide quasi-standard UTMin Germany. Using a different coordinate system is nevertheless no restriction for the concepts presentedin this work.


bership organization which draws its members from international businesses, govern-mental agencies and academic institutions has issued many specifications which facili-tate interoperability between different spatial application systems. Of particular interestfor this work is the OpenGIS simple features specification3 which defines basic formatsfor the representation of some basic spatial features in different formats. By stickingto these specifications spatial information is easily exchangeable between different sys-tems. The usage of OGC specifications is intended wherever possible throughout thiswork.

3.1.2 Spatial Operators

For applications on spatial data specific spatial operators are needed. There are manypossible operators and it depends on the application as well as on the dimensionality andspatial types which are most important. In general spatial operators will be from one ofthe following categories:

• topological operators

• directional operators

• metric operators

The group oftopological operators include operators such asoverlap, meet or coverwhich ask for the topological relation between spatial objects.Directional operatorssuch asnorth or northeast query for the relative positions of spatial objects. This classof operators is inherently fuzzy in that the definition of these operators is not obviousand is very likely to differ over applications. Directional operators were not requiredin the applications used in this work. Themetric operators query the dataset for somemetric property (e. g. thek objects closest to a given object). This class needs a metricto be defined on the spatial objects whereas the other two do not inherently requireone. Operators of this class may also be fuzzy in nature. The three different classes areillustrated in figure 3.1.

There are also several spatial functions operating on a spatial reference object return-ing numerical values that are needed by some spatial applications. These includesizeandlength of spatial objects. Since they return basic data types and can thus make useof regular indexes on one hand and are only considered in modeling functional behaviorof an application they are not considered any further. Nevertheless efficiency of compu-tation of these functions can be improved by techniques from computational geometry,see e. g. [PS85, dBvKOS97].

3Similar to the world-wide web consortium most documents issued by the OGC are publicly availableon their web sidewww.opengis.org. More information about the OGC and their current specificationscan also be found there.

3.1 Data Models 31

q

o1

o2

o3

o4

o5

o6

overlaps

east

nearest_2

Figure 3.1:Examples of spatial operators

Topological Operators

In general topological operators can be considered the most fundamental group and havetherefore been studied deeply (e. g. in [PSV99]). Summarizing [EH90] one can say thatin two dimensional space there exist the following five possible relationships betweenspatial regions:

• disjoint

• isContained

• touch

• equal

• overlap


These stem from comparing all 24 possible combinations of the intersection of theborder∂S and the interiorS◦ of spatial objectsS and by eliminating impossible aswell as symmetric combinations. This approach has been extended in various waysespecially to also accommodate line and point objects. Since there are too many possiblerelationships to remember, the best way to define the five operators mentioned above isto use thedimension-extended methodof [CDFO93]. It considers the dimension of thepossible intersection, e. g. if it is a line or a region, where important and additionallyintroduces thecrossrelationship for e. g. a linear intersection of two regions. In theremainder of this work a restricted set of operators is sufficient which can be definedformally as follows.

Definition (Spatial Topological Operators):For two spatial featuresS1 andS2 there holds:

S1 anyInteractS2 :⇔ S1 ∩ S2 �= ∅S1 isContainedByS2 :⇔ (S1 ∩ S2 = S1) ∧ (S◦

1 ∩ S◦2 �= ∅)

S1 equalsS2 :⇔ (S1 ∩ S2 = S1) ∧ (S1 ∩ S2 = S2)

S1 overlapsS2 :⇔ (dim(S◦1) = dim(S◦

2) = dim(S◦1 ∩ S◦

2))

∧(S1 ∩ S2 �= S1) ∧ (S1 ∩ S2 �= S2)

S1 crossesS2 :⇔ dim(S◦1 ∩ S◦

2) = (max(dim(S◦1), dim(S◦

2)) − 1)

∧(S1 ∩ S2 �= S1) ∧ (S1 ∩ S2 �= S2)

The operatoranyInteractis true, if two objects share any common point. The defini-tion of isContainedByis only true for real containment since the second clause insuresthat the interiors of the objects are not disjoint, as would be the case for a line stringlying on the border of a polygon. Whereas the operatorequalshas an obvious defini-tion, operatoroverlapsdeserves a closer look: objects only overlap if the overlap of theirinteriors has the same dimensionality as their interiors; i. e. for a line crossing a polygonoperatoroverlapsis never true. For such relationships the operatorcrossesis used. Onlytwo lines sharing a common segment or two polygons sharing an extended area overlapin this sense4.

Metric Operators

Of the many possible and important metric operators that have been proposed for spatialapplications only the most fundamental ones will be mentioned here. Additional oper-ators being important for applications that are investigated in detail in this work will beexplained when they are used.

In two dimensions basic unary metric operations that are required includelengthfor one-dimensional features such as lines or line strings,areaand borderlengthfortwo-dimensional features such as rectangles or circles. Moreover among the binary

4as long as they are not equal or one is contained by the other

3.2 Spatial Querying 33

operatorsdistanceis the most important one which operates on two spatial objects ofany dimension. It is defined by extending the metric spatial operatordist operating ontwo points whose definition follows directly from the metric used.

Definition (Distance Operator for Spatial Features):

distance(S1, S2) := min{dist(s1, s2) | s1 ∈ ∂S1, s2 ∈ ∂S2}

In three dimensions thedistanceoperator can be used as defined above, the differ-ence will be introduced by the different metric in thedist definition. Among the unaryoperators we have to introduce operators for three-dimensional features such asvolumeandborderarea. Their formal definition can be found in the literature.

3.2 Spatial Querying

In contrast to traditional relational database systems where the core of the query lan-guage of the system had to be extended to be able to use spatial operators and functions,ORDBMS allow the user to specify domain-specific data types and functions that cansupply the necessary functionality even when using the standard query language of thesystem. The standard language SQL has been extended in SQL:1999 ([For99]) provid-ing an extensible query language for ORDBMS. We can pose any type of spatial queryincluding spatial selection, spatial join, spatial function application as well as other setoperations using this standard query language. A formally complete specialized spatialquery language can be found in [GBG99]. That language is naturally more of theoreticalinterest and will thus not be used in this work.

Nevertheless spatial data is usually more useful if it can be used graphically. Thatrequires graphical query facilities as well as graphical output of the query results. Thegraphical support will be provided by an intermediate application layer on top of thedatabase system and below the end-user application level. Exemplary work in this areaof spatial databases has also been done in case studies in this work and is describedtogether with the application it is used for in chapter 9.1.

3.3 Index Structures

A database system storing spatial data needs special index structures in order to answerqueries efficiently. Since algorithms from computational geometry are usually compu-tationally expensive but essential to answer spatial queries the so-calledfilter-and-refineprocessing is applied (cf. figure 3.2 and [KBS93]). In a first step a set of candidateobjects for answers to the query is generated by using an approximation of the spatialobjects and a (much simpler) geometric algorithm on these approximations. The can-didate set must be a superset of the query answer. In therefinementstep each elementof the candidate set is examined with its exact geometry and by standard geometrical


Spatial Query Spatial Indextest exact

geometry

false drops

hits

candidate set

immediate hits

query result

FILTER REFINE

Figure 3.2:Filter and Refine Evaluation of Queries

algorithms. Thefilter step can be supported by a database index: approximations ofall geometries are stored in the index so that certain objects can directly (and quickly)be eliminated from being a query result without using the time-consuming exact geo-metrical algorithm. In most proposed spatial index structures objects are approximatedby their minimum bounding rectangle (MBR) in two dimensions. This can easily beextended to three dimensions by using the minimum bounding box (MBB) of an object.

Indexing of spatial objects has been the subject of many research publications. Nev-ertheless literature is lacking a concise, exhaustive comparison of methods proposed.This is probably due to different characteristics of the datasets used in comparisons aswell as usage of different database tuning stages. Most of the index structures proposedare investigated and to a certain degree compared in [GG98, G¨un98, HR99]. A theo-retical analysis of worst-case optimal indexing can be found in [ASV99]. Since someimportant and widely used index structures work very well on average but not in theworst-case those results are of limited practical use. In this work index structures withproven good average case behavior are used as foundation. They are later extended formore complex data types (e. g. spatio-temporal data). In the sequel basic versions of theindex structures used later will be presented.

There are two fundamentally different ways to approach indexing of spatial (andmultidimensional) objects: either to map the properties to be indexed to one of the stan-dard data types and use standard indexes on these mapped values, or to build new indexstructures for the specific domain and implement these structures on top of the databasesystem.

In the first approach one can be sure that the fast and robust functionality of thedatabase system is used and will provide the best performance to be expected by the tra-ditional system on non-traditional domains. The drawback is that the proximity inher-ently present in the multidimensional data cannot be mapped properly to the traditionalone-dimensional data types provided by the system. Therefore performance will de-grade since this additional information cannot be used in query evaluation. The methodsbased on Z-Codes described in section 3.3.1 fall into this category.

In the second approach this drawback is solved because the index structure can bespecifically designed for this data type and thus use all the information included in thedata appropriately. In the spatial domain structures in this category are usually based on

3.3 Index Structures 35

the R-Tree ([Gut84]) and are described in section 3.3.2. As explained in section 2.3.2modern commercial database systems allow to integrate user-defined index structuresdirectly into the database kernel. Later in this work it will be shown that this flexibilitycan improve the performance for queries on user-defined types. It is nevertheless diffi-cult to achieve the same performance for traditional, well-established functionality dueto a lack of performance in the implementation language or external call overhead. Forthis reason a relational implementation of advanced indexes as opposed to an implemen-tation using the extensibility features of ORDBS has been reported e. g. in [P¨ot01]. Inthis work the more flexible and generally applicable approach of using the extensibilityfeatures will be used instead.

There is no clear answer as to which of the two approaches works better in general.Comparative experiments have been carried out in [Lam01]; these and additional exper-iments are also described in section 8.3.2 of this work. To a certain degree preservingthe spatial features of data and using a specific index for extended spatial objects asin the R-Tree approach seems to be the better way in the long run. This is due to theextensibility features that are gained by using more generalized structures that can bespecialized for multiple data types (see description of generalized search trees in section8.2.2). Another reason may be that preserving spatial proximity becomes more impor-tant the higher the dimension of the data is. In multimedia applications where featurevectors frequently have dimensions between 15 and 25 a mapping to a one-dimensionaldata type leads to very bad performance. Probably that is the reason that several in-dex structures for these high-dimensional data have been proposed by researchers in themultimedia field. Together with the assumption that integration of user-defined indexesinto database systems will be very much improved in the future, it is suggested to usecustomized index structures for user-defined types.

Also datasets of spatially extended objects may need different index structures thandatasets of points only. Since the aforementioned feature vectors are usually pointsin high-dimensional space, specialized index structures for these domains haven beenproposed such as the similarity search tree which will be introduced in section 3.3.2.Later it will be shown that merging techniques of these different structures may lead toimproved indexes.

3.3.1 Z-Code Based Hashing

The idea to divide the spatial domain into rectangles of equal size and assign an integernumber to each of these rectangles has been published for the first time long ago in thecontext of space-filling curves ([Pea90]). The first work that used this idea for indexingspatial data is [OM84]. Many variants of this idea have since been published. Theprocedure of generating the index entries is described in some detail for two dimensionsin the sequel. After that a possible extension to more dimensions is briefly mentioned.

To generate the Z-Code for the objects the whole area is subdivided into four equallysized rectangles in a first step. These rectangles are coded00 to 11 (0 to 3 decimal; onebit for each axis) such that the order of the numbers follows a Z geometrically. In thenext step these four rectangles are subdivided into four equal sized rectangles each. Each


0 1

2 3

00 10

120302

20 21 30 31

33322322

1101

13

012003002

020 021 030 031

033032023022

011001

013

100 110

112103102

120 121 130 131

133132123122

111

113

210

212203202

220 221 231

233232223222

211201

213

310

312303302

320 321 330 331

333332323322

311301

313

200 300

000 010 101

230

Figure 3.3:Generation of Z-Codes up to level 3

of these 42 = 16 rectangles is again coded 0 to 3 according to the Z shape. In addition itgets the Z-Code of the rectangle it is enclosed in as a prefix. This subdivision is repeateduntil the rectangles are small enough for the application at hand or the Z-Code achievesa given maximal length. The procedure is illustrated for 3 steps in figure 3.3.

The properties of Z-Codes which can be easily used to determine a set of candidateobjects in the primary filter operation include:

• Rectangles which are close to each other spatially have also Z-Codes that areclosely related in the sense that their Hamming distance is small (i. e. the numberof differing bits is small). This is valid for all rectangles: the Hamming distanceof neighboring rectangles is equal to the number of steps that have to be walkedbackwards in order to find a rectangle fitting to the grid and containing the twoneighboring rectangles.

• If rectanglesr1 andr2 are such thatr2 is contained inr1 then the Z-Code ofr1

is a prefix of the Z-Code ofr2 by construction. This prefix property is exploitedin the following algorithms and is traditionally supported by standard indexes indatabase systems. Therefore indexes based on this property are very well suitedfor integration into traditional relational systems. The prefix property does nottranslate well to spatial neighborhood in all cases though: for rectangles wherethe Z jumps into a different area of the domain neighboring rectangles do notpossess common prefixes (e. g. rectangles133 and 311 in figure 3.3 are directneighbors but have no common prefix).

The same procedure to compute Z-Codes can alternatively be described as follows:

1. Divide the domain along each axis intok = log2 R intervals of equal length (R isthe given number of total rectangles to be used; it should be a power of 4). Labeleach interval with a bitstream(b1, . . . , bi=log2 k) representing the binary equivalentof the number of intervals preceding it along that axis.

2. The Z-Code of a given rectangle is obtained by bitwise interleaving of the bit-streams associated with the intervals describing the borders of the rectangle. Forexample the Z-Code of the blue rectangle in figure 3.3 is obtained by bitwise in-terleavingy = 110 withx = 010 asz = 101100 or in base 4 notationz = 230.


10

12

20 21 30 31

33322322

11

13

Z−Code {0}

10

12

20 21 30 31

33322322

11

13

Z−Code {012,013,021,030,031,032,033}

10

12

20 21 30 31

33322322

11

13

Z−Code {012,013,021,023,03}

Figure 3.4:Different possible Z-Codes (level 3) using exact geometry of objects

10

12

20 21 30 31

33322322

11

13

Z−Code {0}

10

12

20 21 30 31

33322322

11

13

Z−Code {003,012,013,021,023,030,031,032,033}

10

12

20 21 30 31

33322322

11

13

Z−Code {003,012,013,021,023,03}

Figure 3.5:Different possible Z-Codes (level 3) using MBR of objects

To set up the index entries for spatial objects each object can be indexed by one ormore Z-Codes. This is achieved by determining a set of rectangles covering the objectto be indexed in an appropriate way. There are different ways to determine this cover-age: in a first step either the object itself or its MBR can be used to obtain the desiredZ-Codes. Using the object itself we can potentially reduce the number of Z-Codes ob-tained compared to using the MBR. On the other hand determining the Z-Codes is muchfaster when using MBRs instead of the objects itself. The method appropriate for anapplication depends strongly on the shape of the objects to be indexed. For a datasetof diagonal line strings for instance the MBR approach does not work well whereas itis acceptable for polygons. For the former type of data an approximation by multiplebounding rectangles as opposed to one large rectangle may be beneficial since it moreclosely approximates the line strings. For each of these methods (exact geometry, MBRor multiple bounding rectangles) we also have the possibility to use different ways toobtain the Z-Code(s) according to the size of the rectangles used (different possibilitiesare illustrated in figures 3.4 and 3.5):

1. The object is coded using exactly one Z-Code. This code is determined by theZ-Code of the minimal rectangle covering the object to be indexed (MBR or exactgeometry). This approach can be implemented using rather easy algorithms andminimizes the amount of internal management. The main disadvantage is that the


approximation is very rough and therefore leads to large candidate sets. Theseresult in large inputs to the secondary filter step degrading index performance.

2. The object is coded using a set of Z-Codes of maximal length (the length dependson the application as explained above). All rectangles of minimal size coveringthe object to be approximated are computed and their Z-Code is inserted into theset of codes. This approach needs a lot of internal storage and management buthas the advantage of good object approximation, which optimizes the secondaryfilter by a minimal candidate set. Together with exact object approximation this isthe purest way to use a filter-and-refine strategy.

3. In a hybrid approach one can also use sets of Z-Codes for each object, but the setcontains Z-Codes of different length. We obtain the index entries by first followingthe previous step and then merging neighboring rectangles with common prefixesinto larger rectangles. We only need to include these larger rectangles into theindex. To a certain degree this approach has the advantages of the second approachbut depending on the implementation may need additional computing to merge therectangles. This process may be improved by not computing the smaller rectanglesfirst and by using rougher approximations. This approach is closely related to[Ore86].

The choice to use a Z-Code based index and in particular which of the methods de-scribed depends on the application at hand. In that sense even though using traditionalindexes inside the database system a customization for the particular application domainis required. For fairly regular shapes one should not use too many index entries foreach object whereas irregular shapes will probably work better with many index entries.Other disadvantages of the methods described are remedied by using improved algo-rithms based on Z-Codes as proposed e. g. in [ELS97, BKK96]. [Klu98] presents ananalysis of some of these methods.

Extension to d > 2 Dimensions

In the previous paragraph the computation of Z-Codes in two dimensions was described.Obviously this method can be easily extended to more dimensions. This is achieved byusing the bit interleaving method presented in the previous paragraph. Instead of in-terleaving just two bitstreams (one forx-axis and one fory-axis)k = logd R intervalsare computed along alld axis to be used. The total numberR of d-dimensional rectan-gles (e. g. boxes in 3d) should be a power of 2d . After that alld bitstreams of lengthi = logd k describing the borders of thed-dimensional rectangle are interleaved. AZ-Code of lengthd · i bits is obtained for eachd-dimensional rectangle.

While this method can be easily implemented the drawbacks of this method in twodimensions become even worse in higher dimensions. The problem of preserving spatialproximity in a mapping to a one-dimensional domain which could not be overcome com-pletely in two dimensions gets the larger the higher the dimension of the indexed spatialobjects is. While for classical spatial applications which usually do not use more than


three dimensions this may to a certain degree lead to sufficient results, more advancedmulti-dimensional applications like multimedia and data warehousing should probablyrather use R-Tree based indexes which are described in the next subsection.

3.3.2 R-Tree Indexing

As an alternative to thetransformation-based methods a different class of indexing meth-ods are studied in the literature which are based on the R-Tree ([Gut84]). These methodsare particularly well-suited for extended objects. They are calledoverlapping regionsmethods in [GG98]. The basic R-Tree is briefly described in the next paragraph followedby descriptions of two advanced structures that are based on the R-Tree. To illustrate theproperties of a R-Tree there is a small example tree in figure 3.6.

o3 p5

R1 R2

R1

R2

o2

o1 Q

o3

o4

p1

p2

p3

p4

p5R3

R4R5

p6

p7

p8

o5

o6

R6

R7

R6 R7

p1 o2 p8 o1p3 o4p4 p2 o6 o5p6p7

R3 R4 R5

Figure 3.6:Example of an R-Tree storing 8 points and 6 rectangles


Iterator rangeSearch (Node N, Region Q):for all rect in N do

if Q overlaps rect thenif N is a leaf then

add rowid to result iteratorelse

call rangeSearch (ptr, Q) recursively;end if

end ifend for

end

Figure 3.7:Region Search using R-Trees

An R-Tree is a height-balanced search tree corresponding to a hierarchy ofd-di-mensional rectangles. It has internal and leaf nodes where each node fits into one diskpage. Leaf nodesL have entries of the form (int, rowid) whereint is thed-dimensionalapproximation of the object stored atrowid in the database. Internal nodesI have entriesof the form (int, ptr) whereptr is a pointer to a tree node on a lower level andint is ad-dimensional interval containing alld-dimensional intervals reachable viaptr. Themaximum numberM of entries in a node is derived from the size of a disk page suchthat one node exactly fits into one page. Other properties of an R-Tree include:

• Every node has at leastm ≤ M/2 entries unless it is the root.

• The root node has at least two entries unless it is a leaf.

• All leaves of the tree are at the same level. The height of the tree is at mostlogm(N )� for N index entries.

The lower boundm of entries in a node is maintained in order to guarantee a sufficientfanout for the tree to keep its logarithmic height. In case the number of entries dropsbelowm the tree is condensed by afuseoperation similar to B-Trees. Similarly duringinsertionsplit operations are applied to deal with overflows.

Searching in an R-Tree for all objects that overlap a given query region works inprincipal as shown in figure 3.7. For example a search for all objects overlappingQ infigure 3.6 would lead to recursive calls on nodes pointed to byR1 andR2 since bothoverlap the query point. At the second level only rectanglesR5 and R6 intersectQleading to recursive calls on nodes containingp1 ando2 as well asp8 ando1. Finallythe rowids of objects with MBRo1 ando2 which intersect withQ are reported as results.Since these rectangles are only MBRs of the exact geometries the refinement step has totest for the exact geometric operator. Compared to calling the exact operator on all 14objects the index has led to only two exact operator calls.

To insert a new entry into an existing R-Tree the tree has to be searched from top tobottom to find the leaf best suited to hold the new entry. In particular each rectangle of


a node is checked for how much enlargement is necessary if the new entry were insertedinto the corresponding subtree, such that the rectangle remains the MBR of all entriesin the subtree. In other terminology this required enlargement is called apenaltyalso,since every enlargement possibly decreases index performance. At each level the subtreewith minimum penalty is chosen for insertion. If the search has reached the leaf levelthe MBR of the new entry is inserted into the leaf together with a pointer to the dataobject. This insertion may cause an overflow of the current node in which case it has tobe split into two nodes. The strategy how to distribute the entries over the two new nodesdiffers from R-Tree variant to R-Tree variant, among possible criteria are minimal sumof MBR areas and minimal sum of MBR boundaries. After that a new entry has to beinserted into the parent node which in turn may cause another overflow to be propagatedupwards, similar to standard balanced search trees.

Iterator RTreeJoinFilter (RTree R, RTree S):candidates.enqueue(Candidate(R.root(),S.root()));while !candidates.isEmpty() {

M = candidates.front().first();N = candidates.dequeue().second();for all rect in M do loop i

for all rect in N do loop jif M.rect(i) overlaps N.rect(j) then

if M is a leaf and N is a leaf thenadd Candidate(M.ptr(i),N.ptr(j)) to result

elsecandidates.enqueue(Candidate(M.ptr(i),N.ptr(j)))

end ifend if

end forend for

end whilereturn result

end

Figure 3.8:Spatial Join using R-Trees

R-Trees may also be used for other spatial queries. One of the most important andalso challenging due to its computational complexity is the spatial join query. Algo-rithms involving R-Trees have been described in [BKS93] as well as [HJ97]. The algo-rithm in figure 3.8 is similar to the first one. Its main idea is to traverse the R-Trees inparallel considering only such pairs of nodes on the next level whose MBR overlaps. Indetail the algorithm starts at the two roots and inserts all pairs of overlapping child nodes,i. e. children whose MBR information in the root overlaps, into a candidate queue. Thenext candidate pair is removed from the queue and processed similar to the two rootsuntil all candidates have been examined. Whenever two leaf nodes overlap they are notinserted into the candidate queue but rather into the result set which is implemented asan iterator. The exact geometry of the elements of this result iterator have to be checked


for exact spatial overlap in a refine step later.If applied to a spatial self-join on the tree of figure 3.6 in the first iteration all four

possible pairs of nodes would be considered, i. e.(R1, R1), (R1, R2), (R2, R1) and(R2, R2), since R1 and R2 overlap. On the next level when considering candidate(R1, R2) the efficiency of this strategy is illustrated since the only candidate pair thatoverlaps would be(R5, R6). All other pairs of rectangles fromR1 andR2 do not over-lap. One can imagine that for larger R-Trees this strategy will pay off very much. Thegreat advantage of this algorithm over Z-Code based spatial join processing is that thisalgorithm works independent of the coordinate range whereas efficient Z-Code basedspatial joins can only be computed on relations indexed using the same range of coordi-nates and usually also the same Z-Code depth. The algorithm in figure 3.8 and severalslight variants have been evaluated in [BKS93], but cannot be directly integrated intocurrent ORDBS to support spatial joins as will be shown in section 8.4. One alternativeapproach for processing of spatial joins is presented in [APR+98].

maxheap[k] R;priorityQueue C;

void nnSearch(Node N, Point P, Integer k):if N is a leaf then

for all int in N docompute distance to Pif distance is less than R.maxKey then

delete max from Rinsert rowid into R with distance as key

end ifend for

elseadd all children of N to C

with minimum distance from P as keyremove most promising node PN from Ccall nnSearch(PN, P, k) recursively

end ifprune candidate nodes: delete all entries from C whose key is

greater than R.maxKeyend

Figure 3.9:Nearest-Neighbor Search in an R-Tree

An R-Tree can also be used to answer metric operator queries like a query for thek nearest neighbors to a given point. The algorithm presented in figure 3.9 is due to[RKV95]. It uses a maximum heapR for the results which is initialized with emptyentries and keysMAXDISTANCE. It is important to note that this heap never grows orshrinks since at all times only thek currently discovered nearest neighbors need to bestored. In addition a priority queueC of candidate nodes for future investigation isused. The algorithm proceeds as follows: search starts at the root of the tree. As long


as the node is internal all entries of the node are checked for minimum distance of therectangle entry and the search objectQ. The children of this node are inserted intoC withthis minimum distance as key5. Search continues by extracting the most promising nodefrom C and calling the procedure recursively. Once a leaf node is reached the distanceof all objects referenced in the leaf fromQ is computed. Whenever this distance issmaller than the distance of the currentk-th nearest neighbor (which is the key of themaximum element of heapR), the currentk-th nearest neighbor is replaced by the objectinvestigated in the result setR. After a node is completely investigated all candidatenodes whose minimal distance fromQ (which is the key inC) is greater than the distanceof the k-th nearest neighbor already found can be removed from further considerationsince they cannot contain answers to the query. This is done by removing them fromC. If the priority queue contains no more candidates, thek nearest neighbors ofQ arecontained in the result heapR. Actually the algorithm in [RKV95] uses a more advancedpruning strategy than this description but the principal idea is the same.

As an example consider a query for the three nearest neighbors toQ in the tree infigure 3.6. This would lead to enteringR1 andR2 as candidate nodes while visiting theroot. R1 seems more promising sinceQ is closer to its center and would lead to addingof R3 to R5 to the candidate nodes. Of those the most promising would beR5 leadingto the first two candidate results inp1 ando2. The next candidate node isR2 since it iscloser toQ thanR3 andR4. This in turn leads to the inspection ofR6 and the insertionof o1 into the candidate results.p8 is not included since it is not closer than the currentthird closest objectp1. After that rectanglesR4 andR7 are pruned since they are moredistant toQ thanp1. Finally R3 is inspected ando3 is found to be closer toQ thanp1.After that the list of candidate nodes is empty and therefore the three objectso1 througho3 currently in the candidate result set are the final result.

R∗-Tree

To improve the performance of an R-Tree some more sophisticated algorithmic detailshave been proposed in the literature. Among all those the R∗-Tree ([BKSS90]) is the onesuggested very often in books and research articles. It differs from the basic R-Tree inusing different insertion algorithms. The first change is that an overflowing node is notsplit directly, but rather some of its entries ([BKSS90] suggest about 30%) are removedand newly inserted into the tree. Only if a node on the same level overflows again it issplit. This is calledforced reinsert. If a split occurs an improved splitting algorithm isused which tries to optimize metric measures like area, perimeter and overlap betweennodes. Only minor changes in the search and deletion algorithms were included. TheR∗-Tree is widely accepted to be a very efficient index structure, the authors reportedperformance improvements of up to 50% compared to a standard R-Tree.

5The underlying heuristic assumption is that nodes with smaller minimum distance are more likely tocontain nearest neighbors.


SS-Tree

In multimedia databases feature vectors of images are computed as multidimensionalpoints (dimensions are usually at least around 15) and on these similarity queries haveto be supported efficiently by the database system. Similarity on images translates tonearest neighbors of feature vectors in the high-dimensional feature space. R-Tree andeven the R∗-Tree show degraded performance, the higher the dimension becomes6. Forthat reason researchers in the multimedia area have suggested improved index struc-tures for high-dimensional data. One of the best approaches taken so far is the simi-larity search tree or SS-Tree ([WJ96]). The main difference to R-Trees is that the treeuses multidimensional spheres as index entries instead of multidimensional intervals.Besides better clustering of the feature vectors this results in higher fanout, since ad-dimensional sphere requires the storage ofd + 1 numbers (d for the center, 1 for theradius) compared to 2d for a d-dimensional rectangle. In high dimensions the fanoutcan almost be doubled this way since more entries fit into one disk page. In the orig-inal article all of the traditional algorithms on R-Trees were modified. But [WHL98]showed for two-dimensional spatial data that the most important change is the use ofminimal distance between centers in splitting during insertions. Similar observationsfor the spatio-temporal domain were made by [KL00] and are presented in chapter 8.2.3.Another advantage is the much faster computation of the distances between node entriesand objects to be inserted. It is linear in the number of node entries compared to thequadratic splitting algorithm in the R∗-Tree.

As mentioned above there are many other index structures proposed in the researchliterature each of which may be better than the ones explained above in certain domainsor circumstances. The overview article [GG98] should be consulted if further informa-tion is desired. For good average case behavior in most applications the index structurespresented above should be sufficient.

6Actually it is not even obvious how to extend the R∗-Tree to multiple dimensions.

Chapter 4

Temporal Databases

A temporal databaseis a database that is able to manage time varying data in generaland thus supports some time domain. An overview of temporal databases can be foundin [OS95, TCG+93, JCG+92]. Other books containing general introductions to tempo-ral database systems include [ZCF+97, EJS98, BJW00]. One of the early articles onthe idea of storing temporal information in a database in the sense temporal databasesare understood today was [SA86], whereas [MJS91] presents an overview of relationalalgebras for temporal data. Important concepts and terminology of temporal DBS werepresented in [JCE+94] with the most recent version in [JD98]. Moreover [Sno95] dis-cusses many modeling and implementation concepts on an exemplary query languageand may be used as a reference. Many more literature on temporal databases can also befound on theTimeCenter homepage athttp://www.cs.auc.dk/TimeCenter.

4.1 Data Models

Temporal databases have been under investigation for a long time. Thus many differenttemporal data models have been proposed, each with its own advantages, disadvantagesand special features. In general these models can be categorized by three orthogonaldimensions: firstly by the type of temporal information attached to a database object,namely a single event (orchronon; this is the smallest entity that a timestamp can repre-sent), an interval (marked by two chronons, one for the beginning and one for the end ofthe interval) or sets of disjoint intervals which are also calledtemporal elements. Sec-ondly either attributes, tuples or even sets of tuples can be timestamped. Finally we haveto distinguish between data models supporting valid time (i. e. the time that the fact tobe stored was true in reality), transaction time (i. e. the time that the fact to be stored waspresent in the database) or both, calledbitemporalmodels. In [JSS94] a model is pre-sented that unites almost all important other models proposed up to then1. At that timemodern database systems were relational which is the reason for researching integrationof temporal data into such systems only (see e. g. [Kou94], [GY88]). An overview on

1References for sources on those models may also be found in [JSS94]

46 Temporal Databases

integration of temporal data into relational systems may also be found in [TCG+93].With the advent of object-relational systems some of the preconditions of those systemsdo not hold anymore. E. g. attribute timestamping was usually not considered in imple-mentations since it was very inefficient and complicated to realize on top of relationaldatabases. ORDBS provide new options of implementing temporal data. Some aspectswill be investigated in part II of this work.

Another comprehensive overview of proposed data models with comparison and dis-cussion of individual advantages and weaknesses can be found in [Sno95]. To unify mostof these approaches the book develops thebitemporal conceptual data model(BCDM)which is a bitemporal data model and can be mapped to and from all temporally un-grouped models that had been previously proposed. Therefore it is a good reference forall bitemporal data models and it will be introduced briefly in the sequel.

Bitemporal Conceptual Data Model

The BCDM supports linear, discrete and bounded time domains for both valid and trans-action time. Both are assumed to use absolute time represented by chronons with a cer-tain basic length. Since chronons can be numbered they are isomorphic to the naturalnumbers. The domains for valid time and transaction time areDV T = {t1, t2, . . . , tk}and DT T = {t ′1, t ′2, . . . , t ′j} ∪ {UC} whereti andt ′i denote chronons andUC is a distin-guished value foruntil changed. In this model tuples are timestamped with values fromboth DV T and DT T . More formally, if a set of attribute namesA = {A1, A2, . . . , Am}and a set of attribute domainsD = {D1, D2, . . . , Dn} are given, a bitemporal concep-tual relationR consists of an arbitrary numbern of attributes fromA with domainsin D and one timestamp attributeT with domainDV T × DT T . Consequently a tuplex = (a1, a2, . . . , an | tb) in a relation instanceσ(R) consists ofn attribute values and atimestamp value. For illustration purposes an example can be found in figure 4.1.

To reflect the different semantics of valid and transaction time the explanation of abitemporal tuple is as follows: each tuple is associated with a subset ofDV T representingthe time that the tuple was true in this model of the real world. Each valid time chrononis associated with a subset ofDT T representing the time that this valid-time tuple waspresent in the database. In more detail a bitemporal tuplet is of the following form:t = (a1, . . . , an | {cvt , {ctt}}) for valid time chrononscvt and transaction-time chrononsctt . This asymmetry will also show up in the definition of the update operations.

To update a bitemporal relation we can use the regular operationsinsert,updateordelete. For aninsert where we want to store in the database that a fact is or wasvalid in the real world for a certain period of time we need to pass the fact to be recordedand its valid time. The transaction time is automatically added by the system to beUC (i. e. the fact is current in the database until it is changed). The transaction time isupdated by the system automatically as described below. The three cases to be consid-ered (formally defined below) are insertion of a completely new fact into the database,modification of the valid time of a fact already present or the case in which an already

4.1 Data Models 47

TT

VT

20 40 60 80

20

40

60

80(Scott,Research)

(Allen,Accounting)(Allen,Sales)

EMP DEPT TIME

{(20,40),. . . ,(20,59),. . . ,(39,40),. . . ,(39,59),Allen Sales (40,20),. . . ,(40,69),. . . ,(59,20),. . . ,(59,69),

(60,40),. . . ,(60,59),. . . ,(69,40),. . . ,(69,59)}Allen Accounting {(40,70),. . . ,(40,85),(40,UC),. . . ,

(59,70),. . . ,(59,85),(59,UC)}Scott Research {(70,70),. . . ,(70,85),(70,UC),. . . ,

(79,70),. . . ,(79,85),(79,UC)}Figure 4.1:Example of a bitemporal relation

current fact is to be modified. In case (i) the new2 fact is simply inserted with the givenvalues and a transaction time value ofUC showing that the fact is current. In case (ii),where a tuple of the same non-temporal attributes can be found in the database but isnot current, since it does not contain chrononUC, the already present tuple is modi-fied according to the current information given. This is formally done by deleting theold tuple and inserting a modified tuple. Finally case (iii), where a tuple with the samenon-temporal information is present and current in the database, should not change thedatabase state since aninsert-operation is not permitted: it would introduce contra-dicting information and anupdate should have been used instead. Formally we obtain(wheretv denotes a set of valid time chronons andtb a set of bitemporal chronons):insert(σ (R), (a1, . . . , an), tv) =

σ(R) ∪ {(a1, . . . , an|tv × {UC}} , if ¬∃tb : ((a1, . . . , an| tb) ∈ σ(R)) (i)

σ (R) − {(a1, . . . , an| tb)} , if ∃tb : ((a1, . . . , an| tb) ∈ σ(R) (ii)

∪ {(a1, . . . , an| tb ∪ (tv × {UC})} ∧ ¬∃ cvt : (cvt , UC) ∈ tb)

σ (R) , otherwise (iii )

2New in this context means that no tuple with the same non-temporal attributes is already present.


The transaction time has to be updated automatically by the system. In particularafter one instantt ′

i of transaction time has passed all tuples containing a chronon of theform (cvt , UC) have to be updated by inserting a new chronon(cvt , t ′i ) into the temporalinformation of that tuple. This procedure would not be very efficient if implementedstraightforward, but it is sufficient since this is only a conceptual model.

For deletion one can logically remove a tuple from the current database state result-ing in all (cvt , UC) pairs being removed from its timestamp (if any are present) suchthat no new chronons are inserted as transaction time passes. Thus the tuple is logicallyremoved. Physically no tuples are removed in bitemporal databases since the goal oftransaction time is to store older states of the database also. Formally we obtain:

delete(σ (R), (a1, . . . , an)) =

σ(R) − {(a1, . . . , an| tb)}∪ {(a1, . . . , an| tb − {(cv, UC) ∈ tb})} , if ∃tb : ((a1, . . . , an| tb) ∈ σ(R)

σ (R) , otherwise

Finally for the temporalupdate operation we simply use deletion followed by aninsertion as follows:

update(σ (R), (a1, . . . , an), tv) =insert(delete(σ (R), (a1, . . . , an)), (a1, . . . , an), tv)

This operation updates the valid time of fact(a1, . . . , an) to betv. The difference inthe signature compared to non-temporal updates in standard DBS is due to the fact thatonly updating valid time information is supported by temporal updates directly. To illus-trate the functionality of the update operations, figure 4.2 shows the updates necessaryto obtain the bitemporal relation in figure 4.1 together with the transaction time at whichthey have to occur. The firstupdate operation in figure 4.2 for instance changes thevalidity of the information(Allen,Sales) from [40,59] to [20,69]. Changing thenon-temporal attributes of a tuple has to be carried out by an explicitdelete followedby in insert as illustrated by the fourth and fifth operation in figure 4.2 where theinformation(Allen,Sales) is changed to(Allen,Accounting).

Summarizing we can say that the BCDM is a tuple-timestamping bitemporal datamodel that uses temporal elements as timestamps. Attributes in the BCDM are atomicand the model inherently solves problems with coalescing since all temporal informationis stored in the form of chronons. These are stored as elements of a set and can thus beeasily retrieved sequentially and returned as a coalesced temporal element if desired.

Since most of the applications we studied only needed valid time support, the BCDMpresented above can be greatly simplified if only one temporal dimension is required.The temporal information stored with each tuple for instance is no longer taken fromDV T × DT T but only from DV T . The system operated function to update temporalinformation as transaction time passes is no longer needed, the valid time temporal ele-ments can be stored directly. The update operations operate directly on the tuples to be

4.1 Data Models 49

Operation TransactionTime

insert(EmpDept,(’Allen’,’Sales’),[40,59]) 20update(EmpDept,(’Allen’,’Sales’),[20,69]) 40update(EmpDept,(’Allen’,’Sales’),[40,59]) 60

delete(EmpDept,(’Allen’,’Sales’)) 70insert(EmpDept,(’Allen’,’Accounting’),[40,59]) 70insert(EmpDept,(’Scott’,’Research’),[70,79]) 70

Figure 4.2:Update Operations to obtain the contents of the relation of figure 4.1

updated. In particular insertion operates as follows:insert(σ (R), (a1, . . . , an), tvn) =

σ(R) ∪ {(a1, . . . , an| tvn)} , if ¬∃tvo : ((a1, . . . , an| tvo) ∈ σ(R)) (i)

σ (R) − {(a1, . . . , an| tvo)}∪ {(a1, . . . , an| tvo ∪ tvn)} , if ∃tvo : ((a1, . . . , an| tvo) ∈ σ(R)) (ii)

Deletion in this case is just regular physical deletion of tuples whose definition canbe found e. g. in [EN00]. Theupdate operation is defined exactly as shown above forthe bitemporal case but operates on the valid time versions ofinsert anddeletedescribed here. As above the firstupdate operation in figure 4.2 can be used to changethe validity of the information(Allen,Sales) from [40,59] to [20,69] in this casealso. Updating the non-temporal attributes works exactly as in the bitemporal case (seethe fourth and fifth operation in figure 4.2 where the information(Allen,Sales) ischanged to(Allen,Accounting)). Note though that the execution of an insertionof (Scott,Research),[75,89] on the valid time version of relationEmpDeptof figure 4.1 leads to a database state containing the single fact(Scott,Research)with validity [70,89]; this illustrates the fact that the model can be implemented in sucha way that it operates inherently time-coalesced in the valid time version also, which isan important feature for implementation purposes.

The BCDM as unification of most proposed temporal data models, as well as thebase models of the BCDM, use tuple timestamping. Tuple timestamping was used,since it could be implemented on top of relational systems with acceptable overhead. Amore natural way of modeling seems to be attribute timestamping: objects of the realworld have several attributes describing them. Some of these attributes are fixed, othersare varying over time. Therefore it is natural to use timestamping at the attribute levelrather than at the object level, which is comparable to the tuple level. But as attributetimestamping was very difficult and inefficient to implement on top of relational sys-tems, it was only considered on the conceptual level so far. Sections 7.4.2 and 8.3.1 willshow that by using object-relational systems attribute timestamping may also be used onlogical and physical levels and is very efficiently implementable in those systems.


4.2 Query Languages

Many query languages for temporal databases have been proposed so far. They weremostly based on well-known database query languages such as SQL, Datalog or Quel.[RP92] summarizes some of them paying special attention to their semantics, whereas[EWK93] present an integrated EER based model and query language. From a practicalpoint of view a language based on SQL which is widely used as standard querying lan-guage is most likely to be used for applications. For that reason almost all researchersworking on temporal querying designedTSQL2 ([Sno95]) together and based it on SQL.Its goal was to unite the advantages of most previously proposed query languages inorder to obtain a consensus language. Requirements derived in this process that arefundamental for all temporal query languages will be briefly presented in section 4.2.1.After the inception of TSQL2 research was working towards an even more practical lan-guage for implementation of temporal DBS. One example of those languages isATSQL([BJ96]) whose features but also drawbacks are discussed in section 4.2.2. We tried toremedy some of those drawbacks in our work presented in section 8.3.1.

4.2.1 Requirements

In many cases data in temporal database systems will be an extension of existing non-temporal data in a traditional database system. Therefore it is important to extend awell-known and widely used query language to handle temporal data. Another reasonis that it is much easier to learn for database experts and is therefore more likely to bewidely accepted. Important requirements are defined formally in [BJ96] and will onlybe explained briefly in this section.

Any statement valid in the traditional query language (i. e. SQL in our case) shouldstill be valid in the temporal query language and yield the same result as the same state-ment on the non-temporal system. This is calledupward compatibility. Queries fallinginto this requirement category are also calledupward compatible. The second impor-tant requirement is that for traditional non-temporal relations that are extended to alsocontain temporal information the database state reflecting current information in thedatabase should be same regardless of the order of applying update operation and tem-poral extension. Queries requiring this level of temporal support are calledtemporalupward compatible.

A further requirement issnapshot reducibility(this term was already used in [Sno87])which ensures that query results on a temporal table will be the same regardless of theorder of applying the query and the projection onto the particular time slice. Queriesthat satisfy this property are calledsequenced. Finally to be able to use time explicitlyin queries, e. g. to answer queries likeFind all employees that got a salary raise in thelast three years, non-sequencedqueries are required. Atemporally complete([BJS95])temporal query language offers this query type which operates on multiple snapshots ofthe database to answer a single query. In order for the user to use time as desired certainfunctions on temporal intervals likecontains andoverlaps should be provided by suchan extension (see e. g. [Ste98] for details).

4.2 Query Languages 51

4.2.2 VTSQL2 - An Example

The language VTSQL2 was developed in [Bei01] and is based on ATSQL ([BJ96]).It provides valid time support for temporal relations with attributes based on the tupletimestamping approach and satisfies most of the requirements mentioned in the preced-ing paragraph.

In particular the language VTSQL2 provides not only temporally complete queries,but also temporally enhanced data manipulation and definition operations. In particu-lar, temporal tables are defined as in regular SQL with the addition of the keywordASVALIDTIME at the end:

CREATE TABLE emp(no NUMBER,name VARCHAR(20),salary NUMBER,supvisno NUMBER) AS VALIDTIME;

Similarly current non-temporal tables can be supplied with temporal support by us-ing anALTER TABLE command combined withADD VALIDTIME. All existing tu-ples are modified to be valid from now on.

In the group of data manipulation operations theINSERT command is fully tem-porally complete. Valid time information is inserted into a relation by adding the validtime interval to the other information to be inserted for each tuple.

INSERT INTO emp(VALUES(12, ’Miller’, 5000, 13)

PERIOD [01.03.2000, 01.07.2000),VALUES(12, ’Miller’, 5500, 13)

PERIOD [01.07.2000, FOREVER));

If a regularINSERT is used on a temporal table the temporal information is set to[now,forever] to be temporally upward compatible. Similarly theUPDATE com-mand operates temporally upward compatible if used in the classical SQL syntax. Thesnapshot reducible version is obtained by adding the keywordVALIDTIME in front ofthe statement:

VALIDTIME UPDATE empSET supvisno = 27 WHERE salary = 5000;

This statement operates only on tuples and for times where theWHERE-clause is sat-isfied. The time of operation may also be restricted additionally to a certain time intervalby specifying that interval after theVALIDTIME keyword. Similar to theUPDATE com-mand different versions of theDELETE command are also provided.

Queries in VTSQL2 are both upward and temporally upward compatible, since reg-ular SQL queries are executed regularly on non-temporal tables and with respect to cur-rent validity on temporal tables. Sequential extension is achieved by using theVALID-TIME keyword:


VALIDTIME SELECT a.name FROM emp a WHERE salary > 5250;

This query retrieves all rows from the temporal table together with their validtimeinformation (the time when the where clause is satisfied) regardless of their validity. Re-sults of sequential queries are always temporal tuples. Since sequential queries operateon one snapshot at a time no coalescing of identical (except for the valid time) resultrows is possible. Consequently results such as the following are possible:

Name VTMiller [01.07.2000, 01.10.2000)Scott [01.04.2000, 01.08.2000)Scott [01.08.2000, 01.11.2000)Scott [01.11.2000, FOREVER)Head [01.09.2000, FOREVER)

If this is not desired by the user an explicit command to coalesce the result intervals canbe issued by using the keywordCOALESCE:

VALIDTIME COALESCE SELECT a.name FROM emp aWHERE salary > 5250;

This query results in the usually more desired tuples:

Name VTMiller [01.07.2000, 01.10.2000)Scott [01.04.2000, FOREVER)Head [01.09.2000, FOREVER)

Finally non-sequential queries are facilitated by adding the keywordNONSEQUEN-CED ahead of a sequential query statement. These queries return a non-temporal result,but time can be used explicitly in these queries since they can operate on different snap-shots at once. The operatorVALIDTIME can be applied to a tuple or single attributeto specify time explicitly in a statement either in the select or in the where clause. Thefollowing query retrieves all employees that received a salary raise:

NONSEQUENCED VALIDTIME SELECT a.name FROM emp a, emp bWHERE a.RWO = b.RWOAND a.salary < b.salaryAND VALIDTIME(a) MEETS VALIDTIME(b);

The implicit attributeRWO is used as a time-invariant object identifier to ensure thatthe same real world object is selected. This is one of the weaknesses of the languagewhich is due to technical implementation details and should be simplified in a futureversion. As stated before theVALIDTIME function may also be used in a select clauseto retrieve time explicitly; by using an additionalDISTINCT an implicit coalescing ofresult intervals is performed. The semantic meaning of this fusion is questionable sincenot only separate real world objects may be fused.


NONSEQUENCED VALIDTIMESELECT DISTINCT a.name, a.salary, VALIDTIME(a.salary)FROM emp a;

All tuples with the same name and salary information will be coalesced into oneresult tuple which may not be the desired mode of operation.

In summary the language VTSQL2 is a promising first step towards temporal en-hancement of ORDBS. Its main features are temporally complete query facilities andalso temporally enhanced DML and DDL statements. But the most important feature isdefinitely that it has been implemented and moreover on top of a commercial ORDBSproviding a simple graphical user interface for end users. Details about the language andimplementation of VTSQL2 can be found in [Bei01].


As described in section 4.1 the orthogonal dimensions of valid time and transaction timecan occur in temporal database applications. In this work index structures for theseapplications are investigated by showing that they are closely related to spatial indexstructures which have been discussed in the previous chapter. A very detailed descriptionof index structures for transaction time databases as well as for bitemporal databases canbe found in [MTT00]. This reference also shows, how valid time indexes can be reducedto interval indexing; examples of indexing structures for intervals include the intervaltree ([Ede83]), segment tree ([Ben77, Meh84]) and the priority search tree ([McC85]).Details how these index structures can be used with external storage can be found in[MTT00] also.

The problem of efficiently managing valid time data can be reduced to the problem ofefficiently managing sets of intervals. This is because the time a fact or object is valid inthe real world can in most cases be described by one or more intervals characterizing therespective time(s). An exception is periodic information which needs to be modeled andstored differently3 and is not considered here. As long as only valid time is consideredthe intervals are one-dimensional intervals. The most important type of query is theone-dimensional stabbing query, which finds all intervals in a set that contain a given querypoint. It is important, because it can be used to solve the more general one-dimensionalinterval management problem, which is to determine all intervals in a set that overlapa given query interval. We will show the relation between the two in the followingparagraph.

4.3.1 One-Dimensional Interval Management Problem

An interval is an ordered pair of the form[l, r ] wherel ≤ r , that specifies the durationof a property.l andr are called the left and right endpoints of the interval, respectively.

3It may e. g. be stored by using base interval and periodicity. It requires a lot of additional applicationlogic to be interpreted correctly.


Intervals in one dimension, i. e. intervals that lie on a line as in our case, havel, r ∈ IR.

Definition (One-Dimensional Dynamic Interval Management Problem):Given a setS of one-dimensional intervals and a query intervalQ = [q1, q2], find allintervals inS that intersect intervalQ.

This problem has been dealt with in computational geometry for quite a while andtherefore several proposals for solutions exist. Since in a database environment datastored in the databases usually changes over time, the problem has to be solved effi-ciently for changing setsS which is why we call the problem dynamic.

To solve the one-dimensional dynamic interval management problem we need to findall intervalsI that belong to one of the four groups illustrated in figure 4.3.

Time-Axisq1 q2

Q

Type 1

Type 2

Type 3

Type 4

Query

i1 i2

i1 i2

i1 i2

i1 i2

Figure 4.3:Answers to the one-dimensional dynamic interval management problem

The intervals of types 1 and 2 have the common characteristic that their left end-point is contained in the query intervalQ. Therefore these intervals can be retrievedefficiently using a regular B+- or B∗-Tree on the left endpoints of all intervals in the setS. Since database systems provide this type of index, we can use these standard indexesfor finding intervals of types 1 and 2. For intervals of types 3 and 4 we have the com-mon characteristic that these intervals contain the left endpointq1 of the query intervalQ. Finding such intervals requires efficiently answering the one-dimensional stabbingquery.

Definition (One-Dimensional Stabbing Query):Given a setS of one-dimensional intervals and a query pointq1, find all intervals inSthat containq1.

To efficiently retrieve all intervals that answer the one-dimensional stabbing querywe cannot use standard database index structures. As described in [MTT00] there exist


three widely used main memory index structures for this problem: the Interval Tree([Ede83]), the Segment Tree ([Meh84]) and the Priority Search Tree ([McC85]). Allthese structures answer the stabbing query usingO(log2 n + t) comparisons, if there aret answers to the query. They support dynamic updates ofS in O(log2 n) comparisonsper update. Therefore these are obviously time optimal solutions. The Segment TreeneedsO(n log2 n) space which is suboptimal as the other two only needO(n) space.

Since we are not dealing with main memory applications but rather with databaseapplications we need to use efficient external memory index structures. It is shown in[MTT00] that a worst-case optimal external memory solution to the one-dimensionalstabbing query usesO( n

B ) space and takesO(logB n) time per update andO(logB n +tB ) time to answer the query. In the literature many almost optimal index structureswere proposed (e. g. [AV96, KS91, BG90, LT98]). These are mostly of theoreticalinterest since they try to achieve a good worst-case performance and moreover are verycomplex to implement. They will not be considered any further in this work, sincespatial indexing has already shown that good average case behavior is more important.Detailed references may be found in [MTT00].

For good average case behavior, which is most important for many applications,methods based on multidimensional access methods (see section 3.3) should be pre-ferred ([MTT00]). These may include simply using a traditional R-Tree variant on two-dimensional point data (transforming intervals into two dimensions by using one dimen-sion for the left and one for the right endpoint) or using the Segment R-Tree ([KS91])which is also an R-Tree variant with special features for intervals. Another approach is touse the Time-Polygon Index ([SOL94]), which uses special shapes in the R-Tree nodesand uses left endpoint and interval length as the two dimensions. Among other tempo-ral index structures in the literature that are close to implementation level are RI-Tree([KPS00]), GR-Tree ([BJSS98]) and 4R-Tree ([KTF98]). Since in this work temporalinformation will be considered combined with domain-specific information those puretemporal indexes are not directly applicable and are thus not considered in more de-tail. For details on pure temporal indexes the original articles or a more comprehensivepresentation such as [MTT00] should be consulted.

After all standard multidimensional access methods like R-Trees and variants seemto be a good choice for temporal data also. Despite good average case behavior thisis also due to the easy extensibility features of these methods for more dimensions.This extension can be either performed by introducing a second temporal dimensionfor transaction time as in section 4.3.2, or by combining the temporal information withdomain-specific (see section 7.4.2) or spatial data (see section 7.4.3).

4.3.2 Two-Dimensional Interval Management Problem

In the case of bitemporal information each fact is timestamped not only by one but ratherby two intervals4. Since contents of the valid time information is not dependent on

4The simplification of recording time as interval is used here again. The problems associated with thissimplification and remedies were discussed in section 4.1 already.


the contents of the transaction time information we obtain two orthogonal dimensions.Obviously bitemporal information can be recorded as two-dimensional rectangle in thevalid-transaction timeplane. Thus the problem to manage bitemporal information canbe reduced to the problem of managing two-dimensional rectangles. This problem hasalready been covered in the chapter on spatial databases. This will also show later in thiswork when modeling issues for bitemporal data are discussed and lead to spatial indexstructures to be used for bitemporal data.

An important comment is necessary at this point: transaction time has different char-acteristics than valid time in some aspects. One is that transaction time always growssince no previous database states may be changed. Moreover in bitemporal databasesfacts are never deleted from the database since all information that was ever presentin the database is kept. In that respect transaction time databases are append-onlydatabases. Finally the current database state is very important and the transaction timeinformationUC has to be stored in a special way to be able to retrieve the current stateefficiently. Also updates are frequent on the current fact since they have to be updatedwith every change in transaction time. All these specialties have led to special treatmentof transaction time in some proposals on temporal indexes (e. g. [CDI+97, BJSS98]). Inthis work a treatment at the physical or application level is preferred: one could e. g. en-codeUC asnow plus two chronons and then fire a trigger to update this informationwhenever transaction time advances. This way an efficient indexing of these informa-tion is also achieved.

As final remark in this introductory section note that most work mentioned above isfocused on the use of tuple timestamping as basic model. In the later sections of thiswork it will be explained that attribute timestamping should be preferred for the givenapplications. In particular section 7.4.2 will present the advantages in logical modelingif using attribute timestamping and section 8.3.1 shows how this is integrated easily andsupported well in the physical model.

Chapter 5

Spatio-Temporal Databases

Even though the field of spatio-temporal database systems (STDBS) has not as longbeen worked on as spatial or temporal databases there had been substantial researchon this topic in the late 1980s and early 1990s. This work is comprehensively gath-ered in [ATSS93]. After that there has been a gap of about five years in which notmany research on STDBS has been published. In recent years it has become a veryactive and challenging field where many contributions have been made which is due toa growing interest in such applications ([ACN+99]). There have also been workshopsexclusively devoted to this area (e. g. [BJS99]). A comprehensive survey including mostconcepts developed previously can be found in [AR99], whereas [PFG00] overviews thefield more analytically and also gives ideas for future research. Most of the research inSTDBS was influenced by either spatial or temporal databases1, mostly not both though.Therefore the structure of this chapter is similar to the previous two: in the first sectionwe describe modeling and querying of spatio-temporal data as well as general require-ments and a structure for the whole area. In section 5.2 we describe index structuresfor querying spatio-temporal data; a new efficient structure to be used in ORDBMS forspatio-temporal data is proposed and evaluated extensively in section 8.2.3.

5.1 Requirements and Data Models

In general many structurally different applications have been termed spatio-temporalover time. Firstly in this work spatio-temporal data will be data that changes its spa-tial properties, and possibly domain-specific information as well, over time. For manyapplications that use spatial as well as temporal data but independently of each otherthe termspatio-temporalwas used. Since we already devoted sections to spatial andtemporal data and applications of those types can be realized by simply putting togethermaterial from the last to sections, we will only consider applications with time-varyingspatial features as spatio-temporal.

1There are also other approaches to STDBS. [CR99, GRS98] for instance extend the constraintdatabase approach to STDBS.

58 Spatio-Temporal Databases

Two important groups of these applications are formed by differentiating accordingto the temporal rate of change of the spatial features. If change is continuous over timethe area is usually calledmoving objects. While the requirement for continuous changeposes many challenges on modeling as well as index structures and is commercially veryinteresting in the time of mobile electronic equipment not all applications require or areappropriately modeled by moving objects ([PFG00]). For many applications we need tomodel spatial features that change their spatial properties at discrete points in time. Thisassumption should enable easier integration of knowledge from temporal databases sincethese usually also assume discrete time steps (as represented bychronons). Thereforemost of the work presented in this thesis will not deal with moving objects, but ratherwith spatial features that change in discrete steps.

Work on representation and querying of moving objects was presented e. g. by[GBEJL+00]. Despite many real-world applications for spatio-temporal data not manywork on applications has yet been published (one of the few exceptions is [ACN+99]).That is probably due to the emerging character of the field. Since application data tobe used for testing purposes is rarely available, researchers need to use synthetic datasets. These can be generated in many different facets by using algorithms presented in[TSN99].

5.1.1 Modeling Spatio-Temporal Data

Several spatio-temporal data models have been proposed in the literature. Very few ofthose models have actually been implemented in database systems ([PFG00]). Amongthose [TH98] present a logical spatio-temporal data model with implementation ideason top of relational database systems. The model is based on the requirements and nota-tion specifications in [PT98]. While even bitemporal data is supported, discrete changesin time are assumed. Non-spatial temporal data is also supported. Disadvantages ofthe model are the missing temporal extent of geographic objects (it uses only snap-shots), the missing use of object-relational concepts as well as the use of layers whichleads to restrictions in the modeling flexibility. This work has been further extendedto the conceptual level with the proposition ofSTER in [TJ99]. This is an extensionof the entity-relationship model for spatio-temporal data. Certain new annotation con-structs are included and illustrated by their usage for a concrete application. As abovebitemporal data is supported in discrete steps only and no physical model was proposed.Nevertheless these models show the importance of discretely changing spatio-temporalobjects in practical applications. Moreover it becomes obvious that an integration ofspatio-temporal components inall steps of the modeling process is required.

In [EGSV98] on the other hand an abstract as well as a discrete implementationmodel for moving objects is presented. The basic idea how to introduce time, whichalso works for non-spatial continuously changing data, is to e. g. use a typempointinstead of a basicpoint. mpoint is defined to be a function from the time domain into thebasic typepoint. The same technique can also be applied to integer numbers to obtaina data type for temporally changing integers. The great advantage of these continuous

5.1 Requirements and Data Models 59

models is that besides support for standard spatio-temporal operators like

at : mpoint × time → point

complex operators like

mdistance : mpoint × mpoint → mreal

which have continuously changing results themselves are supported. Since other opera-tors like

trajectory : mpoint → line

are also provided this model once implemented should be valuable for applicationswith moving objects. A comprehensive description of this work can also be found in[GBEJL+00]. This approach has been used for spatio-temporal partitions (which aresimilar to temporally changing maps) in [ES99]. In that article no individual objects withspatio-temporal features are considered, but rather spatial layers or maps that changecontinuously over time. Classical spatial layer operations likeoverlay or clipping aretemporally enhanced and also temporal operations liketemporal aggregationare intro-duced. This model is only conceptual and has not been translated into an implementablemodel to date. Even though the models in this paragraph are designed for moving ob-jects, which will not be the subject of this work, the use of temporal versions of datatypes for temporal information will be employed in this work as well. Also the formallyexact definition of new operations on these types and the treatment of the different stagesof the modeling process can be used as good examples for other work.

A completely different approach which is worth mentioning since it is closely relatedto established results from spatialand temporal databases was undertaken in [Wor94].It is similar to the realm approach ([GS95]) for spatial data; spatio-temporal objectsare defined based on simplicial complexes ([Poi53]) in the spatial component which areannotated with bitemporal elements. The model presented is a conceptual model withmany proposed spatial, temporal as well as spatio-temporal operators. It has never beentranslated into a logical or physical model unfortunately, since simplicial complexesare only rarely used in practical spatial databases. The method of combining previousresearch results from spatial and temporal database research may be taken as role modelthough.

5.1.2 Querying Spatio-Temporal Data

One of the first proposals for a spatio-temporal query language can be found in [TJS98].In that article a concept is presented that unifies spatial, temporal and spatio-temporalquerying on a conceptual level. The language is conceptual since it facilitates querieswith slices, ranges, elements or values in each dimension. Traditional temporal querylanguages can be embedded into the proposed notation. Assume a spatio-temporalprecipitation table with domain attributeskind and amount in addition to a two-dimensional spatial attribute and a bitemporal attribute is given. We could ask for this


year’s rainfall history in Europe and North America by a query of kindS//E/E//R/S.That means we ask for a particular value of the domain attribute (firstS) over a set ofspatial intervals (E) in both dimensions over a valid time range (R) at the current trans-action time (secondS). While this is not suited for implementation the proposal is veryinteresting for its analytical nature. In the paper several different query types togetherwith examples are presented. The list of query types may be used as a reference forcomparisons of spatio-temporal query languages.

An extension of [EGSV98] to querying which focuses on moving objects can befound in [EGSV99]. It focuses on the definition of the operators required to querymoving objects and then uses standard SQL (SQL-99) together with these functions asuser-defined functions for querying. The great advantage is, that by using SQL anduser-defined functions, which are available in object-relational databases (cf. section2.3), the implementation should be fairly simple, although several difficulties which arealso discussed have to be overcome (see also [FGNS00]). An example of such a queryis:

SELECT a.flightno, b.flightnoFROM flight a, flight bWHERE a.flightno <> b.flightnoAND minvalue(mdistance(a.route, b.route)) < 500;

This query retrieves all flights that came closer than 500 meters to each other. It usesspatio-temporal functionmdistance to compute the distance depending on time incontinuous temporal space and thenminvalue to project a timestamped value to thevalue itself. This work is also described in [GBEJL+00] comprehensively together withmodeling issues.

The approach to define new data types for spatio-temporal objects together with therequired functions on these types and then use these functions as user-defined functionsin ORDBMS seems promising. It has the advantage of using a widely accepted querylanguage with relatively few extensions. Nevertheless the implementation of the func-tions and operators may be a difficult task in itself and also efficient query support isrequired. Therefore the next section presents index structures which are basis for effi-cient querying.


Since not even many data models or query requirements for spatio-temporal data havebeen published yet it is no surprise that even fewer index structures can be found. That isbecause one needs to define operators first that can be supported by indexes in queryingthereafter. Nevertheless several operators extended from spatial and temporal databasesto the spatio-temporal domain will need index support. That are the areas where theindex structures in this section work on.

Extensions of the R-Tree (cf. section 3.3.2) for discretely moving points are pre-sented briefly and compared in [NST99]. In particular they operate on two-dimensional


points with valid time attributes that change their location at distinct points in time. Thestructures discussed are a regular R-Tree in three dimensions, the 2+3 R-Tree and theHR-Tree. The 2+3 R-Tree uses a two-dimensional R-Tree for the current position of thepoints and a three-dimensional R-Tree for the historical information. The drawback ofhaving to search in two trees to find answers is remedied in the HR-Tree (proposed forspatial data in [KF94]) where certain paths for points that have changed their locationare multiplexed. This structure is found to be optimal for the domain specified above.

A framework for a generic spatio-temporal index is presented in [RKS99]. This R-Tree based index was implemented in the CONCERT system and is generic because itdoes not depend on particular data types. It rather works on all types for which twoparticular methods (overlap and split) are provided. It is a very interesting approachbecause it is universal with regard to data types, which makes it similar to generalizedsearch trees (GiST from [HNP95b]; see also section 8.2.2). The main difference isthat GiST are more general whereas this approach is specific for spatio-temporal data.Nevertheless further research is required in order to be able to implement it on top ofan ORDBS. Moreover no performance results have been reported yet and it is thereforedifficult to judge its efficiency.

Finally [KGT99a] presents index structures for the nearest neighbor query on con-tinuously moving points. They evaluate the performance of the different indexes oninstantaneous as well as interval nearest neighbor queries. This is currently the onlyapproach published that stores a motion function of the moving objects instead of stor-ing their concrete position at many time instants. This is a very interesting direction forfuture research on highly mobile datasets. The performance results show that dependingon the type of query (instantaneous or interval) either a B+-Tree variant or the hB-Treeperform best. Since this work does not focus on moving objects the reader is referred to[KGT99a] for details.

In summary no index structure for temporally changing extended objects (regions)has been proposed. Therefore in section 8.2.3 we will present a family of those, operat-ing on extended two-dimensional objects that change their position and shape at discretepoints in time. A performance evaluation will also be presented together with sugges-tions, which structure to use in which context.

Part II

Data Modeling with Object-RelationalDatabases

Chapter 6

General Issues and Case Studies

In this chapter we will explain the data modeling stage on the way to use ORDBMSfor spatial, temporal or spatio-temporal applications (STOSTA for short). After somegeneral remarks on object-oriented modeling which can also be applied to STOSTA themain part of this chapter will feature two case studies: the first describes how to modelspatial data from the German cartographic-topographic information system (ATKIS®)to be able to use it in a database system. The most interesting aspects are integration ofelevation data into two-dimensional base data and the usage of object-relational featuresto store geometric data. The second case study shows how to build the foundation forscientific database applications in the spatio-temporal domain as exemplified by sampledata and methods from soil protection in physical geography.

6.1 General Issues

The de-facto standard in information system modeling nowadays is theUnifiedModelingLanguage (UML; [RJB99]). It is important to note though that UML itself has to beunderstood as a toolbox containing lots of predefined types of presentations (calleddia-grams) to be used in modeling. UML alone does by no means tell how to model aninformation system.

We suggest to follow the process probably most closely related to UML1 which iscalledunified software development processand is described in [JBR99] in detail. Thisprocess is so general that it can also be used for STOSTA. Since database applicationstend to be data-centered the central product of system analysis and design is the classdiagram modeling all classes of real world objects to be stored in the database. More-over since individual scientific applications tend to serve only very few use cases (andmoreover only very few actors are involved in the use cases) the use case model neednot be graphically specified.

After that the classic techniques on how to build a database schema from ER dia-grams have to be adapted and employed to create an object-relational database schema

1This is probably due to the fact that the authors were the same for both the language and processspecifications.

66 General Issues and Case Studies

ConceptualModeling

LogicalModeling

PhysicalModeling

Phase RequirementsAnalysis

ProductInput to all

other phases

Class Diagram

(ER diagram)

Relations, Functions,

Datatypes, Operators

Definition in database language

Access Structures

Figure 6.1:Steps in data modeling for object-relational DBMS

from the class diagram. We follow similar steps as compared to traditional databasedesign which is based on ER or EER diagrams. For this task the conceptual model istranslated into a logical model which is dependent on the implementation paradigm usedbut not on the particular system. This model is usually specified in terms of databaserelations extended by type and method specifications as well as function and operatorspecifications for querying in object-relational systems. In spatio-temporal applicationsthe kinds of spatio-temporal features supported and the real world object classes usingthem would be defined.

Finally the logical model is translated into a physical model. This model is nowsystem dependent and is specified by installation scripts as well as routines for dataimport and optimization tools. In the context of spatial data e. g. data types for spatialfeatures, supported spatial operators and spatial indexes for query optimization wouldbe declared. Since installation scripts tend to be lengthy and difficult to read mostlyonly the important changes and additions compared to the logical model are explicitlystated and described in regular writing. The process of database design described hereis illustrated in figure 6.1.

The above steps of object-relational data modeling for STOSTA are now illustratedby means of two case studies. One is a pure spatial application that needs to build onexisting data formats whereas the second is spatio-temporal and absolutely free in designsince it is the first model of that domain.

6.2 Case Study 1: ATKIS

An overview of the first case study on topographic data presented in this section hadthe objective to develop a database model for these datasets. Since focus was more onusing the standard spatial extension of a database system than to develop a new datacartridge, the later stages of the development process, especially physical design, wereless detailed and elaborate as usual. Still as a motivational example this work led toimportant insights that were used in the physical optimization techniques presented insection 8. A detailed description of this case study can be found in the original work([Fal00, KFL00]).

6.2 Case Study 1: ATKIS 67

6.2.1 Analysis

In Germany cartographic-topographic base data has been recorded in the so-calledamtli-chestopographisch-kartographischesInformationssystem(ATKIS; official topographic-cartographic information system). This is not aphysicalinformation system, but rathera collection of formats, rules, and descriptions that define how landscape and terrain dataare to be recorded. Attributes of the landscape objects to be recorded and their possiblevalues as well aslayers to which these objects belong are standardized in the ATKISdocumentation ([KS95]). Moreover for certain scales a catalog of standard object kindsis defined which states the objects to be recorded at that scale.

Extracts from these data sets which are maintained by the sixteen federal states ofGermany (there is also a large scale extract which is maintained federally) can be orderedat governmental agencies in these states. Data is usually exchanged in the so-calledEDBS2 (einheitliche Datenbankschnittstelle; unified database interface) format: thisformat has been used for several years and was developed in order to minimize file sizefor exchange. Therefore one of its main goals was minimal redundancy in contents.To record agricultural areas, for instance, the borderline geometry between the areas isstored only once, and it carries reference attributes telling which objects are to the leftand right of the line. Since the original ATKIS landscape model is object-structureditself, it is difficult to restore the landscape objects from an EDBS data set.

Another interesting aspect of ATKIS is, that the landscape model itself is almostcompletely separated from its presentation: there is an additionalsignature catalogdefining rules on how to generate standard maps from the ATKIS landscape model data.One could e. g. define different rules on how to create a large scale map and a small scalemap using the same basic landscape data. In section 9.2.1 concepts will be presentedthat show how ATKIS data may be exchanged usingXML and ideas howXSL can thenbe used to generate the representation as an implementation of the signature catalog.

Despite its conceptual advantages, obtaining and using ATKIS data is very ineffi-cient. One has to define exactly which data extracts from a fixed set of possible rangesare desired in cooperation with the supplier, order the desired data sets and then oper-ate on the file-based data. If the data is to be used in a different format, a parser andconverter has to be written. For this task one needs to understand the grammar of theexchanged files in addition to knowledge about the ATKIS format itself.

In order to increase efficiency in the use of ATKIS data we propose a databaseschema for an object-relational database system in section 6.2.3. It is influenced by arelational schema for ATKIS data presented in [NK95, Koc94]. This enables easy main-tenance and extract generation on the supplier side as well as almost object-orientedusage of data on the customer side with full database functionality, especially supportfor querying. For portability reasons we have implemented an import routine from file-based EDBS data into the proposed database schema.

2There are also other formats used for interchange, but they are semantically weaker in the sense thatthey contain only geometric shapes and no thematic information


6.2.2 Conceptual Data Model

To store ATKIS digital landscape data in an object-relational database system we needto create a database schema for all objects to be stored. We present a UML class diagramof all relevant classes in figure 6.2. We include all ATKIS landscape objects from theoriginal standard ([KS95]) together with their attributes, names and layers. We alsoinclude complex objects and overpasses.

ObjectNameType : VARCHAR2(2)Value : VARCHAR2(31)Positions : Geo_Table_Type

LayerID : VARCHAR2(4)Name : VARCHAR2(100)

ObjectKindID : VARCHAR2(4)Name : VARCHAR2(100)

ObjectGeometryGeometry : SDO_GEOMETRY

0..* 0..*0..*

is above

0..*

ObjectAttribute

AttributeValueValue : VARCHAR2(4)Name : VARCHAR2(100)

1..1

0..*

1..1

0..*

is described by

AttributeTypeAttributeTypeNumber : VARCHAR2(4)Name : VARCHAR2(100)

1..1

0..*

1..1

0..*

is of

1..*1..1 1..*1..1

is explained by

I

ObjectPartObjectPartID : VARCHAR2(3)

0..10..1 0..10..1

defines shape of

0..*0..1 0..*0..1

posesses

ATKISObjectObjID : VARCHAR2(7)ModelType : VARCHAR2(2)BuildDate : DateCoordinates : Point

0..*

1..1

0..*

1..1

denotes

1..1 0..*1..1 0..*

belongs to

1..10..* 1..10..*

belongs to

0..*0..* 0..*

consists of

0..*

0..1

0..1

0..1

0..1

defines 2D shape of

0..*

1..1

0..*

1..1

posesses

1..1

0..*

is part of

1..1

0..*

Figure 6.2:Conceptual data model for ATKIS

Central to this model is the classATKISObject3. It represents any real-world land-scape object to be recorded in ATKIS. Standardized attributes ofATKISObject are mod-eled by associations with the classes modeling this standard information: there is exactly

3We try to emphasize the difference between objects in the object-oriented sense (instance of a class)and in the ATKIS sense (some landscape object of the real world) by denoting the latterATKISObject.But in some cases where the context should be clear we omit the prefix ATKIS even for objects in theATKIS sense.


oneObjectKind for eachATKISObject (1 : n-associationbelongs to in figure 6.2). Sim-ilarly an association betweenATKISObject andLayer is defined. Objects can optionallycarry different types of names which is modeled by the associationdenotes. For each ofthese names one can define one or more positions where these names should appear ona map, e. g. an interstate highway number will appear on more than one place on a map.

Complex objects (objects that consist of several objects themselves, e. g. a railroadnetwork consisting of railroad tracks) are modeled by then : m-associationconsists of.Another way of structuring ATKIS objects is by using object parts. The difference be-tween the two is of thematical nature and is described in detail in [KS95]. The geometricshape of objects can be attached to the object itself or to its object parts; therefore weintroduceObjectGeometry as a class of its own. I. e. for a particular object either theassociation betweenObjectGeometry andATKISObject or the one toObjectPart is in-stantiated. At the time of the case study only non-temporal geometries were considered,i. e. the same object could not have different geometries at different times. Data wasconsidered to reflect the current situation4. A recently developed successor of ATKIS(cf. [AdV02]) requires management of geometries changing over time. Section 7.2.4will show how this requirement can be satisfied in the conceptual model.

Similar to the modeling of complex objects we model overpass information by then : m-associationis above; this association is betweenObjectGeometry and itself sincethis information is attached to the two-dimensional geometric information of objects orobject parts.

Finally information about the ATKIS objects (calledattributes5) is modeled by classObjectAttribute and associationpossesses. Since in the new AAA model ([AdV02]) at-tributes and also object names are required to be versioned, i. e. time-dependent, section7.2.4 shows how this can be incorporated into the model.

The classesAttributeType and AttributeValue are used to record standardized at-tributes and their values (of course, some types like the width of a street do not havestandardized values). The associationsis of and is described by ensure that attributesfor ATKIS-objects observe these standards. In summary the classesLayer, ObjectKind,AttributeType andAttributeValue form the standardized ATKIS catalogue, whereas theother classes correspond to a particular dataset.

6.2.3 Logical Model: Object-Relational Database Schema

The conceptual data model in the last subsection has been implemented in the object-relational database management system Oracle8i and 9i 6 using the extension for spa-tial data (Oracle Spatial). The transformation of the conceptual model into an object-relational model follows standard rules for such transformations (see e. g. [SB99] for

4In other words all information was assumed to have the timestampnow.5The termattribute is also overloaded in this section. It appears in two contexts but with similar

meanings: an attribute in the object-oriented sense is a property of an instance of a class, an ATKISattribute is a standardized property of an ATKIS landscape object.

6We have chosen Oracle for availability reasons not based on a comparison with other systems.


details). The database schema we obtain is shown in figure 6.3.

Layer (LayerNo:VARCHAR2(3), Name:VARCHAR2(100))ObjectKind (ObjKindNo:VARCHAR2(4), Name:VARCHAR2(100))AttributeType (AttTypeNo:VARCHAR2(4), Name:VARCHAR2(100))AttributeValue (TypeNo→AttributeType, Value:VARCHAR2(7),

Name:VARCHAR2(100))

ATKISObject (ObjNo:VARCHAR2(7), LayerNo→Layer,ObjectKindNo→ObjectKind, Coordinate:PointType,Actual:VARCHAR2(2), ModType:VARCHAR2(2), BuildDate:Date)

ObjectGeometry (ObjectNo→ATKISObject, ObjectPartNo:VARCHAR2(7),Geometry:SDOGEOMETRY)

Overpass (TopNo→ATKISObject, TopPartNo:VARCHAR2(3),DownNo→ATKISObject, DownPartNo:VARCHAR2(3))

ComplexObject (BigNo→ATKISObject, SmallNo→ATKISObject)ObjectName (ObjectNo→ATKISObject, TypeNo:VARCHAR2(2),

Value:VARCHAR2(31), Geo:GEOTABLE TYPE)ObjectAttribute (ObjectNo→ATKISObject, ObjectPartNo:VARCHAR2(3),

Type→AttributeType, Value→AttributeValue)

Figure 6.3:Object-relational database schema for ATKIS

The classes for standardized information are directly transformed into relations ofthe same name. The association betweenAttributeValue andAttributeType is realized bya foreign key constraint.

The geometry information in classObjectGeometry is mapped to relationObjectGe-ometry. It can be stored in one single attribute of typeSDO GEOMETRY which is thetype for spatial data supplied by Oracle8i Spatial (it follows the recommendation of theOpenGIS consortium, [OGC99a]).

All other information about a single object can be found in relationsATKISObject,ObjectName,andObjectAttribute. The classObjectName carries an attribute of a com-plex type; this attribute is modeled in the object-relational schema by a user-defined type(GeoTableType). This type is defined to be a table of positions (anested table typeinobject-relational terminology) and enables storing multiple positions for each name.

The relationObjectAttributemodels the class of the same name. It has no attributesof its own (except for theObjectPartNo) but rather references the standardized attributesand values via foreign keys.

Finally n : m-associationsconsists of andabove are modeled by their own relations:ComplexObjectandOverpass, which denote the thematic information carried by theseassociations. In order to ensure data integrity we again have used foreign keys (an objectoverpassing another object must be present in the data set).


In this schema we used complex data types for columns of database tables. Thesetypes do not only combine different attributes, but also provide type-specific methods.This is close to object-oriented data modeling where real-world objects described bytheir attributes are integrated with their behavior described by methods. Moreover weused anested table typefor storing the positions of object names on a map, since theircardinality is not known in advance. These features are specific to object-relationaldatabases and could not be used in purely relational systems. The use of other object-oriented concepts such as inheritance or polymorphism was not considered since theyare not supported by the current version Oracle 9.0.1 of the object-relational databasesystem7; thus the schema is stillobject-relationaland notobject-oriented. In fact, thenecessity and benefits of the use of these concepts at this stage is not immediately obvi-ous; modeling without more object-oriented concepts seems to be fully sufficient.

6.2.4 Physical Design

Because of the focus on the earlier design stages in this study the physical design consistsmainly of introducing spatial indexes for efficient data management. Since data typesof Oracle Spatial were used the indexes provided with that cartridge can also be usedfor physical optimization; Oracle Spatial is described in section 8.1 in more detail. DataTypes differing from the logical model were not introduced on the physical level.

With the newly introduced requirements for adding temporal information to ATKISin the AAA model the standard database features are not sufficient for an appropriatephysical model anymore. Therefore section 8 and especially 8.2 through 8.4 presentdetailed research on efficiently modeling such temporally dependent data types by usingobject-relational features such as user-defined types and query optimization.

6.2.5 Integration of Elevation Information

Elevation information from the digital terrain model makes the two-dimensional land-scape model of ATKIS even more valuable. Therefore it has to be integrated in theconceptual as well as logical model. A concept for the conceptual integration can befound in figure 6.4.

The elevation information, which is measured or computed at different scales in theformat of a Gauss-Kruger coordinate attached with the elevation of the earth’s surface atthat point, is modeled by a classElevationModelPoint with attributes for the coordinateand elevation values. Moreover these points can be classified by their origin (e. g. anexactly measured point on a road or a photogrammetric control point) which is modeledby classElevationModelPointType. The integration into the conceptual model of theATKIS data is realized by then : m-associationdescribes elevation of which associateselevation model points with landscape model objects and vice versa. This is a spatialassociation since spatial information of objects of the two associated classes inherently

7Due to the closeness of logical and physical design in this study the logical design is currently system-dependent.


ObjectGeometryGeometry : SDO_GEOMETRY

ElevationModelPointTypeIDPointTypeInterpolated

ElevationModelPointGaussKrugerCoordinateHeight 1..*1..* 1..*1..*

describes elevation of

1..1

1..*

1..1

1..*

is classified by

I

Figure 6.4:Concept for integration of elevation data

contains the instances of this association. Therefore it need not be stored explicitly butcan rather be computed on demand. Explicit storing may nevertheless be considered foroptimization reasons on the physical level as discussed in section 8.3.4.

In the logical model two relations, one for each of the two new classes, will be added.Their definition follows directly from the conceptual model and is therefore omitted.

6.3 Case Study 2: Physical Geography

In this section as another example of a STOSTA the database design part for sampleapplications in physical geography is explained. The classical stages of requirementsanalysis, conceptual, logical and finally physical design were followed and will be pre-sented in the sequel. In particular, following the requirement analysis, external schematafor each layer were designed before being integrated into the conceptual model. Moredetails on this case study can be found in the original documentation ([Pfa00, PKL00]).

6.3.1 Requirements Analysis

Requirements for the application and database system were analyzed by studying litera-ture, file formats and by talking to application experts. Details can be found in [PKL00].As a result of the analysis four different layers (for an explanation of layers see page 29)of information were identified:soil, land use, climateandrelief.

The layersoil in general consists of multiple horizons8 (cf. figure 6.5). For a com-plete description of soil at a certain measuring location information about the soil itselfas well as about the horizons is required. This information is gathered by a vertical

8In terminology of the Nieders¨achsisches Landesamt f¨ur Bodenkunde horizons are also called layerswhich should not be confused with information layers. Therefore we will use horizon in the sequel.

6.3 Case Study 2: Physical Geography 73

upper depth

upper depth

lower depth

lower depth

lower depthupper depth

horizon B

horizon C

horizon A

range of validity

dataof general soil

.

.

Figure 6.5:Soil formed by different horizons

broach up to two meters depth. The soil profile gathered this way contains informationabout every horizon, the relative positions of the these horizons as well as general soil in-formation. Thus information about a soil profile can be divided in general soil data andhorizon-related soil data. General soil data is valid for a measuring location whereashorizon-related data is only valid for a certain horizon at that measuring location.

The layerland use describes how soil in the examined region is used. The differentareas of use are described by non-overlapping polygons. This layer also has a temporalcomponent since land use changes at least once a year. Because information of onlyfew years is required, the temporal information can be modeled as a simple attribute andneed not be treated by methods of a temporal DBS in this application.

The climate layer models climate information that is gathered at discrete points intime over long time periods. Since the examination region is very small climate is as-sumed to be constant over the region. Therefore climate is a non-spatial but temporalinformation. For availability reasons and since database experienced users will use thefirst version of these applications, a classical relational valid time representation is suf-ficient in this version.

The layerrelief describes elevation, slope and exposition of the examined regionat different points. These points are distributed equally over the region and slope andexposition were derived from the elevation information. Since values are gathered atdiscrete points in space, this layer is described by raster information spatially which isfundamentally different from the other layers. This will produce several obstacles in thedesign of domain-specific methods later on.

During the analysis, requirements formethods andoperations were also consideredsince they play an important role in database design. Section 9.3.1 explains as an exam-ple the required methods to compute theinfiltration water ratewhich was the ultimategoal of this case study. The problem of integrating raster and vector based informationwill be considered there also. In brief we can say that operators are required that are


capable of retrieving objects with certain spatial positionand certain thematic attributevalues at the same time. Therefore it is important to provide user-defined indexes thatare capable of indexing multiple (spatial/temporal and thematic) dimensions together.Ideas and performance studies for such index structures will be explained in sections 8.2and 8.3 in detail.

6.3.2 External Database Schemata

In this section external views for the different information layers will be developed andexplained in more detail based on the requirements analysis of the preceding section.

Layer Soil

The external schema of layersoil consists of five classes (cf. figure 6.6). Attributesof classBodenStandort (soil location) define the measuring location in Gauss-Krugercoordinates as well as the homogeneous two-dimensional area associated with the mea-suring location (Geometrie(geometry)). At each measuring location exactly one soilprofile is recorded but the same soil profile may exist at different locations. Thereforethe association betweenBodenProfil (soil profile) andBodenStandort is functional.

An instance ofBodenProfil is defined by an instance ofBoden (soil) and one or moreinstances ofBodenHorizont (soil horizon). All attributes of objects ofBodenProfil arederived from objects of those two classes by the methods listed. Different soil profilesmay have the same general soil data, i. e. they reference the same instance ofBoden, butin that case they differ either in the horizons associated with the soil profile or in the posi-tion of the different horizons within the soil. Information about the position of horizonsis modeled as attribute of the association classLage (tier) between classesBodenProfilandBodenHorizont. This class possesses attributesOtief (upper depth) andUtief (lowerdepth) modeling the upper and lower depth of the horizon within soil, respectively.

A single soil profile is associated with one or more horizons and on the contrary asingle horizon may occur in different soil profiles. Therefore the association betweenclassesBodenProfil andBodenHorizont is n : m. Moreover each instance ofBoden-Profil is associated with one instance ofBoden. A single instance ofBoden may onthe other hand occur in different soil profiles. Thus the association betweenBoden andBodenProfil is functional or 1: n.

Layer Land Use

The schema for layerland use (cf. figure 6.7) consists of only two classes. ClassNutzungStandort (land use location) models similar toBodenStandort in layer soil theposition of the measuring location and the homogeneous area associated with this loca-tion. ClassNutzung (land use) possesses two attributes,NutzNr (use type no) andJahr(year). The temporal information is modeled simply as a tuple timestamp here. Thisworks in this case since no attributes with temporally independent changes occur. Formore complex objects this assumption will not hold anymore and attribute timestamping


Figure 6.6:External schema of layersoil

should be used. Section 7.4.2 will introduce the required user-defined types to imple-ment attribute timestamping in the object-relational paradigm; also section 8.3.1 willshow how user-defined indexing can be used to efficiently query these attributes. Con-sequently an instance ofNutzung reflects the use of a certain piece of land in the givenyear. Even though we may have the same use at different measuring locations differentinstances ofNutzung will be associated to each location for different measurements indifferent years. Therefore ann : m-association betweenNutzungStandort andNutzunghas been used.

Figure 6.7:External schema of layerland use


Layer Climate

The schema for layerclimate (figure 6.8) is similarly built of two classes. ClassKli-maStandort (climate location) models the position of the measuring location as well asthe geometry of the area9 whose climate is described by this instance. Climate informa-tion itself is modeled as attributes of classKlimaWerte (climate values):Datum(date)describes the day of a single measurement, whereasTemperatur(temperature),Nieder-schlag(precipitation) andRFeuchte(relative humidity) are the recorded measurements.Since different measurements are recorded for each location and on the other hand asingle measurement may be valid at least theoretically for different locations (imaginethat at one location on a certain day no measurement was conducted and therefore in-formation of a nearby location is used) the association is of typen : m. As before sincetemporal information is related to the object rather than its attributes tuple timestampingis sufficient here.

Figure 6.8:External schema of layerclimate

Layer Relief

The external schema of the final layerrelief (cf. figure 6.9) consists of only one class,calledReliefPunkt (relief position). The only classReliefPunkt possesses attributes forthe two-dimensional position of the relief information in Gauss-Kruger coordinates aswell as the relief data consisting of values for attributesHoehe(elevation),Hangneigung(slope) andExposition(exposure).

6.3.3 Integrated Conceptual Model

By integration of the external schemata of the different layers we obtain the integratedconceptual model of the database to be implemented which is depicted in figure 6.10.

9Because of the relatively small size of the examined area there is no need to work with differentclimates or climate measuring locations. Thus a simpler model would have been possible for the appli-cation at hand. With respect to future applications, possibly for larger areas with different climate, thegeometrical aspect is also included.


Figure 6.9:External schema of layerrelief

The conceptual model integrates the different layers by use of two abstract classesPunkt (position) andStandort (location). These classes will not be instantiated; theysimply serve as superclasses for the integration.

The classPunkt with attributes for the position of a measuring location in Gauss-Kruger coordinates integrates the different position classes of all layers. For layerreliefit is extended by classReliefpunkt (relief position) which carries the other attributes ofthat layer and needs to be differentiated from the other layers since it describes rasterinformation opposed to vector information in the other layers. Layers with vector in-formation extendPunkt by means ofStandort which adds the geometry of the area forwhich the measured information is valid. For clarity reasons this class is itself extendedby the already introduced classesNutzungStandort, BodenStandort andKlimaStandortwhich are copied from the external schemata of the different layers. Another reason forthis subdivision is that even though measurements may be taken at the same position,they may be valid for different areas in different layers, e. g. in the example climateinformation is valid for the whole examination region whereas there are many differentland uses.

6.3.4 Logical Model

The object-relational model used here on the logical level consists of the relational modelwith additional user-defined data types including nested types as described in section 2.3.Also we assume that user-defined functions and data type-specific query optimization ispossible. Thus every database system providing functionality for the object-relationalmodel could be used for implementation. The transformation of the conceptual modelinto relation schemata is described in this section. For clarity the transformation will bedescribed separately for the different layers. In the logical model data types for attributesare defined, the super- and subclasses are integrated, artificial keys are generated wherenecessary and some simplifications are used.

Layer Soil

By standard transformations known from relational databases the relations depicted infigure 6.11 are obtained from the conceptual model. The only non-relational featureused in the relation schema is the use of the object type calledSDO GEOMETRY for


Figure 6.10:Integrated Conceptual Model


arbitrary two-dimensional objects. It is strongly influenced by the availability of such atype in the database system used (which can be expected, cf. section 7.4.1).

BodenStandort (SNr:INTEGER, Rechtswert:FLOAT, Hochwert:FLOAT,Geometrie:SDO GEOMETRY, BPNR→BodenProfil)

BodenProfil (BPNr:INTEGER, BNr→Boden, We:FLOAT, nFKWe:FLOAT,KRmm:FLOAT, KRStufe:INTEGER, nFKWeStufe:INTEGER)

Boden (BNr:INTEGER, Bodentyp:VARCHAR2(10), Staunaessestufe:INTEGER,Grundwasserstand:VARCHAR2(10),Grundwasserstufe:FLOAT)

BodenHorizont (HNr:INTEGER, Horizontbezeichnung:VARCHAR2(10), Bodenart:VARCHAR2(5), Rohdichte:FLOAT, Grobboden:FLOAT,Humusgehalt:FLOAT, Carbonatgehalt:FLOAT, Tongehalt:FLOAT,pHWert:FLOAT, LD:FLOAT, LDStufe:INTEGER,nFK:FLOAT, We:FLOAT)

ProfilHorizont (BPNr→BodenProfil, HNR→Horizont, Otief:FLOAT, Utief:FLOAT)

Figure 6.11:Relations of layersoil

Layer Land Use

Similarly to layersoil we obtain most of the relations presented in figure 6.12 for landuse by standard transformations. The simple use of a tuple timestamp is sufficient heresince no other time-dependent attribute is present in this class. If this were the caseas expected in more complex domains a data type combining thematic and temporalinformation in one attribute would be required (cf. section 7.4.2).

NutzungStandort (SNr:INTEGER, Rechtswert:FLOAT, Hochwert:FLOAT,Geometrie:SDO GEOMETRY)

Nutzung (NNr:INTEGER, NutzNr:INTEGER, Jahr:DATE)Anbau (SNr→NutzungStandort,NNr→Nutzung)

Figure 6.12:Relations of layerland use

Layer Climate

For practical reasons, climate data is observed and recorded at certain measuring points,the n : m-association in the conceptual model is changed to cardinality 1: n in thelogical model. The possible data redundancy incurred by this change is acceptable sincetherefore data insertion will be improved.

To separate between climate measuring location and measurements, all measure-ments for a certain location are stored in an own relation. The measuring location isthereafter attached with the table of measurements instead of the measurements them-selves. This makes use of the object-relational concept of collection types or in more


detail nested tables. It is implemented by a collection typeTyp KlimaDatenTabelle con-taining the measurement information and implemented as explained on page 81.

Moreover climate information should not be bound to a certain location directly sinceit may be necessary to use same climate information to derive information at differentlocations. This may be due to incomplete climate information for technical, financial orother reasons. This cannot be realized by changing the area associated with a measuringlocation because this way the area would be changed for all measurements at this loca-tion which is not desired. It is possible that at different times differentiation between theareas is required. Thus a relationKlima (climate) is introduced in figure 6.13 which alsoshows the other relation of layerclimate.

KlimaStandort (SNr:INTEGER, KNr→Klima, Rechtswert:FLOAT,Hochwert:FLOAT, Geometrie:SDO GEOMETRY)

Klima (KNr:INTEGER, Klimadaten:Typ KlimaDatenTabelle)

Figure 6.13:Relations of layerclimate

Layer Relief

The layerrelief consists of a single relation from the class in the conceptual model(cf. figure 6.14). The nameRelief25 is due to technical reasons of the scale of rasterpoints used; this is explained in detail in [PKL00].

RPNris an artificial key for raster points andOrt (place) contains a three-dimensio-nal point describing the two-dimensional location of an elevation point together with itselevation value. We make use of the capability of storing three-dimensional informationin complex typeSDO GEOMETRY (cf. section 8.1). The other attributes contain othermeasured and derived information.

6.3.5 Physical Design

Based on the logical model the physical database design is undertaken. In the processsome phases of which are also examined in greater detail in section 8 object types aredefined, relations and integrity constraints are produced and index structures for accessoptimization are defined.

Relief25 (RPNr:INTEGER,Ort:SDO GEOMETRY,Hoehe:FLOAT,Hangneigung:FLOAT,Exposition:FLOAT)

Figure 6.14:Relation of layerrelief


Definition of Complex Types

In the logical model presented previously two complex data types were used, namelySDO GEOMETRY and Typ KlimaDatenTabelle (type ClimateDataTable). To describetwo-dimensional planar regions (typeSDO GEOMETRY) as well as three-dimensionalpoints we make use of the typeMDSYS.SDO GEOMETRY which is provided by thedatabase system Oracle 8i /9i for exactly these purposes. This has the advantage that wecan also use optimization features for this data type provided by the ORDBS vendor.Since especially spatial operations tend to be very costly this is a substantial advantage.Nevertheless three-dimensional indexing is not supported and has to be performed byuser-defined indexes with techniques similar to those of section 8.3.1 for temporal data.

For the collection typeTyp KlimaDatenTabelle on the other hand which was used todescribe climate measurements at a certain location we define our own implementation.A tuple of climate values for a certain day consists of seven attributes. For this we definea new row typeTyp KlimaDatum (type ClimateData) as follows10:

CREATE TYPE Typ_KlimaDatum AS OBJECT (Datum DATE,Temperatur FLOAT,RFeuchte FLOAT,Niederschlag FLOAT,EWT14 FLOAT,ET14 FLOAT,ETPhaude FLOAT

);

The definition of typeTyp KlimaDatenTabelle follows directly from this:

CREATE TYPE Typ_KlimaDatenTabelle AS TABLE OF Typ_KlimaDatum;

This collection type can then be used as a data type for an attribute of another table.In the definition of tableKlima (climate) we make use of this concept by storing climateinformation in a nested table:

CREATE TABLE Klima (KNr INTEGER,KlimaDaten Typ_KlimaDatenTabelleCONSTRAINT PK_Klima PRIMARY KEY (KNr))

NESTED TABLE Klimadaten STORE AS KlimadatenMenge;

KNr is key of the relation andKlimadaten (climate data) stores all measurement datain one attribute.

10EWT14 denotes saturation vapor pressure,ET14 current vapor pressure andETPhaude the potentialevapotranspiration as defined by Haude.


Relations

With one exception the relations implemented are exactly the relations of the logicalmodel described previously. The exception is the storage of geometry information sepa-rately from thematic information. This is done since the size of geometric informationis much larger than the size of thematic information11. Thus queries for only thematicinformation are slowed dramatically by the overhead required to read the complex ge-ometric information from secondary storage. This would also have a negative impacton the application specific methods implemented on top of the ORDBS. Therefore theseparate storage of thematic and geometric information has been used for relationsBo-denStandort, NutzungStandort andKlimaStandort of the logical model. The additionalrelations likeBodenFlaeche contain the geometry attribute together with a foreign keyreference on the location in the thematic data relation.

Example:BodenStandort(SNr:integer, Rechtswert:float, Hochwert:float,

Geometrie:MDSYS.SDO GEOMETRY,BPNR→BodenProfil)

from the logical model is subdivided physically into relations:BodenStandort(SNr:integer, Rechtswert:float, Hochwert:float,

BPNR→BodenProfil)

BodenFlaeche(SNr→BodenStandort, Geometrie:MDSYS.SDO GEOMETRY)

Indexes

Besides the automatically generated primary key indexes spatial indexes were createdmanually. The Z-Code index provided by Oracle 8i /9i was used. For an insertion ofrequired metadata information about the spatial range to be indexed a function automat-ically computing this information for any given dataset was implemented. After that aZ-Code index for the MBRs of the geometries with optimization (cf. middle picture offigure 3.5) of a given depth was computed. Details about the performance of Oraclespatial indexes is explained in section 8.3.2. To support combined queries for spatialand thematic properties specialized user-defined indexes should be used. They are de-veloped by techniques similar to the ones presented in sections 8.3.1 (temporal thematicattributes) and 8.2.3 (spatio-temporal attributes).

6.3.6 Sample Queries

After the database whose schema is depicted in figure 6.10 has been populated with dataa lot of domain specific functionality in STOSTA may be easily implemented. Somesimple computations can be performed directly by SQL queries. Three examples are:

• Compute the minimal and maximal duration of capillary elevation (ta) in the ex-amined region:

11The polygons used consisted of up to 1042 vertices.


SELECT MIN(ta),MAX(ta) FROM vbn;

MIN(TA) MAX(TA)---------- ----------

28.223 120

• Compute number of areas and total area for each step of the available field ca-pacity of effective rooting depth (nFKWe) sorted by area:

SELECT COUNT(*) AS Number_Areas, ROUND(SUM(MDSYS.SDO_GEOM.SDO_AREA(f.geometrie, meta.diminfo))/

POWER(10,8)) AS Area_in_ha,p.nFKWeStufe AS nFKWeStep

FROM BodenStandort b, BodenProfil p, BodenFlaeche f,USER_SDO_GEOM_METADATA meta

WHERE meta.table_name = ’BodenFlaeche’ ANDmeta.column_name = ’Geometrie’ ANDb.SNr = f.SNr AND b.BPNr = p.BPNr

GROUP BY p.nFKWeStufeORDER BY Area_in_ha ASC;

NUMBER_AREAS AREA_IN_HA NFKWESTEP---------------- ------------- ----------

10 57 313 91 119 150 215 251 421 353 5

• Compute the capillary elevation rate (KRmm) of the five soil profiles closest topoint (357135313,576483085):

SELECT NVL(KRmm,’NULL’) AS KRMM, bp.BPNrFROM BodenProfil bp, BodenStandort bs, BodenFlaeche bfWHERE bp.BPNr = bs.BPNr AND bs.SNr = bf.SNr AND

SDO_NN(bf.Geometrie,MDSYS.SDO_GEOMETRY(2001,NULL,MDSYS.SDO_POINT_TYPE(357135313,576483085,0),NULL,NULL),’SDO_NUM_RES=5’) = ’TRUE’

ORDER BY 1 asc;

KRMM BPNR---------- ----------

.3 171.2 20

5.01 195.01 12NULL 22

Since these functions are expressible by SQL queries they can only return a resultrelation in textual form. Obviously more advanced functions may also be implemented


by using a database programming language such as PL/SQL for Oracle 8i /9i . Thesemore advanced computations are linking rules and methods which will be introducedin section 9.3.1. The former are rules on how to compute more internally importantattributes and the latter are combinations of linking rules to compute domain-wide im-portant attributes. In many applications both types of methods will be simply calledmethods.

Earlier in this section it was described that data can be classified into certain infor-mation layers and classes within these layers. If methods use and result in data of asingle class only they can be implemented by simple DML statements; the followingexample computes the storage density for all horizons:

UPDATE BodenHorizontSET Ld = Rohdichte + 0.009*TongehaltWHERE Rohdichte IS NOT NULL AND

Tongehalt IS NOT NULL;

Of course these statements may become more complex by using more complex func-tions or even subqueries, but the structure still remains as shown. If data from multipleclasses within a single information layer is used and combined either one or multipleDML statements may be used. Some functions of this category require more complexcomputations and are thus implemented by PL/SQL functions in the database. This isdue to their slightly more complex structure requiring cursors for computation of thedesired attributes. More details about these complex methods are given in section 9.3.1where a general metamodel for methods in scientific applications is developed.

Chapter 7

Consolidation of Conceptual Modelingin STOSTA

Based on the general issues and case studies from the previous chapter the goal of thischapter is to develop a generalized approach to conceptual and later logical modeling ofspatial, temporal and spatio-temporal data. Previously proposed approaches especiallyon the conceptual level are integrated with novel ideas on how to use ORDBS for thelogical level. The result is a guideline for conceptual and logical modeling of data inthese domains.

7.1 ER-based Conceptual Modeling

In conceptual modeling only few approaches are widely used. This is due to the fieldbeing consolidated for quite a while on one hand and due to the widest possible in-dependence of the application and implementation that is the main characteristic of aconceptual schema on the other hand. The most important means of specifying a con-ceptual schema is the entity-relationship model ([Che76]) which is described in mostbasic database texts (e. g. [EN00]). It provides several modeling constructs to model theinteresting part of the real world for a database application. With the advent of moreadvanced applications and, even more important, an object-oriented view as foundationfor modeling the ER model was not sufficient anymore, since it does e. g. not supportuser-defined types. More advanced modeling constructs were required and proposed inseveral extensions to the ER model. Important proposals can be found in [TYF86] and[EGH+92]. In particular the latter formally defines rules for construction of new datatypes bytype constructors. Moreover the issue of differentiating between object typesand data types is discussed there. The conclusion is that every part of the real worldwhose related information can be updated without changing the objects identity shouldbe modeled as an object of a certain object type whereas everything else should ratherbe modeled as (value of a) data type. As an example an existing country may changeits attributes such as population, capital or geometric shape but still remains the samecountry. Consequently the country should be modeled as an object and the geometry,

86 Consolidation of Conceptual Modeling in STOSTA

even though consisting of several components of simple types, should be modeled as anew data type since an update would modify its identity; the identity is only manifestedby the values of the different components.

In addition to the general extensions leading to the EER model of [EGH+92], es-pecially for temporal and spatial application domains, specific model extensions havebeen proposed. These are helpful if using the model for conceptual schemata in theseapplication domains. For the spatial domain the constructs of [HT97] are very wellsuited whereas on the temporal side [EEAK91] or [GJ99] are suggested as references.For spatio-temporal applications extensions to the EER model have been recently pro-posed by [HT97, TJ99] which are very useful for conceptual modeling in that domain.Also [TH98] presents extensions for conceptual and logical design of spatio-temporalapplications for the EER model.

All of the above ideas and concepts can be either used in (E)ER-based modeling orin object-oriented modeling. The difference is mainly in notation where newly defineddata types can be more easily integrated into object-oriented notation. Moreover a closerintegration of operations on the object types can be achieved in object-oriented notation.Consequently this work uses the object-oriented modeling paradigm and process, butthe principal ideas and model extensions are useful in either approach.

7.2 Object-Oriented Conceptual Modeling

If an object-oriented system design and data modeling approach is preferred the mostimportant analysis artifact is a class model of the application which can be used asconceptual model as well. The class model may be used without specification of datatypes for the attributes of classes (as shown in figure 6.10). For STOSTA in this caseno additional work is necessary and possible. The major drawback of this approach isthat the spatial and temporal components are not explicitly specified and thus are notobtainable from the diagram. Consequently no special support for domain specific datatypes such as spatial or temporal is possible in the later development stages. Thereforeusage of data types for attributes of the class model is strongly recommended.

7.2.1 Spatial Data Types

For spatial applications these data types may be any type of spatial feature in any di-mension. It should be chosen as specific as possible in the conceptual model. In thelogical and physical design stage a more general data type (e. g.2DSpatialFeature orSDO GEOMETRY as used in the case studies) may be used for optimization purposes.Nevertheless the application-independent conceptual model should contain informationas detailed as possible. Thus data types to be used for spatial attributes may for examplebe any type offered by the ROSE algebra ([GS95, Sch97]) in two dimensions. Extensionto three dimensions is also possible but not straightforward. Usually types like3DPoint,3DLine, Sphere and3DRegion should be sufficient. Formal specifications of these datatypes can be found in the literature, see e. g. [GS95, Sch97] for the specifications of data

7.2 Object-Oriented Conceptual Modeling 87

types in the ROSE algebra. Usage of these types is usually possible without studying theformal specifications but one should be aware of the existence of a formal specification.An example of a conceptual schema using these data types was presented in figure 6.2.

7.2.2 Temporal Data Types

Data Types to be used for temporal applications may be temporal versions of any of thebasic data types. Temporal data is treated differently from spatial data since that seemsto reflect humans perception of the real world more closely. Whereas the spatial positionor extent of an object is usually perceived as aregularattribute of an object, the time ismore fundamental. All information (i. e. all attributes, standard as well as spatial) aresubject to temporal development. Time seems to be the governing dimension behindall information. Despite the fact that this position of time is subjectively introduced themany research publications in temporal modeling suggest that it is common sense.

For temporal data in STOSTA one needs to specify, if a valid-time, transaction-timeor bitemporal version of the base type is required. In the spirit of object-oriented mo-deling using temporal data types on the attribute level as opposed to the class level asin many older publications should be preferred. This facilitates one-to-one modelingof real-world objects. This approach essentially means using attribute-timestamping asopposed to the widely used tuple-timestamping. I even propose to go one step furtherin modeling and suggest that combinations of attribute values and timestamps should beseen as basic value units; thus it can be said that the data types to be used in temporalapplications should be temporal versions of each of the basic data types. The reasons forusing attribute-timestamping are the more adequate modeling capabilities and the ease oftransforming these into a logical model if using ORDBS as shown in section 7.4.2. Ac-tually this was exactly the reason for using tuple-timestamping in most previous publica-tions: there RDBS were used and an efficient implementation of attribute-timestampingwith these systems is difficult, if not impossible, or results in tuple-timestamping in theend. But since now ORDBS are available the stronger and for most applications morerealistic attribute-timestamping approach should be used.

A unified approach for many tuple-timestamping data models has been described by[JSS94] under the notion of bitemporal conceptual data model (BCDM, cf. section 4.1).Our main idea, parts of which have been published in [KL01b], is to adjust theBCDMby extending the set of available standard data typesD = {D1, . . . , Dn} by bitemporaldata types:DD = D ∪ {(D × DV T × DT T ) | D ∈ D} for valid and transaction timedomainsDV T and DT T . DT T includes the special symbolUC for current validity upto a change (until changed). Thus we consider combinations of attribute values andtwo timestamps instead of original attribute values. These new data types can be intro-duced as user-defined types in extensible database systems like ORDBS. As usual weassume linear, discrete, bounded time dimensions and therefore use a countable numberof chronons. This yields an intuitive model with a theoretical foundation which is notoptimal for implementation though. A way to implement this data model logically willbe described in section 7.4.2.

Since in addition each object may have different values of an attribute of a stan-


dard type at different times, tuples in this model have to store values which are sets ofthese temporally enhanced standard types. Consequently tuples of a relation withi non-temporal andn − i temporal attributes in this model can be equivalently written as

t = (a1, . . . , ai , {ai+1, {cvt , {ctt}}}, . . . , {an, {cvt , {ctt}}})for valid time chrononscvt and transaction time chrononsctt . This notation is an ab-brevation for sets that consist of 2-tupels where the first component is of the base typeand the second component is a set. Similarly elememts of this set are 2-tupels consistingof a valid time chronon in the first component and a set of transaction time chronons assecond component. Semantics of this kind of temporal information is that each temporalattribute is attached a subset ofDV T as valid time information. Each valid time chrononcvt in turn is attached a subset ofDT T as transaction time information, recording whenthis valid time information was present in the database. This asymmetric semantics isalso reflected by the update operations.

Possible update operations are as usualinsert, delete andupdate. For aninsert the standard attribute values together with their valid time information (fortemporal attributes) are passed; the transaction time is set automatically by the systemto UC initially.

insert(σ (R), (a1, . . . , ai , ai+1, tvi+1, . . . , an, tvn)) =

σ(R) ∪ {(a1, . . . , ai ,uct(ai+1), . . . ,uct(an))} , if ¬∃(a1, . . . , ai , ∗) ∈ σ(R)

σ (R) − {(a1, . . . , ai , ai+1, . . . , an)} , if ∃(a1, . . . , ai , ∗) ∈ σ(R)

∪{(a1, . . . , ai , ai+1 ∪ uct(ai+1), and¬∃l ∈ [i + 1 : n] :. . . , an ∪ uct(an))} UC ∈ �2(�2(al))

σ (R) , else

For better readability the shorthandal was used for the complete representation ofa temporal attribute{al, {cvt , {ctt}}}. Moreoveruct(al) = ∪cv∈tvl

{al, {cv, {UC}}} is theunion of all valid time chronons of attributes of the specified attribute value marked ascurrently contained in the database by the existence of aUC tag. Finally�2 denotesthe projection onto the second component of pairs as elements of a set. In the firstcase where no tuple with the same (non-temporal) key attributes is already present in thedatabase the tuple is added as new real-world object. If on the other hand there is alreadya tuple with the same non-temporal key attributes in the database having no currentlyUC marked attribute1, the new values of the time-invariant attributes are added to thattuple together with the given valid-time information. The transaction time informationis inserted asUC since the tuple is now current in the database. Transaction time isautomatically updated by the system: after a time spant ′

j ∈ C has passed, all tuples withtemporal attribute(cv, UC) are extended by appending the temporal information(cv, t ′j ).

In the above definition we have assumed that exactly the non-temporal attributes of atuple are forming its key. This was only done for better readability and is no restriction of

1In this case the tuple would be temporally modified and thus anupdate has to be used instead.


the model itself: a generalization to arbitrary key attributes is straightforward. The dif-ferent cases are distinguished by only the key attributes as opposed to all non-temporalattributes as above.

Similarly the operationdelete can be defined as follows:delete(σ (R), (a1, . . . , ai , ai+1, . . . , an)) =

σ(R) − {(a1, . . . , ai , ai+1, . . . , an)} ∪ {(a1, . . . , ai ,

ai+1 − uct(ai+1), . . . , an − uct(an))} , if ∃(a1, . . . , ai , ∗) ∈ σ(R)

σ (R) , else

In this formulauct(al) is defined as above. Hence for deletion a tuple having thecorresponding non-temporal attributes is simply modified by deleting allUC entriesfrom valid time information of temporal attribute values that have the desired valuefor deletion. Thus when transaction time advances these tuples will not be updatedanymore, leading to not being current in the database. Since transaction time informationis required no physical deletion of tuples takes place. The above definition also worksfor non-current tuples since in that case the setuct is empty leading to an unchangedrelationr after the deletion, as required. The same simplifications as for insertions wereused again.

Finally theupdate operation is defined as a deletion followed by an insertion:

update(σ (R), (a1, . . . , ai , ai+1, tvi+1, . . . , an, tvn)) =insert(delete(σ (R), (a1, . . . , an)), (a1, . . . , ai , ai+1, tvi+1, . . . , an, tvn)).

VT

20 40 60

20

40

60

10 30

30

50

10

50

70

3000 Scott

Miller5000

TT

4000

Miller

Figure 7.1: Sample tuple with bitemporal attributes

For illustration purposes a tuple with bitemporal attributes may be found in figure7.1. As difference to tuple-timestamping where a whole relation could be visualized


in a single diagram (as in [JSS94]) one diagram per tuple is required for attribute-timestamping. Hence the graph visualizes a single tuple which corresponds to the tuplewith ID 1 in the relational representation of an attribute-timestamped relation shown infigure 7.2. There tuples are of the simpler form2

t = (a1, . . . , ai , {ai+1, {(cvt , ctt)}}, . . . , {an, {(cvt , ctt)}}.In this relation the key attributeID of an employee is the only non-temporal attribute,whereasname andsalary are changing over time as bitemporal attributes.

Finally figure 7.3 shows the modification operations required to obtain the databasestate of figure 7.2 from an empty database, together with the transaction times at whichthese operations have to be issued. Note that it is characteristic for transaction timedatabases that statements have to be issued at particular transaction times in order toresult in a particular database state. That is due to the transparent nature of transactiontime, which yields unchangeability of the transaction time information for the end user.Consequently to obtain a certain database state with given transaction times the usermust issue the statements atthe right time.

ID Name Salary

Miller {(10,10),. . . ,(10,24),. . . ,(19,10), 3000 {(10,10),. . . ,(10,29),. . . ,. . . ,(19,24),(50,60),. . . ,(50,75), (29,10),. . . ,(29,29)}(50,UC),. . . ,(69,60),. . . ,(69,75), 4000 {(30,30),. . . ,(30,49),. . .

1 (69,UC)} (49,30),. . . ,(49,49),(50,30),. . . ,(50,39),. . . ,(64,30),. . . ,(64,39)}

Scott {(20,25),. . . ,(20,59),. . . , 5000 {(50,50),. . . ,(50,75),(49,25),. . . ,(49,59)} (50,UC),. . . ,(74,50),. . . ,

(74,75),(74,UC)}2 Allen {(10,10),. . . ,(10,59),. . . 6000 {(10,10),. . . ,(10,59),. . .

(70,10),. . . ,(70,59)} (70,10),. . . ,(70,59)}Figure 7.2:Sample relation with bitemporal attributes

As an example the third operation modifies the name information of the tuple withID 1 to be(Scott,[20,50)). At that time all entries with nameMiller andtransaction time informationUC are deleted from the database by the deletion part ofthe update operation. Also entries with nameScott and the given valid times as wellas transaction time informationUC are inserted. The given database state is the result ofthe later implicit updates by the database when transaction time advances; this is similarto section 4.1.

Also similar to section 4.1 is the treatment of valid-time databases. Since no transac-tion time information is required most of the above operations can be greatly simplified.

2This form is obviously equivalent to the conceptual form.


Operation Transaction Time

insert(Emp,(1,’Miller’,[10,20),3000,[10,30)) 10insert(Emp,(2,’Allen’,[10,70),6000,[10,70)) 10

update(Emp,(1,’Scott’,[20,50),∗) 25update(Emp,(1,∗,4000,[30,65)) 30update(Emp,(1,∗,4000,[30,50)) 40update(Emp,(1,∗,5000,[50,70)) 45

delete(Emp,(2,∗)) 60update(Emp,(1,’Miller’,[50,70),∗) 60

Figure 7.3:Update operations to achieve the database state of figure 7.2

This can be achieved by a different definition ofuct(ai ) in that case: it can be definedasuct(ai ) = ∪cv∈tvi

{ai , {cv}}. This way the third case in the insertion operation be-comes irrelevant since either the condition of the first or second case is satisfied. Alsothe deletion is reduced to non-temporal deletion as for standard relational DBS.

7.2.3 Spatio-Temporal Data Types

Spatio-temporal objects changing their spatial information are not the focus of this work;work on this subject can e. g. be found in [GBEJL+00], [FGNS00] or [CZ00]. Fordiscretely changing geometries in STOSTA which are the focus of this work the requireddata types may be easily derived from the previous two paragraphs: if we consider alldata types for spatial data presented above as standard types we can easily combinethem with the temporal information in exact the same way as presented for generalstandard data types above. We obtain new data types for spatio-temporal attributes suchasVT 3DPOINT, TT REGIONS or BT 2DLINE. Theoretically all of those types may beused in conceptual schemata, but in practice any particular schema will mostly containonly few of those types at a time.

7.2.4 Application to ATKIS

Since these concepts are novel they have only rarely been used in prior conceptual mod-els. An example of an application for these data types is an improvement of the ATKISschema of figure 6.2. Over the years landscape objects change their shape and for manycommercial and scientific applications recording this temporal development of the spa-tial information would be beneficial. Therefore the conceptual schema of ATKIS aspresented in figure 7.4 would be more appropriate to capture temporal information aswell. These requirements have recently also found their way into the official specifica-tions ([AdV02]) where the AAA model as successor of ATKIS is described to require aversioning of all attributes.


Whereas in the AAA model the temporal requirements have not been incorporatedin the graphical conceptual model, figure 7.4 uses the newly introduced data typesto model the temporal requirements on attribute level. Since the geometry of an ob-ject may change over time, the geometry attribute becomes time dependent in the newmodel. Also objects may change some of their properties (or object attributes in ATKISterminology) over time; this results in a temporal data type for the attribute of classObjectAttribute. Other temporal versions were introduced for attributes of classesATKISObject andObjectName. It should also be noted that the classes forming theATKIS catalogue have attributes of non-temporal types only, as expected.

AttributeValuevalue : STRINGname : STRING

AttributeTypeAttributeTypeNumber : STRINGName : STRING 1..*1..1 1..*1..1

is explained by

ObjectNameType : VARCHAR2(2)Value : VT_STRINGPositions : VT_Geo_Table_Type

LayerID : VARCHAR2(4)Name : VARCHAR2(100)

ObjectKindID : VARCHAR2(4)Name : VARCHAR2(100)

ObjectGeometryGeometry : VT_GEOMETRY

0..* 0..*0..*

is above

0..*

ObjectAttributeTempValue : VT_STRING

0..1

0..*

0..1

0..*

is described by

1..1

0..*

1..1

0..*

is of

ATKISObjectObjID : VARCHAR2(7)ModelType : VARCHAR2(2)BuildDate : DateCoordinates : VT_Point

0..*

1..1

0..*

1..1

denotes

1..1 0..*1..1 0..*

belongs to

1..10..* 1..10..*

belongs to

0..*0..* 0..*

consists of

0..*

0..1

0..1

0..1

0..1

defines ST shape of

0..*

1..1

0..*

1..1

vt_posesses

ObjectPartObjectPartID : VARCHAR2(3)

0..10..1 0..10..1

defines shape of

0..*0..1 0..*0..1

vt_posesses

1..1

0..*

1..1

0..*

is part of

Figure 7.4:Spatio-Temporal Schema for ATKIS data

In the model in figure 7.43 only valid time was used. This is compliant with thepresent specifications of the application. Transaction time information could very wellbe required in a future version of the model. Since data for the same geographical regionon different scales is recorded and managed by different authorities, it could becomeimportant to be able to reconstruct the information as it was present in the database at

3Some of the datatypes used will be defined later in this chapter.

7.3 Standard Logical Modeling in STOSTA 93

a certain time. This is exactly what could be achieved by adding transaction time andconsequently using a bitemporal database. Moreover these could be easily integratedinto the model by using the concepts presented earlier in this section.

7.3 Standard Logical Modeling in STOSTA

Now the transformation of a conceptual schema into a logical schema has to be con-sidered. In this work the focus is on the logical modeling of spatial, temporal andspatio-temporal information in ORDBS which is modeled conceptually as describedpreviously. The three different kinds of data are considered one by one. Despite notfocusing on any particular DBS at this stage the choice of data types is influenced by theobject-relational paradigm as well as common characteristics of today’s object-relationalsystems.

We need not distinguish between transferring ER-based schemata and object-orientedschemata in this section since both model the same kind of information and the trans-formation results in the same products. The only difference at the conceptual stage isnotation, thus we use the data types as described in object-oriented modeling in theprevious section.

Rules for the transformation of an ER schema into a logical database schema aregiven in any standard database textbook such as [EN00]. For example entities andm : n-relationships are translated into relations in the logical model. Functional relationshipsin the conceptual model are transformed into foreign key constraints in the participatingrelations. [EN00] also gives rules on how to translate EER schemata into the relationalmodel. With the additional features of object-relational databases the use of user-definedtypes for some modeling constructs should be considered and will greatly simplify themodeling process. The use of such features which are straightforward in most caseswill be explained when they are used. Since the modeling constructs available in theEER model and in UML class diagrams are almost the same, the transformation rulesfor class diagrams may be adapted from the rules for EER schemata. The rules usedbecome clear when they are used, and are explained for the domains which are the focusof this work in the next section.

7.4 Advanced Logical Modeling in STOSTA

This section explains the parts of the translation of the conceptual model where spa-tial, temporal or spatio-temporal data types are involved in detail. Alternative modelingapproaches for special applications in the spatial domain are also mentioned.

7.4.1 Spatial Data

The most common way to transform spatial conceptual data types to logical is by keep-ing the object-based view of the conceptual model. In this case spatial information which


OGCGeometry

+envelope(): OGCGeometry+boundary(): OGCGeometry+equals(OGCGeometry): boolean+disjoint(OGCGeometry): boolean+intersects(OGCGeometry): booelan+touches(OGCGeometry): boolean+crosses(OGCGeometry): boolean+within(OGCGeometry): booelan+contains(OGCGeometry): boolean+overlaps(OGCGeometry): boolean+relate(OGCGeometry): boolean+distance(OGCGeometry): double+buffer(OGCGeometry): OGCGeometry+convexHull(OGCGeometry): OGCGeometry+intersection(OGCGeometry): OGCGeometry+union(OGCGeometry): OGCGeometry+difference(OGCGeometry): OGCGeometry

OGCPoint OGCCurve

+length(): double+startPoint(): OGCPoint+endPoint(): OGCPoint+isClosed(): boolean+isRing(): boolean

OGCSurface OGCGeometryCollection

OGCLineString

+numPoints(): integer+pointN(): OGCPoint

OGCLine OGCLinearRing

OGCPolygon

OGCMultiSurface OGCMultiCurve

OGCMultiPoint

OGCMultiPolygon OGCMultiLineString

Figure 7.5:Geometric type hierarchy in [OGC99a] with sample methods

is an attribute of a conceptual object is kept as a column of the table corresponding tothat object. Since spatial information is usually considered complex data rather than ob-ject (cf. [EGH+92]), this modeling is appropriate in most cases. When this object-basedview is used the logical level has to be divided into two different aspects. The first is theinterface of the logical data type as perceived by the user of the database. The secondaspect is the internal implementation of the logical data type as basis for the physicalimplementation which will be described in section 8 in detail.

The interface of logical spatial data types should be as simple as possible to providefor easy usability but still provide all required functionality. One comprehensive pro-posal of different two-dimensional spatial data types can be found in [OGC99a]. Thegeometric type hierarchy of the so-calledsimple featuresspecification is also illustratedin figure 7.5. Methods are shown only for the inheritance hierarchy betweenOGCLineandOGCGeometry as examples.

The reason for adding many important methods to the superclassOGCGeometryis user-friendliness: this is the class that provides the required interface of geometricdata for users. The end user should deal with as few different types as possible to makethe system easy to use. For example in Oracle 9i only the typeSDO GEOMETRY (seedescription in section 8.1) which corresponds toOGCGeometry is provided.

On the other hand in the internal representation a differentiation between the dif-

7.4 Advanced Logical Modeling in STOSTA 95

ferent subclasses ofOGCGeometry can be very beneficial. For example the methodwithin does not make sense if applied to two points. Moreover several methods canbe implemented much more efficient for some geometric shapes than for others (considere. g.equals on points or on polygons). Also for each geometric data type a differentphysical representation including indextypes and selectivity estimation may be chosen.One could e. g. use specialized point access methods on2DPoint data, R-Tree indexes on2DPolygons and Z-Codes with multiple approximations for2DPolyLines. Consequentlythe differentiation between the different two-dimensional geometry types on the concep-tual level should be kept on the logical level at least in the internal representation. Thisfacilitates a more efficient physical representation of geometric data later.

For three dimensional data there is no agreed upon specification comparable to thesimple features specificationof the OGC in the two-dimensional case. Definitely thetype3DPoint will be required. Inherently two dimensional shapes in space may be de-scribed by simply adding three-dimensional coordinates to the type2DGeometry. Sur-faces of real three dimensional objects will probably also be described by a single type.Since these objects are usually approximated by a triangulation, the data type will be alist of three-dimensional triangles. A possible data type definition can be found in figure7.6. Representation of three-dimensional objects is a topic of ongoing research in com-puter graphics as well as in application disciplines. Details can be found in publicationsin that area. For the applications which are topic of this work the above specificationsare sufficient; more advanced spatial shapes should be defined when agreed-upon rec-ommendations e. g. by the OGC have become available.

CREATE TYPE 3DPoint (x NUMBER, y NUMBER, z NUMBER);CREATE TYPE 3DTriangle (p1 3DPoint,p2 3DPoint,p3 3DPoint);CREATE TYPE 3DSurface LIST OF 3DTriangle;

Figure 7.6:Definition of a data type for 3D surfaces

For certain applications the computational accuracy achieved by the modeling de-scribed so far is not sufficient. By rounding errors the problem could occur, that theintersection point of two lines does not lie on either line when tested. A mathematicallyrobust foundation for spatial data types is e. g. the ROSE algebra which is describedin detail in [GS95, Sch97]. The core idea is the building of geometric objects fromobjects of a so-called realm in a constructive manner. The realm itself consists of non-intersecting points and lines with integer coordinates such that geometrically robust ope-rations can be defined. Implementation details for it can be found in [GdRS95]. Later animportant modification has been proposed ([LG00]), greatly simplifying the implemen-tation. A prototypical implementation of the latter approach was also part of this work(see [Rip02] for details). The implementation had significant performance problems dueto the use of very large integer numbers that are required for the approach. At current ause of the ROSE algebra can thus only be recommended for applications that really needthe mathematical accuracy of this approach. Since most of the performance problems


relate to data insertion and not to querying a use is still definitely possible for certaindomains.

Applications with high requirements for spatial data integrity between objects suchas road networks or political country borders may on the logical level be modeled by anetwork rather than object-wise. The network representation stores boundary lines ofobjects globally (instead of with the objects) and also stores objects adjacent to a cer-tain boundary line4. The advantage of this model is that updates to the boundary of anobject automatically update boundaries of adjacent objects. The boundary lines form anetwork. This approach makes only sense in the context of non-overlapping (i. e. map-like) spatial information. Also certain queries (like adjacent objects or line intersections)may be answered extremely fast, since this information is already stored in the databaseon insertion or update and does not have to be computed at query time. Many sys-tems in geographic and cartographic applications prefer the network view, since theyare map-oriented and consequently assume non-overlapping spatial information. Alsosome geographic information systems use this model internally. Since it would be outof scope it is not used any further in this work and the required definitions are omitted.It is a non-trivial task to map all the above requirements to ORDBS concepts. This map-ping would probably result in many update triggers, which would be computationallycomplex because of the spatial functions involved.

Similarly details of an implementation of rastered spatial data as opposed to vector-based data is not considered any further in this work. One possible model would be tocreate a table for each raster layer storing the raster points by data types described for theobject-based model. All properties of this raster point would then be modeled as otherattributes of the layer table. This approach was also used in [PKL00] for its simplicity.It is subject to certain performance deficiencies for raster specific operations such asmap overlay if implemented by methods in the object-based model though. An alterna-tive implementation making use of the knowledge about which geometrical features arepresent could greatly improve performance.

7.4.2 Temporal Data

The theoretical conceptual model for temporal information in STOSTA presented insection 7.2.2 would be very inefficient if implemented directly, due to the huge sets ofchronons to be stored; instead sets of temporal intervals should be used. Thus and also tomake optimal use of the new features of object-relational systems as well as to providedata types for a conceptual class model, a different logical model based on the conceptshould be used. Since this logical model is still system-independent but designed for theobject-relational model, all features of ORDBMS as described in [SB99] may be used.For simplification the first part of this section assumes a valid-time database only. Lateron ideas for an extension to bitemporal information are given.

The main idea of providing temporal versions of all data types that was introducedin section 7.2.2 is very well suited for use in object-relational systems. By introducing

4The network view was also used in the original specification of ATKIS.


CREATE TYPE chronon (time LONGINT);CREATE TYPE interval (begin chronon, end chronon);CREATE TYPE tempElement SET OF interval;CREATE TYPE vt_integer_base (value NUMBER, time tempElement);CREATE TYPE vt_integer LIST OF vt_integer_base;

Figure 7.7:Data Types required for valid-time attributes

appropriate user-defined types it can directly be implemented in the object-relationalmodel. Hence for all standard types (and possibly other user-defined types that needtemporal versions such as spatio-temporal types, see section 7.4.3) new data types aregenerated; they can be identified by the prefixvt in addition to the regular type name.The required definitions for the derived typevt integer as temporal version of thestandard typeinteger are presented exemplarily in figure 7.7. Temporal informationis assumed to be stored as temporal elements in that example; alternative representationsof the temporal component such as a function from time into a standard type could alsobe used. The only difference would be in the internal structure of the temporal informa-tion. The typevt integer would remain unchanged which yields an implementationindependent representation of temporal information as required for conceptual models5.

In the definitions in figure 7.7 an integer attribute may be used for chronons fromC, since the set is isomorphic to the natural numbers. In a concrete implementationfunctions for transforming temporal information into a chronon and vice versa wouldbe required. An interval is given by two chronons marking beginning and end of theinterval. For integrity reasons the constraintbegin < end has to be asserted whichcould e. g. be done in transformation functions or by using type triggers. Temporalelements can thus be simply defined as set of intervals in the ORDBS since set typesare one of the important object-relational features. Constraints on temporal elementsinclude empty intersection between any two intervals of a single element and, in orderto achieve a normalized representation, that no two intervals of an element touch.

Up to this point the temporal type definitions may be used for all temporal type exten-sions. On the contrary the last two type definitions are given exemplarily for the standardtypeINTEGER. The first part of the temporal versionVT INTEGER of the standard typeis a combination of value and temporal element as defined inVT INTEGER BASE. Forusage as column of a table, a list of such base types is needed in order to be able tostore non-redundant information (cf. salary and name in figure 7.2). As additional con-straint on the typeVT INTEGER the disjointness of all intervals of temporal elementsof a single list has to be enforced; otherwise storing contradicting information would bepossible.

Using the data typevt integer is illustrated in figure 7.8. Its purpose is to store

5The implementation independency does not hold for the physical optimizations in section 8.3.1 any-more though. The implementation dependency there is acceptable since physical schemata are implemen-tation dependent anyway.


CREATE TABLE employee (id NUMBER PRIMARY KEY,name vt string,salary vt integer);

INSERT INTO employee VALUES (1, 〈 vt string base(’Miller’,{interval(chronon(10),chronon(20)),interval(chronon(50),chronon(70))}),vt string base(’Scott’,{interval(chronon(10),chronon(20))})〉,〈 vt integer base(3000,{interval(chronon(10),chronon(30))}),vt integer base(4000,{interval(chronon(30),chronon(50))}),vt integer base(5000,{interval(chronon(50),chronon(75))})〉);

Figure 7.8:Definition of table from figure 7.2 (valid time only)

numbers with attached valid-time information as required e. g. for salaries of employeesor number of inhabitants of countries in a single column. It shows the definition of thetable from figure 7.2 restricted to store valid time information. Moreover insertion of thefirst tuple contained in that relation is shown by the giveninsert-statement. We haveused〈 and〉 to denote and construct list types as well as{ and} for set types. Complexdata types are constructed by calling implicit default constructors that have the samename as the type itself. Intervals as usual are considered to be right open, left closed.Note the very simple table declaration: by using complex data types as special featureof ORDBS the declaration of a table is as simple as in the non-temporal case. One justhas to use temporal versions of data types instead of the standard types. Such an easydeclaration would not be possible if using tuple-timestamping.

Operators for the new data type can be any combination of operators on valid timeintervals combined with operators on the base type. Temporal operators on intervals(cf. [TCG+93]) arebefore, after, during, equal, adjacent, overlap, follows andprecedes.Since each of these operators requires different indexes, we will focus onoverlap in thesequel. It was most important in the applications at hand. Operators on integer values asthe base type in this case are well known; in this contextbetween is most important. Thecombination of the operators from the different domains will be a conjunction for prac-tical applications. Thus we e. g. obtain the operatorvtBetweenOverlap as conjunctionof temporal overlap and between operator on integers. This way many application de-pendent operators may be defined together with a specialized physical implementation.In this work we will concentrate on the aforementioned operatorvtBetweenOverlap onthe physical level.

An extension of this model to bitemporal information seems straightforward in thefirst place: types of thebase kind could be easily extended to contain a second tempo-ral element for the transaction time information, belonging to the valid time informationstored (cf. figure 7.9). Bitemporal elements for the example in figure 7.1 correspond


CREATE TYPE bt integer base (value NUMBER, valid tempElement,trans tempElement);

CREATE TYPE bt_integer LIST OF bt_integer_base;

Figure 7.9:Data Types required for bitemporal attributes

to rectangles6 in the graphical representation. From a modeling point of view this isappropriate, if a different interval type is used for transaction time information to makeroom for storing the special symbolUC. From an end-user point of view this is probablynot sufficient though: transaction time has to be managed by the system as opposed tothe user for the other information. In summary, since the transaction time dimension hassuch special characteristics in terms of updates and querying requirements, a completelayer to make transaction time transparent for the end-user would be required. Exactrequirements and algorithms remain to be developed and are not the center of this work.Nevertheless from a more technical viewpoint the suggested extension could be used ina more physical model of bitemporal data. The above representation without the specialneeds of the transaction time dimension were also used in the physical design as well asthe optimization stage of this work, which is described in detail in section 8.3.1.

Operators in the bitemporal case can be defined similarly to valid time operators.This time not only one temporal operator is combined with the base type specific op-erator, but rather two temporal operators, one for each temporal dimension are addedby conjunction. Not all possible combinations make sense; but since only the operatorswhich are important for the particular domain are defined anyway, this will not poseany problems. Similar tovt integer we focus onoverlap temporal operators and thebetween operator for integer values in this work yielding operatorbtBetweenOverlap.The techniques presented in chapter 8 on the physical design may also be used for manyother operators in a similar fashion.

7.4.3 Spatio-Temporal Data

In much the same way as at the conceptual stage we primarily consider objects with dis-cretely changing geometry. A proposal for the logical design of moving objects can befound in [FGNS00]. Moreover similarly to the conceptual level we obtain the logical de-sign of spatio-temporal data types in STOSTA by combining the previously mentionedspatial types taken as standard types with the temporal types from the previous para-graph. As of now we only consider two spatial dimensions except for 3D points. Weobtain typesVT 2DPoint, VT 2DGeometry andVT 3DPoint respectively. As mentionedbefore the same concept may be extended to transaction time and bitemporal data if re-quired. The definition in figure 7.10 can be derived directly from the previous paragraph;

6Non-rectangular polygons as the one associated with salary 4000 in figure 7.1 are stored as multiplerectangles by dividing them along the transaction time dimension.


temporal versions of2DPolyLine and3DSurface could be easily generated.Similarly operators for the new data types are defined as conjunctions of correspond-

ing temporal and spatial operators as explained earlier. An example operator that willalso be used in physical design in chapter 8 isvtAnyInteractOverlap which is a conjunc-tion of theanyInteract operator in the spatial domain (cf. section 3.1.2) and theoverlapoperator for valid time intervals.

CREATE TYPE VT_2DPoint (valid tempElement, point 2DPoint);CREATE TYPE VT_2DGeometry(

valid tempElement, geometry 2DGeometry);CREATE TYPE VT_3DPoint (valid tempElement, point 3DPoint);CREATE TYPE BT_2DPoint (

valid tempElement, trans tempElement, point 2DPoint);CREATE TYPE BT_2DGeometry(

valid tempElement, trans tempElement, geom 2DGeometry);CREATE TYPE BT_3DPoint (

valid tempElement, trans tempElement, point 3DPoint);

Figure 7.10:Spatio-Temporal Data Types in the Logical Model

ATKISObject (ObjNo:VARCHAR2(7), LayerNo→Layer,ObjectKindNo→ObjectKind, Coordinate:VT2DPoint,Actual:VARCHAR2(2), ModType:VARCHAR2(2), BuildDate:Date)

ObjectGeometry (ObjectNo→ATKISObject, ObjectPartNo:VARCHAR2(7),Geometry:VT2DGEOMETRY)

Overpass (TopNo→ATKISObject, TopPartNo:VARCHAR2(3),DownNo→ATKISObject, DownPartNo:VARCHAR2(3))

ComplexObject (BigNo→ATKISObject, SmallNo→ATKISObject)ObjectName (ObjectNo→ATKISObject, TypeNo:VARCHAR2(2),

Value:VT STRING, Geo:VTGEO TABLE TYPE)ObjectAttribute (ObjectNo→ATKISObject, ObjectPartNo:VARCHAR2(3),

Type→AttributeType, Value→AttributeValue, TempValue:VTSTRING)

Figure 7.11:Logical Schema of spatio-temporally enhanced ATKIS schema

An example for a logical schema of the temporally enhanced ATKIS schema usingthe novel spatio-temporal data types is given in figure 7.11. The standardized catalogattributes of objects to be referenced are unchanged from figure 6.3 as expected andare therefore omitted. Newly introduced spatio-temporal data types were used for time-varying geometric information in relationsATKISObject andObjectGeometry. More-


over relationObjectNamestoring names of ATKIS objects which may change over time7

makes use of time-varying standard types. User-defined data types can also be tempo-rally extended as shown by attributeGeo of ObjectNamestoring the time dependentplacement of object names on a map. Finally the smooth integration of temporal andnon-temporal information is illustrated by relationObjectAttribute. For non-temporalattributes the structure as in figure 6.3 is kept; for temporal attributes a new attributeof a temporal type is added. The other attribute will then be set to NULL for each tu-ple. Since our temporal types contain temporal elements and not only intervals, eventhe key constraints from figure 6.3 can still be kept in the temporal schema. Temporalkeys are thus not required for ATKIS but they could become important in more dynamicapplications.

7Consider a road whose name and classification may change after a new road has been built.

Chapter 8

Physical Design in Object-RelationalDatabase Systems

8.1 Features provided by the DBS for Spatial Data

In this work the object-relational database system Oracle 8i /9i was used. This systemprovides an extension for (currently) two-dimensional spatial data that may be used forpure spatial data in STOSTA. A short description of the features of this extension that areused in the sequel will be given. A detailed description of the spatial extension can befound in [Ora01]. Other object-relational database systems also have spatial extensionssuch as Informix ([Inf01]), DB2 ([IBM01]) and Postgres ([Ram01]).

The kernel of the extension of Oracle for spatial data is a user-defined type for ge-ometrical dataMDSYS.SDO GEOMETRY which may be used to store many differenttwo-dimensional shapes in a single data type such asOGCPolygon from section 7.4.1.It is widely OGC compliant and offers the possibility to store point data separately fromall other geometric shapes. Points may be three-dimensional in storage but all other op-erations ignore the third dimension. All types of two-dimensional shapes from a simpleline up to a mixed collection of lines and polygons can be stored in a single object ofthis type. Along with the spatial data type, a large set of geometric functions is alsoprovided. Functions returning geometries such asSDO BUFFER which returns a bufferobject around a given object, as well as metric functions likeSDO AREA and booleanfunctions likeWITHIN DISTANCE are included. Moreover some spatial aggregationfunctions such asSDO AGGR CONVEXHULL for computing the convex hull and manyfunctions for handling spatial coordinate systems are provided.

The most important parts for this work are the spatial indexes and operators in-cluded in Oracle Spatial. In particular an operatorSDO RELATE is provided for spatialselections and joins with several different spatial relationships as possible selection orjoin criterion. Combinations of these relationships are also possible. If only the resultof the filter step of a filter-and-refine spatial query (cf. figure 3.2) is required, the op-eratorSDO FILTER can be used. Moreover operatorsSDO NN retrieving a specifiednumber of nearest objects to a given object, andSDO WITHIN DISTANCE retrievingall objects within the given distance to a given object. The execution of these spa-

104 Physical Design in Object-Relational Database Systems

tial operators is supported by spatial indexes. Oracle Spatial provides a quadtree in-dex with variable tiling level1 as well as an R-Tree index with variable fanout and di-mensionality. Creation of these indexes and thus using spatial operators requires, thatmetadata about the coordinate plane used in the particular column be inserted into viewUSER SDO GEOM METADATA.

In the case studies presented in sections 6.2 and 6.3 the Oracle Spatial extensionwas used to store two-dimensional geometric data. Since they were implemented onformer versions of Oracle, not all the features described previously were available. Thusespecially in the geography domain several functions had to be implemented by the user.Nevertheless in the current version Oracle 9.0.1 a two-dimensional spatial applicationcan benefit greatly form the cartridge provided, even if not all functions described in[Ora01] work correctly in all circumstances.

8.2 User-Defined Extensions

Since one of the most important features of ORDBS is the possibility to extend themby user-defined types, a closer look at the support for defining these extensions is nec-essary. Especially the possibility to include support for efficient query processing onuser-defined types is important. This aspect will be investigated in this section in detail,since it is used for STOSTA as discussed earlier.

8.2.1 Object-Relational Features for Indexing

For the ORDBS Oracle the most important features for extensions were described insection 2.3.2. In summary, a pre-defined interface has to be implemented and registeredwith the database server, which will then call the appropriate routines when executingqueries containing user-defined operators or other commands involving user-defined in-dexes.

The interface is defined in the database programming language PL/SQL which mayalso be used for implementation of user-defined indexes. A first study on the implemen-tation of R∗-Trees for spatial data in Oracle 8i , that was done during the development ofthis work, has shown (see [Klo99] for details about the implementation and some perfor-mance results) that this leads to unacceptable query response times. Overall the resultsshowed that using an R∗-Tree index has benefits over not using an index at all, but thatperformance is not acceptable if the index is implemented in PL/SQL. Especially indexcreation showed an extremely bad performance. This is due to the fact that this languageis not designed to be an efficient language for complex computations and in particularfor functions that need to pass large arguments around very often. It is rather designed toprovide easy database access, but this is not so important during index creation or indexusage. Consequently user-defined indexes should be implemented in a more efficientlanguage such as C or C++, where functions can still use the low-level call interface to

1The tiling level can also be chosen variable depending on object density.

8.2 User-Defined Extensions 105

interact efficiently with the database. These functions can then be integrated easily intothe indexing interface by the external implementation features provided by Oracle 9i .This is the approach that will be followed in the next sections.

8.2.2 Generic Index Structures

Since user-defined data types in ORDBS may be of any structure and contents and more-over have specialized operators working on them, no fixed indexing structure will helpfor all types. On the other hand, the development of an index structure for a data typeis a very complex task, that may not be feasible to solve for every single data type.Thus indexing structures are required, that can reuse the main parts of indexing methodsfor different concrete index implementations, for not only but including data types inSTOSTA. In other words: extensible indexing methods are required that can be adaptedto different data types with minimum possible type-specific work. In this work the gen-eralized search tree approach ([HNP95b, Aok98]) will be used that solves exactly thisproblem. It is briefly described in the following paragraph. But to be able to use anextensible indexing structure together with an ORDBS an integration component is alsorequired. This integration component has to solve all problems on the interface betweendatabase system and indexing structure, such as mapping data types and query predi-cates as well as storage of the index structure within the database, to achieve transactioncontrol and backup and recovery features for the index. Such an integration componentfor GiST and Oracle 9i has also been developed in conjunction with this work ([L¨oc01])and is also described briefly below. A similar high-level component has been describedin [CCF+99]. But in that work only the database part has been considered and no newindex structures were integrated. The approach was to map user-defined types to a one-dimensional domain that could then be indexed by a standard DBS index. Even thoughno new structures were added2, the results reported in [CCF+99] show that this wayof extending ORDBS for user-defined types is a promising one and may lead to moreefficient query performance.

Generalized Search Trees

Generalized search trees (GiST; [HNP95b]) are search trees that combine and implementthe common features of tree-based access methods such as insertion and searching butstill allow to fit the classical operations to a particular data type and indexing structure.This is achieved by an implementation using certain extensible methods in the generalalgorithms that have afterwards to be implemented by the user for the particular datatype and indexing structure desired by object-oriented specialization. An overview ofclass structures for some well-known tree types is shown in figure 8.1.

A GiST is a balanced search tree having variable fanout betweenk M and M with2M ≤ k ≤ 1

2 with exception of the root whose fanout is between 2 andM. The constantk

2This addition seems to be possible but at a much higher cost as if using the integration component ofan extensible index.


GiST

#insert(e:GiSTEntry,level:int)#chooseSubtree(e:GiSTEntry,level:int)#split(n:GiSTNode,e:GiSTEntry)#adjustKeys(n:GiSTNode)#delete(e:GiSTEntry)

OrderedGiST

#findMin(q:predicate)#next(q:predicate,e:GiSTEntry)

UnorderedGiST

#search(q:predicate)

BTreeGiST

#consistent(e:GiSTEntry,q:predicate)#union(l:ListOfGiSTEntry)#penalty(e1:GiSTEntry,e2:GiSTEntry)#pickSplit(l:ListOfGiSTEntry)#compare(e1:GiSTEntry,e2:GiSTEntry)

RTreeGiST

#consistent(e:GiSTEntry,q:predicate)#union(l:ListOfGiSTEntry)#penalty(e1:GiSTEntry,e2:GiSTEntry)#pickSplit(l:ListOfGiSTEntry)

RStarTreeGiST

#penalty(e1:GiSTEntry,e2:GiSTEntry)#pickSplit(l:ListOfGiSTEntry)

SSTreeGiST

#penalty(e1:GiSTEntry,e2:GiSTEntry)

RSSTreeGiST

#penalty(e1:GiSTEntry,e2:GiSTEntry)

Figure 8.1:Class hierarchy for trees implemented in GiST framework

is the minimum fill factor of the nodes of the tree. Interior nodes of the tree containpred-icate andpointer pairs where thepredicate describes the search information andpointerpoints to a node on the next level. Thepredicate should describe all objects reachable bypointer. Leaf nodes of the tree are similar with the difference, that thepointer points toa tuple of the database relation to be indexed. Details about parameterized definition ofthe classical tree methodssearch, insert anddelete can be found in the original article([HNP95b]).

In order to define a concrete instantiation of an abstract GiST, one needs to de-fine a specialization class (such asBTreeGiST in figure 8.1) for the particular GiSTextension by supplying (at least) the following methods:consistent(E, q), union(P),penalty(E1, E2) andpickSplit(P). The terminology for the method arguments followsthe terminology in the original article meaningP represents a set of node entries,E rep-resents one single node entry andq stands for an arbitrary supported query predicate. Asearlier research in spatial databases has shown ([GG98],[WHL98]), the original GiSTshould be extended to support forced reinserts as documented by the often observed su-periority of the R∗-Tree over the regular R-Tree for pure spatial access methods. Thedesirability of forced reinsert as an extension was already mentioned in the original GiST


article and was used for ourSTT-GiST.By methodconsistent(E, q) the search through a tree is implemented. It returns true,

if results for queryq may be found by followingE.pointer. Observe that for correctnessthe answer only has to be sufficient not necessary, i. e. ifconsistent returns true, itis not guaranteed to find results viaE.pointer. The methodunion(P) returns a searchpredicate for all entries ofP. Finally penalty(E1, E2) andpickSplit(P) implement thedesired insertion methods. Bypenalty a measure is returned representing the worseningof the index, if entryE2 would be inserted inE1. On every level the node with the leastpenalty is chosen for insertion. If the insertion position has been found and this nodewould overflow by the insertion, a method to distribute the entries of the node into twonew nodes, that are then inserted into the tree instead of the overflown node, must beimplemented inpickSplit. The definition of these methods for R-Trees and the spatialdomain can also be found in the original article. The properties of RSS-Trees which area combination of R∗- and SS-Trees will be explained in detail on pages 8.2.3 to 8.2.3.

The concept of generalized search trees has also been extended to facilitate otherquery types than selections. This is important for join queries or aggregations such asnearest neighbor queries. Details about this extension can be found in [Aok98].

Generalized Search Trees in Oracle

A file-based implementation of generalized search trees calledlibgist has been madeavailable underhttp://gist.cs.berkeley.edu. It is written in C++ and pro-vides the general tree along with several extensions for most of the standard trees pro-posed in the literature such as B-Tree, R∗-Tree and SS-Tree. It was published in theliterature in [Kor99] and a visualization of the index structures was also reported in[KSH98]. Since it operates on index files, it cannot be used directly in conjunction witha commercial ORDBS, where index files should be under control of the DBS as well.Moreover the interfaces need to be linked together in order to be able to use a GiST spe-cialization as index inside the ORDBS. In addition, ORDBS users want to be able to usethe given and other GiST-based index structures for their own user-defined types. Thisrequires a mapping of database types and operators to GiST entry objects and predicates.

To simplify this connection process a connector component between Oracle 9i andlibgisthas been implemented, calledOraGiST. An overview of features and func-tionality of OraGiST is given in figure 8.2. Details about the implementation can befound in [Loc01]. The toolOraGiST mainly consists of two components. The firstcomponent (OraGiST library) is independent of the particular data type and user. Itprovides functionality for calling the appropriate GiST methods inside functions of theOracle 9i extensible indexing interface (cf. section 2.3.2) on one hand. Moreover itfacilitates the storing of index information of alibgist index tree inside an Oracledatabase. These are the upper two connections in figure 8.2.

The second component ofOraGiST (toolbox) is data type and thus GiST special-ization dependent. Consequently it cannot be a complete generic implementation. Itis rather a support tool providing the user with method prototypes, and taking over allgeneric parts of this part of the index structure development process. In particular the


GiST

GiSTExtension

GiSTEntry

1

n

GiSTIndexFile

DBSExtensibleIndexing

UserDefinedIndexStructure

DBSUserDefinedObject

1

n

DBSIndexTable

Oracle ORDBS

1

1

libgist library

1

1

OraGiST

OraGiST Library

OraGiST Toolbox

initiates calls

is stored in

1n

OraGiSTExtension

+getExtension()+getQuery()+needRefine()

TypeMap

+approximate()

Figure 8.2:Architecture and Functionality ofOraGiST

user has to do only three things by himself. Firstly, defining which GiST specializationis to be used for a particular ORDBS user-defined data type. Secondly, implementingthe mapping between ORDBS data type and index entry; this is further simplified bygenerating a programming language (C++) interface for the Oracle data type. Sincesometimes not the exact objects but rather approximations are inserted into an index,this method is calledapproximate (cf. MBRs in spatial indexes, section 3.3). In thatcase the tool implements the filter-and-refine strategy of query execution (cf. section 3.2)by means of the methodneedVerify. Moreover the mapping of Oracle operators tolibgist methods has to be declared in the methodgetQuery by the developer.

Due to its framework character not only the provided GiST specializations, but alsoother indexes based on GiST are supported byOraGiST. This will be illustrated by theperformance experiments described in the remainder of this chapter, where the tool wasused for standard (R∗-Tree, B∗-Tree) as well as newly developed (RSS-Tree) GiST spe-cializations. Results will show that this approach of extensible indexing is very efficient.This is due to the C/C++ implementation of the index. Results will also show that indexperformance degrades when object approximations are used, which is caused by the in-creasing call overhead for the refinement step. The overhead occurs since the Oracle 9iindexing interface operates on tuples leading to one function call for each candidate.

8.2.3 Example: STO-GiST - A Spatio-Temporal Index

We will now show how the GiST approach can be used to design an access method forspatio-temporal objects. A new GiST extension has to be developed for this domain; it


will later be used for physical design of different data types.As stated in earlier parts already we focus on discretely changing geometries when

talking about spatio-temporal objects. Indexing of continuously moving objects hasbeen researched in other publications (see e. g. [AAE00, KGT99b, NDF99, NNS99,SJLL00]), but no single best method has yet been determined. Only few publicationson discretely changing geometries can be found; [TVM98] uses multiple overlappingtrees for indexing such objects. This approach becomes increasingly ineffective withincreasing frequency of change of the geometries. Thus we prefer to use a single treefor all objects; this approach was also preferred in [SJ99].

We present four different tree-based access structures for indexing two-dimensionalextended objects that change their geometry over time. For the applications at hand therewas no need to include a transaction time dimension. Thus we only have to model onetemporal dimension (valid-time). Transaction time can be added by introducing anotherdimension similar to the design of bitemporal data types in section 7.4.2. We define thequeries supported by our indexing method, which include spatio-temporal, purely spatialand purely temporal queries, and extend the penalty metrics of the well-known R*-Tree([BKSS90]) and SS-Tree ([WJ96]) for spatial data to be able to handle spatio-temporaldata. Our new tree structures combine advantages of the two tree types for spatial data.

In [TSPM98] spatio-temporal data and access structures were characterized by sevencriteria to identify certain groups. Based on this classification our methods fall into thefollowing categories:

Data types supported: regionsTemporal support: valid timeDatabase mobility: full-dynamicHandling obsolete entries:NOObject representation: YES (MBR)Temporal treatment: static (dynamic)Query support: YES (temporal, spatial and spatio-temporal)

In terms of temporal treatment we used only static data structures but the methods shouldin general work with dynamic updating as well. Queries supported by our access meth-ods include pure spatial, pure temporal as well as spatio-temporal queries. In each ofthese categories the spatially and temporally known predicates foroverlaps, containsand equals (cf. section 7.4.1) can be computed efficiently using the developed indexstructures.

Supported Query Predicates

The objects to be stored in the spatio-temporal tree (STT) are represented by their spatialbounding box and their valid time interval. Formally we will be storing 6-tuples in thetree of the formSTO = (xll, yll, xur , yur , tstar t , tend) representing a spatio-temporalobject from a STOSTA with(xll, yll) as lower left corner of the (spatial) bounding box,


(xur , yur) as upper right corner of the (spatial) bounding box and valid time fromtstar t

uptotend .Now we have to declare what predicates will be supported by theSTT. The goal

should be to efficiently support both real spatio-temporal predicates as well as purespatial and pure temporal predicates. In this work we restrict ourselves to the mostimportant predicates required by our applications; these areoverlap(STO1, STO2), con-tains(STO1, STO2), contained(STO1, STO2) andequal(STO1, STO2)3. The pure spatialand pure temporal versions are also supported. It should be stressed that all these cat-egories are supported equally well in the sense, that we do not favor either spatial ortemporal component (as would be the case for instance in the RT-Tree, [XHL90], wherethe spatial attribute dominates the temporal).

The semantics of the predicates should be obvious from their names: predicatecon-tains(STO1, STO2) meansSTO1 containsSTO2 andcontained(STO1, STO2) similarlySTO1 is contained inSTO2. This follows the definition in section 7.4.3 where spatio-temporal operators are defined as conjunction of the corresponding spatial and temporaloperators. A couple of other predicates can be derived from these by the user, otherscould be easily incorporated by analogy. But representative predicates for each of thetemporal as well as spatial and spatio-temporal groups identified by [ST99] are included,so that validity of the results should be sufficiently general.

A search key in the tree indexing the spatio-temporal objects corresponds to thepredicatecontains() as described in the original GiST article for spatial objects. Herethe key domain consists of the two spatial dimensions and one temporal dimension. Onpage 112 we will develop different access structures for interpreting these dimensions asstandardthree-dimensionalspace or astwo-plus-one-dimensionalspace. The GiST ap-proach enables us to reuse all methods for both approaches, except for the part definingthe penalty metric for the insertion policy. These have to be generalized for spatio-temporal objects in order to be able to use the termsarea, overlap, margin anddistanceappropriately. For the generalization of these terms see page 112.

Embedding of R∗-Trees in GiST

The methodconsistent(E, q), which guides search through the tree to answer a queryq(which can be any of the 12 predicates discussed in the previous section) for a spatio-temporal objectSTOq , returns a boolean value which tells, if there might be answers toqbelow the current entryE . For interior, non-leaf nodes we useoverlap for a contained-or overlap-query andcontains for a contains- or equal-query. The correctness of thisdefinition is obvious; the same principle was used for the R-Tree extension of GiST inthe spatial domain ([HNP95a]). For pure spatial and pure temporal queries we use thespatial and temporal versions of the predicates, respectively. If a nodeE is on the leaflevel we always apply the search predicateq directly.

3We use these as shorthands for the spatio-temporal operators such asvtBetweenOverlap from sec-tion 7.4.3 in this paragraph.


The methodunion(P) can be easily defined to return a spatio-temporal object con-sisting of the spatial bounding rectangle and the temporal bounding interval for all en-tries in P. The computation can be done by a simplemin andmax scan through allentriesEi of P for all dimensions.

Finally for the pickSplit(P)-method we proceed as follows (similar to R∗-Trees):along each of the possible split axes( j ∈ x, y, t) (wherex andy denote the two spatialand t the temporal dimension) we sort the entries ofP by lower value (ties by uppervalue) along that axis; then for every possible partition{E1, . . . , Ei}, {Ei+1, . . . , EM+1}of the M + 1 entries (M is the maximum,m the minimum number of entries per node)of P we compute

marginMeasure(P, j) = ∑M+1−mi = m margin(E1 ∪ . . . ∪ Ei )

+margin(Ei+1 ∪ . . . ∪ EM+1).

We use the axis with minimal value formarginMeasure as split axis. Along that splitaxis we compute theoverlapMeasure for every partition and choose the partition withminimaloverlapMeasure. In case of ties we use minimalsizeMeasure to break the tie.These measures are defined for each of theM +1−2m partitionsP (i) (i = m, . . . , M +1 − m) by:

overlapMeasure(P(i)) = size(STO(E1 ∪ . . . ∪ Ei )

∩ STO(Ei+1 ∪ . . . ∪ EM+1))

sizeMeasure(P(i)) = size(STO(E1 ∪ . . . ∪ Ei ))

+size(STO(Ei+1 ∪ . . . ∪ EM+1))

For the definition ofsize see page 112. The distribution with minimal measure as de-scribed above induces the split of the entries which meansE1, . . . , Ei remain in thenode and the entriesEi+1, . . . , EM+1 are deleted and then reinserted into the tree.

The penalty-method based on size

A penalty metric determines the insertion policy (to which index entry should a newobject be added). Penalties are computed for an existing index entryEi and a newobject with keykey to be inserted4. The minimum penalty over all entries in a nodeis computed and the corresponding subtree is chosen for insertion. Usingsize insteadof area for spatio-temporal objects (area tends to lead to a wrong picture in a spatio-temporal domain) we define the penalty metric exactly as proposed in [BKSS90]: it usesleast size enlargement for nodes on higher tree levels and least overlap enlargement fornodes directly above the leaves, where

overlap(E) =∑

Ei entry of E’s node

size(STO(E) ∩ STO(Ei )).

4[BKSS90] and [HNP95a] differ slightly in notation; we use the signature as in [BKSS90]


Thus we obtain:

penalty(E, key) ={

size(E ∪ key) − size(E) , if E ′s children are interior

overlap(E ∪ key) − overlap(E) , if E ′s children are exterior

In the sequel we will call this penaltysize-basedsince it uses least size enlargement onthe higher tree levels. By using these defintions we obtain the canonical extension of theR∗-Tree to three dimensions.

The penalty-method based on distance

The penalty metric in [WJ96] used the Euclidean distance between the centroid (thebounding predicates are spheres in the SS-Tree) of an entry and the centroid of thenewly inserted key. We can define this metric similarly as:

penalty(E, key) = dist(STO(E), STO(key))

The definition of distance (dist) is given in the next paragraph; thus a new object isinserted into the entry where the entries midpoint is closest (in the Euclidean distancesense) to the midpoint of the object to be inserted. This penalty is applied on all levelsof the tree. The tree family obtained by using this penalty definition will be called RSS-Tree later in this book.

Generalization of Area, Margin, Overlap and Distance for STO

The four different access methods presented depend on the spatial termsarea, margin,overlap anddistance which need to be appropriately defined in spatio-temporal space.We suggest to consider athree-dimensionaland atwo-plus-one-dimensionalgeneral-ization; we denote the former by index 3d and the latter by index 2+1d. We usuallyspeak ofsize rather thanarea in the following to indicate the change from spatial tospatio-temporal space.

Three-Dimensional Generalization

For the three-dimensional case we simply extend the common measures from two tothree dimensions. We obtain:

size3d(STO) = (xur − xll) · (yur − yll) · (tend − tstar t)

overlap3d(STO1, STO2) = size3d(STO1 ∩ STO2)

margin3d(STO) = 4 · ((xur − xll) + (yur − yll) + (tend − tstar t))

distance3d(STO1, STO2) =√ ∑

j∈{x,y,t}

(dimDist(STO1, STO2, j)2

)

8.3 Physical Model and Index Structures for Selections 113

wheredimDist(STO1,STO2, j ) denotes the distance in dimensionj of the midpoints ofSTO1 andSTO2. Using these definitions together with the methods described in theprevious section we obtain access methods, either together with the size-based penaltymetric (calledSTTsize

3d ) or with the distance-based penalty metric (calledSTTdist3d ).

Two-Plus-One-Dimensional Generalization

In other papers on spatio-temporal data (e. g. [TSN99], [TSPM98]) authors have statedthat a representation of time asjust anotherdimension (like a third spatial dimension)is not appropriate. One of the simplest reasons for this is the fact that spatial proximityintroduced by using the time as third dimension (as in the previous paragraph) does notreflect the orthogonal nature of this dimension properly. There are more sophisticatedreasons (e. g. inefficient use of space for moving points) for treating the time completelyseparately. For that reason we generalized the measures in atwo-plus-one-dimensionalway also. For this we propose the following definitions:

size2+1d(STO) = (xur − xll) · (yur − yll) + (tend − tstar t)2

overlap2+1d(STO1, STO2) = length(temporal(STO1) ∩ temporal(STO2))2

+area(spatial(STO1) ∩ spatial(STO2))

margin2+1d(STO) = border(spatial(STO)) + 2 · length(temporal(STO))

distance2+1d(STO1, STO2) = (dimDist(STO1, STO2, x)2

+dimDist(STO1, STO2, y)2)12

+dimDist(STO1, STO2, t)

In these definitions spatial(STO) denotes the spatial and temporal(STO) the temporalprojection of a spatio-temporal object. We assume as in the three-dimensional case thateither (from a theoretical point of view) the time dimension has the same range as thetwo spatial dimensions or (as in the implementation view) the measures in pure-spatialand pure-temporal dimensions have to be appropriately scaled before being combinedwith one another. One could e. g. scale all values from the domain to a fixed range bya simple multiplication. The fact if the extent in every dimension is known in advance,has an influence on the performance of the queries as we will see in section 8.3.3.

Similar to the previous section we obtain theSTTsize2+1d-Tree by using these general-

izations together with the size penalty metric and finally theSTTdist2+1d -Tree using them

together with the distance penalty metric.

8.3 Physical Model and Index Structures for Selections

As described in chapter 2 for general applications two basic database operations maybe subject to optimization by indexes: selection and join operations. This is also truefor the case of STOSTA. In particular, a selection operation retrieves all rows of a table,for which a given selection formulaϕ holds. The classical way of evaluating selec-


tions is to retrieve all rows of the table, one page at a time, and then check, ifϕ istrue for the current rows. This way of processing may be suboptimal, since all rowsof a table have to be fetched, which takes a long time if rows are large. Moreover theformula has to be evaluated on each row, which may take a long time since this compu-tation may be complex. This is especially true for the spatial domain or selections onother multidimensional domains. Thus in the next sections efficient index structures fordomain-specific selection queries in ORDBS will be developed. Special focus is put onthe extensible indexing capabilities. Efficient multidimensional indexing for selectionqueries in traditional relational databases is described e. g. in [BBKM00]. The supportof join queries in multidimensional domains is briefly described in section 8.4 but is notthe focus of this work. Joins are a different topic, since such queries retrieve all pairsof rows whose attributes satisfy a certain domain-specific comparison operatorθ ; withthe advent of complex data types in ORDBS these operators have changed and thus joinexecution has to be revisited in more detail in future work.

8.3.1 Temporal Data

For pure temporal data types in STOSTA the logical model developed in section 7.4.2needs to be mapped to an implementable physical model at first; parts of the describedmapping can also be found in [KL02b]. After a physical model has been developed,the design of that model can be optimized by adding appropriate indexes and, in theobject-relational case, index structures to optimally support queries on these data types.

Mapping logical model to implementable physical model

To implement the logical model from section 7.4.2 in a presently available commercialDBS adjustments have to be made, since not all object-relational features used in thelogical model are already available (and will probably not be in the near future). InOracle 8i and 9i which we have used as an example it is not allowed to arbitrarilynest collections for instance; also set-valued attributes are not directly supported. Sincethe developed logical model ofvt integer nests a list- and a set-valued attribute, analternative physical implementation has to be chosen. Figure 8.3 shows an ideal physicalimplementation of the logical model.

Oracle currently5 supports two types of collections: collections of a fixed maximumlength are supported by array-likeVARRAYs, which in addition allow direct access totheir elements which are stored in-line. For collections of in advance unknown lengthnested table types are provided. They are not stored within the original table but ratheras one separate table for the whole base table. Access to elements is only allowed via thebase table. Since forvt integer neither the number of values of a particular temporalcolumn nor the number of intervals in a temporal element can be bounded in advanceboth would have to be realized by nested table types. As mentioned above the requirednesting of nested tables is not allowed.

5as of version 9.0.1


salary

3000 {[15,30)}

4000

[50,75)}

Miller {[15,20),

[50,70)}

Scott {[20,50)}

name

{[30,45),

5000 {[45,50)}

6000 {[10,40),Allen {[10,70)}

1

emp id

2

[50,70)}

Figure 8.3:Ideal physical model for table usingvt integer andvt string

The problem of nested collections can be solved by replacing one of the collec-tions with storing redundant information by multiplying tuples. The first variant mapsthe outer collection, which is a list of numerical values with their temporal elements at-tached, to storing one tuple per list element. In formal notation6 a tuple attribute value ofthe form(〈a, {[cvt1, cvt2]}〉) would be represented by multiple tuples with attribute valuesof the form(a, {[cvt1, cvt2]}). All other attributes of the base tuple have to be replicatedinto each of the different simpler tuples (which would be of typevt integer basein that case). By this method a lot of redundancy is introduced which grows propor-tionally to nk if k temporal attributes are present in the table. Moreover the distributionof information over the tuples in the case of multiple temporal attributes is not clear.Queries will not be very efficient since for every query all tuples corresponding to onereal world object have to be retrieved in order to answer the query.

The second alternative which is much more promising and will therefore be furtherevaluated is to replace the inner collection (i. e. temporal elements) by single attributevalues of an interval type. Each temporal attribute is consequently modeled as a nestedtable within the original table containing the tuple attribute value and valid time inter-val. Since only intervals can be associated with attribute values, the latter have to bestored redundantly: once for every interval. This leads to growth of redundancy whichis additive in the number of temporal attributes as opposed to multiplicative growth inthe previous model. Formally one tuple attribute value of the form(〈a, {[cvt1, cvt2]}〉) isrepresented by multiple tuples with attribute values of the form(〈a, [cvt1, cvt2]〉). By us-ing this physical model temporal joins over the same attribute are efficiently supported,since all entries of the nested table for that attribute are stored in a single physical table.In addition and as shown in the next paragraph user-defined index structures can be eas-ily added on these columns providing for efficient query answering, especially when thenested tables are stored inline physically. Along with the timestamped attribute valuesto be found in the nested table as search information, the ROWID of the outer tuple intheemp relation is stored as location information in the index. This storage model is

6We use〈a〉 for lists and{a} for sets.


value valide_name

Miller

Allen

Miller

Scott

[10,20)

[20,50)

[50,70)

[10,70)

salary

2

1

emp id name

value valid

5000

4000

[10,30)

[30,50)

[50,75)

[10,70)

3000

6000

e_sal

Figure 8.4:Physical model for nested collections in Oracle 9i

further illustrated in figure 8.4. Finally it is easily extendible to bitemporal information:the only change is that the nested table contains an additional interval attribute for thetransaction time information. This will obviously further increase redundancy.

In [Bei01] this and two more physical storage models have been evaluated. Theusage of a global timestamp table which is referenced from the nested tables (by theobject-relational feature of references) proved to be very inefficient for several querytypes. Even though this storage model seems to support temporal joins very well at firstglance, the additional overhead incurred for managing the global timestamp table andthe currently very inefficient realization of references in the DBMS exclude this modelfrom further consideration. Also querying becomes very complex syntactically withreferences such that additional user support would be required.

The last physical model evaluated was to use tuple timestamping by introducing oneseparate table for each temporal attribute. This is the model which most previous publi-cations that favored tuple timestamping suggested if temporal attributes were required.Temporal elements can be stored with each table in a nested table since this is the onlynesting required. In [Bei01] it was shown that this model is also not competitive interms of query performance. Even though it is very well suited for clustered storagewhich was in fact used, the evaluation of queries required too many large joins overall the tables corresponding to a logical table or real world object. This could be analternative for tables with at most two temporal attributes, but an extension to bitempo-ral data requires additional considerations and user-defined index structures and queryoptimization is also difficult to apply. Since these are all reasonable storage models inmy opinion, the optimal model illustrated in figure 8.4 and described above will be usedin the sequel with a minor implementation modification: for an easier integration intotheOraGiST environment the nested tables are stored inline with theemp table in theimplementation. This does not significantly affect the performance results, though.


Optimizing the physical design

For temporal data types several different important query types have been identifiedin the literature (cf. section 4.2). Queries can be differentiated by which dimensionsthe desired results are restricted to, and whether they query for arange in a dimensionor a point. For these query types we can use the notational conventions of [TJS98],e. g. ’range//point/range’ means querying for a range in the standard attribute domain, acertain point in valid time and a range in transaction time. In addition, our experimentsshowed that the size of the range in arange query is an important measure. Thereforeexperiments on range queries were performed with several different selectivities. In thesequel the most important results for different temporal queries will be presented withthe objective of finding the optimal index structure forall possible query types. This isimportant since in most cases the DBS will only allow to create a single index for onecolumn regardless of the possible query types. Parts of this performance evaluation havebeen published in [KL02c].

By using the extensibility features of ORDBS like Oracle 9i as described in section2.3 together with the generalized search tree approach described in section 8.2.2, one canoptimize the logical implementation of temporal data types by adding appropriate indexstructures. Since in temporal data types values from different domains or dimensions arecombined into a single column value, the use of index structures from spatial databasesseems straightforward. One can view the standard value as one dimension and the valid(as well as the transaction) time information as another dimension7. Thus in the casewhere the standard datum is from a one-dimensional domain we obtain two- or three-dimensional domains for temporal data types which may be indexed by spatial indexes.Since it cannot be guaranteed that the standard datum remains one-dimensional (thinkof spatio-temporal data and see section 8.3.3 for instance), it is recommended not touse the two-dimensional (or three at best) spatial indexes shipped with current object-relational databases, but rather to use the extensible indexes from section 8.2.2 whichmay be adapted to as many dimensions as required.

Novel in our approach is that we use attribute timestamping here and implement theindex in an extensible framework (GiST). A similar approach for tuple timestampingwithout using the GiST framework has been described for Informix in [BSSJ99] and[YYW00]. Similarly no extensible framework was used for other temporal index struc-tures such as RI-Tree ([KPS00]), GR-Tree ([BJSS98]) and 4R-Tree ([KTF98]). As forany index structure used for query optimization, the optimal index depends on the querytypes to be supported by the index. The previously published indexes only use the tem-poral information but not the thematic information. Thus only temporal queries will beefficiently supported. Thematic and combined queries will perform poorly

7Special properties of the transaction time dimension may lead to a special treatment in indexing thisdimension. This topic is researched in several temporal database publications. Since no consolidatedstatus has been reached, these special properties are neglected here. They should nevertheless be simpleto add due to the extensibility features of the index structures used.


Physical Implementation of VT INTEGER

For the temporal data typeVT INTEGER as introduced in section 7.4.2 one obtains thedimensions of integer values as well as the valid time dimension. After scaling one ofthe dimensions appropriately we can interpret these values as two-dimensional inter-vals in the integer-time-plane. E. g. the temporal integer value (3000,[10,30)) (of typevt integer base8) is treated like the 2-dimensional interval [(3000,10),(3000,30)].A combined Oracle standard index on different attributes is not suitable here, since itis not immediately obvious how to combine attributes of different dimensionality andalso performance will degrade significantly with increasing complexity of data to be in-dexed. Moreover for user-defined indexes there is currently the technical restriction thatthey can only be built on a single attribute. Thus the following physical implementationsfor VT INTEGER in STOSTA are suitable:

1. No index at all

2. Built-in B∗-Tree index on integer values

3. User-defined B∗-Tree index on integer values

4. Built-in B∗-Tree index on beginning of valid time interval

5. User-defined 2D R∗-Tree index on temporal interval

6. User-defined 2D R∗-Tree index on two-dimensional intervals

7. User-defined 2D RSS-Tree index on two-dimensional intervals

The tests were conducted with data of three different sizes (between 84735 and400000 objects) and structure of contents of the data (two datasets were generated syn-thetically by methods adapted from [Bei01], the other one was generated using the SPY-TIME-benchmark methods ([SZ01]); only some representative results of all experimentsare reported in this work. We focused on the operatorvtBetweenOverlap as defined insection 7.4.2; i. e. queries retrieve all rows of the base table where the integer value is ina particular range and this value’s validity overlaps a given time interval. More formallyit is a selectionσφ(R) whereφ = ((value between v1 and v2) and (time overlaps int)) forgiven integer query parametersv1, v2 and an intervalint. These queries may for instancebe used to answer questions likeFind all employees and their boss who earned between5000 and 6000 in the 1970s.Note that by usingv1 = v2 or a degenerated interval pointsin each dimension may also be queried for; thus all query types from [TJS98] may beexpressed by using this operator.

8Thebase version of the type was used in the tests for technical reasons; in the same manner thebasecomponents could be inserted into the index one at a time, if data of typevt integer was used. Usingthe MBR of avt integer value would introduce too much dead space and definitely be inefficient.On the other hand this sacrifices the implementation independency ofvt integer as described for theconceptual level in section 7.4.2.


0

2

4

6

8

10

12

14

0 5 10 15 20

resp

onse

tim

e

selectivity

Without IndexOracle B-Tree (salary)

B-Tree-GiST (salary)Oracle B-Tree (start validtime)

2D-R*-GiST (validtime)2D-R*-GiST

2D-RSS-GiST

Figure 8.5:Preselection of suitable of all possible indexes onVT INTEGER

The experiments showed that methods 1 and 2 were inacceptable for almost all querytypes (cf. figure 8.5). Therefore their results are omitted in the sequel. Also method 4requires a complex recoding of queries to be able to use the index. A use of the user-defined operator is not possible in this case. Since a complex recoding of queries shouldnot be necessary and moreover since this method does not work for more complex user-defined types, this method is also not considered any further. Its average results vary butthe overhead for writing different queries does not outweigh this.

For the data typeVT INTEGER and spatial indexes the refinement step is not re-quired for query processing, since the objects stored in the index are rectangles and thusnot approximated. Consequently all results returned by the filter step from the index arealso query results already. For the spatial domain or other more complex shapes a re-finement step is required as explained in section 3.3. For non-approximated shapes suchasVT INTEGER user-defined selectivity estimation is not required, since index basedevaluation is faster than using no index in any case.

Figures 8.6 and 8.7 show the results forrange//rangequeries. In figure 8.6 the queryintervals have the same size in both value and time dimension, whereas in figure 8.7 thetime range is about three times as big as the value range. In both cases the 2D RSS-Tree index outperforms all other indexes for all selectivities. As expected the R∗-Treeon the valid time dimension performs worse when querying larger time ranges, since itsevaluation does not significantly reduce the number of rows that have to be investigated,whereas the B-Tree index on salary benefits from smaller salary selectivity. All responsetimes grow logarithmically with query selectivity as could be expected by an estimationof theoretical index performance of tree indexes (details will be explained for spatialindexes in section 8.3.2). The idea of using two-dimensional indexes significantly im-


0

1

2

3

4

5

6

7

0 5 10 15 20

resp

onse

tim

e

selectivity

2D-R*-GiST2D-RSS-GiST

B-Tree-GiST (salary)2D-R*-GiST (valid time)

Figure 8.6:Response Time for R//R query (equal ranges, small dataset)

0

1

2

3

4

5

6

0 5 10 15 20

resp

onse

tim

e

selectivity



Figure 8.7:Response Time for R//R query (large time range, medium dataset)

proves query performance, especially for the 2D RSS-Tree by factors of between 2 and6. Therefore these indexes are very well-suited forrange//rangequeries.

Figure 8.8 shows the results forrange//pointqueries; these queries select rows whichare valid at a certain point in time only. Consequently the index on valid-time performsbest in this case. The 2D RSS-Tree performs only slightly worse than the R∗-Tree onvalid time, even though containing the unhelpful salary information together with thetime information. The B-Tree index on salary does not help much for this query type(only the range restriction can be used) and thus its response time grows almost linearlywith selectivity. If on the other handpoint//rangequeries are considered, where thesalary value is one fixed value and the valid time remains a range, results are completely


0

1

2

3

4

5

6

7

8

9

0 1 2 3 4 5 6 7

resp

onse

tim

e

selectivity



Figure 8.8:Response Time for R//P query (medium dataset)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 0.2 0.4 0.6 0.8 1 1.2 1.4

resp

onse

tim

e

selectivity


B-Tree-GiST (salary)

Figure 8.9:Response Time for P//R query (medium dataset)

reversed (cf. figure 8.9). The R∗-Tree on valid time does not help much and its perfor-mance is so bad that it is omitted in the figure to be able to distinguish between resultsof the other indexes. The B-Tree index on salary becomes the best index with increasingselectivity and its response time remains almost constant (retrieving all rows from theindex with one particular salary value remains the same independent of the time range orselectivity). The two-dimensional indexes perform comparably with the 2D-RSS-Treeagain being the more efficient index. Finally, in figure 8.10 the restriction on the salaryvalue was omitted and pure temporal queries were posed (i. e. the selection formulabecomesφ = time overlaps int). Again the B-Tree index on salary naturally performsworst. A little bit surprising is the good performance of the 2D-RSS-Tree which per-


0

5

10

15

20

25

0 5 10 15 20

resp

onse

tim

e

selectivity



Figure 8.10:Response Time for *//PR query (large dataset)

forms almost exactly as good as the specialized R∗-Tree on valid time. This underlinesthe overall very good performance of the 2D-RSS-Tree. Even though for certain special-ized query types other indexes are slightly better, the 2D-RSS-Tree was the best indexoverall. Usually users will expect to create one index for one attribute9 and get a robustindex that helps with most queries that can be expected. Thus the 2D-RSS-Tree shouldbe used, since it provides a good performance over all possible query types for the datatypeVT INTEGER.

The usage of the two-dimensional spatial indexes that are included in the ORDBSOracle 9i is also possible at the expense of a more complicated translation of logicalinto physical model. This is due to the fact that Oracle geometry types are usable formany different and possibly very complex shapes; since only intervals are required forVT INTEGER a significant overhead is incurred. The extensibility features are not avail-able in those indexes also; an extension to bitemporal data requires a completely newdevelopment whereas this is easily achieved with the GiST approach as described in thefollowing paragraph. Oracle indexes are an option that may be used for spatial data andare evaluated in section 8.3.2.

In summary we can say that data of typeVT INTEGER that is used in conjunctionwith the different query types expressible by the operatorvtBetweenOverlap10 as dis-cussed above should be indexed by a GiST-based 2D-RSS-Tree on the physical level.This index structure provides very efficient support for the different query types and isalso easily extendible for more complex data types.

9In several commercial DBS only one index is allowed on a single column.10Temporal, thematic and combined queries are possible.


0

2

4

6

8

10

12

14

0 5 10 15 20

resp

onse

tim

e

selectivity

Without IndexB-Tree-GiST (salary)

2D-R*-GiST (bitemporal)2D-R*-GiST (validtime)


Figure 8.11:Preselection of suitable of all possible indexes onBT INTEGER

Physical Implementation of BT INTEGER

Similar to VT INTEGER the bitemporal data typeBT INTEGER in STOSTA as intro-duced in section 7.4.2 can be interpreted as multidimensional data. Different from thepreceding paragraph one obtains rectangles in three-dimensional space this time. There-fore different indexing options are available, the following options are considered here:

1. No index at all

2. User-defined B-Tree index on integer values

3. User-defined 2D R∗-Tree index on pure temporal rectangles

4. User-defined 2D R∗-Tree index on valid time intervals

5. User-defined 3D R∗-Tree index on three-dimensional rectangles

6. User-defined 3D RSS-Tree index on three-dimensional rectangles

Other options that already performed poorly onVT INTEGER are not consideredany further in this paragraph. Note also that other Oracle indexes could not be usedfor this data type, since no indexes for user-defined types are provided in an ORDBS.There would be the possibility to use two-dimensional spatial indexes for options 3 and4 but this would require huge implementation overhead, since the internal structure ofthe user-defined type would have to be changed to useSDO GEOMETRY internally.Moreover the first option (cf. figure 8.11) is omitted from all diagrams since it is clearlythe slowest option except for queries with selectivities close to 100% where, by using


0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14

resp

onse

tim

e

selectivity

3D-R*-GiST2D-R*-GiST (bitemporal)


Figure 8.12:Response Time for R//R/R query (equal ranges, medium dataset)

0

2

4

6

8

10

12

0 0.5 1 1.5 2 2.5

resp

onse

tim

e

selectivity


2D-R*-GiST (bitemporal)B-Tree-GiST (salary)

Figure 8.13:Response Time for R//PR/P R query (large dataset)

proper cost and selectivity estimation, the indexes will not be used anyway in a produc-tion implementation (see section 8.5 also).

The tests were again conducted with data of three different sizes (between 84157 and400000 objects) and structure of contents of the data. The datasets were generated in thesame way as the datasets used forVT INTEGER. We focused on thebtBetweenOverlapoperator which may be used for several different query types depending on the argu-ments given. The queries may restrict to a point or range in each dimension or may notrestrict in a dimension at all (denoted by *). All different combinations were evaluatedbut only some representative results are discussed below.


0

5

10

15

20

0 0.2 0.4 0.6 0.8 1

resp

onse

tim

e

selectivity



Figure 8.14:Response Time for R//P/P query (medium dataset)

0

2

4

6

8

10

0 0.2 0.4 0.6 0.8 1 1.2

resp

onse

tim

e

selectivity



Figure 8.15:Response Time for P//R/R query (small dataset)

Figure 8.12 shows results forrange//range/rangequeries with similar range size in alldimensions. Similar to the preceding section queries of this type are simple selections,where the selection formulaφ is extended by an additional conjunctive clause restrictingthe possible transaction time values. Such a query could be:Find all employees forwhom the database stored in the 1980s that they earned between 5000 and 6000 in the1970s.Since results for 3D R∗-Tree and 3D RSS-Tree were almost the same, sometimesonly one of those indexes is shown. The three-dimensional indexes outperform all otherindexes significantly by factors between 2 and 6. Among the other indexes the B-Treeon salary is better than the temporal indexes where the bitemporal index outperformsthe valid time index. All response times grow logarithmically with query selectivity


0

2

4

6

8

10

0 1 2 3 4 5

resp

onse

tim

e

selectivity


2D-R*-GiST (bitemporal)

Figure 8.16:Response Time for *//PR/P R query (large dataset)

as could be expected by an estimation of theoretical index performance of tree indexes(details are explained for spatial indexes in section 8.3.2).

If varying the temporal query ranges from point to growing ranges the 3D R∗-Tree isthe best index structure again (cf. figure 8.13). The 3D RSS-Tree becomes the sec-ond best index with increasing selectivity but the bitemporal R∗-Tree is also prettygood. Naturally since salary selectivity is large the salary index does not help muchin query processing and thus shows worst performance. Similar results are obtained forrange//point/pointqueries (cf. figure 8.14); 3D R∗-Tree and bitemporal 2D R∗-Tree areby far the most efficient index structures for this query type. The difference is evenlarger for these queries, since the salary range is growing faster than in the previousfigure leading to even more inefficient salary-based indexes.

If point//point/* as well asrange//range/*queries (depending on selectivity) areevaluated by dropping the restriction on transaction time, the three-dimensional indexesclearly outperform all other indexes with a slight edge towards the 3D RSS-Tree thistime. If on the other hand queries ask for only a single integer value and ranges in thetwo temporal dimensions (point//range/rangequery) the pure temporal indexes performextremely bad since they do not support the strongest selectivity restriction on integernumbers. In this case the B-Tree index on integers is the most efficient index as expected,but the three-dimensional indexes also perform pretty good (cf. figure 8.15). They areonly slightly worse than the B-Tree, but have the advantage of their better performanceon other query types. The huge difference of these two indexes to the temporal indexesbasically prohibits using the latter, if ever queries of this type could be used.

For pure bitemporal queries, as depicted in figure 8.16, the 2D R∗-Tree is the bestindex with the 3D R∗-Tree shortly behind. Since the former indexes exactly the twodimensions that are restricted in the query this result is no surprise. The 3D RSS-Treeis worse but still acceptable, whereas the index on salary is so slow that it is omittedin the figure for clarity reasons. All structures show logarithmic increase in response


0

20

40

60

80

100

0 20 40 60 80 100

resp

onse

tim

e

selectivity in %

3D R*-Tree (without Overhead)3D R*-Tree (with Overhead)

No index

Figure 8.17:Overhead for refine step on R//R/R query (large dataset)

time with increasing selectivity (except for the B-Tree whose response time is actuallyalmost constant since always all tuples are returned by the index). The small differencebetween two- and three-dimensional index illustrates the very good overall performanceof the 3D R∗-Tree once again.

This time the use of spatial index structures provided by the DBMS is only possiblefor the temporal component and not for the data type as a whole since currently11 onlytwo-dimensional indexing is available. Those implementations would incur a lot ofoverhead in the physical implementation, since all user-defined types and operationsmust be cast to the Oracle provided spatial type. Also all operators must be translatedwhich is not feasible for practical applications.

Consequently, since usually only one index will be created for a single database col-umn, insummary one of the three-dimensional indexes should be used. These indexeswere among the best or at least close to the best in all different query categories, mak-ing them an efficient multi-purpose index structure; this even holds, if only some of thethree dimensions are restricted in a query. In most comparisons discussed so far the 3DR∗-Tree outperformed the 3D RSS-Tree; this may lead to the conclusion that it is supe-rior. But this depends on the query type again: for pure value queries (point//*/* ) andfor non-transaction time queries (range//range/*) the 3D RSS-Tree is better. In generalthe choice of the index structure thus depends on the expected query types that will beused in the system: if more queries without restriction on one or both of the temporaldimensions are posed, the 3D RSS-Tree would be better. The 3D R∗-Tree would bebetter, if other query types are used more often.

Similarly to VT INTEGER the index entries into the multidimensional index struc-tures are not approximations of the real objects but rather the objects themselves (boxesin the salary-valid time-transaction time space). Thus query results can be directly taken

11as of Oracle 9.0.1


from the index-based filter step and no refinement is necessary. This greatly improvesindex performance since the call overhead associated with the calls for exact operatorscan be omitted (cf. figure 8.17). This will change in the spatial domain in the next sectionwhere the refinement step is required.

For data typeBT INTEGER no user-defined selectivity estimation is required, sincean index-based query execution is always faster than a full table scan. If a differentimplementation where only approximated objects are inserted into the index is used forthis or other types12, figure 8.17 shows that a user-defined selectivity estimation shouldlead to a use of the user-defined index only for queries of selectivities up to around 50%.For higher selectivities a full table-scan should be preferred. These results are similar toresults forVT INTEGER. User-defined selectivity estimation will be further investigatedin section 8.5.1.

8.3.2 Spatial Data

In contrast to temporal data the management of spatial data is supported by currentversions of commercial ORDBS pretty well. This is due to the many GIS applications inneed for commercial strength database support as well as many other spatial applicationsin use on top of database systems. Thus physical implementation of spatial data typescan make use of DBMS provided index structures as well as user-defined ones. Up to thecurrent version of Oracle only two-dimensional spatial data is fully supported though. Tostay flexible enough user-defined index structures have an obvious advantage to that end.Nevertheless this section describes extensive tests of the performance of user-defined aswell as system provided index structures for two-dimensional data.

Cost models for basic selections (section 2.2) have to be extended in the presence ofuser-defined types and operations. For spatial selections the following cost formulas areobtained. A sequential scan of a spatial relation for all objects fulfilling a certain spatialpredicatespat pred (all operations from section 3.1.2 may be used) costs:

costseq = timespat pred(n) + read factor · N

This formula assumesn rows in the relation are stored onN pages. Note that the pro-cessing of rows takes a certain time which in contrast to section 2.2 is not negligible,since it involves potentially costly spatial operators.

In the presence of a R-Tree like structure the cost is:

costR-Tree scan = kR · logn + read factor · KR + timespat pred(kR).

The first term corresponds to retrieving thekR result candidates from the R-Tree inthe filter step, whereas the second term is for fetching the corresponding tuples intomain memory before processing them in the refine step, which is described by the thirdterm. Note that the number of disk pages to be fetchedKR may lie anywhere between

12If e. g. temporal elements are stored instead of intervals this would be the case.


kR

records per page and kR depending on the clustering capabilities of the R-Tree, the

query posed and the particular dataset.If on the other hand a Z-Code based index is used which is assumed to be stored in

a main-memory resident table ordered by Z-Codes the cost is:

costZ-Code scan = z + kZ · logkZ + read factor · KZ + timespat pred(kZ ).

In this formulaz denotes the number of Z-Codes for the particular window queried, thesecond term sorts the potential candidates for duplicate elimination in order to minimizethe result of the filter step. The last two terms are similar to the cost function for R-Trees.Note that the number of candidateskZ to be processed in the retrieval and refinement stepmay be different from the number of candidates for R-TreeskR . This again depends onthe particular query, the dataset used and the parameters used for index creation. Sincean analytical comparison of these cost functions is impossible due to the many differentparameters involved on one hand, and data dependencies on the other hand, the onlyway to obtain comparable results for the different execution plans is by experiments ondifferent datasets. Since data dependencies are present, the datasets should in contrast tomany other publications be real-world datasets and not synthetic datasets. Moreover theyshould be structurally as close to the datasets that will be used in applications later whichis reflected in the following experiments by using real-world datasets from differentsources and region of the earth.

In particular the experiments were conducted for spatial selections with selectionpredicateanyInteract, which is very important in STOSTA, using the following spatialindex structures:

1. Oracle 9i two-dimensional quadtree indexing

2. Oracle 9i two-dimensional R-Tree indexing

3. User-defined 2D-RSS-Tree based on the GiST approach (cf. section 8.2.3)13

4. User-defined Z-Code indexing using MBR approximations and merged cell Z-Codes (cf. rightmost option in figure 3.5)

The different datasets used in the experiments were ATKIS® cartographic-topogra-phic data from different regions of Lower Saxony (two-dimensional point, line and poly-gon data of low to high complexity), elevation information from the German digital land-scape model from different regions of Lower Saxony (three-dimensional point data), anextract from the biotope dataset available over the web as well as certain extracts fromthe TIGER census datasets of Indiana (2D polygonal census block data and 2D linestrings for roads and rivers). Details about the sizes of the datasets and the types of

13The experiments also included the classical R∗-Tree implemented by the GiST approach. It per-formed slightly worse than the RSS-Tree in all tests and has a much longer index creation time due tothe much more sophisticated split algorithm. Thus those results are omitted in all graphs for readabilityreasons.


objects can be found in table 8.1. Selections were carried out with rectangular selectionwindows of different sizes to obtain different selectivities.

Name Size Types Characteristics

North Sea 15293 All Types Mixed geometriesHannover 11184 All Types Mixed geometriesBiotope 20000 Rectangles Small rectanglesElevation 120159 3D Points PointsBlockdata 5451 Polygons Medium size polygonsMajor Roads 10443 Lines Line strings of all sizesRoads 698075 Lines Line strings of all sizesRivers 7942 Lines Larger line strings

Table 8.1:Spatial datasets used for the experiments

If only approximated objects are inserted into an index and thus a refine step isrequired in query execution, this step incurs a large call overhead (cf. figure 8.17 fortemporal data types where the additional computation is trivial and thus only call over-head leads to the worse performance). These calls are executed for each result tupleof the index scan, whereas internal operators can be closely integrated with the system(e. g. use the same memory location for object to be processed). Thus the Oracle in-dexes outperform the user-defined indexes clearly for complete query processing. Asthe experiments will nevertheless show, the filter step can be implemented efficientlywith user-defined methods also. If a more efficient implementation of the refine step bye. g. redesigning theOraGiST implementation to reduce overhead for the callback toPL/SQL step was possible, these indexes could perform much better.

The graphs in figure 8.18 already give a good idea on the general trends of the ex-perimental results. The Oracle quadtree index was the fastest index structure followedshortly by the Oracle R-Tree. The user-defined indexes performed a little worse with theuser-defined RSS-Tree being faster then the user-defined Z-Code. They were still fasterthan using no index up to the depicted selectivity of 20%. The RSS-Tree, as describedearlier, is a regular R∗-Tree modified to use a distance-based penalty metric leading tobetter clustering of objects than the regular R∗-Tree. The selection time without index isalmost constant regardless of selectivity, since a full table scan is required in any case.All other response times are increasing linearly with selectivity showing that the I/Otime is most significant in the cost formulas presented at the beginning of this sectionand the set of candidates is proportional to the result size as expected.

Similar results were obtained for the biotope dataset. The Oracle indexes are againthe fastest and show similar performance as before relative to each other. The maindifference is that the user-defined Z-Code is at least as fast as the user-defined RSS-Treethis time. This is due to the characteristic of the dataset: the relatively small almostpoint-like rectangles can be indexed very efficiently using Z-Codes since their bounding


0

10

20

30

40

50

60

70

0 5 10 15 20

resp

onse

tim

e

selectivity in %

RSS-GiSTOracle R-Tree

Oracle Z-CodesUser-defined Z-Codes

Without Index

Figure 8.18:Response Times for spatial selections on North Sea data

0

5

10

15

20

25

30

35

40

45

50

0 20 40 60 80 100

resp

onse

tim

e

selectivity in %



Without Index

Figure 8.19:Response Times for spatial selections on census block data

box is equal to the object’s shape itself and thus the primary filter has perfect hit ratio.Consequently only as few objects as possible are retrieved by the filter step and thusneed to be tested in the refine step. For the large extended and irregular shaped objectsin the topographic dataset that was not true. Due to the irregular shape many false hitsof the primary filter occur (cf. figure 8.18).

To determine the selectivity up to which an index scan should be favored over a fulltable scan, we can also use figure 8.19 where response times are depicted up to 100% se-lectivity. The ranking among the different indexes is similar to the previous results withthe user-defined Z-Code performing better than the RSS-Tree this time. This is prob-ably due to the regular shape of census blocks: they are almost rectangular. This way


0

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100

resp

onse

tim

e

selectivity in %



Without Index

Figure 8.20:Response Times for spatial selections on major road data

the Z-Code approximation is pretty good and the clustering property of the RSS-Treecannot be exploited as good, since the objects are very large introducing big overlaps inthe tree nodes. The break-even for the full table scan is between 30 and 50% selectivitydepending on which user-defined index is used. Note that the response time withoutindex is still larger for 100% selectivity than with the Oracle indexes. This result whichseems strange at first glance, is easily explained: Oracle uses a functional implementa-tion different from the operator implementation when executing spatial queries withoutOracle spatial indexes.

Almost the same results were obtained on the line datasets (see e. g. figure 8.20).The main differences are that user-defined RSS-Tree and Z-Code are closer together andthat the break-even for using a full table scan is now around 30% selectivity. The factthat this point is at decreasing selectivity for increasing size of the dataset is due to theincreasing effect of call overhead and not a general phenomenon of the index structure.Other than that even on the very large dataset no differences are observed. The OracleZ-Code outperforms all other indexes by far. The fact that it is much faster even thanthe Oracle R-Tree comes from using exact geometries for the Z-Code approximation asopposed to MBRs in the R-Tree. This pays off even more since for line data the deadspace incurred by using an MBR approximation is particularly large.

To separate call overhead for the refinement step and the pure index performancein the filter step, other experiments were carried out involving only the filter step of aspatial selection query. For polygonal data the response times for the filter step only arecompared in figure 8.21. The significant difference from the previous figures is that thistime the user-defined RSS-Tree shows performance which is much closer to the Oracleindexes. The user-defined Z-Code is still worse than all other indexes but in absolutenumbers14 the difference has become much smaller. This is even more true for the user-

14Note that the scale on the response time axis is much different from the previous figures.


defined RSS-Tree which performs almost as good as the Oracle indexes. This showsthat user-defined indexes could benefit a lot overall from an improved implementationof the refine step.

0

1

2

3

4

5

0 20 40 60 80 100

resp

onse

tim

e

selectivity in %


User-defined Z-CodesOracle Z-Codes

Figure 8.21:Response Times for spatial filtering on census block data

The already good performance of the user-defined RSS-Tree on polygons translatessimilarly to line data of different sizes. Time for the filter step is still very small com-pared to the time necessary for the refinement step: only about 2% of the total selectiontime are used for the primary filter. Thus improvements in total performance can onlybe expected by introducing better approximations of objects in the primary filter, whichwould lead to fewer object being passed to the secondary filter. Both R-Tree indexesneed only very few time for the filter step compared to the Z-Code where the filter steptakes about 25% of the total time. The good performance of R-Trees is due to specialcharacteristics of the dataset containing line data: since the Oracle Z-Code index doesnot use the MBR approach, but rather approximates the exact geometry of an object,it obtains much more entries into the index. This leads to much fewer candidates thanthe MBR based R-Trees. Consequently the refinement step which needs most of thetotal time is much faster, since for lines MBR and original geometry differ the most.Therefore we propose a combination of both: first find a better approximation for linesand similar shaped objects by computing multiple rectangles that cover the geometry(e. g. by clipping the object and determining the MBRs of the clipped parts) and theninsert all of those smaller rectangles into the index.

Finally the scalability of the different indexes, which is an important criterion for thefuture use of the indexes, since database sizes can be expected to grow significantly, theperformance of the filter step of the different index structures is analyzed. A comparisonfor similar selectivities on different dataset sizes can be found in figure 8.22. All indexesshow a logarithmic growth of the execution time of the filter step with increasing size ofthe dataset. Thus the filter step can be expected to be very efficient for larger datasetsalso. In the case of the R-Tree the logarithmic behavior reflects the logarithmic term in


0

1

2

3

4

5

6

0 100000 200000 300000 400000 500000 600000 700000

resp

onse

tim

e

size of dataset



Figure 8.22:Scalability of indexes for spatial filtering on different datasets

the formula at the beginning of this section, whereas for the Z-Code it is hidden in thetermz describing the number of Z-Codes for the query window. For fixed Z-Code depthas in the experiments this is logarithmically dependent on the size of the query window,which in turn is proportional in size to the dataset. Thus the scalability behavior observedreflects the behavior expected by the theoretical formulas given in this section.

All experiments in this section were carried out for overlap queries. Also exemplarilysome of the tests were conducted for other selection queries, e. g.inside. The generalresults remain the same for other query predicates as well. Similar observations arereported in [KRA02]. Nevertheless some details changed for other query predicates: theOracle Z-Code index e. g. performed worse than the R-Tree forinside queries for certaindatasets. Maybe a different Z-Code variant could be used for such queries.

As a general conclusion for spatial data one can say that, when only dealing withtwo-dimensional spatial data, one should use the Oracle provided indexes, since theyguarantee the best response times for selection queries. Moreover the traditional Z-Code approach outperforms the Oracle R-Tree in almost all categories. Other insightgained mainly from using the user-defined variants is, that a better approximation ofobjects in the index structure has great potential for improvements in overall index per-formance. Exemplarily the clipping of lines and using MBRs of the clipped segments asapproximation has been tested. Even though user-defined index structures did not out-perform the system provided indexes, several important benefits are obtained by usinguser-defined indexes. Firstly, these structures may be used for user-defined data types aswell, especially when implemented on the basis of extensible indexing as was done fortrees by using the GiST approach. The Oracle index structures only work for the Oracleprovided typeMDSYS.SDO GEOMETRY. In the previous and next section we will showother important data types, whose efficient implementation can be greatly improved byusing R-Tree like indexes. Those can be easily implemented on top of the user-definedindexes used in this section. No Oracle indexes can be used directly for these data types


since the types are user-defined as well.We have shown that we can improve query performance vastly by using user-defined

indexes as compared to having no indexes at all. For 2D spatial data the R-Tree based in-dexes are better than the Z-Code approach and should be used. Many important insightsinto spatial query processing presented in this section were only possible by using self-implemented indexes, where small changes can be made and much more observationsare possible as compared to the black box vendor-provided indexes.

8.3.3 Spatio-Temporal Data

Firstly, results of a preliminary performance evaluation based on a simulation are de-scribed. These tests were not performed inside a database and are only used for a firstcomparison. After that detailed tests inside an ORDBS are presented and evaluated. Wefocused on theoverlap operator, since it is very important for STOSTA and some sampleresults in the simulation indicated similar results will hold for equality and containmentqueries also.

Preliminary Performance Test (Simulation)

Queries of different selectivity using theoverlap operator were used. They were takenfrom spatio-temporal overlap, pure spatial and pure temporal overlap. We counted thenumber of node accesses necessary to answer the query and also the number of entriesthat needed inspection. The former should be a measure proportional to disk I/O-timesince a new page needs to be loaded for every node. The latter should measure CPUtime as all entries are in main memory after a node is loaded. We also monitored totalelapsed time. Our experiments showed that computation time is negligible comparedto I/O time as most articles on this subject assume. Total time was proportional to thenumber of nodes and can therefore also be omitted.

For our tests we used real polygon data from the German topographic-cartographicinformation system (ATKIS) of parts of the Hannover region15 (row 2 in table 8.1)as well as polygons from the biotope-database in [GPSSO+99] (row 3 in table 8.1).These polygons were transformed to spatio-temporal data by an algorithm derived from[TSN99]. Each polygon changed its shape at discrete points in time, which were drawnfrom a uniform distribution over the whole time domain.

For the ATKIS data objects were relatively large polygons and covered the spatial do-main almost completely. They had few spatial overlap, but after the generation of mov-ing regions they had substantial overlap in spatio-temporal space. The biotope data setdiffers structurally from the first in containing relatively small rectangles covering onlya small part of space and having virtually no overlap. Total sizes of the datasets were121676 spatio-temporal objects for the Hannover dataset and 127420 spatio-temporalobjects for the biotopes dataset.

15Source: ATKIS®-DLM25-Daten der LGN-Landesvermessung+Geobasisinformation Niedersachsen


0

500

1000

1500

2000

2500

0 0.2 0.4 0.6 0.8 1

tree

nod

es a

cces

sed

selectivity in %

STT dist 2+1DSTT dist 3D

STT size 2+1DSTT size 3D

Figure 8.23:Results for Spatio-Temporal Queries on Biotopes Dataset

Each spatial object was assumed to exist over the whole time interval with changinggeometry to generate the spatio-temporal objects. We also experimented with fewercoverage of the temporal dimension by each spatial object. The results there were thesame but less significant. Thus to stress our findings we used complete coverage of thetime interval. From an application point of view this is reasonable since areas in the realworld usually do exist for a long time but with changing geometry.

In the first group of experiments we used actual spatio-temporal overlap queries withselectivities between 0.001% and 1%. These values were chosen small compared to tra-ditional spatial access methods, but that seems reasonable for the application, since onespatial object appears with different geometries at different times and might thereforenot be queried as often as in pure spatial databases. In the subsequent sections on purespatial and pure temporal queries we increased the selectivities to values used in pre-viously published research. In figure 8.23 we show the number of nodes accessed toanswer the queries.

In the second group of experiments we tried to evaluate the ability to adapt to purespatial queries. This is also an important measure as an access method should be asgenerally applicable as possible and moreover even spatio-temporal databases mightstart off with pure spatial data, which are gaining temporal diversity over time. Thus theperformance of the access structures for spatial queries is also very important.

Finally we also evaluated the performance of the different tree structures on puretemporal queries. This measure is important since spatio-temporal data may be queriedto obtain the database’s state at a certain timestamp or in a certain interval. This isneeded forvalid-pure-timeslice-queries (Show all geometries in the area in December1956) or for valid-range-queries (Display all streets that existed from 1970 to 1979).Therefore a spatio-temporal access structure should also support pure temporal queries.


0

1000

2000

3000

4000

5000

0 2 4 6 8 10

tree

nod

es a

cces

sed

selectivity in %



Figure 8.24:Results for Spatial Queries on Biotopes Dataset

0

1000

2000

3000

4000

5000

6000

7000

8000

0 2 4 6 8 10 12 14 16 18

tree

nod

es a

cces

sed

selectivity in %



Figure 8.25:Results for Temporal Queries on Hannover Dataset

Analysis of Results

In general theSTTdist3d - and theSTTdist

2+1d -Trees proved to be the most efficient accessmethod for regions with changing geometry over time in spatio-temporal applications;the differences between the two were only marginal. The experiments also showedthat all the access structures were very efficient methods as the number of objects tobe checked for answering queries was very small compared to the size of the dataset.The only method with serious performance problems in some cases (but still far moreefficient than naive searching) was theSTTsize

2+1d-Tree which should be used with care.The superiority of the distance-based penalty metric over the size-based penalty met-

ric is due to its better subdivision of the whole domain. Since all access methods use


the same split method one may think at first glance, that they should equally subdividespace. But by going into deeper details one sees that the different penalty metrics causedifferent distributions of the spatio-temporal objects over the tree. This results in nodeshaving different shapes before being split. Thus the same split method produces differentdivisions of space in the different methods.

In particular, the size-based penalty metric, by using straight multiplication of sizeenlargements in each of the three dimensions, favors nodes to have astick-like shape:small extents in two and large extent in the third dimension. This has a stronger impactfor three-dimensional spatio-temporal objects than it had in pure spatial access methods.This metric was optimal in the original R*-Tree, but for the spatio-temporal domain thisleads to choosing mainly one axis as split axis (the one which has largest extent in theoriginal node) resulting in a poor subdivision of space for the other dimensions.

The distance-based penalty metric on the other hand treats all three dimensionsequally by clustering the objects in the nodes based on minimizing their distance tothe centroid. This leads tocube-like shapes in the nodes, which in turn causes the splitmethod to split along each of the three axis with almost equal probability. This claim isfurther enforced by the fact, that the difference between the two penalty metrics is largerfor the Biotope database: since objects are smaller and have small overlap the clusteringworks even better compared to the large objects in the Hannover dataset. The size-basedmethod is more likely to producestick-like index entries on the small objects since theyonly have small extent whereas large objects help to avoidstick-like entries already dueto their shape. Thus the distance-based index trees finally show a very good division ofspace leading to better query performance. Similar results were also derived formallyfor the spatial domain in [PSTW93].

In other experiments we examined the behavior of the different structures undervarying scaling factors. We discovered that the denser a dimension is populated themore weight should be given to it (which can be done by adjusting the scaling factor) inthe distance-based trees. In the data used in this chapter the temporal dimension is moredensely populated than the spatial dimensions (this is reasonable for our applicationssince objects usually do not cover the whole geographic space, but exist during thewhole time period). We were even able to improve the efficiency of the access methodsby overscalingthe temporal dimension. That means we gave it a heavier weight thangiven by the absolute coordinate values. Thus in more static environments one shouldconsider using such a technique to obtain even more efficient structures. On the otherhand in extremely dynamic environments, maybe with virtually no bounds on the spatio-temporal values, one should consider using a size-based tree instead of a distance-basedtree. In this area more research is necessary to obtain further insight.

Extensive Performance Evaluation (Database)

By using the toolOraGiST described in section 8.2.2 the newly introduced index struc-tures for spatio-temporal data could be integrated into the database system Oracle 9i . Tocompare the different structures and the system provided two-dimensional spatial indexwhich may be used to index the spatial component of a spatio-temporal object, we only


use the 3d variants of our index structures in this section. The 2+ 1d variants per-formed worse (with size metric) or almost equal (with distance metric) in the previousparagraph, such that they are not used any further. To be compliant with the notation16

in the earlier sections we now use 3D-RSS-Tree forSTTdist3d -Trees and 3D-R∗-Tree for

STTsize3d -Trees; the latter is the same as a 3D extension of the well-known R∗-Tree.

In the experiments three different datasets were used; they were much larger than inthe previous section which was facilitated by using a database system. We used TIGERroad data of the state of Indiana which was temporally enhanced as described in theprevious section; that resulted in a small dataset of 88493 objects and a large dataset of2966261 objects17. Moreover the biotope data described in the previous section wereused again, but this time the full dataset of 528500 objects was used which providesthe medium size dataset in the following analysis. Another important difference in thedata is that biotope data consists of regular rectangles in the spatial component whichmeans no approximation of data was taken for insertion into the index. The road dataon the other hand consists of line strings of arbitrary length in the spatial componentand consequently the MBR of these objects has to be used in the index. It was alreadyexplained for temporal data (cf. figure 8.17) that this has a big impact on performance.

In figure 8.26 the results for spatio-temporal overlap queries on the medium datasetare shown. Execution times for using no index at all are not shown for clarity, sincethey are almost constant at 130. As in the experiments in the previous paragraph the3D-RSS-Tree is the best index, ahead of the 3D-R∗-Tree and then the Oracle Spatial in-dex. The system-provided spatial index is faster than the user-defined indexes for higherselectivities (not shown) due to its closer integration into the DBMS. But for smallerselectivities which are better suited for index usage anyway, the conceptual advantageof the specialized index structures leads to a better performance. Moreover the user-defined indexes have the advantage of being adjustable to other data types and higherdimensions which is not possible for the system-provided index.

For pure spatial overlap queries, as could be expected, the system provided spatialindex is better and also outperforms the user-defined indexes for smaller selectivities(cf. figure 8.27). This is due to it being specialized for indexing of the spatial componentwhich is exactly what is being queried here. As for spatio-temporal queries within thetwo user-defined indexes a slight edge towards the RSS-Tree or distance-based tree isobserved again. This underlines similar findings in the previous section.

A completely different result is achieved for pure temporal selection queries (cf. fig-ure 8.28): this time the user-defined indexes are optimal by far (at least for selectivitiessubject to index support). The spatial index does not help at all leading to constant queryexecution time independent of the selectivity, since queries restrict only the temporalcomponent of objects. Between the two user-defined indexes again the 3D RSS-GiSTperforms slightly better. The difference between Oracle spatial index and no index is in-teresting: it illustrates the huge call overhead incurred for using functions defined inside

16In the previous paragraph we had to differentiate between more indexes and used a different notationto avoid confusion there.

17There exist TIGER data of different resolution leading to different dataset sizes.


0

2

4

6

8

10

12

0 1 2 3 4 5 6 7

resp

onse

tim

e

selectivity in %

3D-RSS-GiST3D-R*-GiST

Oracle Spatial 2D

Figure 8.26:Response Time for Spatio-Temporal Selection (medium dataset)

0

20

40

60

80

100

120

140

160

0 20 40 60 80 100

resp

onse

tim

e

selectivity in %

RSS-GiST3D-R*-GiST

Oracle Spatial 2DWithout Index

Figure 8.27:Response Time for Pure Spatial Selection (medium dataset)

the database as opposed to operators. Whereas the query on the spatial index basicallyrequires a full index scan followed by an invocation of the temporal operator for all tu-ples, the execution without any index only requires an invocation of the spatio-temporalfunction on all tuples. Since especially the spatial overlap function is time consumingthis way of execution takes twice as long as using the theoretically useless spatial index.

So far results for database storage of user-defined indexes were used. Since thelibgist implementation originally operated file-based, another option is to leave theindex information in an operating system file, instead of inserting it into a DBS table.This approach has the great drawback that control over the index and specific databasefeatures such as concurrency control and recovery are not in place for the index; but in


0

20

40

60

80

100

120

140

160

0 20 40 60 80 100

resp

onse

tim

e

selectivity in %

RSS-GiST3D-R*-GiST


Figure 8.28:Response Time for Pure Temporal Selection (medium dataset)

0

20

40

60

80

100

120

140

160

0 20 40 60 80 100

resp

onse

tim

e

selectivity in %

RSS-GiST File3D-R*-GiST File

RSS-GiST Database3D-R*-GiST Database

Figure 8.29:Spatio-Temporal Selections for user-defined indexes (medium dataset)

certain environments it may still be an interesting alternative. The big performance im-provements of this file-based implementation as opposed to the database implementationare illustrated exemplarily in figure 8.29. Spatio-temporal selections are compared forthe two user-defined indexes in the two different variants. Performance improvement ishuge (about factor 5) for both indextypes. In the file-based implementation there is basi-cally no difference between size- and distance-based methods which was different insidethe database. Other results show that even for pure spatial queries the file-based user-defined indexes perform better than the specialized Oracle spatial index. Consequently,if database features for the index are not absolutely required18 this is an interesting op-

18The index has an impact on performance only, not on the correctness of a query result. Therefore


0

5

10

15

20

0 20 40 60 80 100

resp

onse

tim

e

selectivity in %

RSS-GiST DatabaseRSS-GiST File

Oracle Spatial 2D

Figure 8.30:Response Time for Spatio-Temporal Selection (small dataset)

0

50

100

150

200

0 5 10 15 20 25 30 35 40 45 50

resp

onse

tim

e

selectivity in %

RSS-GiST3D-R*-GiST

Oracle Spatial 2D

Figure 8.31:Response Time for index-assisted Pure Spatial Selection (large dataset)

tion. It can be chosen at index creation time and is therefore dynamically changeable.The relation between database-stored user-defined index, file-based user-defined indexand Oracle spatial index is further illustrated in figure 8.30 for the small dataset.

For technical reasons in the current implementation ofOraGiST the user-definedindexes for the large dataset could only be used in the file-based implementation. Thisis due to the fact that the index contents, after being created in a file, is transferredto the database by using conventional insertion statements. Since all those insertions

sometimes it may be beneficial to rather have very good query performance in most cases and bad perfor-mance in few cases when the index cannot be used by the database than to have a guaranteed pretty goodperformance at all times.


0

100

200

300

400

500

600

700

800

0 20 40 60 80 100

resp

onse

tim

e

selectivity in %

RSS-GiST File3D-R*-GiST File


Figure 8.32:Response Time for Pure Temporal Selection (large dataset)

0

5

10

15

20

25

30

35

0 500000 1e+06 1.5e+06 2e+06 2.5e+06 3e+06

resp

onse

tim

e

size of dataset

RSS-GiST3D-R*-GiST

Oracle Spatial 2D

Figure 8.33:Response Time for Spatio-Temporal Selections of 2% selectivity

must be executed within an index creation and can thus not be committed frequently,the whole index insertion is subject to storage in a rollback segment. For larger datasetsthese rollback segments grow very large and consequently slow down index creation.In a future revisited implementation ofOraGiST this problem should be solved by analternative insertion strategy. Figure 8.31 shows the results for pure spatial selectionson the large dataset. They are similar to the results for the other datasets when file-based storage is used. In figure 8.32 we can see once again that the RSS-GiST hascertain advantages over the R∗-GiST for smaller selectivities; the other index structuresperform bad as before on these pure temporal selections.

Finally an analysis of scalability of the different indexing options is important for


0

2

4

6

8

10

12

14

16

0 500000 1e+06 1.5e+06 2e+06 2.5e+06 3e+06

resp

onse

tim

e

size of dataset

RSS-GiST3D-R*-GiST

Oracle Spatial 2D

Figure 8.34:Response Time for Pure Spatial Selections of 5% selectivity

large datasets. In figure 8.33 the response times of the different indexes are shown forspatio-temporal selections of 2% selectivity. All three show a slightly superlinear growthwith increasing size of the dataset where differences in growth rate are negligible. A bet-ter behavior cannot quite be achieved due to certain required swaps and page loads thatbecome more frequent with larger datasets. Altogether the growth rate is neverthelessstill acceptable and index performance can be expected to scale acceptable. Whereasfor pure temporal selections we observed similar growth, pure spatial selections of av-erage selectivity 5% (cf. figure 8.34) show a slightly sublinear growth in response timewith increasing selectivity. This is a very good result, as we can expect acceptable andpredictable index performance for huge datasets also.

In summary we can say that the user-defined 3D-RSS-Tree is the optimal indexstructure for overlap queries on discretely changing spatio-temporal objects. It wasamong the best indexes in all of the categories. This is especially true for the variantthat stores the index information in a file; the disadvantage of this method is that nodatabase control over the index information is present. If any pure temporal queries maybe posed, using an Oracle provided 2D spatial index is not acceptable. If that querytype is not used, such indexes are an alternative as long as the data type to be indexeddoes not become more complex. If e. g. thematic information may be queried combinedwith spatial and temporal information, the user-defined indexes can be easily adapted,whereas Oracle indexes are fixed and thus their performance will degrade further.

8.3.4 Redundancy versus Query Performance

In some situations storing redundant information leads to improved query performance.Thus the gain in performance has to be compared with the price to be paid by the ad-ditional space requirements. Indexes also fall into this category, but the results in theprevious section suggest that there the performance gain is worth the additional space.


Another example is the integration of elevation data into the ATKIS landscape schemaas described in section 6.2.5. On the physical modeling level an interesting aspect arises:even though information about which elevation points are associated with which geome-tries can be computed from the relations already present, we suggest to materialize thisassociation. Like the translation of a regular non-spatialn : m-association a tableele-vation should be used. This table has only two foreign key attributes on the base tablesObjectGeometryandElevationModelPoint. The entries need to be updated after eachinsertion or loading operation by a spatial query. The advantage of the materialization ofthis relation is an improved query performance for three-dimensional queries (e. g. findall lakes near Munich which are at an elevation over 1000 meters), since only table scansof the small relationelevationand the relationElevationModelPointwould be required,once the identifiers of all objects near Munich are known. No additional geometric oper-ations have to be computed, just the application of a selection on the height value of theelevation points. To keep the elevation information for each object current without muchwork, use of materialized views would be of great help. In that case the database systemitself cares for data integrity. One should keep in mind though that this materializationof inferred information may lead to growing times for updating data, since the complexgeometric functions are executed at update rather than query time. For many applica-tions with fairly static data this does not pose any problems though and thus redundantinformation should be materialized.

8.3.5 Spatial Index Creation

Some results about creation times for the spatial indexes from section 8.3.2 will bediscussed in this paragraph.

The times to create the spatial indexes for the previously mentioned structures aredepicted in figure 8.35 for the different datasets. The fastest creation times were foundfor the Oracle indexes where no big difference between the two was observed. Their cre-ation time was also small compared to query time. Thus these indexes are very efficientoverall. The user-defined Z-Code index follows behind the two, still with very accept-able creation time. This is due to the implementation inC which inserts the computedindex entries directly into the database by theC level programming interface (OCI) ofthe system. The user-defined R-Tree indexes (this time we depict R∗-Tree as well asRSS-Tree since their creation times differ significantly) based on thelibgist imple-mentation create the trees in a operating system file first. After creation is completethe file is transferred to the database page-wise in order to obtain full database supportfor the index structure. This insertion is done transactionally which slows down ex-tremely with growing size of the index structure due to fast growing rollback segments.Therefore the long index creation times that are already present for small datasets be-come worse in the case of larger datasets. Maybe bulk loading strategies or an improvedimplementation concept could significantly improve the creation time. Similar resultswere also observed for user-defined index structures on temporal and spatio-temporaldata; thus details are omitted.

A more detailed analysis of the relative creation times can be found in figure 8.36


0

2000

4000

6000

8000

10000

12000

14000

0 100000 200000 300000 400000 500000 600000 700000

resp

onse

tim

e

size of dataset

R*-GiSTOracle R-Tree


RSS-GiST

Figure 8.35:Index creation times for spatial indexes on different datasets

0

20

40

60

80

100

120

140

160

180

5000 10000 15000 20000 25000 30000 35000 40000

resp

onse

tim

e

size of dataset



R*-GiST

Figure 8.36:Detailed index creation times for spatial indexes on smaller datasets

where the times for smaller datasets are presented. This figure shows that the OracleR-Tree is actually created faster than the quadtree except for the largest dataset in thisgraph; this dataset contains only points and the Z-Code for points can be computedvery quickly (since exactly one code of maximum length is required), but the R-Treefollows the same algorithms as for extended objects. We can also see that the user-defined R-Tree indexes are even faster than the Oracle Z-Code for small datasets; thisshows the efficient implementation of the insertion algorithms themselves and stressesthe previously mentioned reasons for the slowdown. On the other hand index creationis usually only performed once for a given dataset and is thus not as important as queryexecution time. The user-defined Z-Code shows some data dependency in index creation

8.4 Index Structures for Joins and other Queries 147

time by the non linear changes for smaller datasets. This supports the proposition thatdata characteristics are the reason for the fast creation of the Oracle Z-Code for thelargest dataset in the graph. Consequently an important result from this section is thatperformance of the Z-Code indexes is more data dependent during index creation thanthat of R-Tree based indexes.

8.4 Index Structures for Joins and other Queries

Apart from overlap selection queries that were discussed extensively in the previoussection other query types in STOSTA also require index support for querying, sometimeseven more urgent than selections. The basis for other query types can already be found inthe relational algebra. One of the more important operations for databases due to its hugecost is the join. With the introduction of user-defined types with user-defined operatorsin ORDBS the need for data type-specific joins also arises. This is due to the fact that thejoin predicate depends on the types of the attributes to be joined. Consequently, in thespatial domain join queries can involve any of the spatial predicates from section 3.1.2 asjoin predicate. Since joins operate on at least two relations a naive implementation of ajoin involves evaluating the join predicate for the Cartesian product of the base relations.Since spatial predicates are very costly to evaluate as seen for selections already theexecution time for spatial joins will be higher than the execution time for selections byseveral orders of magnitude. In more detail for a spatial join over theoverlap predicatethat is executed similar to a nested loops join, i. e. loops over all rows (saym) fromone table and checks the join predicate for all rows (sayn) of the other table, we obtain(cf. section 2.2 for join execution cost on standard types):

costnaive = m + read factor · M + m · costspat pred(n)

This formula shows thatm executions of the computationally intensive spatial predicatehave to be carried out leading to extremely long query response times. The execution ofthis inner spatial predicate may then be executed by any of the selection methods withtheir associated costs as presented in section 8.3.2.

For R-Tree based indexes a special join algorithm has been presented in section3.3.2. The cost for this method would be:

costR-Tree join = #overlapping nodes + read factor · #overlapping node entries+costspat pred(#candidatesR)

The cost consists of executing the procedure of figure 3.8 for the filter step (cost in thefirst row) and the execution of the geometric algorithm for the candidates (cost in thesecond row) which is assumed to be much less than the cost ofm · timespat pred(n) in thenested loops join execution.

If a Z-Code based index is used, the filter step of the join is processed similar toa merge join, but on the index tables instead of the base tables. The index tables canbe assumed to be sorted since that helps in query execution and could thus be done


implicitly on index creation. Altogether the join cost is

costZ-Code join = t1 + t2 + read factor · (T1 + T2) + costspat pred(#candidatesZ ).

In this formulati stands for the number of Z-Codes stored for objects of each table;these codes occupyTi pages. Note that the number of candidates for which the spatialpredicate has to be evaluated may be different for Z-Codes and R-Trees. In both casesa sorting of the candidate pairs to eliminate duplicates before executing the refine stepwould be beneficial. The cost for this sort is not included in the above formulas, since italso depends on the number of candidates and should be small compared to the executiontime of the spatial predicate thereafter. An analytical comparison of the above costmodels is not possible since many data and algorithm dependencies are included in theformulas19.

Hence the area of spatial joins has attracted many researchers to perform experimen-tal comparisons. A more detailed cost model for R-Tree joins has been published in[TSS98] and in [HJR97] whereas analysis results of self spatial joins can be found in[PF00]. Moreover due to its high computational complexity the estimation of the re-sult size and execution time of a spatial join would be helpful; some results have beendescribed in [NP00]. An alternative index structure for spatial joins can be found in[LR98], and [PD96] suggest an alternative join algorithm that could increase processingspeed. Some other publications also suggest alternative implementations of spatial joinssuch as parallel execution ([ZAT98]) or based on raster approximations ([ZdS98]).

In conjunction with this work some other experiments with spatial join queries inOracle 9i have been carried out. User-defined indexes through the extensible indexinginterface of Oracle 9i do not directly support join queries. This is due to the tuple-based interface that is only capable of retrieving a fixed number of rows from an indexat a time. Thus a more global approach of join processing such as the one described infigure 3.8 cannot be implemented on top of the extensible indexing interface. One couldonly write procedures operating directly on the index tables to evaluate such queries.That has the great disadvantage that they are not transparent to the end user, since hecould not use regular queries but has to call procedures instead. Therefore join querieson user-defined indexes can only be executed by the nested loops technique. Estimatesfor execution times can be determined by the formula on page 147; since results cannotbe expected to be acceptable such experiments were not executed.

A comparison of spatial join processing for indexes provided by Oracle Spatial hasbeen performed. The documentation ([Ora01]) describes that a particular parameter tothe spatial operator registers the join query with the server and thus leads to improvedquery performance. An analysis of the query execution plans showed nevertheless thatonly quadtree based indexes used an improved join strategy whereas R-Tree indexesexecuted a regular nested loops join (cf. figure 8.37).

The quadtree based Oracle indexes on the other hand used the merge join-like ap-

19The number of overlapping nodes in the R-Tree formula depends on the particular R-Tree variant andthe data distribution for instance.

8.4 Index Structures for Joins and other Queries 149

Figure 8.37:Execution Plan for Spatial Join on Oracle R-Tree

proach on the Z-Code index tables as described above. This can also be seen by ananalysis of the query execution plan which is depicted in figure 8.38. Note the dramati-cally different estimated cost for query execution (292 vs. 6699739). The Z-Code basedexecution joins in steps 1 to 3 the index tables containing the Z-Codes. This is done bya nested loops join with an index access instead of the merge join technique describedabove. After that duplicate candidates are eliminated in step 4 before the result of thefilter step is produced as a view in step 5. After that all candidate rows are fetchedfrom the base tables and joined as expected for the refinement step. A comparison ofexecution times between Oracle Z-Code and R-Tree was very difficult due to frequentOracle-internal technical problems because of the large space and time consumption of


Figure 8.38:Execution Plan for Spatial Join on Oracle quadtree

the R-Tree join. Results showed that on small tables the R-Tree join took about 2.5times longer than the quadtree join. The discrepancy increased to about 5 times forlarger datasets. The difference was much larger for the filter step only but, since thequadtree generated more candidates, the difference was not too bad for the completejoin query. Nevertheless an improved R-Tree join would be very important to providebetter spatial functionality. For user-defined indexes an efficient implementation of ajoin should be able to perform better than the Oracle R-Tree join since for e. g. joiningdatasets of Roads and Rivers from table 8.1 took several hours.

Other important spatial queries arenearest-neighbor queries (NN queries). An

8.5 Cost and Selectivity Estimation 151

efficient algorithm for R-Tree based NN queries was proposed in [PM97] (cf. figure 3.9for another proposal). Since NN queries correspond to aggregations and user-definedaggregates are not supported in the current version of Oracle 9i , no experimental resultswith user-defined indexes can be given. An alternative procedural implementation thatcannot be properly integrated into the query processor as described for joins would bepossible for user-defined indexes but is not desirable.

Oracle provides an index-assisted operator retrieving a given number of nearest ob-jects to a given object from a table. Since computational complexity is at most linear inthe number of rows in the table and in addition execution of this operator is index-based,performance of theSDO NN operator for all sample queries was very good. Execu-tion time was almost negligible. In the literature nearest neighbor queries are usuallyresearched in much higher dimensions. Applications include image processing and stor-ing of other complex objects as well. In these applications certain usually numericallyspecifiable features of the objects to be indexed are combined into high-dimensionalfeature vectors. Retrieving nearest neighbors in these domains corresponds to searchingfor objects similar to a given object. This is a very important query type and high-dimensional NN queries are an important research topic, since Oracle 9i only supportstwo dimensions as in all spatial operators.

8.5 Cost and Selectivity Estimation

As explained earlier already it is not always smart to use an existing index in queryexecution. If for instance almost all tuples of a relation are in the query’s result set,the overhead for reading candidate tuples from an index as compared to just fetchingall rows of a table is too large. In that case execution without using the index wouldresult in better query performance. In the previously described experimental results thatcan also be seen e. g. in figure 8.20. In that figure for higher selectivities the curve forexecution time using no index is below the curve for the 2D RSS-Tree with refinementstep.

Since only one index is allowed on each column of a table, the choice is only to usethe index or to use a full table scan. The database server can only choose the optimalexecution plan, if it is able to compute (at least approximately) the intersection of the twocurves assuming the monotonic behavior is typical20. For this case an estimation of thecost of a function execution and an index scan must be computable and more importantlythe selectivity of the query. Only in that case the server is able to decide which executionis faster. Consequently a user-defined i. e. data type-specific cost (for data type-specificindex and functions) and selectivity (for user-defined operators) estimation is required.

20Our experiments as well as theoretical derivation show that this will be the case.


8.5.1 Selectivity Estimation

In the domain of spatial data some proposals for estimating the selectivity of queriescan be found. [PF98] present a model which only works for line segment datasets;more general for spatial selections is the model in [JAS00]. An alternative approach ofselectivity estimation is presented in [BF95] using data correlation. An experimentalevaluation of range queries was described in [TP95] which was not performed in thecontext of ORDBS though. Finally the selectivity of spatial joins can be estimated by amodel in [AYS01]. For the complex join queries this may even be used for approximatequery answering, if an exact answer would take too long to compute. All approacheswere not developed with respect to ORDBS; this leads to some of the models beingimplementable under theODCIStats-interface and others are not usable. As a prototyp-ical feasibility study in an ORDBS an exemplary data type and operator will be chosenand one of the implementable models mentioned will be adapted to this data type in thefollowing paragraph.

Prototype Implementation

For data typeVT INTEGER (cf. section 7.4.2 for its definition) the refinement step is notneeded for the operatorvtBetweenOverlap, since the information about an object whichis stored in a user-defined index is not an approximation of the original object but theexact object (line segment in the time-value plane) itself.VT INTEGER will neverthe-less be used for a prototype implementation here due to its simplified implementationwithout sacrificing much expressiveness: since the usual call overhead for approximatedobjects can be easily simulated by using anOraGiST-based implementation of the user-defined index, results are obtained that are easily transferable to more complex domains.More research in those domains would then be necessary but for a prototypical imple-mentation as a feasibility study this simple data type should be sufficient.

The exact definition of the operatorvtBetweenOverlap can be found in section 7.4.2.Since as explained in section 8.3.1 the objects of typeVT INTEGER are stored as linesegments in thevalid time- valueplane, the estimation formulas of [PF98] can be used.The exact formulas can be simplified since all line segments are known to be parallelto the time axis. On the other hand the whole domain is not the unit square anymore(as in the original article) and thus the estimation formula has to be adapted to arbitrarydomains. In summary we obtain the following formula for the selectivity of queryq ondatasetL:

selectivity(L, q) = lmax · ( N−1−ln Nln N ) · qvalue + qvalue · qtime · N

(maxvalue − minvalue) · (maxtime − mintime).

In this formulalmax denotes the length of the longest segment inL, N the number ofobjects inL, qvalue the size of the query window in the value dimension,qtime the same intime dimension andminvalue andmaxvalue the minimal and maximal value of the valuedimension of all tuples inL and similarly for the time dimension. The second termwould be an estimate for the selectivity if the segments were evenly distributed over


the whole space. But since all segments are collinear we need a correction term for theskewedness of the segment distribution; the first term accounts for that and has beendetermined in [PF98].

In order to be able to compute this estimation the dataset specific parameterslmax, N ,minvalue, maxvalue, mintime andmaxtime have to be stored in a metadata table in advance.This table is filled by functionODCIStatsCollect which is called automaticallyby the server when anANALYZE TABLE ...COMPUTE STATISTICS command isissued. They are computed by a simple full table scan which is not time critical sinceit is executed prior to the query and only once for each table21. The other parametersqvalue andqtime which are query dependent can be derived from the actual arguments ofthe operator which are passed to the estimation functionODCIStatsSelectivityby the server.

Since the estimated selectivity alone is not sufficient for the choice of an executionplan, estimations for the details of the possible plans are also needed. The first possibleplan would be a full table scan where the user-defined function associated with the user-defined operator would be executed for each row. Consequently an estimation of theexecution time of the user-defined function is required to compute a cost estimate. Inthe second option where rows are retrieved from a user-defined index an estimation ofthe cost of retrieving one result row by an index scan depending on the particular querypredicate and arguments must be supplied. Therefore this estimation has to be computedalso and the corresponding functions must be supplied to the server.

8.5.2 Cost Estimation for User-Defined Methods

In the previous section the necessity for implementation of user-defined cost estimationfor user-defined functions and indexes has been shown. The estimation of the cost of theapplication of a user-defined function should be directly derivable from the particularalgorithm used for implementation. Differently the estimation of the cost of an indexscan is not so obvious. In the literature [TS96] (and [TSS00]) presents a very detailedestimation model for the filter step cost in R-Trees which may be interpreted as a muchmore accurate description of the formula given in section 8.3.2. It is pretty compli-cated to compute and makes use of information about the index used and the dataset.It requires knowledge about fanout and data density in every node of the R-Tree22 forexample. To keep the prototypical implementation simple this model was not used here.In more advanced implementations its use can be considered but it is useful for the fil-ter step only. Another detailed cost estimation technique for spatial selections that alsotakes the refinement step into account, which is very important as the experiments in theprevious sections have shown23, has been presented in [AN00]. It is a histogram based

21Exactly it is executed once for eachANALYZE command.22Even though formulas for the computation of this density from the data density are given the formula

still remains pretty complex.23The cost for the refinement step including the call overhead actually dominated the total cost of

selection queries.


0

5

10

15

20

25

0 20 40 60 80 100

resp

onse

tim

e

size of dataset

2D-RSS-GiST (with overhead)No Index

Theoretically Optimal Execution

Figure 8.39:Selection onVT INTEGER with and without index

technique and promising results are presented in the above paper. A more advancedimplementation would probably use this model later.

For the prototypical implementation in this work the estimation of the user-definedfunction implementing the operatorvtBetweenOverlap was implemented as fixed cost.This fixed cost was computed by executing multiple invocations of the function andaveraging the cost over the number of calls. To include the fact that multiple invocationsneed not load the same data page into main memory every time, queries very close to thequeries described in the following experiments of different selectivities were used. Theoverall average was then used as an estimation and will be returned in theCPUCostcomponent of the Oracle cost type. This should be a sufficiently exact estimation of thefunction cost.

The cost of retrieving results from a scan of the user-defined index in the prototypicalimplementation is estimated by a formula from [KF93]. This formula is used to computethe average number of pages to be read from a R-Tree index for a given query. It canalso be used for R-Tree variants since it only uses static properties of R-Trees. Detailscan be found in the original article.

8.5.3 Results of the Prototypical Implementation

The aforementioned estimates were implemented as user-defined statistics by using theOracle 9i user-defined statistics interface described in section 2.3.2. The optimal execu-tion time to be expected forVT INTEGER would be the minimum of the curves withoutindex and with index (including operator overhead) shown in figure 8.39. How well theminimum can be achieved depends on the quality of the selectivity estimation functionon one hand. The function used here has shown its approximation quality in previouspublications already and thus we can assume that it is sufficient.

On the other hand we need to know or approximate the point where the curves in-


tersect for the particular dataset and index, in order to decide which execution plan tochoose once the selectivity was estimated. This can be achieved by a good quality costestimation. Results from previous research as described above were used in this categoryalso. Therefore quality of these estimations should be sufficient.

An extensive evaluation of the quality of the estimation functions remains to be done.Nevertheless in some preliminary experiments we have found the execution times forqueries onVT INTEGER to be pretty close to the theoretical minimum. The minimum isachieved for smaller and higher selectivities. In the area around 50% selectivity, where itis difficult to choose the optimal execution plan, some minor inaccuracies in estimationled to performance up to 15% worse than the theoretical minimum. This is still a verygood result for a prototypical implementation and shows that further investigation ofcost and selectivity estimation should be undertaken to provide optimal efficiency foruser-defined data types while being transparent to the end-user. Especially for morecomplex data types and larger datasets the benefits can make an even bigger impact.

Part III

Applications on OR Databases andProspect

Chapter 9

Applications for Spatial and TemporalData in ORDBS

To show how the principles of object-relational database schema development presentedin the previous part may be used in end-user applications (STOSTA), three sample ap-plications are presented in this chapter. The first is a visualizer for spatial data in Oracle9i which can be seen as an application for ATKIS data presented in the first case studyin chapter 6 for example. After that web-based exchange of non-standard data from OR-DBS is illustrated. This can be seen exemplarily as another ATKIS application, but maybe used for a much wider range of applications also. Finally the development of geosci-entific computing methods using ORDBS based on the second case study of chapter 6 isdiscussed in some detail.

Visualization

Tool

using DB program (PL/SQL)

EDBS-File

(fixed format)

XML-File

(webwide portable)Temporary Database Topographic Database

World Wide Web

using Java program

using XML, XSL and its derivatives

Figure 9.1:Architecture of ATKIS data management system

An overview of the proposed architecture of a cartographic-topographic data man-agement and exchange system is shown in figure 9.1. This acts as a foundation for boththe visualization and the web exchange components described in the next two sections.All components have been implemented in our system prototypically according to thatfigure, except for the visualization which is operating directly on the database ratherthan on XML files.

160 Applications for Spatial and Temporal Data in ORDBS

9.1 Visualization of Spatial Data

In this section we do not discuss features and functionality of commercial geographicinformation systems (GIS). Such a discussion may be found in the documentation ofthe particular vendor (e. g. ESRI or Smallworld). This section is rather concerned withspecialized visualization of spatial information inside an ORDBS. After a presentationof the important requirements for a visualizer to be able to work efficiently with spatialinformation, a prototypical implementation of such a visualizer on top of Oracle 8i /9i isdescribed along with an analytical model of the optimal tool. Even though the prototypeis very powerful, it does not completely provide all required functionality. The use isillustrated by means of ATKIS data which is already known from the first case study inchapter 6. Possible extensions are also discussed in the last part.

9.1.1 Requirements

The many requirements for data visualization in full-scale geographic information sys-tems (GIS) have been extensively presented in the literature (see e. g. [Slo99, Car02]).We will rather focus on the specific requirements for visualization as a tool to efficientlymanage spatial data in an ORDBS and to perform some kind of data analysis that doesnot require using a GIS (or is only possible by not using one).

The standard database interfaces provided with the system are based on textual in-teraction. Usually systems provide a command line tool where the user can write or loadqueries that can then be executed in the ORDBS. The query result is displayed textuallyon the screen. Obviously for checking the correct spatial placement of a polygon froma dataset with maybe hundreds of vertices this tool is not sufficient. Therefore the firstrequirement is that a tool is provided that candisplay user-defined spatial data from anORDBSgraphically, i. e. display the spatial information in human recognizable form.

Unfortunately, given a certain thematic information covering the whole area, visu-alizing all spatial shapes of that layer where the thematic information is valid will onlydisplay one black area. Therefore we need to be able to color the shapes of a layer ac-cording to a thematic attribute. An extension of this idea is theusage of a color mapfor a certain layer whose information is also stored in the database (see section 9.1.2for details). One should also be able to choose if polygons are displayed as polylinesor as solid areas. All the requirements listed so far are among the standard features ofregular GIS. The additional requirements listed below can only be satisfied by using anORDBS.

One of the main advantages of using ORDBS for spatial data is the integration ofthematic and spatial information (and other integration aspects like integrating differentkinds of spatial information). This leads to an important visualization feature: acorre-spondence between textual result for thematic informationand graphical result forspatial information should be established. Such a feature may for instance be satisfiedby supplying a marking feature that marks the textual result row of a query in the tabularrepresentation in the same color and at the same time as marking the spatial feature inthe graphical representation. This way the user is able to relate textual and graphical

9.1 Visualization of Spatial Data 161

results.Within the last paragraph another requirement is implicitly mentioned: a visualiza-

tion of an arbitrary data extract or even a transformed analyzed data extract should bepossible. This feature is provided on top of a ORDBS by allowing theresults of arbi-trary SQL queries to be displayed. Together with user-defined spatial functions andoperators, e. g. as described in section 8.1, that will usually be available together withthe spatial data type, this allows for extremely flexible and powerful analysis and visu-alization.

Finally, a lot of important information cannot be derived by using a single query butrather byoverlaying the results of different queries. Important data to be combinedmay e. g. come from different databases and can thus only be retrieved with differentqueries. Moreover it may be more expressive to see several query results graphicallyin their entirety and then overlay them on the screen manually, than to do the overlaywithin the query with spatial operators and only visualize the result. Therefore we alsoneed the possibility tovisualize multiple layers at a time. Note that there is a hugedifference between displaying several layers at a time which can be easily done and acomputationally intensive overlay operation that actually computes the overlay not onlyin graphical form but analytically. The latter is an important operation but should ratherbe provided by the data component than by the visualization component.

Of course there are many other features required to be able to work with spatial datain the ORDBS (e. g. zooming, selecting rectangular areas, loading predefined queries),but since most visualizers will provide these features and since they are not specific todatabase support we omit them here. In the prototypical implementation in the nextsection the features that were implemented will be explained.

9.1.2 Visualizing Spatial Data in Oracle 9i

At first a concept for a spatial visualizer on top of an ORDBS will be presented whichis then further illustrated by the description of a prototypical implementation.

Each visualization application is an instantiation ofVisualizer (cf. conceptual modelin figure 9.2). A visualization consists of several layers whose display order is storedwith the visualization itself. A layer (often also calledthemein GIS literature) is a col-lection of domain-specific information for a certain spatial extent. In conjunction withdatabases each layer is associated with its own database connection and query that de-scribes how the information forming this layer is extracted from the database. Since SQLqueries are used to retrieve data, all the techniques of modeling spatial data types fromthe preceding chapters of this work are automatically used. Efficient querying and visu-alization is thus guaranteed, if physical optimizations as described in section 8.3.2 havebeen applied. The query must retrieve exactly one column of typeSDO GEOMETRYwhich is determined automatically and used for determining the spatial information tobe displayed.

After a query is executed the result is stored in a textual representation which isprovided as one instantiation ofTextualQueryResult for each result row as well as agraphical representation (instance ofGeometricQueryResult). The textual result in a


GeometricQueryResult

ResultID.ResultGeometry

Visualizer

NameLayerOrder

ElevationSource

TableNameColumnNameScale

TextualQueryResult

ResultIDResultRow 1..11..1 1..11..1

describes

Layer

DBConnectionDBQueryVisible?BorderMode?LayerNameResultRowType

1..1 1..*1..1 1..*

consists of

0..11..1 0..11..1

is assigned

1..*

1..1

1..*

1..1

contains

MapElement

ColumnValueDisplayColor

ColorMap

NameColumnName

1..1

1..*

1..1

1..*

displayed using

1..1

1..*

is part of

1..1

1..*

.

Figure 9.2:Concept for Spatial Visualizer for Oracle

tabular display is in a 1:1-relation to a geometric object in the graphical representation.This relation enables the user to relate domain-specific thematic information with thecorresponding spatial feature. This can be achieved visually by e. g. highlighting thetwo in the same color.

Moreover each layer is displayed using a particular color map. The color map storeswhich of the result columns of the query is used for determining the color of the corre-sponding geometry in the spatial display. Moreover it consists of a set ofMapElementswhich provide the coloring information for all possible values of the selected column.The advantage of this approach is that the same color map can be used for differentlayers. This may be useful for a display of certain domain attributes which are alwaysdisplayed using the same shades, or for displaying ATKIS data from multiple databasesin one window all using the same map.

The visualizer is a very flexible tool since each layer is generated by a database query.Thus the complete database functionality is available in the visualization component.One could e. g. display results of complex user-defined functions that are computed andthus optimized by the database system by posing a single query. Main memory usagewhich may become critical for larger datasets of complex shapes is saved as much aspossible by this technique. Also physical optimizations as explained in the previous

9.1 Visualization of Spatial Data 163

chapter of this work will automatically be used transparent to the end-user.Finally a special feature for displaying elevation information is also included: each

layer can optionally be associated with an instance ofElevationSource which stores in-formation about the database table and column containing the elevation information forthe particular layer. This could be used for a three-dimensional visualization or in sim-pler visualizers for retrieving heights of particular points of the graphical display. Thesystem may e. g. compute the three points in the database closest to the point selectedby the user, display the average elevation and mark that point in the graphical display.The closest points in the database could be computed by a database query and only thosepoints are retrieved; this is much more efficient than selecting the whole elevation tableand computing the closest points in main memory1, since the spatial indexes explainedin section 8.3.2 can be used by the database system.

Prototype

A prototypical implementation of a visualization tool on top of Oracle 8i /9i will bedescribed briefly in the sequel. A detailed description and implementation informationcan be found in [L¨oc99, Kossl00]. The visualization tool which is written in Java toprovide for good portability is able to display multiple layers. These layers are managedin the layer list in the left part of the main window (cf. figure 9.3).

The order of the layers displayed in the list corresponds to the order in which thelayers are displayed in the graphics window (cf. figure 9.4); thus an overlay operation isnot provided, data in one layer is only visible if no layer above it contains informationat that point in space. The order of layers may be changed by theUp andDown buttons,they can be removed or colored by using the other buttons. Each layer can be visible,display polygons by their borderlines and use a given color map. The status of theseflags is indicated and can be modified by the check boxes in the list.

By choosing a layer in the layer list the middle text field in the right part of the mainwindow shows the editable query to generate the information for this layer. The up-per text field displays the name of the layer while the lower field is used for displayinggeometric restrictions on the data of a layer. After a query is entered the buttonStartQuery executes the query after the database connection for this layer has been estab-lished. After all data is read from the database it is displayed in the graphics window(figure 9.4).

Non-spatial thematic information for a layer (as explained above a layer should con-tain a certain kind of spatial thematic information) is displayed in tabular form by meansof the buttonShow Result. By marking objects either in the graphic window or in thetabular result the marked objects are also marked in the other window enabling an ad-vanced analysis on thematic and spatial properties.

Special coloring support by means of color maps stored in the underlying databaseis included in the tool as explained in the concept above. These maps are automaticallyretrieved and used if available. An example of a display with color map using a strongly

1This way could even be not feasible since the elevation table may not fit into main memory.


Figure 9.3:Screenshot of Main Window of Visualization Tool

simplified version of rules for generating the German topographic base map as coloringcan be found in figure 9.4. TheRestrict function enables the user to select a rectanglein the graphics display and then the original query is automatically expanded by SQLcode that only retrieves objects in the selected region. This selection is optimized bythe database and thus index assisted (cf. section 8.3.2). Other standard functionality ofthe tool includes loading and saving queries as well as zoom in, zoom out and scrollingfunctions in the graphics display.

The tool is only a prototype, since some important requirements for a complete prod-uct are still missing. A complete overlay function is not provided but could be added inthe database and then called by an appropriate query within the visualizer. The integra-tion of theRestrict function has not been adapted to multiple databases. Moreover thegeneration of color maps and their storage is not supported by a graphical interface andis pretty cumbersome at the moment. Finally advanced map generation features such aslegends or predefined color maps for special domains are still missing.

9.2 Exchanging Non-Standard Data over the Web

The database support described in section 7.2.4 with the physical optimizations fromsection 8.3 is already useful for managing geospatial data within a single company ororganization; the visualizer from section 9.1.2 allows remote data inspection. With the

9.2 Exchanging Non-Standard Data over the Web 165

Figure 9.4:Screenshot of Graphics Window of Visualization Tool

growing demand for exchange of data over the web nevertheless, the system needs to beextended to enable web support for the data stored as another example of a STOSTA.As of now XML seems to becomethe standard for exchange of data over the web. Itallows the definition of the format of documents containing a certain kind of informationaccording to a standard syntax (calledDocumentTypeDefintion); a proposal for sucha DTD for ATKIS data is presented in the next section along with some general ideason how to create a DTD from an object-relational database schema. For the export andimport of XML files according to the proposed DTD we present database programs insection 9.2.1 which finally facilitate the database-supported exchange of cartographic-topographic base data over the web. In the sequel a case study for web-based exchangeof the already mentioned ATKIS data is presented, parts of which have been publishedin [KL01c]. Thereafter a reference to an extended algorithm automating the process ofDTD generation which has been published in [KL01a, KL02a] is given.

9.2.1 Case Study: ATKIS

In this section we show how geographic data sets that are available in a proprietary fileformat only, can be web-enabled by using an object-relational database system. In par-ticular we introduce an XML-based document type definition for ATKIS data (cf. sec-tion 6.2) in order to exchange cartographic-topographic data sets over the internet. The


generation of the XML file makes use of the optimizations on the physical level as de-scribed in section 8.3.2 since it operates with regular SQL queries. We describe how togenerate files for data exchange according to the proposed DTD and try to derive somegeneral rules on how to design DTDs for strongly structured data according to a databaseschema.

Motivation for XML-based data exchange

An interesting aspect of topographic-cartographic information in ATKIS for which anobject-relational schema was developed in section 6.2 is, that the landscape model itselfis almost completely separated from its presentation: there is an additionalsignaturecataloguedefining rules on how to generate standard maps from the ATKIS landscapemodel data. One could e. g. define different rules on how to create a large scale mapand a small scale map using the same basic landscape model. This integrates very wellwith XML andXSL: these two languages can also be used to describe the data itself(XML) and its presentation for a certain audience (XSL). In this view ATKIS is stilla conceptually very modern scheme and could therefore be used as a pattern in othercountries as well.

Despite its conceptual advantages, obtaining and using ATKIS data is very ineffi-cient. One has to define exactly which data extracts from a fixed set of possible rangesare desired in cooperation with the supplier, order the desired data sets and then oper-ate on the file-based data. If the data is to be used in a different format, a parser andconverter has to be written. For this task one needs to understand the grammar of theexchanged files in addition to knowledge about the ATKIS format itself.

In order to increase efficiency in the use of ATKIS data we propose ATKIS-ML andATKIS-META-ML for future data interchange. These are XML DTDs, based on ourdatabase schema as well as the conceptual model from section 6.2. They can be usedfor interchange of ATKIS data sets and standardized reference information for ATKIS,respectively. In addition along the way we present some general ideas on how to generateDTDs for exchange of data from object-relational database schemata.

A DTD for Standardized Attributes

One of the most important products of the ATKIS working group was the standardizationof a so-called object-kind catalogue. It groups all landscape objects into certain layersand object kinds. Moreover it defines certain standard attribute classes and standardvalues within these classes. Since these standards are very stable, one should not includethe information in every single data set that is exchanged. Rather than that one needs todefine an XML format for the exchange of these standards itself, which can then be usedas reference for a particular ATKIS-ML datafile.

The derivation of an XML-based language for this part is straightforward from theobject-relational schema in figure 6.3. We present its DTD called ATKIS-META-ML infigure 9.5.

The four relations from figure 6.3Layer, ObjectKind, AttributeTypeandAttribute-Valuecontain standardized information. The first three of them are included within top-level elements in the DTD. Since possible values of attributes are different for differentattribute types, we include the elementStndrdAttributeValue as subelement ofStndrdAttributeType. In general it is a good technique to map foreign key asso-ciations (asAttributeValueon AttributeType) to subelements in the DTD: the elementmodeling the primary relation gets subelements for all relations having foreign keys onit. This directly transfers the validity of the foreign key constraint from the database toa structural constraint in the XML document, since without a valid reference the par-ent element would not exist and therefore also the child element could not be included.That way we can model foreign key constraints within the same XML document, butnot across different documents, as we will see later.

<!ELEMENT ATKISStandard (StndrdLayers, StndrdObjectKinds,

StndrdAttributeTypes)>

<!ENTITY % primKeyNo "No ID #REQUIRED"><!ENTITY % nameAtt "Name CDATA #IMPLIED">

<!ELEMENT StndrdLayers ( StndrdLayer* ) >

<!ELEMENT StndrdObjectKinds ( StndrdObjectKind* ) >

<!ELEMENT StndrdAttributeTypes ( StndrdAttributeType* ) >

<!ELEMENT StndrdLayer EMPTY><!ATTLIST StndrdLayer %primKeyNo;><!ATTLIST StndrdLayer %nameAtt;>

<!ELEMENT StndrdObjectKind EMPTY><!ATTLIST StndrdObjectKind %primKeyNo;><!ATTLIST StndrdObjectKind %nameAtt;>

<!ELEMENT StndrdAttributeType ( StndrdAttributeValue* ) ><!ATTLIST StndrdAttributeType %primKeyNo;><!ATTLIST StndrdAttributeType %nameAtt;>

<!ELEMENT StndrdAttributeValue EMPTY><!ATTLIST StndrdAttributeValue Value CDATA #REQUIRED><!ATTLIST StndrdAttributeValue %nameAtt;>

Figure 9.5:DTD for ATKIS-META-ML

The order of the three elements in the top level element is not important, it should befixed in some way, however, to provide for a better readability of the resulting XML file.Since the documents are derived from a database schema, they show an extremely highstructure; this is reflected in the DTD by all lowest level elements being empty elements.That means there is no unstructured data needed in ATKIS-META-ML documents.

Since the meta-information to be exchanged by this DTD is standardized and stable,an XML document with this information should once be generated by a standardizingorganization and then made available for reference in particular datasets over the web.

<!ELEMENT ATKISContents ( ATKISObject*,(Overpass | ComplexObject)* )>

<!ELEMENT ATKISObject ( ATKISObjectPart*, ObjectKind, Layer,Position, ( ObjectName | ObjectAttribute )* )>

<!ATTLIST ATKISObject ObjectNo ID #REQUIRED><!ATTLIST ATKISObject Actuality CDATA #REQUIRED><!ATTLIST ATKISObject ModelType CDATA #REQUIRED><!ATTLIST ATKISObject BuildDate CDATA #IMPLIED>

Figure 9.6:ATKIS DTD top level part

A DTD for a Particular Cartographic Dataset

In this paragraph we present the most important parts of a DTD for the actual ATKISdata. The definition is generated directly from the database schema shown in figure6.3. We therefore claim that some of the methods used in this derivation are generallyapplicable to the derivation of a DTD from a database schema. We will try to highlightthese methods in the discussion.

After defining the root element for the DTD we define the document to consist ofATKISObjects,Overpasses andComplexObjects as shown in figure 9.6. TheclassATKISObject was the central class of the data model, which was reflected in thedatabase schema by the relationATKISObject having no foreign keys on other relationsexcept for the relationsLayer andObjectKind containing values for standardized at-tributes. Since these are not stored in the ATKIS-ML document of a particular data setitself but rather in a different document providing standard values for ATKIS data as ex-plained in the previous paragraph, we can forget about them for a moment. In general itseems to be a good starting point when defining a DTD from a database schema to startwith an element for the central relations (i. e. the relations without foreign keys). This isdue to the fact that these relations in a sense model the most essential objects of the realworld for this application. These objects should thus appear on a higher level in the hier-archical XML document. That exactly means that the corresponding elements should beat a high level in the DTD. The other attributes of relationATKISObjecthaving standarddata types are modeled as attributes of elementATKISObject.

The other two elements on the main level correspond to relationsOverpassandCom-plexObject. These relations were introduced to model the generaln : m-associations ofclassesObject andObjectGeometry, respectively. Such relations can only be modeledwithout redundancy on the top level in ATKIS-ML documents. Since they need to refer-enceATKISObjects we should include them afterATKISObject elements in the DTD,although this is not necessary, if using XLinks for references. More details can be foundat the end of this paragraph.

Figure 9.7 shows how the standardized ATKIS object attributes from the first para-graph of this section are used. We propose to realize the references to the file containingstandard attributes by use ofXLinks ([W3C01a]). The object attribute type is definedby a simple XLink to the standards file. One would like to ensure that this link re-


<!ELEMENT ObjectAttribute( AttributeValueReference | ValueAttribute ) >

<!ATTLIST ObjectAttributexlink:type (simple) #FIXED "simple"xlink:href CDATA #REQUIREDxlink:role CDATA #FIXED "Reference Data"xlink:title CDATA "Object Attribute Type Pointer">

<!ELEMENT AttributeValueReference EMPTY><!ATTLIST AttributeValueReference

xlink:type (simple) #FIXED "simple"xlink:href CDATA #REQUIREDxlink:title CDATA "Object Attribute Value Pointr">

<!ELEMENT ValueAttribute EMPTY><!ATTLIST ValueAttribute Value CDATA #REQUIRED>

Figure 9.7:ATKIS DTD attributes referencing standard data

ally points to an element of typeStndrdAttributeType in that file, but to ourknowledge that is currently not possible2. The values of the object’s attributes can be ei-ther simple XLinks to the standards file (elementAttributeValueReference), incase these values are standardized (e. g.elementary schoolas value ofbuilding use),or any information inCDATA format for non-standardized attribute values (elementValueAttribute), e. g. the width of a canal. Again for integrity reasons one couldimagine to define exactly for which attribute types standard values are expected andcontrol, that in these cases a correct XLink is provided. As a general rule for the trans-formation of foreign key constraints in databases into XML DTDs we suggest to use anXLink and/or an XPointer attribute. Validating XML parsers will then be able to controlthe correct specification of these constraints (but not that the pointer points to an elementof the correct type since such restrictions cannot be specified in a DTD without usingadditional techniques such as RDF, [W3C99], or XML Schema, [W3C01d, W3C01e]).The ATKISObject subelementsLayer andObjectKind are translated into themarkup language in the same way as the object attributes and are therefore omitted.

The object-relational concept of nested tables was used in the database schema tomodel the different positions at which a name of an ATKIS object should appear ona map. In the DTD shown in figure 9.8 this is translated as subelement: the elementObjectName consists of subelementsPosition which contain all information fromthe nested tableGeoof relationObjectName. In this case we require at least one positionfor each object name, since that was the way these positions were defined in the ATKISstandard ([KS95]). The other columns of tableObjectNameare modeled asCDATAattributes of elementObjectName as before.

We suggest to transform nested table types by introducing a new element type for the

2This will change in the next version of our system. There we plan to make use of the XML SchemaLanguage ([W3C01c]). Together with the XLink/XPointer Language ([W3C01a, W3C01b]) and thenamespaces defined in XML Schema it will be possible to define such references and to enforce themby use of a validating XML parser.


<!ELEMENT ObjectName ( Position+ )><!ATTLIST ObjectName NameType CDATA #REQUIRED><!ATTLIST ObjectName NameValue CDATA #REQUIRED>

Figure 9.8:ATKIS DTD for nested table of names

<!ELEMENT ComplexObject (ObjRef, ObjRef+)><!ATTLIST ComplexObject

xlink:type (extended) #FIXED "extended"xlink:title CDATA #FIXED "Complex Object">

<!ELEMENT ObjRef EMPTY><!ATTLIST ObjRef

xlink:type (locator) #FIXED "locator"xlink:href CDATA #REQUIREDxlink:title CDATA #IMPLIEDxlink:role CDATA #IMPLIED>

Figure 9.9:ATKIS DTD for relation modelingn : m-association

table’s contents in general. This element type should be included as subelement (withcardinality* or + as desired) in the element corresponding to the relation containingthe nested table type. The reason for that is that a nested table within another tablealready in itself forms a hierarchy. The corresponding hierarchical construct in XML isthe subelement which can occur with cardinality since the nested table can also containmultiple entries for each row of the parent table. Therefore we obtain the translation ofnested tables as subelements with cardinality* or + in general.

Finally we need to show how relations in the database schema modelingn : m-associations from the class model are included in the DTD. As an example we presentthe DTD for relationComplexObjectin figure 9.9. The relationOverpassis modeledsimilarly. A complex object consists of one reference to the major object and referencesto the minor objects forming the complex object. Since references to an ATKIS objectare used more than once we use a new elementObjRef. The complex object itselfis an extended link (cf. [W3C01a]), since it consists of subelements which are linksthemselves. The object reference is modeled as a locator type XLink, since it is used tolocate anATKISObject element within the same document. In the data file this locatorlink will be formed using the XPointer ([W3C01b]) language. This has the advantageof a minimal redundancy in the document, since the objects are stored only once andjust references to them are used thereafter, as well as enforcement of the validity of thereferenced objects to a certain degree. We are not able to pose as strong restrictionson the XPointers as we can use in the database system, but still valid references arereasonably enforced.

The relationObjectGeometryis included in the DTD in a straightforward manner(cf. figure 9.10): since the only attribute in the conceptual model was the geometry it-self we can simply use a subelement of anobject part for Geometry. We do so


<!ELEMENT ATKISObjectPart( OGC::GeometryCollection, ObjectAttribute* )>

<!ATTLIST ATKISObjectPart ObjectPartNo CDATA #REQUIRED>

Figure 9.10:ATKIS DTD for ATKISObjectPart using GML ([OGC99b])

by using the DTD of theGML language ([OGC99b]3) proposed by the OpenGIS Con-sortium for the exchange of geographic information as additional external subset in the<!DOCTYPE>-clause of the produced XML documents. In particular we use elementOGC::GeometryCollection from theGML. This can be done as the data typeMDSYS.SDO GEOMETRY in Oracle9i is compliant with the OpenGIS SQL specifica-tion ([OGC99a]). In a future version the inclusion of elementGeometryCollectionfrom a URL at a standardizing organization could be done by using a different name-space for this attribute. By doing so one can ensure compliance with the most recentversion of the defined standard.

Generating XML Documents from the Database

After the database schema and exchange language is fixed, we need tools to producefiles in ATKIS-ML from data in the database; a simple sample XML document tree canbe found in figure 9.11. Also import tools that can read ATKIS-ML files and insert theircontents into the corresponding database schema are needed. Both tasks can be solvedeasily by using a database programming language or an external language which offersdatabase support (like Java and JDBC). We have only implemented a tool to generateATKIS-ML files from a database schema so far, but we feel that an import tool shouldbe as easily implementable if desired.

We decided to use a database programming language (PL/SQL for Oracle 8i /9i ) tobe able to use the advantages of a tight SQL integration into the language. The exporttool for a particular data set works its way through the ATKIS-ML definition (e. g. itstarts withATKISObjects) obtaining the desired data from the database tables usingcursors. It generates an output file in text format containing the desired ATKIS-ML tagsfor elements detected. The export tool for the standardized information works in exactthe same way on ATKIS-META-ML instead of ATKIS-ML. We checked the correctnessof the generated documents by using a validating XML parser; the documents were bothwell-formed and valid.

Both export procedures have a prototype flavor in that they are specialized to workwith the database schema from section 6.2 and the markup languages from the precedingsubsections. It is probably desirable to design a more general tool that generates markuplanguage files given a database schema and the language DTD, or even a tool that gener-ates the DTD as well from only the database schema. Instead of DTDs XML Schema de-scriptions could also be used; ideas on XML Schema generation from database schemataare discussed in [Klo01] for instance. The XML files that we generated could be used by

3In the future version which uses XML Schema instead of DTDs as explained above, we can use therecent version 2.1.1 of GML ([OGC02]) which is also based on XML Schema.


Position ValueAttribute

ObjAttribute 1

GeometryCollectionAttributeValueReference

AttributeValueReference

ATKISObject 2

ObjectKind Layer Position ObjectName

ATKISContents

ATKISObject 1

ObjAttribute 2

Overpass 1

ATKISObjectPart 1 ATKISObjectPart 2

GeometryCollection ObjectAttribute

LowerObject

UpperObject

ATKISMeta.xml

XLink

XLinkXLink

XLink

Figure 9.11:Simple sample XML document tree conforming to ATKIS-ML DTD

other geographic databases to import the cartographic base data that is supplied. Thesesystems do not need to implement a specialized import tool for a special file format;instead they only have to be able to read the output document tree of a validating XMLparser. Tools that perform this task are already available for some database systems.

If no complete exchange of data but rather only data from a specific geographicarea is to be exchanged, the queries (or cursors) in the export tool have to be modified.To select data from a certain region only spatial window queries are necessary. Theseselections will be efficiently performed by the ORDBS, if the methods from section 8.3were used in the physical design of the STOSTA system for ATKIS data.

Sizes of XML-Files

At first glance one of the main drawbacks of exchanging XML files is their compara-tively large size. Because of the annotations by tags and attributes XML files are consid-erably larger than files in some proprietary file format containing the same information.Since datafiles tend to be large and bandwidth nowadays is still a concern (at least forreally large datasets) one would like to keep the files to be exchanged as small as possi-ble.

While this seems to favor domain specific formats, we can overcome this difficultyfor XML data by using compression techniques. XML files feature a large redundancysince tag and attribute names are repeatedly used. Therefore they provide a good basisfor standard as well as XML specific compression techniques. In table 9.1 we presentsome example results of file sizes computed on sample ATKIS data sets. We compareoriginal file size in EDBS (cf. section 6.2.1) to XML format (using ATKIS-ML as pre-sented in the previous section) as well as compressed sizes usinggzip on EDBS andXML and Xmill ([LS00]) on XML files.


File Type North Sea Hannover Nenndorf Damme

EDBS 10024 KB 5126 KB 1727 KB 2437 KBXML 26225 KB 15357 KB 4750 KB 6356 KBEDBS + GZIP 1673 KB 744 KB 282 KB 415 KBXML + GZIP 2182 KB 843 KB 386 KB 547 KBXML + Xmill 1767 KB 619 KB 317 KB 440 KB

XML Compression 93.3% 96.0% 93.3% 93.1%Ratio (XML:EDBS) 1.056 0.832 1.124 1.060

Table 9.1:Comparison of compressed and uncompressed Datafiles

The results in table 9.1 show that both EDBS and XML files can be compressed verywell. Moreover we can see that the XML compression usingXmill is even greater thanusinggzip on either format. The compression ratio forgzip on XML files with highredundancy is better than on the proprietary EDBS files. The compression ratio of XMLfiles using the specialized toolXmill is very high ranging from 93 % to 96 %. Finallythe comparison of sizes of compressed EDBS and XML files shows that the overheadincurred by using XML is never greater than 12 %, one file became even smaller us-ing XML (by 17 %). Thus we can say that the portability, semantic information, easeof use and exchangeability gained by using XML over a proprietary format outweighsthe possible increase in file size since this increase (if any) is very moderate4. Finallywe recommend to use XML based file format together with a specialized compressiontool (Xmill is available for most widely used platforms) for spatial data exchange asopposed to using specialized formats since these can usually only be used for very fewapplications and need conversion routines to be used in others.

9.2.2 Generation of XML from OR Schemata

During the process of DTD generation for ATKIS data in the preceding section we havealready discovered several general rules on how to generate DTDs from object-relationaldatabase schemata. It would be less system-specific, if we were able to generate a DTDdirectly from the conceptual model. Consequently we have extended the ideas from theprevious section to an algorithm for DTD generation from (E)ER schemata, i. e. concep-tual schemata. Since it is not specific for STOSTA5, we do not describe it in detail here.It has been published in [KL01a, KL02a]. The main benefit is that a DTD for a specificdomain, preserving as many constraints from the schema in structural constraints in theDTD as possible, can be generated implementation independent. While this is due tothe algorithm working on a conceptual schema a concrete implementation of the XML

4Note that the greatest overhead percentage occured on the smallest file, but the overhead is moreimportant on large files.

5But nevertheless it is applicable to STOSTA since it works on the conceptual level.


generation has to be adapted to the particular implementation. Details can be found inthe above publications.

9.3 Developing Scientific Applications

In this section an example is presented showing how the data types defined in part II ofthis work can be used in STOSTA for scientific applications. In particular a meta-modelthat could be used as a basis for a domain computation method management systemin heterogeneous research environments is presented. This model is based on samplemethods illustrated in the second case study in chapter 6, but may be used for differentdomains also.

9.3.1 Case Study: Physical Geography

In section 6.3.6 simple SQL queries on top of the presented database schema as wellas other simple methods combining information from a single layer were explained.If data from multiple classes within a single information layer is used and combined,either one or multiple DML statements may be used. Some functions of this categoryrequire more complex computations and are thus implemented by PL/SQL functions inthe database. This is due to their slightly more complex structure requiring cursors forcomputation of the desired attributes. The general structure of these methods is usually,that a cursor containing a join over the input tables is opened firstly. Within this loop therequired values are selected and possibly aggregated6 before being stored in the resulttable. After that several updates may be required to adjust result attributes. Even thoughthey are structurally simple, details tend to lengthen these methods and therefore only anexcerpt is presented in figure 9.12. Due to the simple structure prototypes can be easilygenerated by a system, that just have to be completed by an application developer.

More advanced methods may require overlaying information layers. In the casewhere both layers have geometric information in vector format, overlay can be per-formed by database functions for intersecting geometries; these in turn use the data typesand optimizations presented in part II of this work. Functions performing the overlay areincluded in the query generating the cursor expression as shown in figure 9.13. Otherthan that these methods are implemented just as the methods within one informationlayer. Computations are performed within the cursor loop and result attributes are alsowritten in that loop. Note that a spatial join on the base tables is required to compute theresults; thus these methods will be computationally intensive as shown in section 8.4.

Also computationally intensive are methods combining data from rastered and vec-torized information layers. To obtain the relationship between input attributes in a vectorlayer and attributes in a raster layer a spatial join is also required within a cursor state-ment. Since the result will also be in raster format the cursor declaration now looks asshown in figure 9.14.

6This requires a domain-specific aggregation and is thus not expressible in current simple SQL queries.

9.3 Developing Scientific Applications 175

CREATE OR REPLACE PROCEDURE calculate_ta10 ISCURSOR cur IS

SELECT ra10.pnr, BP.krmm AS krmm, BP.nfkwe AS nfkwe,TA_NK.Bezeichnung as nart

FROM RasterAll10 ra10, BodenProfil BP, Nutzung N,BodenStandort BS, NutzungStandort NS,Anbau A, TA_NutzungKlassen TA_NK

WHERE ra10.SNrBoden=BS.SNr ANDra10.SNrNutzung=NS.SNr ANDBP.BPNr=BS.BPNr ANDA.SNr=NS.SNr ANDN.NNr=A.NNr ANDN.Jahr=1996 ANDN.NutzNr>=TA_NK.min ANDN.NutzNr<=TA_NK.max;

runner cur%ROWTYPE;dauer FLOAT;

BEGINOPEN cur;LOOP

FETCH cur INTO runner;EXIT WHEN cur%NOTFOUND;IF(runner.nart=’AG’) THEN -- grainIF (runner.krmm<1.5) THEN

dauer:=0.14*runner.nfkwe+14.3;ELSIF ... -- other values for krmmELSE dauer:=60.0;END IF;

ELSIF(runner.nart=’MA’ OR runner.nart=’ZS’) THEN -- corn,IF (runner.krmm<1.5) THEN -- sugar beets

... -- other types of benficial useEND IF;

END IF;UPDATE RasterAll10SET ta=dauerWHERE pnr=runner.pnr;

END LOOP;CLOSE cur;

END;

Figure 9.12:Method for computation of capillary elevation (excerpt)

CURSOR cur ISSELECT B.SNr AS SNrBoden,

N.SNr AS SNrNutzung,SDO_GEOM.SDO_POLY_INTERSECTION(B.Geometrie,D.diminfo,

N.Geometrie,D.diminfo) AS GeometrieFROM BodenFlaeche B,

NutzungFlaeche N,USER_SDO_GEOM_METADATA D

WHERE D.table_name=’BodenFlaeche’ andMDSYS.SDO_FILTER(B.Geometrie,N.Geometrie,

’querytype = JOIN’)=’TRUE’;

Figure 9.13:Cursor definition for overlay of vector layers


CURSOR cur ISSELECT a.PNr AS PNr, b.SNr AS SNrFROM Raster10 a , BodenFlaeche bWHERE MDSYS.SDO_RELATE(b.Geometrie, a.Rasterpunkt,

’mask=CONTAINS querytype=JOIN’)=’TRUE’;

Figure 9.14:Cursor definition for overlay of raster and vector layer

Actually this cursor generates a table of spatially related objects from the vectorand raster layer. If this relationship is used more than once (as is frequently the case inSTOSTA), it is better to materialize it than to compute it several times, since spatial joinsare computationally intensive, so that space for the materialization is usually preferredto executing a join again.

Details and comprehensive examples as well as visualizations of the results for theseapplications can be found in [PKL00]. They are presented here to motivate the generalmodel for application methods presented in the next section that could be used as basisfor a domain method management system. This model will illustrate the benefit of thedata types and physical optimizations described in chapters 7 and 8 of this work. Whilethe spatial data types have already been used extensively, the next section will also showthe importance of temporal and spatio-temporal data types in STOSTA.

9.3.2 General Model for Spatio-Temporal Applications

For interdisciplinary research activities, e. g. spatial decision processes in coastal zonemanagement a web-based information system is required. This information systemshould on one hand enable the user to get information about the data and methods sup-plied by the different participants, as well as registering of new data and methods byauthorized users. To model such an information system we need to model the metadataabout data and methods provided. In this sense we deal with a meta-meta-model. Lotof work about metadata has been published recently (e. g. [CHHS97] or [GL98]) but wefeel that most of this work does not address modeling of spatio-temporal methods andtheir internal structure sufficiently.

We present a conceptual meta-model for spatio-temporal evaluation methods in ap-plications from geography and geobotany. For a future interdisciplinary web-based envi-ronmental information system firstly application specific methods from the field of soilprotection (cf. second case study in section 6.3 and description of methods in [M¨ul97])with respect to their spatial and temporal behavior were analyzed. In particular the mainentities (method, linking rule, domain attributeand layer) in the application methodswere identified and their relationships are modeled. After that the spatial and temporalbehavior of each entity is examined and layers are used for domain attributes of differ-ent spatial or temporal granularity. In a concrete implementation these attributes willuse the data types introduced in chapter 7 such asVT INTEGER for temporal attributesandVT 2DGEOMETRY for spatio-temporal attributes. For an efficient implementation


the optimizations from chapter 8 should be added. The model is validated briefly byshowing how to implement the computation of infiltration water rate with our model(cf. section 6.3). Details of this and other validations by instantiating the proposed classmodel can be found in [Kle00]. Finally an extension of the model for applications indifferent domains or with different spatio-temporal requirements (such as continuoustemporal information) is also possible.

Class Model for Evaluation Methods

An overview of the conceptual model showing the main classes of an application systemis presented in figure 9.15. The model shows attributes only. This is not surprisingsince it is a meta-model whose classes model objects which are attributes, linking rulesand methods themselves. Thus we cannot expect to have many class methods at thathigh level of abstraction. In figure 9.16 we present a structured analysis of the classespresented in the model overview. Here we focus on specializations and associationsbetween the classes resulting in few class attributes and class methods. This is again dueto the high level of abstraction of the presented meta-model.

method

namecontentsplanungsinstrumentasOfsourceusedForScale

linking rule

namecontentsasOfsourcecommentary

1..*1..* 1..*1..*

usesdomain attribute

kennwertunitOfMeasurement

1..10..1 1..10..1

is result of

1..*0..* 1..*0..*

is input for

attribute layer

namerangegranularity1..*1..1

is modeled in

1..*1..1

Figure 9.15:Overview of the class model for domain methods

A geo-application system basically consists ofdomain attributes, linking rules andmethods. Some of the domain attributes are known in advance (e. g. by measurement),others are computed within or as results of the application-specific evaluation methods.

An evaluationmethodis a complete description of a calculation starting with someinitial (often measured) domain attributes and resulting in the computation of an impor-tant other domain attribute. The initial attributes are often in a different spatial and tem-poral resolution than the result. One main feature for a calculation to be called methodis that the computed result is of practical importance for the application domain; thatmeans for instance that the domain specialist wants to create a map of the distributionand values of the computed attribute for the examined region.

A linking rule is also a complete description of a calculation but at a smaller granu-larity. A method is composed of several linking rules. In a linking rule the applicationsystem also combines certaindomain attributesto obtain other domain NKIattributes. Ina regular linking rule there are no sub-computations: it simply combines input attributesin a defined manner to output attributes. This explains the cardinalities in figure 9.15. Inthe sequel we will analyze the class linking rule in more detail.


Detailed Model for Linking Rule

Structurally we can divide linking rules into classifying linkings, where the computedattribute is output according to a certain classification scheme, and exact linkings wherethe results are values from a continuous domain.

classifying computation classifying table−lookup exact computation exact computation with table−lookup

classifying linking

classified attribute

1..*

0..1

1..*

0..1

is result of

linking rulenamecontentsasOfsourcecommentary

domain attributekennwertunitOfMeasurement0..* 1..10..* 1..1

classifies

1..*

0..*

1..*

0..*

is input for

observed attribute

exact linking

computed attribute

1..*

0..1

1..*

0..1

is result of

Figure 9.16:Specialization of linking rules by computational structure

The classifyinglinkings are a very common form of rules, both in the final stagesof a method in order to produce well-visualizable results, and within a method becausecertain linkings are just not known exact enough to be able to state mathematically ex-act formulas. These linkings are computed by using e. g. empirical values for certaininput configurations from tables. The two different objectives lead to the two differentsubclasses in figure 9.16.Classifying Computationsusually dominate within the formerlinkings: after an exact computation according to some formula the results are classi-fied using a certain classification instance. This is useful for presentation purposes orto be able to apply some other non-exact linking rule in the method thereafter.Clas-sifying Table-Lookupsobtain the result attribute by reading it from a (possibly multi-dimensional) table using the input attribute values. The main reason for these rules areagain domain-specific deficiencies in knowledge about real-world processes. For thatreason one uses discrete empirical knowledge instead of formulas.

The exact linkingsusually use exact formulas to compute the desired domain at-tributes. This kind of linking is the classical way to compute new features. Still thisclass can also be subdivided into pure exact computations and computations which arefollowed by a table-lookup. This table-lookup may classify the computed attribute toobtain aclassified attributeor may also be used to lookup certain necessary adjustmentsfor the purely computed value.


Domain attributescan be either some quantity observed in the real world which wecall observed attributeor be the result of some linking rule. In this case we use aninstance ofcomputed attribute. In either case attributes may be obtained from someclassification. Therefore we introduce the classclassified attribute. One reason for thisnew class is that one domain attribute may be classified by different classifications re-sulting in different classified attributes. The other is that classifications seem to play amajor role in geo-applications and therefore their associations should be worked out asdetailed as possible.

Orthogonal to the structural classification of linking rules is a specialization by theirspatial and temporal behavior (see figure 9.17). In that perspective there are at least fivecategories:temporally aggregating, spatially-overlaying, spatially-aggregating, spatiallyintersectingandspatio-temporally invariantlinkings. In the applications we analyzedso far we only found those categories, but in a more general setting more categories maybe important.

The first category oftemporally aggregatinglinking rules transforms temporallyhighly-resoluted input data (e. g. time series of climate data) into aggregated data thatcan be linked together with temporally invariant or lowly-resoluted data in subsequentrules. Thespatially-overlayingrules combine input or derived data of different spatialresolution or distribution but equal spatial dimension into new attributes (cf. vector-vector computations in section 9.3.1). These typically obtain higher spatial resolutionthan the input data (or the resolution of the highest-resoluted input attribute, if that is apartition of the resolution of the other input attributes).

A spatially-overlayinglinking combines attributes over continuously-distributed spa-tial objects (i. e. vector data) with attributes over discretely-distributed space (i. e. rasterdata). In most applications the result of these rules is given in a discretely-distributed(meaning lower dimensional) form; this is not necessary, however, and one could alsoimagine results being spatial objects of uniform result values. Finally thespatio-tempo-rally invariant linking rules combine domain attributes of the same spatial and temporalresolution and distribution into new attributes. These rules usually occur in the earliersteps of a method when measured base data is transformed into important derived basedata of the same granularity to be used in other linking rules. Non-spatial and non-temporal applications use only these types of linking rules.

A layer is used to augment the domain attributes with spatial and temporal informa-tion. A layer is a complete description of a domain attribute over a spatial and tempo-ral range. Objects of class layer combine knowledge about a certain domain attributewith spatial and temporal information resulting in knowledge where and when a cer-tain attribute has a certain value. A more detailed specialization of layers by the spatialand temporal information contained is shown in figure 9.18. The particular specializa-tion can be used to determine the data types of the corresponding domain attributes.If e. g. a numerical attribute is modeled in a timestamped7, non-spatial layer the datatype VT INTEGER as introduced in chapter 7 should be used. If the layer is times-

7The applications we investigated used valid time only; bitemporal versions may also be used if re-quired in the particular domain.


linking rule

temporally aggregating linking

spatially overlaying linking

spatially intersecting linking

spatio−temporally invariant linking

spatially aggregating linking

Figure 9.17:Specialization of linking rules by spatio-temporal behavior

non−temporal layer timestamped layer time−continuous layer

temporal structure

rastered layer vectorized layer non−spatial layer

spatial structure

attribute layernamerangegranularity

domain attribute

1..*1..1 1..*1..1

is modeled in

Figure 9.18:Introducing spatio-temporal data in layers

tamped and vectorized a data typeVT 2DGEOMETRY INTEGER should be used for theattribute. This type can be defined by the orthogonal combination ofVT INTEGER and2DGEOMETRY as explained in chapter 7. The physical optimizations for efficient querysupport can easily be added by using the extensible framework from chapter 8.

A layer can spatially describe domain attribute values which are valid in a certainspatial range with extension. Such layers and their domain attributes are usually mod-eled by a vector representation of the spatial extents (vectorized layer); this is achievedby using the data type2DGEOMETRY or similar (cf. chapter 7). It could also containattributes valid at certain discrete points in space resulting in arastered layer(e. g. usingtype2DPoint) or it describes a domain attribute whose validity is not spatially dependentwhich we callnon-spatial layer.

Orthogonally in the temporal dimension a layer can assemble domain attributes withvalues changing only at certain points in time; we call themtimestamped layer. This


includes purely timestamped data (e. g. temperature at 12 noon every day) as well asdata that is measured only at discrete points in time and is assumed to be constant (orlinearly interpolated) in between (e. g. land use being updated every year). For do-main attributes in these layers temporal data types such asVT INTEGER will be used.We can also have layers of temporally invariant attributes (non-temporal layer). Allthese types of layers operate on valid-time only due to the requirements of our applica-tions. Extensions to layers with continuously changing values (time-continuous layer)or bitemporal/transaction time information may be required in other applications. Thetransaction time would lead to a third orthogonal specialization of the classlayer. Thesein turn might lead to additional specializations oflinking rule as well.

The above model resulted from analysis of only part of all the applications whichshould later be included in the web-based domain information system. Even if exten-sions may be required for the final conceptual model, we claim that the most importantfeatures are accounted for. Moreover the missing parts of the applications can be intro-duced into our model by the concept of specialization similar to the above specifications.

An Example: Infiltration Water Rate

Theinfiltration water rate(cf. section 6.3) can be defined in terms of the proposed classmodel as shown in figure 9.19. A detailed description of the object model as instantiationof the class model for this method can be found in [Kle00]; that object model shows thevalidity of the class model for this domain attribute. Since this is not the main topic ofthis work details are omitted.

An overview of all required layers is given in figure 9.19. The three-dimensionalinformation inBasicSoilData is spatially aggregated to obtain two-dimensional infor-mation in layerSoilData. This information is vectorized since it is valid for a spatiallyextended region. After that it is overlaid with observed layerBeneficialUse and leadsto a new vectorized layerOverlayUseSoil which is now timestamped with the temporalinformation fromBeneficialUse. This new layer is then overlaid with layerPotentialE-vapoTranspiration which was obtained by temporal aggregation from the observed layerClimate. The result of this overlay (BasicInfiltration) finally has to be intersected with therastered observed layer of slope information. The final result layerInfiltrationWaterRateis consequently also rastered and carries the temporal information of layerBasicInfiltra-tion.

Conclusion

All the above computations belong to one single method whose purpose is the com-putation of theinfiltration water rate. All the details of the linking rules are explainedin [Pfa00] where they were implemented on top of Oracle 8i . We also verified ourproposed model with other methods taken from [M¨ul97] as well as with some simplemethods from geobotany. The conceptual model presented could be used as basis fordevelopment of an environmental information system for methods from physical geog-raphy, geobotany, land surveying and cartography. In such a future method management


no time vector

SoilData

PotentialEvapoTranspiration

timestamped vector

timestamped vector

BasicInfiltration

OverlayUseSoil

timestamped vector

BasicSoilData

no time vector

Climate

timestamped vector

BeneficialUse

timestamped vector

raster

Slope

no time

timestamped raster

InfiltrationWaterRate

observed layer

intermediate layer

result layer

aggregating linking

overlaying linking

intersecting linking

Figure 9.19:Overview of layers required to compute the infiltration water rate

system data types of domain attributes in intermediate and result layers can be automat-ically determined by the types of attributes in the observed layers and the definition oflinking rules. Since spatial, temporal and spatio-temporal attributes are used, many ofthe data types described in chapter 7 will be used. The physical optimizations of chapter8 which can to a large degree be added automatically by the system will provide the end-user with an easy to use system for defining domain-specific methods that in addition isefficiently supported by the ORBDS.

The integration of modeling of dynamic processes as methods is not reflected inour model yet. It will be an important addition facilitating the applicability in a muchwider range of applications, since many real world processes are inherently dynamic.Yet many applications lack a formal description of the processes under research, so thata formal model for these would be difficult to develop.

Chapter 10

Summary and Outlook

This thesis started with introductions to the topics which are the foundation for this workin part I. For the more important parts such as physical modeling the current status ofresearch was also described. Thereafter part II started with the description of two moti-vating applications from the area of STOSTA. One was a scientific the other a scientificas well as commercial application. The important insights gained in these domains wereobtained by a close cooperation between computer science and domain experts. For fu-ture work such cooperations should be extended to other subjects as well. This requiresthe willingness for such cooperations on both sides. For the maximum benefit of bothdomain and computer science experts cooperation should be strived for to a greater ex-tent than so far. The application domains used in this work are positive exceptions ascan be seen by the presentation earlier.

After the description of the motivating applications new data types to be used forSTOSTA on the conceptual and logical levels were introduced. The basic idea was tocombine temporal with domain-specific or spatial information into a single new datatype. The advantage is that all information belonging to a certain feature of a real-worldobject can be stored in a single attribute in this case. Even though several different datatypes were already introduced other applications will probably require other and morecomplex data types. The technique of combining different information domains into onedata type can nevertheless still be applied in a similar manner.

The largest part of this work focused on the physical implementation of the newlyintroduced data types for STOSTA in ORDBS. Whereas for pure 2D spatial data thesystem-provided indexes provided optimal performance, the other data types have tobe indexed by user-defined indexes since no system-provided indexes are available andquery performance is not acceptable without. Since different indexes for each data typewere tested and moreover indexes for different data types were required, an extensibleindexing framework was used. This will also be helpful when even more data typeswill be required for other applications as explained above. This extensible frameworkhad to be adapted to the ORDBS provided interface for user-defined indexes in order tohave minimal work for every new data type and index. This connector tool (OraGiST)should be enhanced in a future version to integrate filter and refinement step in a betterway in query execution in order to get rid of the huge call overhead incurred by the

184 Summary and Outlook

external calls to database functions. This should improve performance for data typeswith computationally complex operators in particular.

Nevertheless the results presented earlier already yield very good performance forselection queries on the data types introduced in this work. In particular the combinationof R∗-Tree with distance-based penalty metric as known from the SS-Tree led to goodresults for different data types. Consequently this technique seems to be very promis-ing for selection queries. Future work should also experiment with other data types,querytypes and index structures. Especially the embedding of other highly specializedindexes into GiST and thus ORDBS could lead to better performance for specific datatypes or operators. Data Type-specific joins need to be researched in greater detail, asuser-defined indexes could also be used for computation. But the extensibility featuresof ORDBS need vendor-side support for this kind of join processing.

Since the Z-Code indexes were not implemented in an extensible framework in thiswork, they could not be used for all data types. An extension of this technique, so thatit could be used for different data types and thus operators could be interesting. Alsomultiple indexes on a single column will also be required for more complex data types inthe future. Complex data types need different indexes for the many different operators.Until now commercial DBS do not permit multiple indexes on a single column. Thiswould be a major addition, since cost and selectivity estimation have to be extended inthat case as well. The optimizer has more execution plans available than just using theone index or a full table scan as of today. Thus the discussion of cost and selectivityestimation will have to be intensified in the future and possibly completely extensibleoptimizers will be required.

In part III of this thesis application development with ORDBS was examined. Thedata types evaluated in part II were used for the previously introduced domains and otherapplications for these domains using ORDBS were also discussed. Obviously many ap-plications could be of great help for each of the different application domains. Thus it isimpossible to discuss them all. Among the more urgent applications are an extension ofthe visualizer for spatial data to spatio-temporal data (cf. ideas in [CKR00]) and moreuser-friendly interfaces to interact with temporal data. This could be done on a snapshotbasis, where the user can see the development of the dataset over time, or choose to dis-play the database contents at a certain time. As mentioned earlier transaction time infor-mation needs more advanced strategies for a user interface, since it should be transparentto the end user. Another important development would be the integration of the differ-ent components (visualization, domain-specific application and data exchange) into anintegrated system. Also based on the foundation presented in section 9.3.2 advancing ofspatio-temporal domain method management systems could be of great value.

Finally the ideas on using user-defined data types for STOSTA based on ORDBSneed to be better integrated into the whole development process and tools. The definitionof data types in the database can be automated to a great degree; also this holds for thedefinition of the other user-defined additions (indexes, cost estimation). Thus CASEtools should include the automatable parts in order to make the use of user-defined typesas easy for the application developer as possible and consequently lead to a wider rangeof applications using these promising techniques.

Bibliography

[AAE00] P. K. Agarwal, L. Arge, J. Erickson: Indexing Moving Points. InPODS 2000- Proceedings of the Nineteenth ACM SIGACT-SIGMOD-SIGART Symposium onPrinciples of Database Systems, PODS 2000, Dallas, May 15-17, 2000, ACMPress, Baltimore, MD, 2000, 175–186.

[ACN+99] F. Arcieri, C. Cammino, E. Nardelli, M. Talamo, A. Venza: The Italian CadastralInformation System: a Real-Life Spatio-Temporal DBMS (Extended Abstract). InM. Bohlen, C. Jensen, M. Scholl (eds.),Spatio-Temporal Database Management– Int. Workshop STDBM’99, Edinburgh, Sept. 10-11, 1999, LNCS 1678, Springer-Verlag, Berlin, 1999, 79–99.

[AdV02] Arbeitsgemeinschft der Vermessungsverwaltungen der Bundesrepublik Deutsch-land. Dokumentation zur Modellierung der Geoinformationen des amtlichen Ver-messungswesens (GeoInfoDok) Version 1.0. 2002.

[AG97] N. R. Adam, A. Gangopadhyay:Database Issues in Geographic InformationSystems. The Kluwer Int. Series on Advances in Database Systems, Kluwer Aca-demic Publishers, Boston, 1997.

[AN00] A. Aboulnaga, J. F. Naughton: Accurate Estimation of the Cost of Spatial Selec-tions. InSixteenth International Conference on Data Engineering (ICDE’00), 28February - 3 March 2000, San Diego, IEEE Computer Society Press, Los Alami-tos, CA, 2000, 123–134.

[Aok98] P. Aoki: Generalizing ’Search’ in Generalized Search Trees. InFourteenth In-ternational Conference on Data Engineering (ICDE’98), Orlando, Febr. 23-27,1998, IEEE Computer Society Press, Los Alamitos, CA, 1998, 380–391.

[APR+98] L. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, J. S. Vitter: Scalable Sweeping-Based Spatial Join. In A. Gupta, O. Shmueli, J. Widom (eds.),VLDB’98 –Proceedings of the Twenty-fourth International Conference on Very Large DataBases, New York, Aug 24-27, 1998, Morgan Kaufmann Publishers, San Francisco,1998, 570–581.

[AR99] T. Abraham, J. F. Roddick: Survey of Spatio-Temporal Databases.GeoInforma-tica 3:1 (1999), 61–99.

[ASV99] L. Arge, V. Samoladas, J. S. Vitter: On Two-Dimensional Indexability andOptimal Range Search Indexing. InPODS 1999 - Proceedings of the 18th

186 BIBLIOGRAPHY

ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems,Philadelphia, May 31 - June 2, 1999, ACM Press, Baltimore, 1999, 346–357.

[ATSS93] K. K. Al-Taha, R. T. Snodgrass, M. D. Soo: Bibliography on SpatiotemporalDatabases.SIGMOD Record 22:1 (1993), 59–67.

[AV96] L. Arge, J. S. Vitter: Optimal Dynamic Interval Management in External Memory.In I. C. Society (ed.),Proceedings of the 37th Annual Symposium on Foundationsof Computer Science, FOCS ’96, IEEE Computer Society Press, 1996, 560–569.

[AYS01] N. An, Z.-Y. Yang, A. Sivasubramaniam: Selectivity Estimation for Spatial Joins.In 17th International Conference on Data Engineering (ICDE’01), 2-6 April 2001,Heidelberg, IEEE Computer Society Press, Los Alamitos, CA, 2001, 368–375.

[Bar00] N. Bartelme:Geoinformatik – Modelle, Strukturen, Funktionen – Third Edition.Springer-Verlag, Berlin, 2000.

[BBKM00] C. Bohm, S. Berchtold, H.-P. Kriegel, U. Michel: Multidimensional Index Struc-tures in Relational Databases.Journal of Intelligent Information Systems 15:1(2000), 51–70.

[BDF+99] P. Baumann, A. Dehmel, P. Furtado, R. Ritsch, N. Widmann: Spatio-TemporalRetrieval with RasDaMan. In M. Atkinson, M. Orlowska (eds.),VLDB’99 – Pro-ceedings of the 25th Intl. Conference on Very Large Data Bases, Ediburgh, Sept7-10, 1999, Morgan Kaufmann Publishers, San Francisco, 1999, 746–749.

[Bei01] F. Beier: Objekt-relationale Realisierung einer temporalen Datenbanksprache.Master’s thesis, Institut f¨ur Informatik, Universitat Hannover, 2001.

[Ben77] J. L. Bentley: Algorithms for Klee’s Rectangle Problems. Technical report,Carnegie Mellon University, Pittsburgh, 1977.

[BF95] A. Belussi, C. Faloutsos: Estimating the Selectivity of Spatial Queries Using the‘Correlation’ Fractal Dimension. In J. Paredaens, P. Peelman, L. Tanca (eds.),Merging Graph Based and Rule Based Computation, 1995.

[BG90] G. Blankenagel, R. H. G¨uting: Internal and External Algorithms for the Points-in-Regions Problem - the INSIDE Join of Geo-Relational Algebra.Algorithmica 5(1990), 251–276.

[BJ96] M. H. Bohlen, C. S. Jensen: Seamless Integration of Time into SQL. TechnicalReport R-96-2049, Aalborg University, Department of Computer Science, 1996.

[BJS95] M. Bohlen, C. Jensen, R. Snodgrass: Evaluating the Completeness of TSQL2.In J. Clifford, A. Tuzhilin (eds.),Recent Advances in Temporal Databases –Proceedings of the International Workshop on Temporal Databases, Sept. 1995,Workshops in Computing, Springer-Verlag, Berlin, 1995, 153–174.

[BJS99] M. Bohlen, C. Jensen, M. Scholl (eds.):Spatio-Temporal Database Manage-ment – Int. Workshop STDBM’99, Edinburgh, Sept. 10-11, 1999. LNCS 1678,Springer-Verlag, Berlin, 1999.

BIBLIOGRAPHY 187

[BJSS98] R. Bliujute, C. S. Jensen, S. Saltenis, G. Slivinskas: R-Tree Based Indexingof Now-Relative Bitemporal Data. In A. Gupta, O. Shmueli, J. Widom (eds.),VLDB’98 – Proceedings of the Twenty-fourth International Conference on VeryLarge Data Bases, New York, Aug 24-27, 1998, Morgan Kaufmann Publishers,San Francisco, 1998, 345–356.

[BJW00] C. Bettini, S. Jajodia, S. X. Wang:Time Granularities in Databases, Data Mining,and Temporal Reasoning. Springer-Verlag, Berlin, 2000.

[BKK96] S. Berchtold, D. Keim, H.-P. Kriegel: The X-tree: An Index Structure for High-Dimensional Data. In T. Vijayaraman, A. Buchmann, C. Mohan, N. Sarda (eds.),VLDB’96 – Proceedings of the 22nd International Conference on Very Large DataBases, Sept. 3-6, 1996, Bombay, Morgan Kaufmann Publishers, San Francisco,1996, 28–39.

[BKS93] T. Brinkhoff, H.-P. Kriegel, B. Seeger: Efficient Processing of Spatial Joins UsingR-Trees. In P. Buneman, J. Sushil (eds.),Proceedings of the 1993 ACM SIGMODInternational Conference on Management of Data, SIGMOD Record 2, ACMPress, New York, 1993, 237 – 246.

[BKSS90] N. Beckmann, H.-P. Kriegel, R. Schneider, B. Seeger: The R∗-Tree: An Efficientand Robust Access Method for Points and Rectangles. In H. Garcia-Molina, H. V.Jagadish (eds.),Proceedings of the 1990 ACM SIGMOD International Conferenceon Management of Data, ACM Press, New York, 1990, 322–331.

[BM00] T. Baumann, A. Mielich: Informix Dynamic Server.2000 – Architektur undDataBlade-Technologie. dpunkt-Verlag, 2000.

[Bro01] P. Brown: Object-Relational Database Development: A Plumber’s Guide. Pren-tice Hall, 2001.

[BSSJ99] R. Bliujute, S. Saltenis, G. Slivinskas, C. S. Jensen: Developing a DataBlade for aNew Index. In M. Kitsuregawa, L. Maciaszek, M. Papazoglou, C. Pu (eds.),15th

International Conference on Data Engineering (ICDE’99), March 23-26, 1999,Sydney, IEEE Computer Society Press, Los Alamitos, CA, 1999, 314–327.

[Car02] J. R. Carr:Data Visualization in the Geosciences. Prentice Hall, 2002.

[CCD99] M. Carey, D. Chamberlin, D. Doole: O-O, What Have They Done to DB2? InM. Atkinson, M. Orlowska (eds.),VLDB’99 – Proceedings of the 25th Interna-tional Conference on Very Large Data Bases, Ediburgh, Sept 7-10, 1999, MorganKaufmann Publishers, San Francisco, 1999, 542–553.

[CCF+99] W. Chen, J.-H. Chow, Y.-C. Fuh, J. Grandbois, M. Jou, N. Mattos, B. Tran,Y. Wang: High Level Indexing of User-Defined Types. In M. Atkinson, M. Or-lowska (eds.),VLDB’99 – Proceedings of the 25th International Conference onVery Large Data Bases, Ediburgh, Sept 7-10, 1999, Morgan Kaufmann Publish-ers, San Francisco, 1999, 554–564.

[CDFO93] E. Clementini, P. Di Felice, P. v. Oosterom: A Small Set of Formal TopologicalRelationships for End-User Interaction. In D. Abel, B. C. Ooi (eds.),Advances inSpatial Databases, LNCS 692, Springer-Verlag, Berlin, 1993, 277–295.

188 BIBLIOGRAPHY

[CDI+97] J. Clifford, C. Dyreson, T. Isakowitz, C. S. Jensen, R. T. Snodgrass: On theSemantics of ’Now’ in Databases.ACM Transactions on Database Systems 22:2(1997), 171–214.

[Cha96] D. Chamberlin:Using the New DB2 – IBM’s Object-Relational Database System.Morgan Kaufmann Publishers, San Francisco, 1996.

[Che76] P. P. Chen: The Entity-Relationship Model — Towards a Unified View of Data.ACM Transactions on Database Systems 1:1 (1976), 9–36.

[CHHS97] S. Conrad, W. Hasselbring, A. Heuer, G. Saake (eds.):Engineering FederatedDatabase Systems EFDBS’97 - Proceedings of the International CAiSE’97 Work-shop, Barcelona, June 16-17, 1997. Preprint Nr. 6, Otto-von-Guericke-Universit¨atMagdeburg, 1997.

[CKR00] M. Cai, D. Keshwani, P. Z. Revesz: Parametric Rectangles: A Model for Queryingand Animation of Spatiotemporal Databases. In C. Zaniolo, P. C. Lockemann,M. H. Scholl, T. Grust (eds.),Advances in Database Technology – EDBT 2000 -7th Int. Conference on Extending Database Technology, Konstanz, March 2000,LNCS 1777, Springer-Verlag, Berlin, 2000, 430–444.

[CR99] J. Chomicki, P. Z. Revesz: Constraint-based Interoperability of SpatiotemporalDatabases.GeoInformatica 3:3 (1999), 211–243.

[CZ00] C. X. Chen, C. Zaniolo: SQL ST: A Spatio-Temporal Data Model and QueryLanguage. In A. Laender, S. Liddle, V. Storey (eds.),Conceptual Modeling - ER2000 - 19th Int. Conference on Conceptual Modeling, Salt Lake City, Utah, USA,October 9-12, 2000, LNCS 1920, Springer-Verlag, Berlin, 2000, 96–111.

[CZ01] A. B. Chaudhri, R. Zicari:Succeeding with Object Databases – A Practical Lookat Today’s Implementations with Java and XML. Wiley & Sons, New York, 2001.

[Dat00] C. Date:An Introduction to Database Systems, Seventh Edition. Addison-Wesley,Reading, MA, 2000.

[dBvKOS97] M. de Berg, M. van Kreveld, M. Overmars, O. Schwarzkopf:ComputationalGeometry – Algorithms and Applications. Springer-Verlag, Berlin, 1997.

[DD98] C. J. Date, H. Darwen:Foundation for Object-Relational Databases – The ThirdManifesto. Addison-Wesley, Reading, MA, 1998.

[DGS+90] D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. Hsiao, R. Ras-mussen: The Gamma Database Machine Project.IEEE Transactions on Knowl-edge and Data Engineering 2:1 (1990), 44–62.

[Ede83] H. Edelsbrunner: A New Approach to Rectangle Intersection.International Jour-nal of Computer Mathematics 13 (1983), 209–229.

[EEAK91] R. Elmasri, I. El-Assal, V. Kouramajian: Semantics of Temporal Data in an Ex-tended ER Model. In H. Kangassalo (ed.),Proceedings of the 9th InternationalConference on the Entity-Relationship Approach - ER 1990, Elsevier Science Pub-lishers, Amsterdam, The Netherlands, 1991, 239–254.

BIBLIOGRAPHY 189

[EFKS00] M. Ester, A. Frommelt, H.-P. Kriegel, J. Sander: Spatial Data Mining: DatabasePrimitives, Algorithms and Efficient DBMS Support.Data Mining and Knowl-edge Discovery 4:2/3 (2000), 193 – 216.

[EGH+92] G. Engels, M. Gogolla, U. Hohenstein, K. Hülsmann, P. Löhr-Richter, G. Saake,H.-D. Ehrich: Conceptual modelling of database applications using an extendedER model.Data & Knowledge Engineering 9:2 (1992), 157–204.

[EGSV98] M. Erwig, R.-H. Güting, M. Schneider, M. Vazirgiannis: Abstract and DiscreteModeling of Spatio-Temporal Data Types. In R. Laurini (ed.),Proceedings ofthe 6th international symposium on advances in geographic information systems,ACM Press, 1998, 131–136.

[EGSV99] M. Erwig, R. H. Güting, M. Schneider, M. Vazirgiannis: Spatio-Temporal DataTypes: An Approach to Modeling and Querying Moving Objects in Databases.GeoInformatica 3:3 (1999), 269 – 296.

[EH90] M. Egenhofer, J. Herring: A Mathematical Framework for the Definition of Topo-logical Relationships. In K. Brassel, H. Kishimoto (eds.),Proceedings of the 4thInternational Symposium on Spatial Data Handling (SDH 1990), InternationalGeographical Union IGU, Columbus, Ohio, 1990.

[EJS98] O. Etzion, S. Jajodia, S. Sripada (eds.):Temporal Databases: Research andPractice. LNCS 1399, Springer-Verlag, Berlin, 1998.

[ELS97] G. Evangelidis, D. Lomet, B. Salzberg: The hB-tree: a multi-attribute indexsupporting concurrency, recovery and node consolidation.The VLDB Journal 6:1(1997), 1–25.

[EN00] R. Elmasri, S. B. Navathe:Fundamentals of Database Systems (Third Edition).World Student Series, Addison-Wesley, Reading, MA, 2000.

[ES99] M. Erwig, M. Schneider: The Honeycomb Model of Spatio-Temporal Partitions.In M. Bohlen, C. Jensen, M. Scholl (eds.),Spatio-Temporal Database Manage-ment – Int. Workshop STDBM’99, Edinburgh, Sept. 10-11, 1999, LNCS 1678,Springer-Verlag, Berlin, 1999, 39–59.

[ESG99] M. Erwig, M. Schneider, R.-H. Güting: Temporal Objects for Spatio-TemporalData Models and a Comparison of Their Representation. In Y. Kambayashi,D. Lee, E.-P. Lim, M. Mohania, Y. Masunaga (eds.),Advances in Database Tech-nologies - ER’98 Workshops on Data Warehousing/Data Mining,Mobile Data Ac-cess,Collaborative Work Support,Spatio-Temporal Data Management, Singapore,1998, LNCS 1552, Springer-Verlag, Berlin, 1999.

[EWK93] R. Elmasri, G. T. Wuu, V. Kouramajian: A Temporal Model and Query Languagefor EER Databases. In A. Tansel, J. Clifford, S. Gadia, S. Jajodia, A. Segev,R. Snodgrass (eds.),Temporal Databases: Theory, Design, and Implementation,Database Systems and Applications Series, Benjamin/Cummings, Redwood City,CA, 1993, 212–229.

190 BIBLIOGRAPHY

[Fal00] S. Falke: Entwurf und Implementierung einer räumlichen Datenbank für ATKIS-Daten mit Datenimport. Studienarbeit, Institut für Informatik, Universitat Han-nover, 2000.

[FGNS00] L. Forlizzi, R. H. Güting, E. Nardelli, M. Schneider: A Data Model and DataStructures for Moving Objects Databases. In W. Chen, J. Naughton, P. A. Bern-stein (eds.),Proceedings of the 2000 ACM SIGMOD International Conference onManagement of Data, Dallas, May 16-18, 2000, SIGMOD Record 2, ACM Press,New York, 2000, 319–330.

[For99] P. J. Fortier:SQL 3 - Implementing the SQL Foundation Standard. McGraw-HillEnterprise Computing Series, McGraw-Hill, New York, 1999.

[GBEJL+00] R. Guting, M. Bohlen, M. Erwig, C. Jensen, N. Lorentzos, M. Schneider,M. Vazirgiannis: A Foundation for Representing and Querying Moving Objects.ACM Transactions on Database Systems 25:1 (2000), 1–42.

[GBG99] M. Gyssens, J. v. d. Bussche, D. v. Gucht: Complete Geometric Query Languages.Journal of Computer and System Sciences 58:3 (1999), 483–511.

[GdRS95] R.-H. Güting, T. de Ridder, M. Schneider: Implementation of the ROSE Algebra:Efficient Algorithms for Realm-Based Spatial Data Types. In M. J. Egenhofer,J. R. Herring (eds.),Advances in Spatial Databases – 4th Int. Symposium SSD’95,Portland, USA, August 1995, LNCS 951, Springer-Verlag, Berlin, 1995, 216–239.

[GG98] V. Gaede, O. Günther: Multidimensional Access Methods.ACM ComputingSurveys 30:2 (1998), 170–231.

[GJ99] H. Gregersen, C. S. Jensen: Temporal Entity-Relationship Models – A Survey.IEEE Transactions on Knowledge and Data Engineering 11:3 (1999), 464–497.

[GL98] S. Gobel, K. Lutze: Development of meta databases for geospatial data in theWWW. In R. Laurini (ed.),Proceedings of the 6th international symposium onadvances in geographic information systems, ACM Press, 1998, 94–99.

[GLL98] Y. J. Garcia, M. A. Lopez, S. T. Leutenegger: On Optimal Node Splitting forR-trees. In A. Gupta, O. Shmueli, J. Widom (eds.),VLDB’98 – Proceedings ofthe Twenty-fourth International Conference on Very Large Data Bases, New York,Aug 24-27, 1998, Morgan Kaufmann Publishers, San Francisco, 1998, 334–344.

[GMUW00] H. Garcia-Molina, J. D. Ullman, J. Widom:Database System Implementation.Prentice Hall, Upper Saddle River, NJ, 2000.

[GPSSO+99] O. Gunther, P. Picquet, J.-M. Saglio, M. Scholl, V. Oria: Benchmarking spa-tial joins a la carte.International Journal of Geographical Information Science(IJGIS) 13:7 (1999), 639 – 655.

[GRS98] S. Grumbach, P. Rigaux, L. Segoufin: Spatio-Temporal Data Handling with Con-straints. In R. Laurini (ed.),Proceedings of the 6th international symposium onadvances in geographic information systems, ACM Press, 1998, 106–111.

BIBLIOGRAPHY 191

[GRSS98] S. Grumbach, P. Rigaux, M. Scholl, L. Segoufin: DEDALE, A Spatial Con-straint Database. In S. Cluet, R. Hull (eds.),Database Programming Languages– 6th International Workshop, DBPL-6, Estes Park, August 1997, LNCS 1369,Springer-Verlag, Berlin, 1998, 38–59.

[GS95] R. H. Guting, M. Schneider: Realm-Based Spatial Data Types: The ROSE Alge-bra. VLDB Journal 4 (1995), 243–286.

[Gun98] O. Gunther: Environmental Information Systems. Springer-Verlag, Berlin, 1998.

[Gut84] A. Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. InB. Yormark (ed.),Proceedings of the 1984 ACM SIGMOD International Confer-ence on Management of Data, ACM Press, New York, 1984, 47–57.

[Gut94] R. H. Guting: An Introduction to Spatial Database Systems.The VLDB Jour-nal 3:4 (1994), 357–400.

[GY88] S. K. Gadia, C.-S. Yeung: A Generalized Model for a Relational TemporalDatabase. In H. Boral, P.-A. Larson (eds.),Proceedings of the 1988 ACM SIG-MOD International Conference on Management of Data, SIGMOD Record 3,ACM Press, New York, 1988, 251–259.

[Hel98] J. M. Hellerstein: Optimization Techniques for Queries with Expensive Methods.ACM Transactions on Database Systems 23:2 (1998), 113–157.

[HG94] G. Hake, D. Gr¨unreich: Kartographie (7.Auflage). de Gruyter Lehrbuch, deGruyter, Berlin, 1994.

[HJ97] Y.-W. Huang, N. Jing: Spatial Joins UsingR-trees: Breadth-First Traversal withGlobal Optimizations. In M. Jarke, M. Carey, K. R. Dittrich, F. Lochovsky,P. Loucopoulos, M. A. Jeusfeld (eds.),VLDB’97 – Proceedings of the Twenty-third International Conference on Very Large Data Bases, Athens, Aug 26-29,1997, Morgan Kaufmann Publishers, San Francisco, 1997.

[HJR97] Y. W. Huang, N. Jing, E. A. Rundensteiner: A Cost Model for Estimating the Per-formance of Spatial Joins Using R-trees. In Y. E. Ioannidis, D. M. Hansen (eds.),9th International Conference on Scientific and Statistical Database Management,August 11-13, 1997, Olympia, Washington, IEEE Computer Society, 1997, 30–38.

[HKP97] J. M. Hellerstein, E. Koutsoupias, C. H. Papadimitriou: On the Analysis of In-dexing Schemes. InPODS 1997 - Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Tucson, May12-14, 1997, ACM Press, Baltimore, MD, 1997, 249–256.

[HNP95a] J. Hellerstein, J. Naughton, A. Pfeffer: Generalized Search Trees for DatabaseSystems. In U. Dayal, P. M. Gray, S. Nishio (eds.),VLDB’95 – Proceedings ofthe 21th International Conference on Very Large Data Bases, Zurich, Sept. 11-15,1995, Morgan Kaufmann Publishers, San Francisco, 1995.

[HNP95b] J. M. Hellerstein, J. F. Naughton, A. Pfeffer: Generalized Search Trees forDatabase Systems. Techn. Report 1274, University of Wisconsin, Madison, 1995.

192 BIBLIOGRAPHY

[HP98] U. Hohenstein, V. Plesser:Oracle8 – Effiziente Anwendungsentwicklung mit ob-jektrelationalen Konzepten. dpunkt-Verlag, Heidelberg, 1998.

[HR99] T. Harder, E. Rahm:Datenbanksysteme – Konzepte und Techniken der Implemen-tierung. Springer-Verlag, Berlin, 1999.

[HT97] T. Hadzilacos, N. Tryfona: An Extended Entity-Relationship Model for Geo-graphic Applications.SIGMOD Record 26:3 (1997), 24 – 29.

[IBM01] IBM Corporation: IBM DB2 Spatial Extender User’s Guide and Reference PartNo SC27-0701-01, 2001.

[Inf01] Informix Corporation: Working with the Geodetic and Spatial DataBlade Mod-ules, 2001.

[JAS00] J. Jin, N. An, A. Sivasubramaniam: Analyzing Range Queries on Spatial Data. In16th International Conference on Data Engineering (ICDE’00), San Diego, IEEEComputer Society Press, Los Alamitos, CA, 2000, 525–534.

[JBR99] I. Jacobson, G. Booch, J. Rumbaugh:The Unified Software Development Process.Addison-Wesley, Reading, MA, 1999.

[JCE+94] C. Jensen, J. Clifford, R. Elmasri, S. Gadia, P. Hayes, S. Jajodia: A ConsensusGlossary of Temporal Database Concepts.SIGMOD Record 23:1 (1994), 52–63.

[JCG+92] C. Jensen, J. Clifford, S. Gadia, A. Segev, R. T. Snodgrass: A Glossary of Tem-poral Database Concepts.SIGMOD Record 21:3 (1992), 35–43.

[JD98] C. J. Jensen, C. E. Dyreson: The Consensus Glossary of Temporal Database Con-cepts - February 1998 Version. In O. Etzion, S. Jajodia, S. Sripada (eds.),Tem-poral Databases: Research and Practice, LNCS 1399, Springer-Verlag, Berlin,1998, 367–405.

[JS82] G. Jaeschke, H.-J. Schek: Remarks on the Algebra of Non First Normal FormRelations. In ACM (ed.),Proceedings of the ACM Symposium on Principles ofDatabase Systems, Los Angeles, March 1982, ACM, 1982, 124–138.

[JSS94] C. Jensen, M. Soo, R. Snodgrass: Unifying Temporal Data Models via a Concep-tual Model. Information Systems 19:7 (1994), 513–547.

[KBS93] H.-P. Kriegel, T. Brinkhoff, R. Schneider: Efficient Spatial Query Processing inGeographic Database Systems.IEEE Data Engineering 16:3 (1993), 10–15.

[KF93] I. Kamel, C. Faloutsos: On packing R-trees. In B. K. Bhargava, T. W. Finin,Y. Yesha (eds.),Proceedings of the Second International Conference on Informa-tion and Knowledge Management, ACM Press, 1993, 490–499.

[KF94] I. Kamel, C. Faloutsos: Hilbert R-tree: An Improved R-tree using Fractals. InJ. Bocca, M. Jarke, C. Zaniolo (eds.),Proceedings of the 20th Int. Conf. on VeryLarge Data Bases - VLDB, Sept. 1994, Santiago, Morgan Kaufmann Publishers,Palo Alto, CA, 1994, 500–509.

BIBLIOGRAPHY 193

[KFL00] C. Kleiner, S. Falke, U. W. Lipeck: Verwaltung geographischer Basisdatendurch objekt-relationale Datenbanken am Beispiel von ATKIS und Oracle 8i .Informatik-Berichte DB-02/2000, Institut für Informatik, Universitat Hannover,2000.

[KGT99a] G. Kollios, D. Gunopulos, V. J. Tsotras: Nearest Neighbor Queries in a Mo-bile Environment. In M. Böhlen, C. Jensen, M. Scholl (eds.),Spatio-TemporalDatabase Management – Int. Workshop STDBM’99, Edinburgh, Sept. 10-11,1999, LNCS 1678, Springer-Verlag, Berlin, 1999, 119–134.

[KGT99b] G. Kollios, D. Gunopulos, V. J. Tsotras: On Indexing Mobile Objects. InPODS1999 - Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Sympo-sium on Principles of Database Systems, Philadelphia, May 31 - June 2, 1999,ACM Press, Baltimore, MD, 1999, 261–272.

[KL00] C. Kleiner, U. W. Lipeck: Efficient Index Structures for Spatio-Temporal Objects.In A. Tjoa, R. Wagner, A. Al-Zobaidie (eds.),Eleventh International Workshopon Database and Expert Systems Applications (DEXA 2000; 4-8 September 2000,Greenwich, UK), IEEE Computer Society Press, Los Alamitos, 2000, 881–888.

[KL01a] C. Kleiner, U. W. Lipeck: Automatic Generation of XML DTDs from Concep-tual Database Schemas. In K. Bauknecht, W. Brauer, T. Mück (eds.),Informatik2001 – Wirtschaft und Wissenschaft in der Network Economy - Visionen und Wirk-lichkeit / GI/OCG-Jahrestagung Sept. 2001, Universitat Wien - Band I, Oesterrei-chische Computer Gesellschaft, Wien, 2001, 396–405.

[KL01b] C. Kleiner, U. W. Lipeck: Temporale Daten in objekt-relationalen Datenbanksys-temen. In M. Endig, T. Herstel (eds.),13. GI-Workshop ’Grundlagen von Daten-banken’, Gommern, 5.-8. Juni 2001, Preprint 10, Otto-von-Guericke-UniversitätMagdeburg, 2001, 73–77.

[KL01c] C. Kleiner, U. W. Lipeck: Web-Enabling Geographic Data with Object-RelationalDatabases. In A. Heuer, F. Leymann, D. Priebe (eds.),Datenbanksysteme in Buro,Technik und Wissenschaft – 9. GI-Fachtagung Oldenburg, 7.-9. Marz 2001, Infor-matik aktuell, Springer-Verlag, Berlin, 2001, 127–143. BTW 2001.

[KL02a] C. Kleiner, U. W. Lipeck: Automatische Erzeugung von XML-DTDs aus konzep-tionellen Datenbankschemata.Datenbank-Spektrum (2002), 14–22.

[KL02b] C. Kleiner, U. W. Lipeck: Natural and Efficient Modeling of Temporal Informa-tion with Object-Relational Databases. Informatik Berichte 01-2002, Institut fürInformatik, Universitat Hannover, 2002.

[KL02c] C. Kleiner, U. W. Lipeck: Performance of Querying Temporal Attributes inObject-Relational Databases. To appear in: Proceedings of TIME-2002.

[Kle00] C. Kleiner: A Metamodel for Spatio-Temporal Evaluation Methods in Geo-Applications. Institut fur Informatik, Universitat Hannover, 2000. Unpublishedinternal report.

194 BIBLIOGRAPHY

[KLF00] C. Kleiner, U. W. Lipeck, S. Falke: Objekt-Relationale Datenbanken zur Ver-waltung von ATKIS-Daten. In R. Bill, F. Schmidt (eds.),ATKIS - Stand undFortfuhrung – Beitrage zum 51. DVW-Seminar, 25 - 26 Sept 2000, UniversitatRostock, Schriftenreihe des DVW 39, Konrad Wittwer, Stuttgart, 2000, 169–177.

[Klo99] S. Klopp. Implementation von R∗-Baumen als benutzerdefinierte Indexstruktur inOracle 8i . Studienarbeit, Institut für Informatik, Universitat Hannover, 1999.

[Klo01] S. Klopp: Ableitung von XML-formatierten Daten aus objekt-relationalen Daten-banken. Master’s thesis, Institut für Informatik, Universitat Hannover, 2001.

[Klu98] G. Klump: Entwicklung, Implementierung und Evaluation von Strategien zurgeometrischen Anfragebearbeitung unter Oracle 8. Master’s thesis, Ludwig-Maximilians-Universitat Munchen, 1998.

[KM00] M. Klettke, H. Meyer: XML and Object-Relational Database Systems - Enhanc-ing Structural Mappings Based on Statistics. In D. Suciu, G. Vossen (eds.),Pro-ceedings of the Third International Workshop on the Web and Databases, WebDB2000, Dallas, Texas, USA, May 18-19, 2000, 63–68.

[KMS+00] N. Kitsios, C. Makris, S. Sioutas, A. Tsakalidis, J. Tsaknakis, B. Vassiliadis: 2-DSpatial Indexing Scheme in Optimal Time. In J. Stuller, J. Pokorny, B. Thalheim,Y. Masunaga (eds.),East European Conference on Advances in Databases andInformation Systems, Prague, Czech Republic, September 5-9, 2000, LNCS 1884,Springer-Verlag, Berlin, 2000, 108–116.

[KNRW97] M. v. Kreveld, J. Nievergelt, T. Roos, P. Widmayer:Algorithmic Foundations ofGeographic Information Systems. LNCS 1340, Springer-Verlag, Berlin, 1997.

[Koc94] H. Koch: Entwurf und Implementierung eines Informationssystems für ATKIS–Daten. Master’s thesis, TU Braunschweig, 1994.

[Kor99] M. Kornacker: High-Performance Extensible Indexing. In M. Atkinson, M. Or-lowska (eds.),VLDB’99 – Proceedings of the 25th International Conference onVery Large Data Bases, Ediburgh, Sept 7-10, 1999, Morgan Kaufmann Publish-ers, San Francisco, 1999, 699–708.

[Kossl00] G. Kossler. Visualisierung räumlicher Daten auf Grundlage des objekt-relationa-len DBMS Oracle 8i . Studienarbeit, Institut für Informatik, Universitat Hannover,2000.

[Kou94] V. Kouramajian. Incorporating Time in Databases. Tutorial, Information Technol-ogy Development, Rice University, 1994. Presented at ICTL’94, Bonn, Germany.

[KPS00] H.-P. Kriegel, M. Pötke, T. Seidl: Managing Intervals Efficiently in Object-Relational Databases. In E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal,N. Kamel, G. Schlageter, K.-Y. Whang (eds.),VLDB 2000 – Proceedings of the26th International Conference on Very Large Data Bases, September 10-14, 2000,Cairo, Egypt, Morgan Kaufmann Publishers, San Francisco, 2000, 407–418.

BIBLIOGRAPHY 195

[KRA02] R. K. Kothuri, S. Ravada, D. Abugov: Quadtree and R-tree Indexes in OracleSpatial: A Comparison using GIS Data. To appear in: Proceedings of SIGMOD2002.

[KS91] C. Kolovson, M. Stonebraker: Segment Indexes: Dynamic Indexing Techniquesfor Multi-Dimensional Interval Data. In J. Clifford, R. King (eds.),Proceedingsof the 1991 ACM SIGMOD International Conference on Management of Data,SIGMOD Record 2, ACM Press, New York, 1991.

[KS95] E. Kophstahl, H. Sellge (eds.):Das Geoinformationssystem ATKIS und seineNutzung in Wirtschaft und Verwaltung, anlasslich des 2.AdV–Symposiums ATKIS,am 27. und 28. Juni 1995. Kopie, Niedersächsisches Landesverwaltungsamt, Han-nover, 1995.

[KS97] N. Katayama, S. Satoh: The SR-tree: An Index Structure for High-DimensionalNearest Neighbor Queries. In J. M. Peckman (ed.),Proceedings of the 1997 ACMSIGMOD International Conference on Management of Data, Tucson, Arizona,May 13-15, 1997, SIGMOD Record 2, ACM Press, New York, 1997.

[KSH98] M. Kornacker, M. Shah, J. M. Hellerstein: amdb: An Access Method DebuggingTool. In L. Haas, A. Tiwary (eds.),Proceedings of the 1998 ACM SIGMODInternational Conference on Management of Data, Seattle, Washington, June 1-4,1998, SIGMOD Record 2, ACM Press, New York, 1998.

[KTF98] A. Kumar, V. J. Tsotras, C. Faloutsos: Designing Access Methods for BitemporalDatabases.IEEE Transactions on Knowledge and Data Engineering 10:1 (1998),1–20.

[Lam01] T. Lamping. Systematische Auswertung der Effizienz indexunterstützter raumli-cher Anfragen objekt-relationaler Datenbanken. Studienarbeit, Institut für Infor-matik, Universitat Hannover, 2001.

[LG00] J. A. C. Lema, R. H. Güting: Dual Grid: A New Approach for Robust SpatialAlgebra Implementation. Informatik-Bericht 268, Fernuni Hagen, 2000.

[L oc99] U. Lockmann. Visualisierung räumlicher Daten auf Grundlage der Oracle SpatialCartridge. Studienarbeit, Institut für Informatik, Universitat Hannover, 1999.

[L oc01] U. Lockmann: Erweiterung eines objektrelationalen Datenbankmanagementsys-tems um verallgemeinerte Suchbäume als Indexstrukturen. Master’s thesis, Insti-tut fur Informatik, Universitat Hannover, 2001.

[LR98] M. L. Lo, C. V. Ravishankar: The Design and Implementation of Seeded Trees:An Efficient Method for Spatial Joins.IEEE Transactions on Knowledge andData Engineering 10:1 (1998), 136–152.

[LS00] H. Liefke, D. Suciu: XMill: An efficient compressor for XML data. In W. Chen,J. Naughton, P. A. Bernstein (eds.),Proceedings of the 2000 ACM SIGMOD In-ternational Conference on Management of Data, Dallas, May 16-18, 2000, SIG-MOD Record 2, ACM Press, New York, 2000, 153–164.

196 BIBLIOGRAPHY

[LT92] R. Laurini, D. Thompson: Fundamentals of Spatial Information Systems. TheAPIC Series, Academic Press, London, 1992.

[LT98] C. Lee, T.-M. Tseng: Temporal Grid File: A File Structure for Interval Data.Data& Knowledge Engineering 26:1 (1998), 71–98.

[McC85] E. McCreight: Priority Search Trees.SIAM Journal on Computing 14:2 (1985),257–276.

[Meh84] K. Mehlhorn: Data Structures and Algorithms, Vol. III: Multi-DimensionalSearching and Computational Geometry. EATCS Bulletin, Springer-Verlag,Berlin, 1984.

[MJS91] L. E. McKenzie Jr., R. T. Snodgrass: Evaluation of Relational Algebras Incorpo-rating the Time Dimension in Databases.ACM Computing Surveys 23:4 (1991),501–544.

[MTT00] Y. Manolopoulos, Y. Theodoridis, V. J. Tsotras:Advanced Database Indexing.The Kluwer Int. Series on Advances in Database Systems, Kluwer Academic Pub-lishers, Dordrecht, The Netherlands, 2000.

[Mul97] U. Muller: Auswertungsmethoden im Bodenschutz. Niedersächsisches Lan-desamt für Bodenforschung (NLfB), Hannover, 1997.

[NDF99] Y. Nakamura, H. Dekihara, R. Furukawa: Spatio-Temporal Data Managementfor Moving Objects Using the PMD-Tree. In Y. Kambayashi, D. Lee, E.-P. Lim,M. Mohania, Y. Masunaga (eds.),Advances in Database Technologies - ER’98Workshops on Data Warehousing/Data Mining,Mobile Data Access,CollaborativeWork Support,Spatio-Temporal Data Management, Singapore, 1998, LNCS 1552,Springer-Verlag, Berlin, 1999.

[NK95] K. Neumann, H. Koch: Ein experimentelles Informationssystem für ATKIS-Daten.Nachrichten aus dem Karten- und Vermessungswesen 113:1 (1995), 179–190.

[NNS99] S. Nishida, H. Nozawa, N. Saiwaki: Proposal of Spatio-Temporal IndexingMethods for Moving Objects. In Y. Kambayashi, D. Lee, E.-P. Lim, M. Mo-hania, Y. Masunaga (eds.),Advances in Database Technologies - ER’98 Work-shops on Data Warehousing/Data Mining,Mobile Data Access,CollaborativeWork Support,Spatio-Temporal Data Management, Singapore, 1998, LNCS 1552,Springer-Verlag, Berlin, 1999.

[NP00] E. Nardelli, G. Proietti: Size Estimation of the Intersection Join between Two LineSegment Datasets. In J. Stuller, J. Pokorny, B. Thalheim, Y. Masunaga (eds.),Cur-rent Issues in Databases and Information Systems – East European Conferenceon Advances in Databases and Information Systems, Prague, Czech Republic,September 5-9, 2000, LNCS 1884, Springer-Verlag, Berlin, 2000, 229–238.

[NST99] M. A. Nascimento, J. R. O. Silva, Y. Theodoridis: Evaluation of Access Structuresfor Discretely Moving Points. In M. Böhlen, C. Jensen, M. Scholl (eds.),Spatio-Temporal Database Management – Int. Workshop STDBM’99, Edinburgh, Sept.10-11, 1999, LNCS 1678, Springer-Verlag, Berlin, 1999.

BIBLIOGRAPHY 197

[OGC99a] OpenGIS Consortium: OpenGIS Simple Features Specification for SQL Revision1.1. Open GIS Consortium, 1999.

[OGC99b] OpenGIS Consortium: Geography Markup Language (GML) 1.0. OpenGIS Con-sortium Inc., 1999. OGC Request 11: Request for Comments 13-Dec-1999.

[OGC02] OpenGIS Consortium: OpenGIS Geography Markup Language (GML) Imple-mentation Specification, Version 2.1.1. OpenGIS Consortium Inc., 2002.

[Ora99a] Oracle Corporation:Dokumentation zu Oracle 8i: Oracle8i Data Cartridge De-veloper’s Guide, Part No A76937-01, 1999.

[Ora99b] Oracle Corporation:Oracle 8i Documentation: Concepts, Part No A76965-01,1999.

[Ora01] Oracle Corporation: Oracle 9i Spatial User’s Guide and Reference, Part NoA88805-01, 2001.

[OM84] J. A. Orenstein, T. H. Merrett: A Class of Data Structures for Associative Search-ing. In Proceedings of the 4th ACM SIGACT-SIGMOD-SIGART Symposium onPrinciples of Database Systems, ACM Press, New York, 1984, 181 – 190.

[Ore86] J. A. Orenstein: Spatial Query Processing in an Object-Oriented Database Sys-tem. In C. Zaniolo (ed.),Proceedings of the 1986 ACM SIGMOD InternationalConference on Management of Data, ACM Press, New York, 1986.

[OS95] G.Ozsoyoglu, R. T. Snodgrass: Temporal and Real–Time Databases: A Survey.IEEE Transactions on Knowledge and Data Engineering 7:4 (1995), 513–532.

[PD96] J. M. Patel, D. J. DeWitt: Partition based spatial-merge join. In H. Jagadish,T. Merrett, I. Mumick (eds.),Proceedings of the 1996 ACM SIGMOD Interna-tional Conference on Management of Data, Montreal, June 4-6, 1996, SIGMODRecord 2, ACM Press, New York, 1996, 259–270.

[Pea90] G. Peano: Sur une courbe qui remplit toute une aire plane.Mathematische An-nalen 36 (1890), 157–160.

[PF98] G. Proietti, C. Faloutsos: Selectivity Estimation of Spatial Queries for LineSegment Datasets. In G. Gardarin, J. C. French, N. Pissinou, K. Makki,L. Bouganim (eds.),Proceedings of the 1998 ACM CIKM International Confer-ence on Information and Knowledge Management, ACM Press, 1998, 340–347.

[PF00] G. Proietti, C. Faloutsos: Analysis of Range Queries and Self-Spatial Join Querieson Real Region Datasets Stored Using an R-Tree.IEEE Transactions on Knowl-edge and Data Engineering 12:5 (2000), 751–762.

[Pfa00] J. H. Pfau: Entwurf und Implementierung eines Datenmodells für Bodendatenzur datenbankgestützten Integration von Methoden der physischen Geographie.Master’s thesis, Institut für Informatik, Universitat Hannover, 2000.

198 BIBLIOGRAPHY

[PFG00] N. Paton, A. Fernandes, T. Griffiths: Spatio-Temporal Databases: Contentions,Components and Consolidation. In A. Tjoa, R. Wagner, A. Al-Zobaidie (eds.),Eleventh International Workshop on Database and Expert Systems Applications(DEXA 2000; 4-8 September 2000, Greenwich, UK), IEEE Computer SocietyPress, Los Alamitos, 2000, 851–855.

[PKL00] J. H. Pfau, C. Kleiner, U. W. Lipeck: Implementierung von Auswertungsmetho-den der physischen Geographie mit Hilfe objekt-relationaler Datenbanksysteme.Informatik-Berichte DB-01/2000, Institut für Informatik, Universitat Hannover,2000.

[PM97] A. Papadopoulos, Y. Manolopoulos: Performance of Nearest Neighbor Queriesin R-trees. In F. Afrati, P. Kolaitis (eds.),Database Theory – ICDT ’97, 6th Int.Conference, Delphi, Jan. 8-10, 1997, LNCS 1186, Springer-Verlag, Berlin, 1997.

[Poi53] H. Poincare:Oeuvres. Gauthier-Villars, Paris, 1953.

[Pot01] M. Potke: Spatial Indexing for Object-Relational Databases. PhD thesis, Ludwig-Maximilian Universitat, Munchen, 2001.

[PS85] F. Preparata, M. Shamos:Computational Geometry. Springer-Verlag, Berlin,1985.

[PSTW93] B.-U. Pagel, H.-W. Six, H. Toben, P. Widmayer: Towards an Analysis ofRange Query Performance in Spatial Data Structures. InProceedings of the 12thACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems- 1993, ACM Press, Baltimore, MD, 1993.

[PSV99] C. H. Papadimitriou, D. Suciu, V. Vianu: Topological Queries in SpatialDatabases.Journal of Computer and System Sciences 58:1 (1999), 29–53.

[PT98] D. Pfoser, N. Tryfona: Requirements, Definitions and Notations for Spatiotem-poral Application Environments. In R. Laurini (ed.),Proceedings of the 6th inter-national symposium on advances in geographic information systems, ACM Press,1998, 124–130.

[Ram01] P. Ramsey:PostGIS Manual, 2001.

[Rip02] N. Ripperda. Realisierung eines numerisch robusten Data Cartridge für raumlicheDaten fur Oracle 9i auf Basis der ROSE-Algebra. Studienarbeit, Institut für In-formatik, Universitat Hannover, 2002.

[RJB99] J. Rumbaugh, I. Jacobson, G. Booch:The Unified Modeling Language – Refer-ence Manual. Object Technology Series, Addison-Wesley, Reading, MA, 1999.

[RKS99] L. Relly, A. Kuckelberg, H.-J. Schek: A Framework of a Generic Index for Spatio-Temporal Data in CONCERT. In M. Böhlen, C. Jensen, M. Scholl (eds.),Spatio-Temporal Database Management – Int. Workshop STDBM’99, Edinburgh, Sept.10-11, 1999, LNCS 1678, Springer-Verlag, Berlin, 1999, 135–151.

BIBLIOGRAPHY 199

[RKV95] N. Roussopoulos, S. Kelley, F. Vincent: Nearest Neighbor Queries. In M. Carey,D. Schneider (eds.),Proceedings of the 1995 ACM SIGMOD International Con-ference on Management of Data, San Jose, May 22-25, 1995, SIGMOD Record 2,ACM Press, New York, 1995, 71–79.

[RP92] J. F. Roddick, J. D. Patrick: Temporal Semantics in Information Systems – ASurvey.Information Systems 17:3 (1992), 249–267.

[RSV01] P. Rigaux, M. Scholl, A. Voisard:Spatial Databases: With Application to GIS.Morgan Kaufmann Publishers, 2001.

[SA86] R. Snodgrass, I. Ahn: Temporal Databases.IEEE Computer 19:9 (1986), 35–42.

[SAC+79] P. Selinger, M. Astrahan, D. Chamberlin, R. A. Lorie, T. Price: Access Path Se-lection in a Relational Database Management System. In P. A. Bernstein (ed.),Proceedings of the 1979 ACM SIGMOD International Conference on Manage-ment of Data, ACM Press, New York, 1979, 23–34.

[Sar98] C. M. Saracco:Universal Database Management – A Guide to Object-RelationalTechnology. Morgan Kaufmann Publishers, San Francisco, 1998.

[SAW98] M. Sester, K.-H. Anders, V. Walter: Linking Objects of Different Spatial DataSets by Integration and Aggregation.GeoInformatica 2:4 (1998), 335–358.

[SB97] H. Saurer, F.-J. Behr:Geographische Informationssysteme - Eine Einfuhrung.Wiss. Buchgesellschaft, Darmstadt, 1997.

[SB99] M. Stonebraker, P. Brown:Object-Relational DBMSs – Tracking the Next GreatWave (Second Edition). The Morgan Kaufmann Series in Data Management Sys-tems, Morgan Kaufmann Publishers, San Francisco, 1999.

[Sch97] M. Schneider: Spatial Data Types for Database Systems – Finite ResolutionGeometry for Geographic Information Systems. LNCS 1288, Springer-Verlag,Berlin, 1997.

[SJ99] S. Saltenis, C. S. Jensen: R-Tree Based Indexing of General Spatio-TemporalData. Technical Report TR-45, TimeCenter, 1999.

[SJLL00] S. Saltenis, C. S. Jensen, S. T. Leutenegger, M. A. Lopez: Indexing the Posi-tions of Continuously Moving Objects. In W. Chen, J. Naughton, P. A. Bern-stein (eds.),Proceedings of the 2000 ACM SIGMOD International Conference onManagement of Data, Dallas, May 16-18, 2000, ACM Press, New York, 2000.

[SKS97] A. Silberschatz, H. F. Korth, S. Sudarshan:Database System Concepts (3rd Edi-tion). McGraw-Hill, New York, 1997.

[Slo99] T. A. Slocum:Thematic Cartography and Visualization. Prentice Hall, 1999.

[SM96] M. Stonebraker, D. Moore:Object-Relational DBMSs – The Next Great Wave– First Edition. The Morgan Kaufmann Series in Data Management Systems,Morgan Kaufmann Publishers, San Francisco, 1996.

200 BIBLIOGRAPHY

[Sno87] R. Snodgrass: The Temporal Query Language TQuel.ACM Transactions onDatabase Systems 12:2 (1987), 247–298.

[Sno95] R. T. Snodgrass (ed.):The TSQL2 Temporal Query Language. The KluwerInt. Series in Engineering and Computer Science, Kluwer Academic Publishers,Boston, 1995.

[SOL94] H. Shen, B. C. Ooi, H. Lu: The TP-Index: A Dynamic and Efficient IndexingMechanism for Temporal Databases. In M. Rusinkiewicz (ed.),Proceedings ofthe 10th IEEE CS International Conference on Data Engineering - 1994, IEEEComputer Society Press, 1994.

[ST99] B. Salzberg, V. J. Tsotras: A Comparison of Access Methods for Time EvolvingData.ACM Computing Surveys 31:2 (1999), 158 – 221.

[Ste98] A. Steiner:A Generalisation Approach to Temporal Data Models and their Im-plementations. PhD thesis, ETH Z¨urich, 1998.

[SZ01] D. Shasha, Y. Zhu: SpyTime — a Performance Benchmark for BitemporalDatabases. Technical report, New York University, 2001. published online only:http://cs.nyu.edu/cs/faculty/shasha/spytime/spytime.html.

[TCG+93] A. Tansel, J. Clifford, S. Gadia, S. Jajodia, A. Segev, R. Snodgrass (eds.):Tem-poral Databases: Theory, Design, and Implementation. Database Systems andApplications Series, Benjamin/Cummings, Redwood City, CA, 1993.

[TH98] N. Tryfona, T. Hadzilacos: Logical Data Modeling of Spatio-Temporal Appli-cations: Definitions and a Model. In B. Eaglestone, B. C. Desai, J. Shao (eds.),Proceedings of the 1998 International Database Engineering and ApplicationsSymposium (IDEAS’98), July 8-10, 1998 - Cardiff, IEEE Computer Society Press,Los Alamitos, 1998, 14–23.

[TJ99] N. Tryfona, C. S. Jensen: Conceptual Data Modeling for Spatiotemporal Appli-cations.GeoInformatica 3:3 (1999), 245–268.

[TJS98] V. J. Tsotras, C. S. Jensen, R. T. Snodgrass: An extensible notation for spatiotem-poral index queries.SIGMOD Record 27:1 (1998), 47–53.

[TP95] Y. Theodoridis, D. Papadias: Range Queries Involving Spatial Relations: A Per-formance Analysis. In A. U. Frank, W. Kuhn (eds.),Spatial Information Theory- A Theoretical Basis for GIS. International Conference, COSIT’95, Semmering,September 1995, LNCS 988, Springer-Verlag, Berlin, 1995, 537 – 552.

[TS96] Y. Theodoridis, T. Sellis: A model for the prediction of R-tree performance.In PODS 1996 - Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGARTSymposium on Principles of Database Systems, Montreal, June 1996, ACM Press,Baltimore, MD, 1996, 161–171.

[TSN99] Y. Theodoridis, J. R. O. Silva, M. A. Nascimento: On the Generation of Spa-tiotemporal Datasets. In H. G¨uting, D. Papadias, F. Lochovsky (eds.),Advancesin Spatial Databases – 6th International Symposium, SSD’99, Hong Kong, July20-23, 1999, LNCS 1651, Springer-Verlag, Berlin, 1999, 147 – 166.

BIBLIOGRAPHY 201

[TSPM98] Y. Theodoridis, T. Sellis, A. Papadopoulos, Y. Manolopoulos: Specificationsfor Efficient Indexing in Spatiotemporal Databases. Technical Report CH-98-01,CHOROCHRONONS Technical Reports, 1998.

[TSS98] Y. Theodoridis, E. Stefanakis, T. Sellis: Cost Models for Join Queries in Spa-tial Databases. InFourteenth International Conference on Data Engineering(ICDE’98), Orlando, Febr. 23-27, 1998, IEEE Computer Society Press, LosAlamitos, CA, 1998, 476–485.

[TSS00] Y. Theodoridis, E. Stefanakis, T. Sellis: Efficient Cost Models for Spatial QueriesUsing R-Trees. IEEE Transactions on Knowledge and Data Engineering 12:1(2000), 19–32.

[TVM98] T. Tzouramanis, M. Vassilakopoulos, Y. Manolopoulos: Overlapping LinearQuadtrees: a Spatio-Temporal Access Method. In R. Laurini (ed.),Proceedings ofthe 6th international symposium on advances in geographic information systems,ACM Press, 1998, 1–7.

[TYF86] T. J. Teorey, D. Yang, J. P. Fry: A Logical Design Methodology for RelationalDatabases Using the Extended Entity-Relationship Model.ACM Computing Sur-veys 18:2 (1986), 197–222.

[W3C99] W3Consortium. Resource Description Framework (RDF) Model and SyntaxSpecification. World Wide Web Consortium, 1999. W3C Recommendation 22-Feb-1999.

[W3C01a] W3Consortium. XML Linking Language (XLink) Version 1.0. World Wide WebConsortium, 2001. W3C Recommendation 27-Jun-2001.

[W3C01b] W3Consortium. XML Pointer Language (XPointer) Version 1.0. World WideWeb Consortium, 2001. W3C Candidate Recommendation 11-Sep-2001.

[W3C01c] W3Consortium. XML Schema Part 0: Primer. World Wide Web Consortium,2001. W3C Recommendation 02-May-2001.

[W3C01d] W3Consortium. XML Schema Part 1: Structures. World Wide Web Consortium,2001. W3C Recommendation 02-May-2001.

[W3C01e] W3Consortium. XML Schema Part 2: Datatypes. World Wide Web Consortium,2001. W3C Recommendation 02-May-2001.

[WHL98] S. Wang, J. M. Hellerstein, I. Lipkind: Near-Neighbor Query Performance inSearch Trees. Technical Report CSD-98-1012, University of California, Berkeley,1998.

[Win98] S. Winter: Bridging Vector and Raster Representation in GIS. In R. Laurini (ed.),Proceedings of the 6th international symposium on advances in geographic infor-mation systems, ACM Press, 1998, 57–62.

[WJ96] D. A. White, R. Jain: Similarity Indexing with the SS-tree. In S. Y. Su (ed.),Twelfth International Conference on Data Engineering (ICDE’96), New Orleans,Feb 26 - Mar 1, 1996, IEEE Computer Society Press, Br¨ussel, 1996, 516 – 523.

202 BIBLIOGRAPHY

[Wor94] M. F. Worboys: A Unified Model for Spatial and Temporal Information.TheComputer Journal 37:1 (1994), 26–34.

[XHL90] X. Xu, J. Han, W. Lu: RT-tree: An improved R-tree index structure for spa-tiotemporal databases. In K. Brassel, H. Kishimoto (eds.),Proceedings of the 4thInternational Symposium on Spatial Data Handling (SDH 1990), InternationalGeographical Union IGU, Columbus, Ohio, 1990, 1040–1049.

[YYW00] J. Yang, H. C. Ying, J. Widom: TIP: A Temporal Extension to Informix. InW. Chen, J. Naughton, P. A. Bernstein (eds.),Proceedings of the 2000 ACMSIGMOD International Conference on Management of Data, Dallas, May 16-18,2000, SIGMOD Record 2, ACM Press, New York, 2000.

[ZAT98] X. Zhou, D. J. Abel, D. Truffet: Data Partitioning for Parallel Spatial Join Pro-cessing.GeoInformatica 2:2 (1998), 175 – 204.

[ZCF+97] C. Zaniolo, S. Ceri, C. Faloutsos, R. Snodgrass, V. Subrahmanian, R. Zicari:Ad-vanced Database Systems. The Morgan Kaufmann Series in Data Management,Morgan Kaufmann Publishers, San Francisco, 1997.

[ZdS98] G. Zimbrao, J. M. de Souza: A Raster Approximation for the Processing of SpatialJoins. In A. Gupta, O. Shmueli, J. Widom (eds.),VLDB’98 – Proceedings of theTwenty-fourth International Conference on Very Large Data Bases, New York,Aug 24-27, 1998, Morgan Kaufmann Publishers, San Francisco, 1998, 558–569.