final paper - many journeysgis.manyjourneys.com/geog582 - spatial databases... · web view... j. et...

24

Click here to load reader

Upload: truonganh

Post on 18-Mar-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

Final PaperGEOG 582 (Spring 2008)

Matt Stuemky

Submitted: May 12, 2008

University of Southern CaliforniaDepartment of Geography

Geographic Information Science & Technology (GIST) program

Page 2: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

To what extent does MySQL 5.0 conform to the Relational Model as defined by Codd’s Twelve Rules?

Introduction

In 1970, IBM research scientist Dr. E.F. Codd introduced the relational data model, which had a

tremendous and immediate impact on the computer science field. Within a decade, his model found its way

into real-world implementations within many fields and industries, transforming the way people store and

access information in databases. Databases built on the fundamental structural, manipulation, and integrity

elements of Codd’s relational model have been commonplace since the 1980s.

In 1985, Codd published a paper listing twelve rules that define the ideal relational database. The

foundations of most database systems are based on Codd’s relational model and these twelve rules. To date,

no single relational database management system (RDBMS) satisfactorily conforms to all twelve of these rules,

including MySQL.

The following is a definition of each of Codd’s Twelve Rules accompanied by a short analysis of how

well each one is conformed to in MySQL 5.0.

Rule 1: The Information Rule

All information stored inside a database must be logically represented in only one way: as values inside

tables. Each table consists of a grid of columns and rows. Each column (also known as a field or attribute) is a

property that describes a single, atomic piece of data. Each row (also known as a record or tuple) is a

collection of these columns of data and represents “a single fact” (Kalis, 2003) of information.

All data in a MySQL database is available at the logical level as values in tables and, like every other

RDBMS, fully conforms to Codd’s First Rule, a fundamental, essential aspect of the relational model.

Rule 2: Guaranteed Access Rule

Every atomic value in a database can easily be accessed with a table name, a primary key value, and a

column name. This rule emphasizes the fundamental aspect of the relational model, which requires the use of

primary keys to uniquely identify each row of data in a table.

MySQL fully conforms to Codd’s second rule as a way to uniquely identify and access every value in a

database. Example: SELECT Name FROM Country WHERE Country_ID = 8;

P a g e 2 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 3: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

Rule 3: Systematic treatment of Null values

A NULL value must be used to indicate unknown or missing values in a relational database. Whether

the column is defined to store text, numeric, Boolean, or other data, the absence of a value there must always

be represented by a NULL value.

MySQL 5.0 appears to conform to Codd’s third rule where the use of the NULL keyword literally means

no data. MySQL differentiates a NULL value from values such as 0 for numeric data types or the empty string

('') for string data types. However, several exceptions are noted in Appendix B of the MySQL Reference

Manual. Here it states, “For some data types, MySQL handles NULL values specially. If you insert NULL into a

TIMESTAMP column, the current date and time is inserted. If you insert NULL into an integer or floating-point

column that has the AUTO_INCREMENT attribute, the next number in the sequence is inserted.” (MySQL AB,

2008, B.1.5.3)

Rule 4: Dynamic relational online catalog

Brawley and Fuller (2008) state “The method of interrogating the structure of the data must be

identical to the method of interrogating the data itself.” (Chapter 1) Metadata refers to data that is used to

describe data. In a relational database, this metadata must exist in order to describe the structure of the user-

defined tables and how they are related. The “online catalog” is essentially a data dictionary, and should be

able to be queried as a relational table in the same way as regular, user-defined tables.

MySQL implements Codd’s Fourth Rule by using system-defined tables, known specifically as

INFORMATION_SCHEMA tables, to provide access to database metadata (MySQL AB, 2008, Chapter 22).

These tables contain information on table structures, relationships between tables, and constraints.

Rule 5: Comprehensive Data Sublanguage

There must be at least one comprehensive language used to handle the creating of table structures,

viewing and manipulation the data inside the tables, handling integrity constraints, establishing security, and

perform transactional processing.

MySQL 5.0 uses the Structured Query Language (SQL), a non-procedural (declarative) language, to

satisfy the requirements of Codd’s Fifth Rule for a comprehensive data query language. The MySQL

implementation of SQL is very similar to, but has some slightly different syntax than, other RDBMS that also

use SQL. The user manual documentation also provides several specific examples where the MySQL

implementation syntactically differs from industry standard ANSI SQL and ODBC SQL but I don’t think this

violates Codd’s Fifth Rule in any way.

P a g e 3 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 4: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

Rule 6: View Updating Rule

In a relational database a user can look at data directly in the tables, or the user can be permitted to

look at the data using views. A view is essentially a “virtual table” that is dynamically created using a query, by

specifying a combination of rows and columns, from one or more tables. A user can use a view to look at data

but the user should also be able to change the data, making the view capable of update operations, which

should automatically update the underlying tables.

Codd’s Sixth Rule seems to be the first rule where some RDBMS, including MySQL 5.0, fail to fully

conform. Although Chapter 21 of the reference manual indicates that views, including updatable views, are

implemented in MySQL 5.0, it seems that MySQL only supports view updates that use a single underlying

table. Digging deeper into the user manual, in the Restrictions on Views section, it states “You cannot use

UPDATE to update more than one underlying table of a view that is defined as a join.” and “You cannot use

DELETE to update a view that is defined as a join.” (MySQL AB, 2008, F.4.)

It seems then that MySQL 5.0 only partially conforms to Codd’s Sixth Rule because updateable views

are not possible in all scenarios. This may be a SQL language based limitation more than anything else. The

reference manual states that the intent is to comply with Codd’s Sixth Rule “as much as possible”, citing word-

for-word his definition: “All views that are theoretically updatable, should in practice also be updatable.”

(MySQL B, 2008, 1.8.5.5.). To me, this implies that they’re giving themselves some wiggle room, a way to

allow for the limitations in MySQL with updateable views.

Rule 7: High-level insert, update, and delete

The ability to perform powerful, even complex insert, update, and delete operations on sets of data

must be supported rather than just insert, update, and delete operations for a single record (row) at a time.

That is, data must be able to be retrieved from multiple rows in a single table or multiple rows from multiple

tables.

In MySQL 5.0 as with other RDBMS, SQL syntax provides the means by which full conformity with

Codd’s Seventh Rule is achieved. The use of SQL keywords such as JOIN and UNION can be used to perform

select queries of multiple tables combined with insert, update, and delete operations.

Example:

UPDATE Table1 t1

JOIN Table2 t2 on t1.ID=t2.t1ID

JOIN Table3 t3 on t2.ID=t3.t2ID

SET t1.Value=12345

WHERE t3.ID=54321

P a g e 4 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 5: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

Rule 8: Physical data independence

Users and software applications should not need to see, depend upon, or even necessarily know about

the underlying physical data structure of a database. If the access methods change or underlying physical

storage changes (for example, how and where the data is stored on hard drives and/or optical drives), it

should be completely transparent to the users and applications.

MySQL 5.0 and other RDBMS prevent users and applications from having to worry about how and

where data is physically stored and therefore fully conform to Codd’s Eighth Rule. Indexes in MySQL can be

added or removed without disrupting the physical data structure of a database or affecting the user queries

(other than possible performance-related issues).

Rule 9: Logical Data Independence

The logical structure of a database can be and most likely will need to change over time. Users and

applications should not be affected by any changes made to the logical structure, whether it is adding or

removing columns to a table, making two tables out of one, changing the relations between tables, etc.

In reality, achieving full conformity to Codd’s Ninth Rule is difficult to achieve. In MySQL 5.0 as with

other RDBMS, it really depends on the scope of the changes itself that determines whether or not the users

and applications will be affected by changes to the logical structure. For example, removing several columns

from a table structure that are specifically referenced in a SELECT query will most likely cause an error.

Other more modest table structure changes, such as inserting columns between existing ones, or

rearranging their order, can generally be done in MySQL 5.0 without causing a SELECT query to fail. Example:

A table named Country is currently defined with columns in the following sequence: Country_ID, Name,

Hemisphere but then that logical structure is changed by inserting a new column called President after

Country_ID and rearranging the order so that Hemisphere comes before Name. Performing a SELECT query

using column names in the original sequence, and without referencing the newly-added column, should have

no impact on the results of the query. SELECT Country_ID, Name, Hemisphere FROM

Country;

Rule 10: Integrity Independence:

Data integrity must be part of the database itself, enforcing correctness and consistency of the data.

Any changes made to integrity constraints should not directly affect users or applications.

MySQL 5.0 partially conforms to Codd’s Tenth Rule. For example, basic features are part of the MySQL

RDBMS such as entity integrity to protect primary keys so they cannot have NULL values and referential

integrity to assure that all non-NULL foreign keys have a matching primary key value from the same domain.

P a g e 5 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 6: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

However, Brawley and Fuller (2008) state that it remains possible to create tables which bypass both

integrity constraints in MySQL 5.0 (although they did not elaborate how), and the use of triggers, which are

programmable actions that reside in the RDBMS, often used to help maintain integrity, are not as fully

implemented in MySQL 5.0 as it is in other RDBMS. So while MySQL 5.0 has been substantially improved

compared to earlier versions of MySQL for handling integrity constraints, it is still possible that applications

themselves may still need to use their own integrity checks because MySQL 5.0 cannot fully provide it.

Rule 11: Distribution Independence

A RDBMS should be able to work with distributed databases, which includes the joining of data from

tables located on different servers (distributed queries) and from different RDBMS (heterogeneous queries).

Similar to Codd’s physical and logical data independence rules, the user and applications should not be aware

that a database is distributed.

The literature does not make it abundantly clear where MySQL 5.0 stands with regard to conformity

with Codd’s Eleventh Rule, although the reference manual refers to the use of MySQL Clusters, which is a

group of computers working together to support a distributed database (MySQL AB, 2008, 17.14.) Brawley

and Fuller (2008) state that MySQL Version 5.03 introduced new storage engines which violate Codd’s rule but

provide no further information. Overall, the implication is that MySQL 5 has at least some support for

distributed databases, and therefore partially conforms to Codd’s rule, but it’s likely not fully supported and

somewhat “rough around the edges” to implement.

Rule 12: Non-subversion Rule

If the RDBMS supports a low-level language, it should not be allowed to circumvent (subvert) data

integrity and constraints, and otherwise violate any of the previous eleven rules. This non-subversion rule

relates more to database interfaces such as ADO and DAO (among others) that can manipulate a single record

at a time versus the higher level language such as SQL which can operate on multiple records at a time.

I was unable to dig up any substantive information on how MySQL 5.0 implements support for Codd’s

Twelfth Rule. Although not an expert with ADO and DAO interfaces, I have worked with both before, and can

easily believe that they and other low-level interfaces and languages such as C could be used to violate all

sorts of integrity rules and constraints in a relational database, once the connectivity had been established.

My best guess is that MySQL 5.0 does not provide protections from this or, if protection measures do exist,

they are limited at best and could fairly easily be circumvented.

P a g e 6 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 7: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

Some Final thoughts

Based on the results of analyzing the capabilities of MySQL 5.0, it would seem that most but certainly

not all of Codd’s Twelve Rules are conformed to. The problematic ones for MySQL seem to be Rules 6, 9, 10,

11 and 12.

Codd also defined a “Rule 0” in addition to his famous Twelve Rules. He states, "For any system that is

advertised as, or claimed to be, a relational database management system, that system must be able to

manage databases entirely through its relational capabilities, no matter what additional capabilities the

system may support." (1990, pp. 16-17) In essence, he is saying that no matter what extra features might be

added to a DBMS, it can only be considered truly relational if it fully complies with the other Twelve Rules.

Codd and many have others have stated that no fully relational database yet exists. Rather, he has described

features of a pure, idealized relational database. I do not believe any RDBMS will ever fully conform to Codd’s

Twelve Rules.

P a g e 7 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 8: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

How have the assumptions of the Relational Model been relaxed to accommodate geographic data?

Introduction

In the strictest sense, geographic data does not lend itself to being stored inside a relational database.

Shekhar and Chawla (2003) discuss the abstraction of real-world spatial information into distinct, identifiable

entities – objects for representing real-world geographic features. Roads, houses, water pipes running

beneath these roads and houses, city parks, rivers, groves of giant Sequoia, National Forest boundaries – all

can be abstracted as objects, then broken down into simple geometries: points, lines, polygons, and

aggregations of these. In a GIS, these geometries can be stored in databases and then retrieved to be used on

digital maps, making them recognizable for the real-world geographic features they represent.

As objects representing real-world geographic features, storing spatial data inside an object-oriented

database, theoretically, makes a lot of sense. Storing spatial data in a relational database, traditionally used

for business and transactional type data, does not. And yet, for a variety of reasons OODBMS are not the

favored choice among software vendors and the GIS industry. Instead, the solution has moved forcefully

towards enhancing the relational model to accommodate spatial data, resulting in the creation of the object-

relational model – which is manifested in object-relational database management systems (ORDBMS).

As a consequence, certain aspects of Codd’s original relational model, as well as his Twelve Rules

(never mind the other 333), have had to undergo amendments in order to allow various object-oriented

concepts such as objects, classes, methods, properties, and inheritance to co-exist within the relational

model’s logical structure of tables comprised of columns and rows. Additionally, those of the original Codd’s

Twelve Rules that are typically difficult to fully conform to in the pure relational model (Rules 6, 9, 10, 11 and

12) are likely even more challenging to conform to in an object-relational model.

For this exercise, a set of seven imaginary new rules, “Codd’s Rules for Spatial ORDBMS” have been

defined. Some are amendments to several of the original Codd’s Twelve Rules. These spatial ORDMS “rules”

incorporate major elements of Open Geospatial Consortium (OGC) standards and specification for simple

features and spatial SQL (Herring, 2006), spatial topologies, multi-dimensional indexes, as well as some other

ideas. This set of “rules” is not meant to be a complete set but merely identify some of the main elements

that must be part of a spatial ORDBMS framework. They are meant to demonstrate how much the relational

model has been relaxed in order to accommodate object-oriented spatial data, resulting in more complex

RDBMS’ and the databases they house.

These “rules” will first be identified and defined below, and then three different vendor

implementations – Oracle Spatial, PostGIS, and ArcSDE – will be briefly considered.

P a g e 8 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 9: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

“Codd’s Rules for Spatial ORDBMS”

1. Information rule – framework for storing geospatial data

The ORDBMS must be able to support the storage and retrieval of geospatial data as defined in each of the rules below. The ORDBMS must accommodate the storage of spatial objects by using multiple tables and spatial data types, as well as provide support for storing spatial metadata, spatial reference systems (coordinate systems) and transformations, multi-dimensional spatial indexes, and spatial relationships. Spatial data must be allowed to co-exist in the same database with non-spatial data.

2. Simple features and geometry spatial data types

Support for geometry data types, including points, lines, polygons, and aggregations of these, must exist. Composite objects, representing real-world geographic features, can be created by incorporating these base spatial data types. Instances of these objects must be able to be stored and managed within the logical framework of relational tables.

3. Comprehensive spatial data language (spatial SQL)

The ORDBMS must have a procedural language based on the Open Geospatial Consortium (OGC) Simple Features for SQL (SFSQL) specification. There must be methods for creating, updating, deleting, and viewing spatial data in an object-relational database. There must also be methods to perform “spatial analysis” including distance between two points, length of a line, area of a polygon, centroid of a polygon, buffer, etc.

4. Topologies and methods for querying spatial relationships

The ORDBMS must support spatial topologies by storing and managing shared geometry (node, edge, and face elements) and enforcing data integrity rules. The procedural language used in the ORDBMS must support OGC specifications by having operators and methods to evaluate spatial relationships (geometries are disjointed, intersect, touch, cross, overlap, contained by, etc.).

5. Spatial access methods (SAMs)

Single attribute index methods such as B-Trees are not suitable for indexing spatial data. Multi-dimensional indexes must be available in the ORDBMS for efficient searches of spatial data.

6. Raster-based geospatial features

The ORDBMS must provide support for the storage and retrieval of raster-based geospatial features. This rule is not directly based on OGC standards or specifications.

7. Seamless storage, retrieval and exchange of spatial data across distributed systems

The ORDBMS must allow storage, retrieval and exchange of data in spatially-enabled databases (queries of both homogonous and heterogeneous databases) across distributed systems, including network servers and the Internet for web-based systems. If necessary, the use of special interfaces (APIs) should exist to provide seamless exchange of spatial data across these distributed systems.

P a g e 9 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 10: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

Oracle Spatial

Oracle Spatial is the most widely used enterprise spatial DBMS (SDBMS) with over 80% of the

enterprise spatial database market (Steiner, et al., 2007, p. 7). It is an add-on product that integrates directly

into the main Oracle RDBMS in order spatially-enable databases. It is intended to eliminate the need for a

separate, proprietary middleware products (ESRI ArcSDE or similar). It uses an object-relational model for

storing vector-based spatial data and is fully compliant with the OGC Simple Features Specification, including

the use the geometry base types for creating and storing two-dimensional spatial objects. It allows both

spatial and non-spatial data to co-exist within the same database. Although the OGC standard only supports

two-dimensional geometries, it is worth noting that Oracle Spatial also supports complex three-dimensional

geometries including point clouds (collections of points), surfaces, and solids using the SDO_GEOMETRY base

type. (Murray, 2007)

Oracle Spatial manages spatial relationships and provides a full set of operators and methods for

querying them. In addition, the Oracle Spatial Topology Data Model is used to manage a topology, storing

node, edge, and face elements. For multi-dimensional indexes, Oracle uses R-Trees as its primary spatial

access method (SAM) but also supports Quadtree-based indexes as well. With Oracle Spatial, Oracle’s

proprietary procedural language PL/SQL, a variation of standard ANSI SQL, uses OGC SFSQL based syntax to

create and query spatial data, as well as create the R-tree indexes. (Murray, 2007)

Oracle offers another add-on product called GeoRaster, which makes it possible to store geo-

referenced raster data, in addition to geometry-based vector data, in an Oracle database. While the

implementation is somewhat intricate, it definitely relies upon the object-relational model for

implementation, using a proprietary object type called GeoRaster: “An Oracle database containing…

GeoRaster tables, in which each image or raster grid is stored as a GeoRaster object in a row of a GeoRaster

column.” (Xie, 2007, p. 11). A product such as GeoRaster provides a definitive example of how far the

traditional relational model has been stretched beyond its original framework in order to accommodate the

storage and retrieval of complex spatial objects.

PostGIS

One of the most compelling aspects about PostgreSQL ORDBMS is that it is a completely free, open-

source product, which is in sharp contrast to the expensive products from Oracle. However, when it comes to

spatially enabling PostgreSQL, the integration is similar to how Oracle Spatial works. PostGIS is the name of

the extension (also free) for PostgreSQL, allowing both spatial and non-spatial data to co-exist seamlessly

within the same database. (Ramsey, 2007)

P a g e 10 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 11: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

PL/pgSQL is the procedural language used in PostgreSQL, and it closely resembles the syntax of

Oracle’s PL/SQL language. With the PostGIS extension, PL/pgSQL syntax can be used for creating and selecting

spatial objects and functions, as specified in the OGC "Simple Features for SQL" (SFSQL) specification.

PL/pgSQL also supports all the major OGC geometry relationship functions including ST_Disjoint, ST_Distance,

ST_Intersects, ST_Touches, ST_Overlaps, ST_Contains, etc. Like Oracle, PostGIS supports the OGC specification

for inputting and outputting spatial objects into an ORDBMS using the Well-Known Text (WKT) and the Well-

Known Binary (WKB) formats. (Ramsey, 2007)

PostGIS provides full support for multi-dimensional indexing of spatial data. The PostGIS user manual

(Ramsey, 2007) indicates that PostGIS does not use the native implementation of R-Tree indexes used in

PostgreSQL because it was determined not to be as robust and efficient for indexing spatial data. Instead,

PostGIS uses an R-Tree index implemented on top of a GiST (Generalized Search Trees) index. Evidently GiST

can be used on a wide range of data types, including spatial data types, and is able to quickly evaluate “things

to one side, things which overlap, and things which are inside” (Ramsey, 2007, p. 23).

In regards to providing support for storing raster-based spatial data in the PostgreSQL ORDBMS, an

extension to PostGIS is currently in the process of being developed. It is called PGRaster and uses “a generic

raster data model that is component-based, logically layered, and multidimensional”. (Lin, X. & Keitt, 2007, p.

5) The design itself is apparently very similar to the Oracle’s GeoRaster component. The OGC specifications do

not currently provide for complex, raster-based spatial objects and so PGRaster is a proprietary data type

within PostgreSQL ORDBMS only.

The extent with which vendors such as Oracle and PostgreSQL have gone to allow raster-based spatial

data to be stored in ORDBMS was somewhat surprising, as I was unaware that such an implementation was

even available in these ORDBMS solutions. The product documentation suggests that both proprietary

implementations require a fairly substantial amount of storage space and numerous tables to store the data.

ESRI ArcSDE

ArcSDE technology is a spatial ORDBMS solution that is currently included with ESRI’s ArcGIS Server

and ArcGIS Desktop products. ArcSDE does not integrate directly into an RDBMS like Oracle Spatial and

PostGIS do. It is an external, middleware application. Although it operates outside a RDBMS it is designed to

spatially-enable it, essentially transforming it into a spatial ORDBMS by taking advantage of the existing BLOB

data type for storing spatial data in relational tables. This spatial-enabling can be done with a variety of

RDBMS products, including ESRI’s own file-based geodatabases, SQL Server Express, and full-fledged

enterprise level RDBMS such as Oracle, DB2, and Informix. Like Oracle Spatial and PostGIS, ArcSDE manages

geographic features by using multiple tables inside the RDBMS for storing the spatial data, indexes, metadata,

P a g e 11 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 12: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

and topologies. ArcSDE is the middleware layer that operates between the ArcGIS applications and the

RDBMS. (“What is ArcSDE?”, 2008)

ArcSDE does offer many of the same levels of conformity to “Codd’s Rules for Spatial ORDBMS” that

Oracle Spatial and PostGIS do. For example, ArcSDE Technology allows spatial and non-spatial data to co-exist

within the same database, it is open-standards based including OGC standards for simple features, and it uses

spatial SQL syntax that conforms to the SFSQL standard. (“What is ArcSDE?”, 2008)

However, ArcSDE does not seem to as fully conform to some of the “rules” primarily by virtue of it

being an external application and not a true integrated solution within the RDBMS. For example, using the

BLOB data type for storing spatial data in a RDBMS that doesn’t have any native spatial handling support is a

workable solution, but these BLOB columns in relational tables are simply generic repositories for spatial data,

and ArcSDE has to manage all aspects of their storage. Regarding spatial access methods (SAMs) and spatial

topologies, ArcSDE provides support for these but to more fully conform to the “rules”, they should ideally be

an integrated part of a spatially-aware RDBMS rather than being managed externally. Server-integrated

solutions like Oracle Spatial and PostGIS probably provide a faster, more efficient means for managing and

using multi-dimensional indexes as well as interpreting spatial relationships.

Distributed systems and spatial data sharing

Conformity with each of the imaginary “Codd’s Rules for Spatial ORDBMS” have been briefly

considered with Oracle Spatial, PostGIS, and ArcSDE except for the last one: Seamless storage, retrieval and

exchange of spatial data across distributed systems.

There are various OGC distributed computing specifications for OLE/COM (Runnion, et al., 1999) and

CORBA (Gottier, et al., 1999). These older specifications define how application programming interfaces (APIs)

should handle the storage and retrieval of simple features (point, line, and polygon) only, allowing the

exchange of spatial data between applications and other RDBMS. In addition to providing support for some of

these specifications, Oracle Spatial, PostGIS, and ArcSDE also provide the important ability to perform

synchronized replication of spatial data between databases over local and wide area networks.

The Internet has rapidly become one of the primary mechanisms for the open exchange of spatial data.

Several newer OGC implementation specifications and standards for distributed computing have emerged to

help make this possible. Some of these are the Web Feature Service (WFS) specification for simple features

(vector-based) data (Vretanos, 2005), Web Coverage Service (WCS) standard for raster-based data (Whiteside

& Evans, 2008), and the Geography Markup Language (GML) encoding standard for handling the exchange of

geospatial data (Portele, 2007).

P a g e 12 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 13: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

To what extent, have these OGC specifications for distributed computing been incorporated into the

three spatial ORDBMS solutions? As of version 11g, Oracle Spatial includes support for spatial web services

and data exchange that conforms to the WFS standard (Murray, 2007). PostGIS is actively being used in some

prominent open-source web-based mapping projects, including GeoServer and MapServer that use the open

WFS specification and as well as the popular Web Map Service (WMS) specification (Ramsey, 2007). ArcSDE

also provides interfaces into distributed, web-based systems such as these.

As far as how “seamless” this integration is, the OGC standards probably help facilitate the process

considerably, but web-based exchange of geospatial data is still a fairly new technology that likely suffers from

speed and performance issues compared to exchanging data between databases across local networks. For

example, research turned up information on some recent real-world tests for transmitting data from

databases to web-based servers using WFS and GML. The results indicated that ArcSDE was considerably

slower, being described as a true bottleneck performance-wise, compared to direct database integration

solutions like Oracle Spatial and PostGIS (Getman, D. & Dollins, 2008, p. 6).

Some Final thoughts

The “Codd’s Rules for Spatial ORDBMS” identify major elements that substantially disrupt the

traditional relational model in order to accommodate geospatial data. Of course, it is important to remember

that geospatial data isn’t the only reason that RDBMS have evolved into ORDBMS, but the GIS industry

(vendors, developers, and users) has certainly benefitted tremendously from it.

Oracle Spatial, PostGIS, and ArcSDE are all effective spatial ORDBMS solutions. However, their object-

oriented nature and implementation of many of the OGC standards and specifications has forced RDBMS’ to

evolve considerably into a framework far more complex, and one might argue, considerably less elegant, than

the original relational model. As far as being a satisfactory, long-term answer for storing spatial data, it seems

a safe bet that ORDBMS will prevail for quite awhile.

It remains to be seen if another database model will come along to shake things up the way Codd did

back in 1970 with his original, elegant relational model. Perhaps the notion of a true object-oriented

database, and full-fledged OODBMS, will finally gain favor and eventually supplant the recent ORDBMS trend.

With the information technology field, including GIS, it is always important to keep in mind that change is a

constant. Database models will continue to evolve.

P a g e 13 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 14: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

References

MySQL AB. (2008). MySQL 5.0 Reference Manual. Retrieved May 6, 2008 from http://dev.mysql.com/doc/refman/5.0/en/

Brawley, P. & Fuller A. (2008). Get It Done With MySQL 5. Retrieved May 6, 2008 from http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch01.html

Kalis, F. (2003, Dec 10). Codd’s Rules. Retrieved May 6, 2008 from http://www.sqlservercentral.com/articles/Advanced/coddsrules/1208/

Codd, E.F. (1990). The Relational Model for Database Management, Version 2. Addison-Wesley Longman Publishing Company,, Inc.

Shekhar, S. & Chawla, S. (2003). Spatial Databases: A Tour. Pearson Education,, Inc.

Herring, J.R. (Ed.). (2006, October 5). OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Common Architecture. Open Geospatial Consortium, Inc. Retrieved May 8, 2008 from http://www.opengeospatial.org/standards/sfs

Herring, J.R. (Ed.). (2006, October 5). OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 2: SQL option. Open Geospatial Consortium, Inc. Retrieved May 8, 2008 from http://www.opengeospatial.org/standards/sfs

Murray, C. et al. (2007). Oracle Spatial Developer's Guide, 11g Release 1 (11.1). Oracle. Retrieved May 8, 2008 from http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28400.pdf

Xie, Q., Sharma, J., & Ihm, J. (2007). Oracle Spatial 11g GeoRaster: An Oracle Technical White Paper. Oracle. Retrieved May 8, 2008 from http://www.oracle.com/technology/products/spatial/pdf/11g_collateral/spatial11g_georaster_twp.pdf

Steiner, J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White Paper. Oracle. Retrieved May 8, 2008 from http://www.oracle.com/technology/products/database/oracle11g/pdf/database-11g-unstructured-data-whitepaper.pdf

Ramsey, P. (Ed.). (2007). PostGIS Manual. Refractions Research. Retrieved May 8, 2008 from http://postgis.refractions.net/docs/postgis.pdf

Lin, X. & Keitt, T. (2007). PGRaster – A Coverage/Raster Model and Operations for PostGIS. PostgreSQL Global Development Group. Retrieved May 8, 2008 from http://lists.refractions.net/pipermail/postgis-devel/attachments/20070711/b6dfd192/PGRASTER_20070710-0001.pdf

What is ArcSDE? (2008, March 20). ESRI, Inc. Retrieved May 10, 2008 from http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=An_overview_of_ArcSDE_geodatabase_administration

P a g e 14 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)

Page 15: Final Paper - Many Journeysgis.manyjourneys.com/GEOG582 - Spatial Databases... · Web view... J. et al. (2007). Managing Unstructured Data with Oracle Database 11g – An Oracle White

Runnion, E., Beddoe, D., et al. (1999, May 18). OpenGIS Simple Features Specification for OLE/COM – Revision 1.1. Open Geospatial Consortium, Inc. Retrieved May 10, 2008 from http://www.opengeospatial.org/standards/sfo

Gottier, B., Beddoe, D., et al. (1999, June 2). OpenGIS Simple Features Specification for CORBA – Revision 1.1. Open Geospatial Consortium, Inc. Retrieved May 10, 2008 from http://www.opengeospatial.org/standards/sfc

Vretanos, P.A. (Ed.). (2005, May 3). Web Feature Service Implementation Specification. Open Geospatial Consortium, Inc. Retrieved May 10, 2008 from http://www.opengeospatial.org/standards/wfs

Whiteside, A. & Evans, J.D. (Eds.). (2008, March 19). Web Coverage Service (WCS) Implementation Standard. Open Geospatial Consortium, Inc. Retrieved May 10, 2008 from http://www.opengeospatial.org/standards/wcs

Portele, C. (Ed.). (2007, August 27). OpenGIS Geography Markup Language (GML) Encoding Standard. Open Geospatial Consortium, Inc. Retrieved May 10, 2008 from http://www.opengeospatial.org/standards/gml

Getman, D. & Dollins, B. (2008). Mobile, Interoperable, Near Real-time Sensor Networks: Two Consecutive Case Studies in Combining Geospatial Standards with Proprietary Software through Custom Development. Oak Ridge National Laboratory. Retrieved May 9, 2008 from http://gis.esri.com/library/userconf/feduc08/papers/getman_dollins.pdf

P a g e 15 | Matt Stuemky – Final Paper – GEOG 582 (Spring 2008)