spatial databases enve/ce 424/524. definitions database – an integrated set of data on a...
TRANSCRIPT
Spatial Databases
ENVE/CE 424/524
Definitions• Database – an integrated set of data on a particular subject
• Spatial database - database containing geographic data of a particular subject for a particular area
• Database Management System (DBMS) – software to create, maintain and access databases
Geographic Information
System
Database Management
System
Data
System• Data load• Editing• Visualization• Mapping• Analysis
• Storage• Indexing• Security• Query
GIS: old and new
GIS used to be monolithic systems
all-in-one, proprietary applications that stored, queried, and visualized data
New systems follow more of a tool-box approach
modularized applications that interoperate
Who can benefit from spatial data management?
Army Commander: Has there been any significant enemy troop movement in the past week?
Insurance Risk Manager: Which houses are most likely to be affected in the next great flood on the Mississippi?
Medical Doctor: Based on this patient’s MRI, have we treated somebody with a similar condition?
Molecular Biologist: Is the topology of the amino acid biosynthesis gene in the genome found in any other sequence feature map in the database?
Astronomer: Find all blue galaxies within 2 arcmin of quasars.
Three classes of users for spatial databases
Major database managers: specialized products for enterprise management
GIS users: analysis of data
Internet user: more generalized requirements
Advantages of Databases over Files
• Avoids redundancy and duplication
• Reduces data maintenance costs
• Applications are separated from the data– Applications persist over time– Support multiple concurrent applications
• Better data sharing
• Security and standards can be defined and enforced
Disadvantages of Databases over Files
• Expense• Complexity• Performance – especially complex data types• Integration with other systems can be difficult
Types of DBMS Model
• Hierarchical
• Network
• Relational – RDBMS
• Object-oriented – OODBMS
• Object-relational - ORDBMS
Characteristics of DBMS• Data model support for multiple data types
– e.g MS Access: Text, Memo, Number, Date/Time, Currency, AutoNumber, Yes/No, OLE Object, Hyperlink, Lookup Wizard
• Load data from files, databases and other applications
• Index for rapid retrieval
• Query language – SQL
• Security – controlled access to data– Multi-level groups
• Controlled update using a transaction manager
• Backup and recovery
Relational DBMS
• Data stored as tuples (tup-el), conceptualized as tables• Table – data about a class of objects
– Two-dimensional list (array)– Rows = objects– Columns = object states (properties, attributes)
Table
Row = object
Column = property Table = Object Class
Object Classes withGeometry called Feature Classes
Relational DBMS
• Most popular type of DBMS– Over 95% of data in DBMS is in RDBMS
• Commercial systems– IBM DB2
– Informix
– Microsoft Access
– Microsoft SQL Server
– Oracle
– Sybase
Spatial Database Example
Land parcel with boundary id: 1050
Relational Database Example
Four tables needed in the land parcel relational database
Relational database example #2
Relation Rules (Codd, 1970)
• Only one value in each cell (intersection of row and column)
• All values in a column are about the same subject• Each row is unique• No significance in column sequence• No significance in row sequence
SQL• Structured (Standard) Query Language – (pronounced SEQUEL)
• Developed by IBM in 1970s
• Now standard for accessing relational databases
• Three types of usage– Stand alone queries– High level programming– Embedded in other applications (ArcGIS)
Types of SQL Statements
• Data Definition Language (DDL)– Create, alter and delete data– CREATE TABLE, CREATE INDEX
• Data Manipulation Language (DML)– Retrieve and manipulate data– SELECT, UPDATE, DELETE, INSERT
• Data Control Languages (DCL)– Control security of data– GRANT, CREATE USER, DROP USER
Geometry
Point Curve Surface
LineString Polygon MultiSurface
Line LinearRing
MultiCurve
MultiPolygon MultiLineString
Composed
Type
Relationship SpatialReferenceSystem
GeometryCollection
MultiPoint
Spatial Types – OGC Simple Features
Data Model: A set of constructs for representing objects and processes in a digital environment
Spatial Relations• Equals – are the geometries the same?
• Disjoint – do the geometries share common point?
• Intersects – do the geometries intersect?
• Touches – do the geometries intersect at their boundaries?
• Crosses – do the geometries overlap?
• Within– is one geometry within another?
• Contains – does one geometry completely contain another?
• Overlaps – do the geometries overlap?
• Relate – are their intersections between the interior, boundary or exterior of the geometries?
Contains Relation
Touches Relation
Spatial Methods• Distance – determines shortest distance between any two points in two
geometries
• Buffer – returns a geometry that represents all the points whose distance from the geometry is less than or equal to a user-defined distance
• ConvexHull – returns a geometry representing the small polygon that can enclose another geometry without any concave areas
• Intersection – returns a geometry that contains just the points common to both input geometries
• Union – returns a geometry that contains all the points in both input geometries
• Difference – returns a geometry containing the points that are different between the two geometries
• SymDifference – returns a geometry containing the points that are in either of the input geometries, but not both
Convex Hull and Difference Methods
Convex Hull
Difference
Indexing• Used to locate rows quickly
• Like a book index, it is a special representation of the content that adds order and makes finding items faster
• RDBMS use simple 1-d indexing
• Spatial DBMS needs 2-d, hierarchical indexing– Grid– Quadtree– R-tree
• Multi-level queries often used for performance (MBR)
Grid Index (multi-level)
- Overlay uniform grid
- Assign objects a grid id
Multi-level grids are used for variable sized objects within a database
Point and Region Quadtree Indexing
Based on recursive division of space.
Point QuadtreeRegion Quadtree
R-tree
Use minimum bounding rectangle (MBR) or minimum bounding box (MBB)
Add a new object to the MBR that would expand the least to accommodate the object
Study Area
Minimum Bounding Rectangle
Minimum Bounding Rectangle
Order Dependence of a Query
Query: Select all households within 3 km of a store that have an income greater than $100,000
1. Select all households with an income greater than $100,000; from this selected set, select all households within 3 km of a store
2. Select all households within 3 km of a store; from this selected set, select all households with an income greater than $100,000
Distributed Databases
www.midcarb.org
References
Longley et al., Geographic Information Systems and Science, 2001Chapter 11
Guenther, Environmental Information Systems, 1998Chapter 3
Final Few Weeks
Lecture: April 15, Metadata and Interoperability
Lab: April 17 (next Thursday), project/problem set work
I’ll spend a few minutes with each of you to get an update on your progress.
• Article review due April 17
Lab: April 22, project lab session.
Lecture April 24, GIS in decision-making
Project Presentation: May 8