University of California – Santa Barbara
www.nceas.ucsb.edu
Rick Reeves / March 17, 2005
Geospatial Data and Spatial Data Analysis Tools For Ecologists
Presentation Goals
Overview: Geospatial Data Analysis
Defining and distinguishing between spatial, geospatial, geographic data
Addressing the particular attributes of geospatial data
Inventory of Geospatial Data Types
Primary data types and common sources for data
Survey of Geoprocessing Software Tools
Key issues driving choice of geospatial processing software
A Tour of NCEAS Scientific Computing Web Site
Spatial Datasets, Tools, Tutorials, and Project Archives
Some Examples: Geospatial Data Analysis at NCEAS
From the Annals of the NCEAS Scientific Programmer: ‘Real World’ solutions to Ecological research challenges
Meet the Scientific Programmer
Rick’s Academic and Professional Background
Undergraduate: Environmental Remote Sensing
Graduate: Spatial Operations Research / Location-Allocation Heuristic Development
Spatial Modeling branch of Geographic Data Analysis
Problem Domain: Transportation and Facility Location within networks
Professional: Software Development, geospatial database development, training curriculum development
Spatial Data: A Hierarchical Definition
Spatial Data Observations are distributed in multidimensional space
X / Y / Z coordinates attached to each data element
Geospatial Data Spatial Data with attached Geographic coordinates
Latitude / Longitude, UTM Optional: data subjected to a map projection transformation
Geographic Data Geospatial Data that captures ‘Earth System’ phenomena
Terrain height Drainage Network Land surface cover or urban Land Use Meteorological / climate data forecasts
Ecologists may work with any or all during a project
Overview: Geospatial / Geographic Data
Two Broad Primary Categories Raster: A multi dimensional, regularly-spaced grid of values
(samples) Dimensions: Northing, Easting, Altitude, Time Examples: Satellite Image, Digital Terrain, land surface cover maps
Vector: Three primary shapes stored in drawing-optimized format Point, Line, Polygon, (TIN, vector field)
Thousands of datasets exist in hundreds of formats Remote Sensing Imagery / Digital Elevation Models Surface Features (political, physiographic) as points/lines/polygons Meteorological data (observed / forecasted (short-and long-term)) File format standards set by Industry, Government, user community
Data Ingestion: First Step in Geospatial Analysis Data input / format conversion / spatial registration
Geospatial Data Analysis
Geospatial Information Analysis: 3 Categories From O’Sullivan & Unwin (2003)
Spatial Data Manipulation: Investigate the relationships between geographic dataset layers
Examples: ‘point-in-polygon’, buffer zones around spatial features GIS software typically used to view/ manipulate / create layers
Spatial/Statistical Data Analysis: Descriptive and Explanatory: What is there? How do we categorize it?
Data points treated as statistical ‘population’, compared to others
Spatial Modeling: Construct models to explore and understand geospatial systems
Based on ‘abstraction’ of domain-specific problem into a systems framework. Some examples:
Predicting network flows; optimizing facility locations among demands
Lessons learned building model as valuable as model’s ‘answers’
The Challenge of Geospatial Analysis
Geospatial Data violate some key statistical assumptions Must be addressed in the experimental design and sampling scheme Require specialized assessment techniques to factor out effects
Spatial Autocorrelation Samples are NOT randomly selected from normally-distributed
population In fact, nearby samples more likely to be similar than distant ones Autocorrelated data points introduce redundancy into the sample
set Spatial Scaling
AKA Modifiable Areal Unit Problem Statistical relationships in an area may change at different
aggregations The placement of sampling grid can introduce artifacts
Nonuniform sampling space, edge effects Geospatial Data Attributes have explanatory power
Spatial relationships may be causes for observed phenomena
Selecting Geospatial Software Tools
Geospatial software: layered software architecture Data layer: Efficiently store geospatial data
Feature Set + spatial coordinates Analytic Layer: Spatial/statistical analysis algorithms
Statistical packages increasingly contain geospatial analysis tools
Visualization Layer: Creates data views (AKA maps) Geospatial tools broadly divided in two categories
Geographic Information Systems (GIS) Three software layers are each extensive, ‘feature rich’
Geospatial Analysis Packages Data layer is ‘thinner’, Analytic layer ‘thicker’ Visualization layer built on existing data plotting tools
Geospatial Software Tools: GIS ‘Value Added’
Data layer is optimized for efficient geospatial data storage/processing
Raster and Vector Data storage, ‘mixed mode’ operations
Georeferencing tools for data layer projection, spatial registration
Map Algebra tools foster analysis and creation of data layers
Comprehensive cartographic tools for output map design
Geospatial Software Tools: GIS Caveats
Underdeveloped geostatistical processing tools Vendors pressured to include them in product
Yet validation data and algorithm details not available Often, these are critical tools for ecological analysis
Steep Learning Curve Identifying, mastering ‘essential’ features a
challenge Cost: GIS Software can be expensive
Upfront purchase and yearly license fees Time investment in training and data maintenance
Workload If non-GIS must be used for part of analysis, time
must be spent moving between s/w packages
Geospatial Software Tools: GIS Caveats
Underdeveloped geostatistical processing tools Vendors pressured to include them in product
Yet validation data and algorithm details not available Often, these are critical tools for ecological analysis
Steep Learning Curve Identifying, mastering ‘essential’ features a
challenge Cost: GIS Software can be expensive
Upfront purchase and yearly license fees Time investment in training and data maintenance
Workload If non-GIS must be used for part of analysis, time
must be spent moving between s/w packages
Geospatial Software Tools: Choosing
Some Suggested Selection Criteria Research Objectives should drive choice of tools
Identify the project’s core geospatial processing needs Platform Flexibility
Select tools supported on multi-platforms (hardware/OpSys) Widely supported/used platforms foster collaberation
Solution ‘Visibility’ Can you obtain the details of the algorithm? Does the community recognize the accuracy of the
algorithm? Costs of implementing your research idea in software
Scripted solutions using integrated environments are best R, SAS, MATLAB
Avoid development in high-level programming languages
Geospatial Software Tools: Choosing
Select GIS for core needs: Construct, compare, create multiple spatial data layers Simultaneously analyzing vector and raster data Creating detailed production quality study site maps Your data is exclusively in the GIS product format You require spatial analysis tools unavailable outside
GIS Select Geospatial Analysis tools for core needs:
Spatial/Statistical data analysis is the focus Your mapping requirements are modest
two-dimensional data plots with geographic coordinates, legend You need in-depth understanding of algorithms used
Or, you wish to extend / modify the algorithms
Sources for Geospatial Software Tools
Commercial Software Products For-profit corporations sell or license their software Major players produce comprehensive products
ESRI ArcGIS is the dominant GIS vendor Their goal: Provide solution for every geospatial application
Other vendors offer tailored solutions Examples: ENVI / IDL, ERDAS: Remote Sensing oriented GIS Example: S Plus Spatial Statistics: Geospatial statistics and
spatial data visualization enhancements to statistical package Example: MATLAB has mapping and image processing toolkits Example: SAS offers GIS, geospatial software tools
Commercial products often drive geospatial data formats
Example: ESRI Shape File, ERDAS IMG file
Sources for Geospatial Software Tools
Open Source Software Broad-based effort by worldwide scientific and
research community Distributed under General Public License (GPL) Software development and maintenance by the
user community Most significant geospatial analysis products: R, GRASS GIS Examples of others: PostGIS, GDAL libraries
Visit FreeGIS.org, or the open software foundation sites.
Tradeoffs: Commercial GIS Software
Centralized documentation and product support….. At a price of $100s to $1000s per year
Comprehensive, integrated software product Data/Analytic/Visualization layers populated w/
features Steep learning curve: Where are my ‘essential
features?’ Training always available – at a cost…. Details of proprietary geospatial algorithms
usually unavailable
Tradeoffs: Open Source GIS Software
Open Source Software Distributed under General Public License (GPL) Software development and maintenance by the user
community Most significant geospatial analysis products: R, GRASS GIS
Many applications available via the Internet but…. Quality, features, support, and documentation are inconsistent
Algorithms and even source code are freely available Open Source software drawbacks are shrinking as
user support community evolves and matures But active participation in the community is advised for
those wishing to stay technically proficient
Sources for Geospatial Data
Government Agencies National Mapping and Survey Agencies: surface cover data
USGS Research Centers: Climate forecasting models
NOAA, NASA, NCDC For-Profit Corporations
The highest-quality UNCLASSIFIED imagery now acquired by the private sector
Sometimes, no-cost government data is resold to public
Data widely available via the Internet Many data sets available at no- or low-cost
Notable Exception: Satellite Remote Sensing data Some discounts available to education and/or research entities
The best sites allow ‘search by geographic coordinates’ Examples from NCEAS Scientific Computing web site
Popular Geospatial Data Formats
Meteorological and Climatalogical Data Historical measurements Short-term model-based forecasts (3 – 10 days from now) Long-term predictions (10 – 100 years): General Circulation
Models Widely-Used Formats: Gridded Binary (GRIB), NetCDF
Political and Physiographic features Country Boundaries Road Networks Drainage Networks Widely-Used Formats: Digital Line Graphs (DLG), ESRI
Shape Files (.shp)
Most GIS/Geospatial packages ingest these formats
Or conversion utilities are available to ingest them
Popular Geospatial Data Formats
Remote Sensing Imagery Many operational systems provide many kinds of images
Multispectral Imagery: Landsat, SPOT, IKONOS Data Formats tend to be sensor-specific Most GIS can ingest most imagery types
Portal sitesCommercial: http://www.vterrain.org/Imagery/commercial.html Govt: http://www.nationalgeographic.com/maps/map_links.html
Digital Terrain Models Raster Grid datasets containing elevation measurements Available for complete Earth land surface Primary format: USGS Digital Elevation Model (DEM)
AKA National Elevation Dataset (NED) Portal sites:
USGS: http://gisdata.usgs.net/Website/Seamless/Terrainmap.org: http://www.terrainmap.org/
Tour of the Scientific Computing Web Site
Links to Data Sources Links to Geospatial Software Sources Links to Tutorials and Research Papers Archive of NCEAS Research Projects
http://www.nceas.ucsb.edu/scicomp
Example: Spatial Modeling: Optimization
Route vehicles along network using environmental costs as a metric
Simultaneously locate facilities along shipment routes that mitigate environmental costs
Optimal Location of species reserve sites Develop and compare performance of
alternate solution methods Mathematically optimal but operationally
impractical Heuristically derived Near-optimal, usable
solution
Spatial Modeling: The Problem Domain
Geospatial Dataset: Routes + Locations
Spatial Model Solution: Alternative Methods
Selecting Species Reserves Locations
Dr. Ross Gerrard, UCSB Biogeography Lab, 1996
Example: Spatial Data Manipulation
Elevation zone threshold calculation Digital Elevation Models for selected worldwide
sites Classify sites into 100 meter ‘wide’ elevation
zones General Circulation Model climate data
extraction Identify, obtain, import GCM data files Import the data into GIS as raster grid Overlay point file, extract matching climate
values
Digital Elevation Data Ingestion / Clipping
Elevation Zone Data Analysis
General Circulation Model data extraction
Spatial Analysis: Arc GIS and R Platforms
• ESRI Shape files exported to the R programming environment
• R Geostatistical and Spatial Analysis methods can then be applied
A Sampling: R Geospatial Analysis packages
clim.pact: Climate data analysis and downscaling tools
GeoR: Geostatistical Data Analysis: variograms, et. al
maptools: read/manipulate polygon data (ESRI .shp)
shapefiles: read/manipulate ESRI shape files sgeostat: Geostatistical modeling code splancs: Spatial and space-time point
patterns spstat: Spatial Point Pattern analysis
Concluding thoughts
NCEAS Associates are extensively use geospatial data in many creative ways
Geospatial Data Analysis requires specialized techniques
GIS and geospatial analysis available from commercial vendors and open source community
Choosing geospatial data and tools can be overwhelming and distract from the primary ‘science mission’
Scientific Programming Team has geospatial expertise, and can assist NCEAS Associates in this domain
Coming soon: Short course on the R Programming Language!