a geospatial data catalog and metadata management tools for the u.s. environmental protection...

29
A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences Oregon State University

Upload: juliana-flynn

Post on 28-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

A Geospatial Data Catalog and Metadata Management Tools

for the U.S. Environmental Protection Agency’s

Western Ecology Division

David L. Bradford

Geosciences

Oregon State University

Page 2: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Introduction

• U.S. EPA Summer Internship: Western Ecology Division, Corvallis, OR

• Large amount of GIS data (4 Tb) representing 20+ years worth of research

• Common national datasets• Virtually no metadata and no central index• Hard to know whether/where data exist• MISSION: come up with a catalog for these

geospatial data…• …with one intern, no budget, no new

infrastructure, and do it all in 14 weeks?

Page 3: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Introduction

• Background: the Western Ecology Division (WED) & the need for metadata

• Research questions, hypothesis: give them a fish or teach them to fish?

• Approach: system development life cycle• Results: EPA Synchronizer, GeoData

Gateway, & metadata “harvesting”• Discussion & Conclusions: automating

metadata creation, overcoming institutional inertia

Page 4: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Background

• June through September, 2007• The WED – laboratory under the National

Health & Environmental Effects Research Laboratories (NHEERL)

• EPA Office of Research & Development (ORD)

• Project team: Connie Burdick, Denis White, Randy Comeleo, Patrick Clinton, & yours truly

• Help from: Office of Environmental Information (OEI) GeoData Gateway team

Page 5: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Metadata• Information about data• Self-indexing, fitness for purpose, how to

manipulate(Green & Bossomaier, 2002; Longley et al., 2005)

• Time-consuming (i.e. expensive) to create(e.g., Ma, 2007)

• A “hassle” for the analyst• Standard: Federal Geographic Data

Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) (FGDC, 1998)

• LINCHPIN: GOOD METADATA• Objective: Tools to create standards-

compliant metadata and automate the process as much as possible

Page 6: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Existing EPA Process

• WED projects launched, GIS data created• Different PIs, different goals, shared analysts• Before: informal “over-the-cubicle-wall”

communication was sufficient to manage data; could get by without metadata

• Now: informal methods breaking down • GIS analysts/contractors recently dispersing

to different offices, buildings, sites• Data now require multiple disk volumes

Page 7: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Existing Resources & Infrastructure

• Data storage: Windows NT-based servers (2.5 Tb), Linux RAID server (1.5 Tb)

• Web server: Windows NT-based (IIS)• ESRI ArcGIS Suite, ArcObjects Libraries• EPA Metadata Editor (EME)• Second Copy (batch file copy utility)• GeoData Gateway (GDG)• Microsoft Visual Studio 2005 Integrated

Development Environment (IDE)

Page 8: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Other Parameters and Constraints

• Budget: 1 summer intern

• Team: 4 analysts, 1 developer (the intern), 1 GDG administrator, local tech support

• Users: 14 GIS analysts (half contract staff); ~ 50 local GIS data “consumers”

• Data: 4 Tb (coverages & shapefiles)

Page 9: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Other Parameters and Constraints (cont.)

• Standards & Policies– FGDC-CSDGM– EPA National Geospatial Data Policy– EPA Metadata Technical Specification v1.0– GeoData Gateway Governance Structure

• Primary constraint: Don’t relocate the data! Interlinked, interdependent datasets

Page 10: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Challenges

• Can an effective geospatial catalog system be assembled, using existing EPA resources, that has minimal long-term administrative costs?

• Can such a system be more than just a one-time inventory, i.e., can the solution be sustained by the WED GIS community long after the programmer leaves?

Page 11: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Propositions• A sustainable geospatial catalog solution can

be developed using existing or freely available (e.g., open source) tools, software components, and EPA resources

• Regardless of architecture, in order to be self-sustaining, it will require that primary GIS users implement a policy of creating consistent metadata

• The system cannot be fully implemented within 14 weeks

Page 12: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Approach

• System Development Life Cycle

– Identify the need: done

– Requirements Analysis: identify resources,

constraints, functionality, user interfaces

– Architectural Design: weigh options,

choose strategy, develop “blueprint”

Page 13: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Approach (cont.)

• System Development Life Cycle (cont.)

– Software Development: code missing

components, unit test

– Integrated System Testing: implement

components and test entire system

– User Training and Implementation: “roll it

out”

Page 14: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Results: Requirements Analysis

• Support existing processes

• Use existing infrastructure

• Arcane, “homegrown” solution: No

• Low maintenance solution: Yes

• User interfaces:– ArcGIS-Integrated– Web Portal

• Don’t relocate datasets

Page 15: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Results: Architectural Design1. Metadata creation/

maintenance• GIS analyst responsibility• But, as automated as

possible using EPA Synchronizer - new software tool

• Edit/validate metadata using EPA Metadata Editor (EME) - existing tool

• EPA Synchronizer uses EME Defaults Database (local MS Access database)

• Once this step happens, the rest is magic

Page 16: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Results: Architectural Design

2. Internal “harvesting” of metadata

• Weekly server process that runs automatically (Second Copy)

• Locates all new & modified metadata files contained within specified disk volumes

• Copies metadata files (including their containing directory structure) to a “web accessible folder” (WAF) on the WED’s intranet server

Page 17: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Results: Architectural Design

3. GeoData Gateway (GDG) metadata harvest

• ESRI GIS Portal Toolkit server (the catalog system)

• maintained by EPA Office of Environmental Information

• Configured to automatically harvest the WED’s metadata from the WAF

• Validates metadata and posts to GDG catalog

Page 18: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Results: Architectural Design

4. Users search GDG using ArcCatalog or a web browser

• full-text searchable on any metadata element value

• can search using geographic extent (completely within or overlapping)

• results returned include full local path to actual dataset

Page 19: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Results: Software Development

• Synchronization: the term used by ESRI to describe the update of metadata using internal dataset info

© 2002 ESRI

Page 20: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Results: Software Development

• A custom tool, called the EPA Synchronizer, was developed based on ESRI white paper and sample code

• Written in Visual Basic using ArcObjects libraries

• Can automatically create most of the metadata, pulling values from two sources: dataset, and EME defaults database

• User then inserts Title, Abstract, Purpose, & Supplemental Info using EME

Page 21: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Results: Software Development

• Synchronization: the term used by ESRI to describe the update of metadata using internal dataset info

© 2002 ESRI

Page 22: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Results: Unit Testing

Remainder of processIs automated.

Page 23: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Results: Integrated System Testing

• Identify major commonly-used national and regional datasets

• Start process of creating metadata for them

• Automated processes for harvesting metadata would be triggered

• Full system test would be enabled

• This step has barely begun

Page 24: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Results: User Training and Implementation

• Implementation has not yet occurred• Draft of instructional user documentation

completed, focused on metadata creation and catalog searching

• Technical instructions detail installation and configuration of software tools, harvesting processes, and GDG administration

• Catalog (create metadata for) select existing datasets

• Create metadata for new datasets

Page 25: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Discussion

• Seemingly monumental challenge at first, but untapped existing resources emerged (GDG, EME, Second Copy, web server)

• Federated approach: – autonomy in data maintenance– non-intrusive data access– no changes to data structure

• An elegant, minimalist solution

Page 26: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Discussion• But the jury is still out.• Odds of success would increase with:

– Dedicated permanent staff vs. temporary; GIS service and support requires GIS skills, administrative skills, and IT skills (Longley et al., 2005; Longstreth, 1995)

– A champion in the organization; someone needs to foster a high level of support for the project (Obermeyer, 1995)

– Conscious effort to overcome institutional inertia; turf battles, unwillingness to reorganize can kill a project (Evans and Ferreira, 1995)

– Formalized quality control of digital information– Less paranoia, less government red tape

Page 27: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Conclusion• Data used in a shared environment become

cleaner – more complete and correct (Craig, 1995)

• Useful legacy datasets will receive new metadata• Some unseen hurdles remain; will need a

champion to see it through• GDG team has plans to bundle EPA Synchronizer

with EPA Metadata Editor

Obermeyer and Pinto, 1994

Page 28: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

Craig, William J. (1995). Why We Can’t Share Data: Institutional Inertia. In: Onsrud, H.J. and G. Rushton (Eds.) Sharing Geographic Information. Rutgers University & the Center for Urban Policy Research, New Brunswick, New Jersey: 107-118.

ESRI (2002). Creating a Custom Metadata Synchronizer, An ESRI White Paper. July 2002. ESRI, Redlands, CA. http://www.esri.com, last accessed November 26, 2007.

Evans, John and J. Ferreira Jr. (1995). Sharing Spatial Information in an Imperfect World: Interactions Between Technical and Organizational Issues. In: Onsrud, H.J. and G. Rushton (Eds.) Sharing Geographic Information. Rutgers University, Center for Urban Policy Research, New Brunswick, New Jersey: 448-460a.

FGDC (1998). FGDC-STD-001-1998, Content Standard for Digital Geospatial Metadata, Federal Geographic Data Committee, June 1998.

Green, David and T. Bossomaier (2002). Online GIS and Spatial Metadata. Taylor & Francis, London; New York.

Longley, Paul A., M.F. Goodchild, D.J. Maguire, and D.W. Rhind (2005). Geographic Information Systems and Science, 2nd Ed. John Wiley & Sons, Ltd, Chichester, West Sussex, England.

Longstreth, Karl (1995). GIS Collection Development, Staffing, And Training. Journal of Academic Librarianship, vol. 21 no. 4: 267-275.

Ma, Jin (2007). SPEC Kit 298: Metadata. Association of Research Libraries, Washington, DC.

Obermeyer, Nancy J. (1995). Reducing Inter-Organizational Conflict To Facilitate Sharing Geographic Information. In: Onsrud, H.J. and G. Rushton (Eds.) Sharing Geographic Information. Rutgers University, Center for Urban Policy Research, New Brunswick, New Jersey: 138-148.

Obermeyer, Nancy J. and J.K. Pinto (1994). Managing Geographic Information Systems. The Guilford Press, New York.

Literature Cited

Page 29: A Geospatial Data Catalog and Metadata Management Tools for the U.S. Environmental Protection Agency’s Western Ecology Division David L. Bradford Geosciences

¿Preguntas?