the earth system grid (esg)

20
The Earth System Grid (ESG) METADATA SCHEMAS IN ESG DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003

Upload: sarah

Post on 15-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

The Earth System Grid (ESG). METADATA SCHEMAS IN ESG DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003. Introduction. ESG initial focus is on climate model data, particularly PCM/CCSM data (netCDF format). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Earth System Grid (ESG)

The Earth System Grid (ESG)

METADATA SCHEMAS IN ESG

DOE SciDAC ESG Project Review

Argonne National Laboratory, Illinois

May 8-9, 2003

Page 2: The Earth System Grid (ESG)

May 8, 2003 Earth System Grid 2

Introduction

• ESG initial focus is on climate model data, particularly PCM/CCSM data (netCDF format).

• Consequently, our work so far has concentrated upon developing or evaluating metadata schemas suited for this kind of data, specifically: “ESG schema” for expressing collection-level metadata NcML schema for file-level metadata THREDDS schema for data cataloguing and browsing

Page 3: The Earth System Grid (ESG)

Part I

ESG Schema

Page 4: The Earth System Grid (ESG)

May 8, 2003 Earth System Grid 4

ESG schema: history

• Purposedly developed by ESG to fulfill the specific needs of the PCM/CCSM modeling community (through ESG liason Gary Strand)

• Several other standards were evaluated before developing our own, none of them was found to be completely satisfactory: Dublin Core (not rich enough for scientific data) ISO (too complex to be imposed on data providers), CLRC and DIF (almost ok, but not flexible enough to allow

capturing some details that are important to PCM/CCSM).

• Initial draft developed in conjunction with UK eScience office, still collaborating towards common schema or interoperability

Page 5: The Earth System Grid (ESG)

May 8, 2003 Earth System Grid 5

ESG schema: requirements

Information that needed to be captured in the metadata:• Model run description (including run scenario and time period)• Model configuration notes

Active/inactive components (atmosphere, ocean, ice) Pointers to documentation of model components (usually on the

web). Input forcing datasets (which ozone dataset, sulfate dataset, etc.) At what site the model binary was built, perhaps even the compiler

options that were used. Site where the model was run. Persons that carried out the model integration and submission

• Related model experiments - VERY IMPORTANT! "Sibling" runs (for ensembles of runs) "Parent" run (the run from which this particular experiment started) "Child" runs (runs descended from this run)

Page 6: The Earth System Grid (ESG)

May 8, 2003 Earth System Grid 6

ESG schema: requirements

• References to visualizations (MPGs and so on) using this model data.• References to to published journal articles/papers/presentations that

have used this experiment's data.• Miscallenous notes• Aknowledgment of funding agencies

Page 7: The Earth System Grid (ESG)

May 8, 2003 Earth System Grid 7

ESG schema: description

• Expresses collection-level metadata, i.e. logical metadata that describes a set of logically related data files (for example, a model run).

• Developed following an object model: we defined objects with properties, inheritance between objects, and relations between objects (see following slide)

• Although developed specifically for modeled data, it could be easily extended to express observational, experimental and analysis data.

• Metadata encoded in XML, conforming to an XML schema definition document (metadata syntax)

• XML metadata may be stored directly in an XML native database (Apache Xindice), or may be shredded and stored in a relational database (MySQL) within a set of purposedly defined tables.

• Currently developing API for I/O of ESG metadata as XML to/from a transparent database backend

Page 8: The Earth System Grid (ESG)

Object[1] id

Object[1] id

Activity[0,1] name[0,1] description[0,1] rights[0,n] date type= encoding=[0,n] note[0,n] participant role=[0,n] reference uri=

Activity[0,1] name[0,1] description[0,1] rights[0,n] date type= encoding=[0,n] note[0,n] participant role=[0,n] reference uri=

isA

Investigation

Investigation

isA

Project[0,n] topic type=[0,1] funding

Project[0,n] topic type=[0,1] funding

isA

Ensemble

Ensemble

Campaign

Campaign

isPartOf

Simulation[0,n] simulationInput type=[0,n] simulationHardware

Simulation[0,n] simulationInput type=[0,n] simulationHardware

Observation

Observation

Experiment

Experiment

Analysis

Analysis

Dataset[0,1] type[0,1] conventions[0,n] date type= encoding=[0,n] format type= uri=[0,1] timeCoverage[0,1] spaceCoverage

Dataset[0,1] type[0,1] conventions[0,n] date type= encoding=[0,n] format type= uri=[0,1] timeCoverage[0,1] spaceCoverage

isA

generatedBy

isPartOf

Person[0,1] firstName[0,1] lastName[0,1] contact

Person[0,1] firstName[0,1] lastName[0,1] contact

Institution[0,1] name[0,1] type[0,1] contact

Institution[0,1] name[0,1] type[0,1] contact

isA

worksFor

participant role=

Class

Class

AbstractClass

AbstractClass

inheritanceassociation

LEGEND

Service[0,1] name[0,1] description

Service[0,1] name[0,1] description

serviceRef

ParameterList

ParameterList

hasParameters

Parameter[1] name[0,1] mapping authority=

Parameter[1] name[0,1] mapping authority=

hasParameter

activityRef

isDerivedFrom

Page 9: The Earth System Grid (ESG)
Page 10: The Earth System Grid (ESG)
Page 11: The Earth System Grid (ESG)

Part II

NcML

NetCDF Markup Language

Page 12: The Earth System Grid (ESG)

May 8, 2003 Earth System Grid 12

NcML: description

• Developed as ESG/Unidata collaboration• XML language for expressing metadata associated with netCDF data (i.e. data

following the netCDF model)• Modular, extensible architecture: built as a set of schema modules each

fulfilling a specific funtionality: Core NcML schema: XML encoding of file-level metadata associated with any

netcdf file (i.e. same information as contained in netCDF header). Useful for expressing metadata into an encoding standard (XML), so that it can be processed by a large number of clients; also, metadata may be made immediately available even if data is not (for example, it’s on remote storage).

Coordinate system extension: allows capturing of information related to coordinate and coordiante systems (normally encoded as netCDF conventions like COADS or CF). This info can be used for example by high level visualization and analysis clients.

Dataset extension (under development): allows data aggregation and subsetting, definition of derived or virtual data. Aggregation metadata information is used to expose a dataset independently on how (which files) the data is actually stored

Planned extension for openGIS-ISO interoperability• NcML is automatically generated by parsing the input netCDF file(s)

Page 13: The Earth System Grid (ESG)

May 8, 2003 Earth System Grid 13

NcML: schemas architecture

NcML core(generic netcdf data)

NcML core(generic netcdf data)

NcML Coordinate Systems(netcdf conventions for coord, coord systems)

NcML Coordinate Systems(netcdf conventions for coord, coord systems)

NcML dataset(aggregation, operations

on data)

NcML dataset(aggregation, operations

on data)

openGIS-ISO openGIS-ISO

Page 14: The Earth System Grid (ESG)

Part III

THREDDS

Page 15: The Earth System Grid (ESG)

May 8, 2003 Earth System Grid 19

THREDDS

Project lead by Unidata in collaboration with many universities and research groups

Aimed ad developing a standard for hierarchical cataloguing of data and associated metadata

Allows cross browsing of catalogs and associated metadata, federation of data holdings among multiple repositories

ESG is currently evaluating THREDDS technology: we produced and published on the web THREDDS catalogs for 16 PCM runs

Ultimately, ESG might decide to produce THREDDS catalogs for all of its data holdings, either as a separate process or by generating them from other metadata sources

Page 16: The Earth System Grid (ESG)
Page 17: The Earth System Grid (ESG)
Page 18: The Earth System Grid (ESG)

Part IV

Conclusions

Page 19: The Earth System Grid (ESG)

May 8, 2003 Earth System Grid 23

Future Development

• Schema conversion: automatic generation of metadata conforming to other standards from ESG collection level metadata DIF, for publishing to GCMD discovery system (also, DIF can be

converted to ISO) Dublin Core, for publishing to digital libraries

• Aggregation metadata: Finalize NcML dataset extension Conversion of NcML aggregation metadata into:

- CDML (for CDAT visualization) - LAS (for analysis of data through LAS)

• Ontologies for scientific schemas interoperability

Page 20: The Earth System Grid (ESG)

May 8, 2003 Earth System Grid 24

Collaborations and Impact

• COLLABORATIONS PCM/CCSM modeling community (“ESG schema”) UK eScience office (“ESG schema”) Unidata (NcML)

• FEDERATIONS THREDDS servers GCMD search and discovery engine Digital Libraries

• IMPACT ESG schema could be adopted by a wide scientific community NcML may become standard for XML encoding of netCDF data NcML will be used as standard for Unidata DODS aggregation

server