the earth system curator: integration technology for ...vb/talks/curator2007.pdffunded by nsf and...

25
The Earth System Curator: integration technology for models and data ESMF Community Meeting, Boulder V. Balaji ([email protected] ) 1 1 Princeton University NOAA/GFDL 30 May 2007 Balaji et al (NOAA/GFDL) Curator 30 May 2007 1 / 23

Upload: others

Post on 28-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

The Earth System Curator: integration technologyfor models and data

ESMF Community Meeting, Boulder

V. Balaji ([email protected] )1

1Princeton University

NOAA/GFDL

30 May 2007

Balaji et al (NOAA/GFDL) Curator 30 May 2007 1 / 23

Page 2: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Outline

1 What is a curatorConvergence of models and dataModel and grid metadata

2 Curator use casesDynamically derived data cataloguesOffline modelsBranch runs

3 Curator implementationsGFDL Curator and CDP CuratorFRE: a prototype for CRELinks with ESMF and MAPLLinks to other projects

4 Summary

Balaji et al (NOAA/GFDL) Curator 30 May 2007 2 / 23

Page 3: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Outline

1 What is a curatorConvergence of models and dataModel and grid metadata

2 Curator use casesDynamically derived data cataloguesOffline modelsBranch runs

3 Curator implementationsGFDL Curator and CDP CuratorFRE: a prototype for CRELinks with ESMF and MAPLLinks to other projects

4 Summary

Balaji et al (NOAA/GFDL) Curator 30 May 2007 3 / 23

Page 4: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

The routine use of Earth System models in researchand operations

Let’s declare that 2000-2010 (the “noughties”) is the decade of thecoming-of-age of Earth system models.

Operational forecasting model-based seasonal forecasts delivered tothe public;

Decision support models routinely run for climate policy, energystrategy, risk pricing.

Fundamental research the use of models to develop a predictiveunderstanding of the earth system and to provide a soundunderpinning for all applications above.

This will require a radical shift in the way we do modeling: aninfrastructure for moving the building, running and analysis of modelsand model output data from the “heroic” mode to the routine mode.

Balaji et al (NOAA/GFDL) Curator 30 May 2007 4 / 23

Page 5: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Curators: a noughties technology

The comparative study of climate simulations (e.g IPCC) acrossmany models has spawned efforts to build uniform access tooutput datasets from major climate models, as well as modelingframeworks that will promote uniform access to the modelsthemselves.The key element in the integration will be a curator. A curatorbegins begins with a crucial insight: that the descriptors used forcomprehensively specifying a model configuration are needed fora scientifically useful description of the model output data as well.Thus the same attributes may be used to specify a model as wellas the model output dataset: thus leading to a convergence ofmodels and data.ESC – the Earth System Curator – is a pilot project buildingprototype elements of such a system. The current project isfunded by NSF and brings together NCAR, Princeton, MIT andGA Tech.

Balaji et al (NOAA/GFDL) Curator 30 May 2007 5 / 23

Page 6: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Curators: a noughties technology

The comparative study of climate simulations (e.g IPCC) acrossmany models has spawned efforts to build uniform access tooutput datasets from major climate models, as well as modelingframeworks that will promote uniform access to the modelsthemselves.The key element in the integration will be a curator. A curatorbegins begins with a crucial insight: that the descriptors used forcomprehensively specifying a model configuration are needed fora scientifically useful description of the model output data as well.Thus the same attributes may be used to specify a model as wellas the model output dataset: thus leading to a convergence ofmodels and data.ESC – the Earth System Curator – is a pilot project buildingprototype elements of such a system. The current project isfunded by NSF and brings together NCAR, Princeton, MIT andGA Tech.

Balaji et al (NOAA/GFDL) Curator 30 May 2007 5 / 23

Page 7: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Curators: a noughties technology

The comparative study of climate simulations (e.g IPCC) acrossmany models has spawned efforts to build uniform access tooutput datasets from major climate models, as well as modelingframeworks that will promote uniform access to the modelsthemselves.The key element in the integration will be a curator. A curatorbegins begins with a crucial insight: that the descriptors used forcomprehensively specifying a model configuration are needed fora scientifically useful description of the model output data as well.Thus the same attributes may be used to specify a model as wellas the model output dataset: thus leading to a convergence ofmodels and data.ESC – the Earth System Curator – is a pilot project buildingprototype elements of such a system. The current project isfunded by NSF and brings together NCAR, Princeton, MIT andGA Tech.

Balaji et al (NOAA/GFDL) Curator 30 May 2007 5 / 23

Page 8: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Linking model and data frameworks

Community data frameworks (e.g ESG, an ESC partner) are underdevelopment, at various institutions, informally linked by the GO-ESSP.For model output data to be scientifically useful, the researcher musthave some knowledge of how the data was produced. Model datarequires a model’s eye view description of the data, another layer ofmetadata, which might include:

Description of model components: e.g GEOS-5 atmosphere, landand sea ice coupled to MIT ocean.

Description of grid configurations and resolutions.

Choice of physics packages and input parameters.

Model state and its fields.

ESMF and PRISM are emerging standards that allow the developmentof the model metadata layer, based on the state data structures and itsbase classes. (Think State , Grid , LocStream , . . . )

Balaji et al (NOAA/GFDL) Curator 30 May 2007 6 / 23

Page 9: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Semantic vs. syntactic, discovery vs. use

Descriptive metadata can be succinct, and can be used to discovercertain aspects of the data. But almost any serious use requiresdeeper knowledge. The boundary between discovery and use,semantic and syntactic, is blurred by the use of controlled vocabulariesand ontologies.

Graphics such as this from Heldand Soden (2006) are so routinelyproduced from the IPCC AR4database that we’ve ceased tomarvel at it. This is a composite ofoutput from 20 models worldwide,run with minimal coordination.

Balaji et al (NOAA/GFDL) Curator 30 May 2007 7 / 23

Page 10: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Model and grid metadata

Physical fields: standard vocabulary for describing the relevantphysical quantities (viz. CF standard_name ).

Geospatial information: location information: latitude, longitude,elevation. This set of standards unites a much largercommunity (mobile phones, GIS), in which our communityhas begun to play a role. We can provide some usefulextensions toward 3D and 4D data.

Grid structure: interrelations between grids, between points and grids.With this information available, it is perhaps possible toperform regridding and subsampling of data by userrequest, on the archive servers.

Model metadata: describing data source comprehensively, relativelyeasy for observations, harder for models but canasymptote toward completeness starting from currentPCMDI standard. Two levels of model metadata:components and applications.

Balaji et al (NOAA/GFDL) Curator 30 May 2007 8 / 23

Page 11: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Model metadata

Application metadata:experiment, scenario,institution, contact: currentlycovered by CF/CMOR.

Component metadata:physical description ofcomponent: currently coveredby CMOR, extended by NMM.

Coupler metadata: inventoryof export and import fields,interpolation methods.Currently covered by OASIS4XML, not exported to modeloutput. Associated with anXGrid.

Application

GridComp GridCompCoupler

Grid GridXGrid

Field Field Field

Field Field Field

Balaji et al (NOAA/GFDL) Curator 30 May 2007 9 / 23

Page 12: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Grid metadataThe Mosaic Gridspec is now underconsideration as a draft CFstandard. Contains information fordifferentiation, integration andregridding on generalized grids.Currently implemented for couplingmodels at GFDL (cubed-sphereatmosphere, tripolar ocean, lat-lonland and river grids) and as anXML schema for data web servicesin the European Genie project.

Balaji et al (NOAA/GFDL) Curator 30 May 2007 10 / 23

Page 13: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Outline

1 What is a curatorConvergence of models and dataModel and grid metadata

2 Curator use casesDynamically derived data cataloguesOffline modelsBranch runs

3 Curator implementationsGFDL Curator and CDP CuratorFRE: a prototype for CRELinks with ESMF and MAPLLinks to other projects

4 Summary

Balaji et al (NOAA/GFDL) Curator 30 May 2007 11 / 23

Page 14: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Dynamically derived data catalogues

From Guilyardi (2006): for the most part, tables such as this arelaboriously filled by hand. Currently deployed data frameworks(PCMDI, GFDL, DDC) are becoming capable of dynamicallygenerating these tables.

Balaji et al (NOAA/GFDL) Curator 30 May 2007 12 / 23

Page 15: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Disease vectors in a changing climate

Koelle et al, Nature, 2005: Refractory periods and climate forcing incholera dynamics. Requires monthly forcing data, no feedback. Thisusage is typical of IPCC WG2 users.

Balaji et al (NOAA/GFDL) Curator 30 May 2007 13 / 23

Page 16: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Statistical downscaling of climate change projections

Hayhoe et al, PNAS, 2004: Emissions pathways, climate change, andimpacts on California. Uses daily data for “heat degree days” and otherderived quantities.What if it requires data beyond that provided by IPCC AR4 SOPs(1960-2000)?

Balaji et al (NOAA/GFDL) Curator 30 May 2007 14 / 23

Page 17: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Alternate energy sources

Keith et al, PNAS, 2005: The influence of large-scale wind power onglobal climate.Feedback on atmospheric timescales: but does not require model tobe retuned.

Balaji et al (NOAA/GFDL) Curator 30 May 2007 15 / 23

Page 18: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Outline

1 What is a curatorConvergence of models and dataModel and grid metadata

2 Curator use casesDynamically derived data cataloguesOffline modelsBranch runs

3 Curator implementationsGFDL Curator and CDP CuratorFRE: a prototype for CRELinks with ESMF and MAPLLinks to other projects

4 Summary

Balaji et al (NOAA/GFDL) Curator 30 May 2007 16 / 23

Page 19: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

GFDL Curator and CDP Curator

GFDL Curator currently creates dynamic datacatalogues from metadata (currently not fromthe Curator schema itself, but from metadataembedded in the datasets: integration withschema is underway).

The CDP Curatorprovides an interfaceboth query and archivalof ESMF components.

Balaji et al (NOAA/GFDL) Curator 30 May 2007 17 / 23

Page 20: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Operational use of model frameworks

The next stage in the evolution of frameworks is the addition of aruntime environment.

Source code maintenance across many repositories;

Model configuration, launching and regression testingencapsulated in XML descriptors;

Relational database for archived model results;

Standard and custom diagnostic suites;

Branching: "descent with modification".

Balaji et al (NOAA/GFDL) Curator 30 May 2007 18 / 23

Page 21: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

FRE: a prototype for CRE

The FMS Runtime Environment (FRE) describes all the steps forconfiguring and running a model jobstream; archiving, postprocessingand analysis of model results.fremake, frerun, frepp, frecheck, ...The Regression Test Suite (RTS) is a set of tests that are runcontinuously on a set of FMS models to maintain and verify codeintegrity.FRE was successfully used at GFDL for the development of climatemodels targeted for IPCC (CM2.0 and CM2.1) and management ofGFDL’s IPCC data. We are currently merging FRE and ESC schema tomake FRE serve as a prototype Curator Runtime Environment.http://www.gfdl.noaa.gov/˜fms/fre

Balaji et al (NOAA/GFDL) Curator 30 May 2007 19 / 23

Page 22: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

ESC links to ESMF and MAPL

ESMF data structures (Grid ,Field , LocStream ,ConfigAttr , State , . . . ) areideal containers for holding themetadata. Tools are underdevelopment for extracting theCurator metadata from ESMFcomponents registered underthe Community Data Portal.

MAPL specifications ofcoupling will be encoded inCurator coupler metadata,which itself is being developedon the basis of the OASIS4(PRISM) schema.

Application

GridComp GridCompCoupler

Grid GridXGrid

Field Field Field

Field Field Field

Balaji et al (NOAA/GFDL) Curator 30 May 2007 20 / 23

Page 23: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

ESC links with other projects

Earth System Grid is developing key technology for serving datafrom multiple modeling centers (IPCC, NARCCAP), and arepartners in the metadata definition effort focused on AR5.

Numerical Model Metadata (NMM) based in the University ofReading defines discovery metadata for Earth System Models,and have converged with ESC schema where there is overlap.The Metafor proposal seeks to formalize the relationship further.

The FLUME project at the UK Met Office is developing similarmetadata and is interested in looking at auto-generation of FLUME"glue" using the common information model. Also part of Metafor.

Balaji et al (NOAA/GFDL) Curator 30 May 2007 21 / 23

Page 24: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Outline

1 What is a curatorConvergence of models and dataModel and grid metadata

2 Curator use casesDynamically derived data cataloguesOffline modelsBranch runs

3 Curator implementationsGFDL Curator and CDP CuratorFRE: a prototype for CRELinks with ESMF and MAPLLinks to other projects

4 Summary

Balaji et al (NOAA/GFDL) Curator 30 May 2007 22 / 23

Page 25: The Earth System Curator: integration technology for ...vb/talks/curator2007.pdffunded by NSF and brings together NCAR, Princeton, MIT and GA Tech. Balaji et al (NOAA/GFDL) Curator

Summary

Curators are a natural outgrowth of frameworks. By blurring theedges between models and data, between objects and services,they are a typical "fuzzy boundary" technology. (Roger Sessions,ACM Queue, 2004: Fuzzy Boundaries: Objects, Components,and Web Services.)

The ESC project is drawing up a metadata architecture which canaggregate within a single information model, metadata layers (e.gcomponent, coupler, campaign) to be assigned to differentdomains of expertise.

A functional schema-driven runtime environment allowingcomposition, configuration, running, archival, and analysis ofEarth system model data is already a (limited) reality, and we arecurrently probing its limits and imposing generalizations.

http://www.earthsystemcurator.org

Balaji et al (NOAA/GFDL) Curator 30 May 2007 23 / 23