“curator” db design curator meeting, gfdl, sep 20

20
curator” curator” DB design DB design Curator meeting, GFDL, Sep 20

Upload: avis-jacobs

Post on 04-Jan-2016

230 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: “curator” DB design Curator meeting, GFDL, Sep 20

““curator” curator” DB designDB design

Curator meeting, GFDL, Sep 20

Page 2: “curator” DB design Curator meeting, GFDL, Sep 20

22

Why RDBMS Why RDBMS

A lot of information: A lot of information: Model metadataModel metadata Experiments metadataExperiments metadata Institution/user metadataInstitution/user metadata Data metadataData metadata

Mostly it’s in textual formMostly it’s in textual form

Information is internally linked tightly that can be easy to Information is internally linked tightly that can be easy to express by means of relational databases.express by means of relational databases.

Relational databases have well developed means for Relational databases have well developed means for searching and extracting procedures searching and extracting procedures (SQL query language and (SQL query language and program interfaces for any language) program interfaces for any language) as for local as well as for as for local as well as for remote userremote user..

Very reliable, safety technology. Very reliable, safety technology.

Curator meeting, GFDL, Sep 20

Page 3: “curator” DB design Curator meeting, GFDL, Sep 20

33

Desirable Features of Model Data FactoryDesirable Features of Model Data Factory

Relational Database storing metadata, containing Relational Database storing metadata, containing description of description of model components and model configurationmodel components and model configuration scenariosscenarios postprocessing (model output and CMOR) directivespostprocessing (model output and CMOR) directives experimentsexperiments variablesvariables formalized rules of Quality Controlformalized rules of Quality Control data locations data locations task schedulertask scheduler users and groups accountsusers and groups accounts

XML as data exchange formatXML as data exchange format for compliance with FREfor compliance with FRE working format of existing third party softwareworking format of existing third party software good fitted for hierarchical metadata descriptiongood fitted for hierarchical metadata description prevalent in world, easy to exchange with others Data Portalsprevalent in world, easy to exchange with others Data Portals

Model Builder (FMS Runtime Environment in GFDL) Model Builder (FMS Runtime Environment in GFDL) checks out available model components from DBchecks out available model components from DB chooses model datasets from DBchooses model datasets from DB sets postprocessing directives sets postprocessing directives checks components and configurations compatibilitychecks components and configurations compatibility builds executable application and runs it builds executable application and runs it write metadata about experiment into DB (model configuration, scenario, write metadata about experiment into DB (model configuration, scenario,

project, organization/user, postprocessing)project, organization/user, postprocessing)

Curator meeting, GFDL, Sep 20

Page 4: “curator” DB design Curator meeting, GFDL, Sep 20

44

Desirable Features of Model Data Factory Desirable Features of Model Data Factory (continue)(continue)

Climate Model Output Rewriter (CMOR) Climate Model Output Rewriter (CMOR) subsystemsubsystem prepares data consistently with specific project requirementsprepares data consistently with specific project requirements

Data PublisherData Publisher transfer data to Data Portal storage in accordance to settings from DBtransfer data to Data Portal storage in accordance to settings from DB

Data Portal Software PackageData Portal Software Package Configuration Manager Configuration Manager (configures Aggregation Server and Data Portal Interface)(configures Aggregation Server and Data Portal Interface) Search Catalog Engine Search Catalog Engine Data Subsampling EngineData Subsampling Engine Data Computation Engine Data Computation Engine Data Visualization Data Visualization Data Delivery ManagerData Delivery Manager

Curator meeting, GFDL, Sep 20

Page 5: “curator” DB design Curator meeting, GFDL, Sep 20

55

Standard scenario of functioning Model Data Factory Standard scenario of functioning Model Data Factory (ideal picture)(ideal picture)

Scientist builds model in FRE using available model components, datasets Scientist builds model in FRE using available model components, datasets and forcing scenario.and forcing scenario.

FRE puts metadata about built model, scenario, experiment into “curator” FRE puts metadata about built model, scenario, experiment into “curator” DB and runs experiment; DB and runs experiment;

Postprocessing subsystem extracts metadata about postprocessing plan Postprocessing subsystem extracts metadata about postprocessing plan from “curator” DB and executes it, and on finish puts metadata about from “curator” DB and executes it, and on finish puts metadata about processed experiment back into DB.processed experiment back into DB.

Data Publisher (DP) regularly checks “curator” DB for new experiments Data Publisher (DP) regularly checks “curator” DB for new experiments marked as “public” and if finds any invokes CMOR.marked as “public” and if finds any invokes CMOR.

CMOR goes to “curator” DB for metadata and processes needed data CMOR goes to “curator” DB for metadata and processes needed data following metadata instructions.following metadata instructions.

DP calls QAC and then transfers data to Data Portal storage.DP calls QAC and then transfers data to Data Portal storage.

Configuration Manager configures Aggregation Server and Data Portal Configuration Manager configures Aggregation Server and Data Portal Interface and puts records about new public data in “curator” DB.Interface and puts records about new public data in “curator” DB.

End of process, data is ready to go.End of process, data is ready to go.

Curator meeting, GFDL, Sep 20

Page 6: “curator” DB design Curator meeting, GFDL, Sep 20

66

Common functionality schema of Common functionality schema of ‘Model Data Factory’‘Model Data Factory’

Curator meeting, GFDL, Sep 20

Page 7: “curator” DB design Curator meeting, GFDL, Sep 20

77

Database Compartments:Database Compartments:

Model Metadata CompartmentModel Metadata Compartment contains models’ descriptions, allows to build coupled model of needed configurationcontains models’ descriptions, allows to build coupled model of needed configuration

Variables CompartmentVariables Compartment List of all related physical variables List of all related physical variables

Workflow CompartmentWorkflow Compartment contains scenarios, experiments, institutions, projects and users infocontains scenarios, experiments, institutions, projects and users info

Postprocessing CompartmentPostprocessing Compartment defines postprocessing plan for conducting experimentdefines postprocessing plan for conducting experiment

Data Portal CompartmentData Portal Compartment contains info about experiments datacontains info about experiments data

Database ‘Database ‘curatorcurator ’’ designdesign

Curator meeting, GFDL, Sep 20

Page 8: “curator” DB design Curator meeting, GFDL, Sep 20

88

MySQL DB MySQL DB CURATORCURATOR

Curator meeting, GFDL, Sep 20

Page 9: “curator” DB design Curator meeting, GFDL, Sep 20

99

Model Metadata CompartmentModel Metadata Compartment(in development)(in development)

Coupled_Models

Model_List

Component_Medias

Models

Experiments

Workflow Compartment

Variables

Variables Compartment

Curator meeting, GFDL, Sep 20

Page 10: “curator” DB design Curator meeting, GFDL, Sep 20

1010

Data Samples from Model CompartmentData Samples from Model Compartment

Components_Medias Coupled_Models

Model_List

Models

Curator meeting, GFDL, Sep 20

Page 11: “curator” DB design Curator meeting, GFDL, Sep 20

1111

Variables CompartmentVariables Compartment

Projects

Workflow Compartment

Variables Variable_Bundles

Variable_ListsVariable_List_Contents

Proj_Var_Names

Curator meeting, GFDL, Sep 20

Page 12: “curator” DB design Curator meeting, GFDL, Sep 20

1212

Variable_Lists Variable_List_Contents

Data Sample from Variables CompartmentData Sample from Variables Compartment

Proj_Var_Names Variables

Variable_Bundles

Curator meeting, GFDL, Sep 20

Page 13: “curator” DB design Curator meeting, GFDL, Sep 20

1313

Workflow Compartment Workflow Compartment (in development)(in development)

Institutions GFDL_USERS

Experiment_Status

Realization

Projects

Experiments

Scenarios

Curator meeting, GFDL, Sep 20

Page 14: “curator” DB design Curator meeting, GFDL, Sep 20

1414

Data Samples from Workflow CompartmentData Samples from Workflow Compartment

Experiments

Scenarios

Curator meeting, GFDL, Sep 20

Page 15: “curator” DB design Curator meeting, GFDL, Sep 20

1515

Coupled_Models

Postprocessing CompartmentPP_Units Post_Proc

PP_Content

Data Samples from Postprocessing CompartmentData Samples from Postprocessing Compartment

PP_Units PP_Content

Variable_Lists

ProjectsGFDL_USERS

Average_Periods

Curator meeting, GFDL, Sep 20

Page 16: “curator” DB design Curator meeting, GFDL, Sep 20

1616

Data Portal CompartmentData Portal Compartment

MissedData_Descriptors

Data_GridsData_Files

Variables

Experiments

Variable_Bundles

Coupled_Models

Curator meeting, GFDL, Sep 20

Page 17: “curator” DB design Curator meeting, GFDL, Sep 20

1717

Data Samples from Data Portal CompartmentsData Samples from Data Portal Compartments

Data_Files

Data_Grids

MissedData_Descriptors

Curator meeting, GFDL, Sep 20

Page 18: “curator” DB design Curator meeting, GFDL, Sep 20

1818

““curator” DB is in use now: curator” DB is in use now:

CM2.0CM2.0 CM2.1CM2.1

Curator meeting, GFDL, Sep 20

Page 19: “curator” DB design Curator meeting, GFDL, Sep 20

1919

Future DevelopmentFuture Development

Bring DB terms to conventional terminology.Bring DB terms to conventional terminology.

Set up model metadata schema standards and create Set up model metadata schema standards and create tables in “curator” DB following this schema. tables in “curator” DB following this schema.

Fill these tables with real metadata extracted from models Fill these tables with real metadata extracted from models of GFDL, CCSM, MIT and from ESMF Component Database.of GFDL, CCSM, MIT and from ESMF Component Database.

Implement tables for observation data metadata.Implement tables for observation data metadata.

Implement DODS aggregated data support.Implement DODS aggregated data support.

Build XML bridge for XML transcoding DB input/outputBuild XML bridge for XML transcoding DB input/output

Curator meeting, GFDL, Sep 20

Page 20: “curator” DB design Curator meeting, GFDL, Sep 20

2020

ENDEND

Questions?Questions?

Suggestions? Suggestions?

Objections?Objections?

Thanks!Thanks!

Curator meeting, GFDL, Sep 20