metadata concepts / use in climate research

27
Metadata Concepts / Use in Climate Research Stephan Kindermann , Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany

Upload: moses

Post on 16-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Metadata Concepts / Use in Climate Research. Stephan Kindermann , Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany. Overview. Metadata descriptions: sources, usage  data level, preservation level, model level, domain knowledge level - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Metadata Concepts / Use  in Climate Research

Metadata Concepts / Use in Climate Research

Stephan Kindermann, Martina Stockhause

German Climate Computing Center (DKRZ)

Hamburg, Germany

Page 2: Metadata Concepts / Use  in Climate Research

Overview

Metadata descriptions: sources, usage

data level, preservation level, model level, domain knowledge level

Metadata standards, IT-principles

Page 3: Metadata Concepts / Use  in Climate Research

A) Metadata descriptions: sources, usage

(I) Data Description Level:

source: model run output format: gib, netcdf3/4 container formats (including basic metadata) metadata homogenization („Climate and Forecast Convention (CF)“

conformance, CMOR2 compliance, controlled vocabs)

usage: analysis tools, data access script, data search ( „linked data principle“)

(II) Data Preservation Level:

target: legacy data centers (e.g. WDCC) format: internal DB, various external formats, e.g. ISO 19139, DIF, .. usage: long term data storage and access, citation e.g. using DOIs

Page 4: Metadata Concepts / Use  in Climate Research

A) Metadata descriptions: sources, usage

(IIl) Model Description Level: source: Researcher interviews, online questionnaire

format: CIM (Climate Metadata for Climate Modelling Digital Repositories - Metafor FP7)

Con-CIM: UML, APP-CIM: XSD + vocabs)

usage: model intercomparison, scientific portals, information space browsing / search

(lV) Semantic Annotion Level: source: data metadata, model metadata, domain knowledge

metadata format: OWL (RDF) usage: user navigation in portals, „faceted search“ etc. deployments: Earth System Grid CMIP5 portal, IS-ENES portal

Page 5: Metadata Concepts / Use  in Climate Research

.. Short Background Info ..

The Fifth Coupled Model Intercomparison Project (CMIP5)

– Sponsored by the WMO WGCM

– Quality Controlled Data to (eventually) appear in the IPCC Data Distribution Centre…

• World Wide Data Management Infrastructure building effort, consistent wflow from

producers to consumers...

In Numbers:

Simulations:~90,000 years~60 experiments~20 modelling centres using~30 major(*) model configurations~2 million output datasets~10's of petabytes of output

~2 petabytes of CMIP5 requested output~1 petabyte of CMIP5 “replicated”output– Which will be replicated at BADC& DKRZ, to arrive in 2010/2011!~10 TB of land-biochemistry (from thelong term experiments alone).

Page 6: Metadata Concepts / Use  in Climate Research

B) Metadata standards, IT principles

(I) Data Description Level:

Grib, netcdf

data containers

10`s of PBytes

Metadata

Data

File naming convention based on CVs building uniform URIs (DRS, Data Reference Syntax)

Activity/Product/Institute/Model/Exp/frequ/realm/Variable/ensemble

Data serversMD catalogue

servers

wget http://server.org/Activity/Product/../ensemble

Enabling „linked data“

Page 7: Metadata Concepts / Use  in Climate Research

B) Metadata standards, IT principles

(II) Data Preservation Level:

CERA2 DB

schema

OWL conceptual model

Tape Archive

search API

QC, DOI assignment

, ..

WDCC Metadata Concept

CERA GUI IS-ENES Portal …

•Scalability

•SustainabilityCommonCV

•Flexibility

•User friendly GUIs

OAI-PMH

ISO 19139

Page 8: Metadata Concepts / Use  in Climate Research

B) Metadata standards, IT principles

(III) Model Description Level:

Metafor FP7 project: Common Information Model (CIM)

Formal metadata model of the climate modelling process

It includes descriptions of the experiments being undertaken, the

simulations being run in support of these experiments, the software

models and tools being used to implement the simulations and the

data generated by the software.

CMIP5 use case: CV collection, CMIP5 questionnaire

Page 9: Metadata Concepts / Use  in Climate Research

CONCIM (UML)

APPCIM (XSD)

CIM Instances (interliked XML files)

ISO, Geographic Markup Language

(GML) series

Automatic translation

CMIP5 portal(s)IS-ENES portal

Metafor catalogue

Metafor CIM overview

Page 10: Metadata Concepts / Use  in Climate Research

Metadata collection

Page 11: Metadata Concepts / Use  in Climate Research
Page 12: Metadata Concepts / Use  in Climate Research

Automatic XML RDF translation

CMIP5 gateway(s)

IS-ENES1 portal

1Infrastructure for the European Network for Earth System Modelling

ESG OWL instances

Page 13: Metadata Concepts / Use  in Climate Research

(CON)CIM Overview

Quality

ISO

Shared

Data

Activity: simulations in support of experiments

Software(hierarchical model components, Coupled together)

Grids

Page 14: Metadata Concepts / Use  in Climate Research

B) Metadata standards, IT principles

(IV) Semantic Annotation Level

CIMXML

RDF

Data objectXML

Communitycontent

Content Management

System

RDF

TripleStore

Portal(s)

ESG Gateways

IS-ENESPortal

Evolving OWL model

TripleStore

OWL ontologies:

http://ontologies.ucar.edu/owl

Rel.DB

Page 15: Metadata Concepts / Use  in Climate Research

CMIP5 Quality Control

Files Data Metadata CIM Metadata

Datain prescribedDRS Syntax

Data QualityChecks L2

MD QualityChecks L2

THREDDSData Server

MD on data

Metafor / CIMQuestionnaire

MD onmodel+simulation

QC DB

Quality MD

MetadataRepository

Data MD Information MD

Page 16: Metadata Concepts / Use  in Climate Research

CMIP5 STD-DOI Publication

TIB:DOIRegistrationAgency

Data Node Metadata

THREDDSData Server

MD on data

QC DB

QualityMD

Data MD InformationMD

Filesystem

Data

LongtermArchive

Data QualityChecks L3double check,cross checks

STD-DOICatalogue

STD-DOI MD Information MD

WDCC:DOI Publication Agent

DOI Target Page

access todata andmetadata

Metafor / CIMMD on

model+simulation+data+quality

Page 17: Metadata Concepts / Use  in Climate Research

B) Metadata standards, IT principles

(IV) Semantic Annotation Level

CIMXML

RDF

Data objectXML

Communitycontent

Content Management

System

RDF

TripleStore

Portal(s)

ESG Gateways

IS-ENESPortal

Evolving OWL model

TripleStore

OWL ontologies:

http://ontologies.ucar.edu/owl

Rel.DB

Page 18: Metadata Concepts / Use  in Climate Research

IS-ENES Info Portal

Page 19: Metadata Concepts / Use  in Climate Research
Page 20: Metadata Concepts / Use  in Climate Research

2010-07-07 16:49:13 INFO triplestorefill.utility Adding item<ComponentModel at /test7/echam> with ID echam athttp://localhost:8080/test7/echam2010-07-07 16:49:13 INFO triplestorefill.sesameconnector Storing RDF...(1118 byte)2010-07-07 16:49:13 INFO triplestorefill.sesameconnector RDF data:@prefix foaf: <http://xmlns.com/foaf/0.1/> .@prefix owl: <http://www.w3.org/2002/07/owl#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .@prefix isenes: <http://www.enes.org/isenes#> .isenes:echam rdf:type isenes:ComponentModel .isenes:echam foaf:page <http://plone.dkrz.de/test7/echam> .<http://plone.dkrz.de/test7/echam> foaf:topic isenes:echam .isenes:echam dc:title "ECHAM" .isenes:echam rdfs:label "ECHAM" .isenes:echam rdfs:comment "Global circulation model" .isenes:dkrz isenes:isResponsibleFor isenes:echam .isenes:echam isenes:hasResponsible isenes:dkrz .isenes:joachim-biercamp rdfs:label "Joachim Biercamp" .isenes:joachim-biercamp rdf:type foaf:Person .isenes:dkrz rdfs:label "DKRZ" .isenes:dkrz rdf:type foaf:Organization .isenes:joachim-biercamp isenes:isMemberOf isenes:dkrz .isenes:dkrz isenes:hasMember isenes:joachim-biercamp .isenes:dkrz dc:title "DKRZ" .isenes:joachim-biercamp foaf:mbox "[email protected]"

„save“

Triple Store

Page 21: Metadata Concepts / Use  in Climate Research

(B) From a user`s perspective

Bildchen: Plone seite mit „related info“ portlet

Page 22: Metadata Concepts / Use  in Climate Research

(B) From a user`s perspective

Bildchen: Plone Seite nach Klick auf „related“ link: faceted search

Page 23: Metadata Concepts / Use  in Climate Research

Summary

• international CMIP5 / IPCC effort is key driver for collection / standardization of CVs, Metadata, conceptual models (Ontologies)

• Metadata mainly used for model intercomparison, uniform data search / access + data processing

Prepare for Climate Impact Community use cases !!

Page 24: Metadata Concepts / Use  in Climate Research

..workshop reminder..

- Usage and quality of descriptive keyword type of metadata used in your domain to manage

data.

- Types of usages of this metadata (management, retrieval, research statistics, machine

processing, etc).

- The standards used for your metadata descriptions (structure, elements, vocabularies).

- Adherence to common IT principles (explicit syntax, registered semantics, use of PIDs, etc).

- Compliance with the recommendations to be found in the report of the e-IRG task force on

Data Management http://www.e-irg.eu/publications/e-irg-task-force-reports.html

..therefore we would like the presenters to focus on a few points allowing all of us to draw conclusions at the end:

Page 25: Metadata Concepts / Use  in Climate Research

Methodology to create CMIP 5 CIM instancaes

Page 26: Metadata Concepts / Use  in Climate Research

Producers: providers of models, tools, model

results, HPC ecosystem, Grid .., community

Motivation Consumers: ENES community, impact

community

Virtual Earth System Modeling Resource Centre

Portal

E-infrastructure components

GovernanceAgreements,

Commitments,Sociology,..

TicketingCollaboration

Metadata (CIM,..) Protocols

APIs

AAICMIP5/AR5/+data services

Page 27: Metadata Concepts / Use  in Climate Research

IS-ENES vERC Portal

(A) Community info presentation (models, tools, descriptions,..)

Content Management Sytem (CMS, Collab.Tool)

Requirement E-Infra component Technology used

Plone + IS-ENES „content-types“

(C) Data portal to AR5 archives Web Framework Zope/Plone plugin(s)

(F) Additional value provisioning

„Cross-selling“Semantic interlinking

RDF triple store (Sesame)

(D) CIM metadata (external) Metafor service(s)(external) ESG-gateway

(E) External content / metadata collection

Web service (proxies)

Info (XML) harvesterPython info collector based using Atom, OAI-PMH,.. protocols

(B) Community development support Project Management / Ticketing Tool

Redmine