m.lautenschlager (wdcc, hamburg) / 11.02.05 / 1 training-workshop facilities and sevices for earth...

17
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure (IMDI) Michael Lautenschlager World Data Center for Climate Max-Planck-Institut für Meteorologie / Modelle und Daten, Hamburg Hamburg, 23.02.05

Upload: felicia-posy-thornton

Post on 03-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1

Training-WorkshopFacilities and Sevices for Earth System Modelling

Integrated Model and Data Infrastructure (IMDI)

Michael Lautenschlager

World Data Center for Climate

Max-Planck-Institut für Meteorologie / Modelle und Daten, Hamburg

Hamburg, 23.02.05

Page 2: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 2

Introduction to IMDI

Roots:a) Results of the PRISM Project

(Program for Integrated Earth System Modeling)• EU funded project with about 20 partners• Runtime: Dec. 2001 – Nov. 2004• Project details: prism.enes.org

b) ICSU World Data Center for Climate (WDCC)• M&D together with DKRZ has been approved as WDC

Climate in Sept. 2002• Emphasis is on archiving and dissemination of Earth

system modeling results and related data products• Web access to 140 TB of data• User accounts: 500

Page 3: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 3

Page 4: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 4

Model Standard Environments

SCE: Standard Compile EnvironmentSRE: Standard Run Einvironment

WDCC Data Infrastructure

The CERA database systemUser interfaces to integrate and to extract data

Data processing routines and data formatsData visualisation

Page 5: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 5

Towards the Integrated Model and Data Infrastructure (IMDI)

Important aspects for future work:

• The automatic fill process (operational service)

• How to integrate models (definition of user interface)

• How to integrate data (definition of user interface)

• Networking between archives (international cooperation)

• Primary data publication as a new scientific service (data detection and citation)

Page 6: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 6

Creation of application-orienteddata storage must beautomatic !!!

Automatic Fill Process (AFP)

Page 7: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 7

Archive Data Flow per month

ComputeServer

GlobalFile

System

MassStorageArchive

CERADB

System

60 TB/month

2004: 1 TB/day (peak)

Unix-Files

Application OrientedData Hierarchy

Application OrientedData Hierarchy

Unix-Files

MetadataInitialisation

Important:Automatic fill processhas to be performedbefore correspondingfiles migrate to massstorage archive.

Page 8: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 8

Networking between Archives

• DLR: WDCC catalogue links to external satellite data

• CEOP: mutual data access for in-situ, model and satellite data • BADC: distributed data holding (ERA40 and IPCC), certificates

for authentication

Page 9: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 9

Web Access to WDCC

METADATA: DATA:

GUI: display in appletsoon: html page

downloadJDBC

jblob-script: search downloadJDBC

jblob –f …

http: - xml-download (ISO, DC, …)- html-display

downloadhttp

DOI

URL:http://…

Page 10: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 10

Selection via CERA meta data:

selection of the experiment (=model run)

display of meta data: experiment, quality, datasets

selection of the dataset

display of dataset information

add datasets to “process list”

select time span and download from tape archive to data server and to the client

The Present Applet

Page 11: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 11

jblob Command

Retrieval by script :

• Java based – for Unix/Linux and Windows

• http and JDBC connection

• examples :jblob –datasetname <name> -optionsjblob –datasetid <id> -optionsjblob –showdatasets “<search_string>” (% as wildcard)

Page 12: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 12

http Metadata Access

Retrieval by browser (or Unix command wget):

wget http://mad.dkrz.de/Daten/ … … XML/CERA2WINIq.xsql?&id=50

• XMLISO formathttp

• or:Browser displayas html file

Page 13: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 13

Data Publication:Problem and Solution

Shortcomings in data provision and interdisciplinary useRules of good scientific practise are not taken into account in all cases.

Data sources are widely unknown.

Data are achived without context.

Data cannot be cited as independent entities

Method of solution: publication of primary data as independent entitiesPersitent Identifier with global resolving mechanism for data archive and

context referencing (scientifc datamodel at archive level)

Integration into library catalogues in order to find data together with articles

STD-DOI application profile: meta data kernel + items for electronic publication (interface between scientific data archives and libraries)

Page 14: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 14

Data Publication:Credits in Science

"Citation Index": Scientific efficiency is "measured" by publications.

Extra work for data publication is currently not acknowledged.Data processing, context documentation, quality assurance.

Recommendation: Data publications should be included in the standard scientific "Citation Index".Motivation of the individual scientist.

Connection between person and primary dataset.

Citable Data publicationssupport the rules of good scientific practise.

encourage inter-disciplinary data utilisation.

Make data searchable in library catalogues together with articles

Closes the gap between scientifc literature and related data sources

Page 15: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 15

Data Publication:Criteria for Persistent Identifier Allocation

Critical points are securing of data quality, stable connection between identifier and data entity, definition of independent data entities

Allocation includes data entity definition, metadata implementation (expert data description) and long-term archiving (WDC‘s)

Scientific quality assurance is expected by the author and will be reviewed during the allocation process.

Stable connection between identifier reference and data entity must be ensured (Registration Agency)

Published primary data cannot be changed like published articles.

Page 16: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 16

Page 17: M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure

M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 17

Further information

Project webpage:http://www.std-doi.de

TIB Catalogue:http://tiborder.gbv.de/services/

(Primary Data Search: exk +primaerdaten or WDCC)

WDC Climate:http://www.wdc-climate.de

Assistance:[email protected] [email protected]