m.lautenschlager (wdcc, hamburg) / 11.02.05 / 1 training-workshop facilities and sevices for earth...
TRANSCRIPT
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 1
Training-WorkshopFacilities and Sevices for Earth System Modelling
Integrated Model and Data Infrastructure (IMDI)
Michael Lautenschlager
World Data Center for Climate
Max-Planck-Institut für Meteorologie / Modelle und Daten, Hamburg
Hamburg, 23.02.05
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 2
Introduction to IMDI
Roots:a) Results of the PRISM Project
(Program for Integrated Earth System Modeling)• EU funded project with about 20 partners• Runtime: Dec. 2001 – Nov. 2004• Project details: prism.enes.org
b) ICSU World Data Center for Climate (WDCC)• M&D together with DKRZ has been approved as WDC
Climate in Sept. 2002• Emphasis is on archiving and dissemination of Earth
system modeling results and related data products• Web access to 140 TB of data• User accounts: 500
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 3
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 4
Model Standard Environments
SCE: Standard Compile EnvironmentSRE: Standard Run Einvironment
WDCC Data Infrastructure
The CERA database systemUser interfaces to integrate and to extract data
Data processing routines and data formatsData visualisation
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 5
Towards the Integrated Model and Data Infrastructure (IMDI)
Important aspects for future work:
• The automatic fill process (operational service)
• How to integrate models (definition of user interface)
• How to integrate data (definition of user interface)
• Networking between archives (international cooperation)
• Primary data publication as a new scientific service (data detection and citation)
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 6
Creation of application-orienteddata storage must beautomatic !!!
Automatic Fill Process (AFP)
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 7
Archive Data Flow per month
ComputeServer
GlobalFile
System
MassStorageArchive
CERADB
System
60 TB/month
2004: 1 TB/day (peak)
Unix-Files
Application OrientedData Hierarchy
Application OrientedData Hierarchy
Unix-Files
MetadataInitialisation
Important:Automatic fill processhas to be performedbefore correspondingfiles migrate to massstorage archive.
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 8
Networking between Archives
• DLR: WDCC catalogue links to external satellite data
• CEOP: mutual data access for in-situ, model and satellite data • BADC: distributed data holding (ERA40 and IPCC), certificates
for authentication
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 9
Web Access to WDCC
METADATA: DATA:
GUI: display in appletsoon: html page
downloadJDBC
jblob-script: search downloadJDBC
jblob –f …
http: - xml-download (ISO, DC, …)- html-display
downloadhttp
DOI
URL:http://…
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 10
Selection via CERA meta data:
selection of the experiment (=model run)
display of meta data: experiment, quality, datasets
selection of the dataset
display of dataset information
add datasets to “process list”
select time span and download from tape archive to data server and to the client
The Present Applet
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 11
jblob Command
Retrieval by script :
• Java based – for Unix/Linux and Windows
• http and JDBC connection
• examples :jblob –datasetname <name> -optionsjblob –datasetid <id> -optionsjblob –showdatasets “<search_string>” (% as wildcard)
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 12
http Metadata Access
Retrieval by browser (or Unix command wget):
wget http://mad.dkrz.de/Daten/ … … XML/CERA2WINIq.xsql?&id=50
• XMLISO formathttp
• or:Browser displayas html file
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 13
Data Publication:Problem and Solution
Shortcomings in data provision and interdisciplinary useRules of good scientific practise are not taken into account in all cases.
Data sources are widely unknown.
Data are achived without context.
Data cannot be cited as independent entities
Method of solution: publication of primary data as independent entitiesPersitent Identifier with global resolving mechanism for data archive and
context referencing (scientifc datamodel at archive level)
Integration into library catalogues in order to find data together with articles
STD-DOI application profile: meta data kernel + items for electronic publication (interface between scientific data archives and libraries)
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 14
Data Publication:Credits in Science
"Citation Index": Scientific efficiency is "measured" by publications.
Extra work for data publication is currently not acknowledged.Data processing, context documentation, quality assurance.
Recommendation: Data publications should be included in the standard scientific "Citation Index".Motivation of the individual scientist.
Connection between person and primary dataset.
Citable Data publicationssupport the rules of good scientific practise.
encourage inter-disciplinary data utilisation.
Make data searchable in library catalogues together with articles
Closes the gap between scientifc literature and related data sources
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 15
Data Publication:Criteria for Persistent Identifier Allocation
Critical points are securing of data quality, stable connection between identifier and data entity, definition of independent data entities
Allocation includes data entity definition, metadata implementation (expert data description) and long-term archiving (WDC‘s)
Scientific quality assurance is expected by the author and will be reviewed during the allocation process.
Stable connection between identifier reference and data entity must be ensured (Registration Agency)
Published primary data cannot be changed like published articles.
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 16
M.Lautenschlager (WDCC, Hamburg) / 11.02.05 / 17
Further information
Project webpage:http://www.std-doi.de
TIB Catalogue:http://tiborder.gbv.de/services/
(Primary Data Search: exk +primaerdaten or WDCC)
WDC Climate:http://www.wdc-climate.de
Assistance:[email protected] [email protected]