intelligent distributed data management in earth system science
DESCRIPTION
Intelligent Distributed Data Management in Earth System Science. S. Kindermann, DKRZ, Germany K. Ronneberger, DKRZ, Germany T. Brücher, University of Cologne, Germany H. Ramthun, M&D, Germany M. Stockhause, MPI-Met, IFM-Geomar, Germany. Structure. What is Earthsystem Science about? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/1.jpg)
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Intelligent Distributed Data Management in
Earth System Science
S. Kindermann, DKRZ, GermanyK. Ronneberger, DKRZ, Germany
T. Brücher, University of Cologne, GermanyH. Ramthun, M&D, Germany
M. Stockhause, MPI-Met, IFM-Geomar, Germany
![Page 2: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/2.jpg)
EGEE User Forum `07 Manchester 2
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Structure
• What is Earthsystem Science about?– Typical workflows– Traditional infrastructure
• Why can grid-technology help?– Limits of the current practice
• How do we use this technology?– Conceptual outline of the developing infrastructure – Outline of the developed prototype
• Potential impact and vision– Next steps and challenges
![Page 3: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/3.jpg)
EGEE User Forum `07 Manchester 3
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Motivation: Data in ESSModel Output Data + Observation Data + Analysis Data
Scenario Data
Data related to geo-referenced physical variables
![Page 4: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/4.jpg)
EGEE User Forum `07 Manchester 4
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Collect & Prepare
Visualize4
Analyse
Find & Select
Model DataObservation Data
Analysis Dataset
Result Dataset
Scenario data
3
2
1
„I want to correlate model data from DKRZ with observation data from DWD and satellite data from DLR“
• Contact each data provider• Learn their data search utilities• Find and select data
• Get access rights for datasets at each data provider• Learn their data access / preprocessing services• Get access to sufficient storage facilities• Trigger preprocessing and download data
At central service provider: • start analysis tools• produce undocumented data
• copy to local resources
• create visualization
„has somebody done similiar things i want ?? Can i reuse data for …??“
ESS Data Management Nowadays
![Page 5: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/5.jpg)
EGEE User Forum `07 Manchester 5
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Collect & Prepare
Visualize4
Analyse
Find & Select
AWI, GKSS, …
World Data Centers
Analysis Dataset
Result Dataset
DKRZ,DWD
3
2
1
Bridging C3Grid and EGEE
C3Grid:
• Standardized metadata description
• Uniform discovery of German data providers
• Uniform data access
• Grid based data delivery
EGEE:
• established international collaboration platform
• secure data management
data analysis and data sharing platform
Key component: (ISO) metadata catalog for ESS data in EGEE
C3Grid Middleware
![Page 6: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/6.jpg)
EGEE User Forum `07 Manchester 6
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Fin
d &
sele
ct
Collect
&
pre
pare
an
aly
se
vis
ualize
• Central web-portal: unique entrance point to common central metadata catalogue (Lucene index) and access facility
• Standardized Metadata: hierarchical description of discovery- and some use-aspects of the data (ISO 19115/ISO 19139)
• Standardized data request interface: hide the complexity of specific data access mechanisms and pre-processing functionality (webservice technology)
• Automatic update and republishing of metadata: metadata of data processing is logged, managed and can be harvested (AMGA + java extension, OAI-PMH server )
C3 Grid and EGEE - the components
![Page 7: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/7.jpg)
EGEE User Forum `07 Manchester 7
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(1) EGEE and C3Grid: Discovery
EGEEEGEE
UI
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
LFCCatalog
Web Portal C3
Lucene Index
OAI-PMHserver
Webservice Interface
OAI-PMHserver
AMGAMetadata Catalog
(f) Publish (ISO
19115/19139)
(g) Harvest (OAI-PMH)
WDC Climate, WDC RSAT, WDC Mare, DWD, AWI, PIK, IFMGeomar, MPI-Met, GKSS
DataResource Metadata
(a) Publish (ISO
19115/19139)
(b) Harvest (OAI-PMH)
![Page 8: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/8.jpg)
EGEE User Forum `07 Manchester 8
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(1) EGEE and C3Grid: Data Discovery
![Page 9: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/9.jpg)
EGEE User Forum `07 Manchester 9
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(2) EGEE and C3Grid: Data Upload
![Page 10: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/10.jpg)
EGEE User Forum `07 Manchester 10
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(2) EGEE and C3Grid: Data Upload
EGEEEGEE
UI
DataResource
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
LFCCatalog
Web Portal C3
Lucene Index
Webservice Interface
OAI-PMHserverOAI-PMH
server
AMGAMetadata Catalog
(1) Find & Select
(2) Collect & Prepare
(b) Retrieve (jdbc or archive)
(c) Stage & Provide
Webservice Interface
(a) Reqest (webservice)
(d) notifyWebservice Interface
(f) Transfer &
Register (lcg-tools)
(e) Reqest (webservice)
(g) Register
(Java-API)
Metadata
(f) Publish (ISO
19115/19139)
![Page 11: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/11.jpg)
EGEE User Forum `07 Manchester 11
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(3) EGEE and C3Grid: Data Analysis
![Page 12: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/12.jpg)
EGEE User Forum `07 Manchester 12
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(3) EGEE and C3Grid: Data Analysis
EGEEEGEE
UI
DataResource Metadata
C3Grid data interfaceC3Grid data interface
ClimateData
Workspace
Webservice Interface
SE
CEWNWNWNWNWNWN
(3) Analyse
LFCCatalog
(4) Visualize
Web Portal C3
Lucene Index
Webservice Interface
OAI-PMHserverOAI-PMH
server
AMGAMetadata Catalog
Webservice Interface
(b) submit
(glite)
qflux
qflux
(a) Reqest (webservice)(g)
Harvest (OAI-PMH)
(f) Publish (ISO
19115/19139)
(c) retrieve
(lcg-tools)
(e) Return graphic
(d) Update (Java-
API)
![Page 13: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/13.jpg)
EGEE User Forum `07 Manchester 13
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
(3) Example Workflow
• Example: Humidity flux
(QFLUX)
![Page 14: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/14.jpg)
EGEE User Forum `07 Manchester 14
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Approach in international contextEarth System Grid project
(USA)
C3 Grid/(EGEE) NERC data grid (UK)
Scope
(project)
High performance access of climate model data
Uniform & effective discovery and access of data of various disciplines & types
Harmonized & detailed search and access of data of various disciplines & types
Data stock
(status)
• Homogenous
• Flat-file storage
• Heterogeneous
• Databases & flat-file storage
• Heterogeneous
• Databases & flat-file storage
Data description
(solution)
• Use aspect of data, tools and models
• E.g. NcML for netCDF data
• Discovery and some use aspects
• ISO 19115/ISO 19139
• Content of the data in great detail
• Semantic datamodel (CSML, based on GML)
Data access
(solution)
• Different protocols
• Intelligence at portal
• Uniform access interface
• Intelligence at data provider / grid
• Different protocols
• Link to Data Provider
![Page 15: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/15.jpg)
EGEE User Forum `07 Manchester 15
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Potential Impact
Potential impact on EGEE ESR-community:Provide a framework to easily and consistently
exchange and manage esr-data and tools between EGEE and traditional earth science data-storage-systems
Potential impact on international ESR-community:
Approach is based on international standards (ISO 19139, OAI-PMH) and uniform interfaces (Web services). Thus other data centers and infrastructures can be integrated uniformly
![Page 16: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/16.jpg)
EGEE User Forum `07 Manchester 16
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Next steps
• Expand the demonstrated prototype to a reliable and stable system
• Porting further workflows and some pre-processing functionalities to EGEE
• Enlarge the user community
![Page 17: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/17.jpg)
EGEE User Forum `07 Manchester 17
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Future challenges or missing bricks
• Comprehensive and consistent security context to control access to (restricted) data with a single sign-on– Approach: federated AA infrastructure based on
Shibboleth
• Analysis-services description to improve discovery, use and share possibilities– Approach: adapt ISO19119/19139 as a common
metadata format for analysis-tool description
• Modularized workflows to increase the flexibility and enable intelligent scheduling – Approach: implement a workflow information
service
![Page 18: Intelligent Distributed Data Management in Earth System Science](https://reader035.vdocuments.net/reader035/viewer/2022070417/568154cc550346895dc2cdc3/html5/thumbnails/18.jpg)
EGEE User Forum `07 Manchester 18
Enabling Grids for E-sciencE
INFSO-RI-031688 Deutsches Klimarechenzentrum
Thank You
kindermann @ dkrz.de