www.eudat.eu eudat receives funding from the european union's horizon 2020 programme - dg...

8
www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 Processing services in EUDAT EUDAT GEF status and plans Christian Pagé, CERFACS Earth System Science data management Session RDA 6 th Plenary Paris 23-25 September 2015

Upload: dinah-doreen-king

Post on 13-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

www.eudat.eu

EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065

Processing services in EUDATEUDAT GEF status and plans

Christian Pagé, CERFACSEarth System Science data management SessionRDA 6th PlenaryParis 23-25 September 2015

Science Drivers

Data available for scientific analysis: a very large trend

Limitations in data access means limitations in data analytics and scientific results

Download locally then Analyze: a workflow that cannot be sustained

Climate researchersImpact researchers

EUDAT Generic GEF ideas: orchestrate / multi-communities / services

GEF Web Service

Generic API

iRODS data access

Abstracting iRODS

(flexibility)

or

Community-specific

Federation

ENES/ESGFOR

WebLicht, etc.

HadoopData

Federation

Data Federation

Common Metadata Semantics:

Searching across communities

Common AAI:Authentication and

Authorization across communities

Extensions

Processing Services Catalogs:

Getting information about communities'

Services

Common

Communities-specific

Processing/workflows

Requests using PIDs

PPIDs for identification of data products

EUDAT Generic GEF ideas: orchestrate / multi-communities / services

http://github.com/GEFx/gef 4

User request

GEF web service

GEF Executor iRODS

Backend

Backend

App container

App container

• Prototype implementation• Spec in progress

• Unclear API direction, more discussions needed

Thanks to Emanuel Dima, EKUT (CLARIN)

ESGF WPS API: Future computing nodes

Goal: perform data analysis near the data storage

Better data accessMove away from the download/analyze workflow

ESGF WPS API: Future computing nodes

Develop general APIs for exposing ESGF distributed compute resources to multiple analysis tools Not yet develop the server-side processing capabilities: focus on the API

First Steps Use Case approachUsed the Goddard Climate Data Services (CDS) API and server-side processingCompared different APIs

ESGF WPS API: Future computing nodes

Next StepsTechnology Exploration for server-side processing

Continue the exploration of HDFS and the other technologies (e.g. Spark)Exploration of high performance file systems

ESGF APISpecification of a ESGF WPS APITest implementation

Challenges to orchestrate too!

Federated environment: Orchestration of the calculation from the requested computing node

Advanced Scheduler neededWhere one should perform calculations if data is available at multiple data nodes?Which calculation services are available, if any?

Results from several computing nodes will need to be gathered and combined

Many challenges ahead!