www.eudat.eu eudat receives funding from the european union's horizon 2020 programme - dg...
TRANSCRIPT
www.eudat.eu
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Processing services in EUDATEUDAT GEF status and plans
Christian Pagé, CERFACSEarth System Science data management SessionRDA 6th PlenaryParis 23-25 September 2015
Science Drivers
Data available for scientific analysis: a very large trend
Limitations in data access means limitations in data analytics and scientific results
Download locally then Analyze: a workflow that cannot be sustained
Climate researchersImpact researchers
EUDAT Generic GEF ideas: orchestrate / multi-communities / services
GEF Web Service
Generic API
iRODS data access
Abstracting iRODS
(flexibility)
or
Community-specific
Federation
ENES/ESGFOR
WebLicht, etc.
HadoopData
Federation
Data Federation
Common Metadata Semantics:
Searching across communities
Common AAI:Authentication and
Authorization across communities
Extensions
Processing Services Catalogs:
Getting information about communities'
Services
Common
Communities-specific
Processing/workflows
Requests using PIDs
PPIDs for identification of data products
EUDAT Generic GEF ideas: orchestrate / multi-communities / services
http://github.com/GEFx/gef 4
User request
GEF web service
GEF Executor iRODS
Backend
Backend
App container
App container
• Prototype implementation• Spec in progress
• Unclear API direction, more discussions needed
Thanks to Emanuel Dima, EKUT (CLARIN)
ESGF WPS API: Future computing nodes
Goal: perform data analysis near the data storage
Better data accessMove away from the download/analyze workflow
ESGF WPS API: Future computing nodes
Develop general APIs for exposing ESGF distributed compute resources to multiple analysis tools Not yet develop the server-side processing capabilities: focus on the API
First Steps Use Case approachUsed the Goddard Climate Data Services (CDS) API and server-side processingCompared different APIs
ESGF WPS API: Future computing nodes
Next StepsTechnology Exploration for server-side processing
Continue the exploration of HDFS and the other technologies (e.g. Spark)Exploration of high performance file systems
ESGF APISpecification of a ESGF WPS APITest implementation
Challenges to orchestrate too!
Federated environment: Orchestration of the calculation from the requested computing node
Advanced Scheduler neededWhere one should perform calculations if data is available at multiple data nodes?Which calculation services are available, if any?
Results from several computing nodes will need to be gathered and combined
Many challenges ahead!