xdc (extreme datacloud) · xdc objectives the extreme datacloud is a software development and...

31
eXtreme DataCloud is co-funded by the Horizon2020 Framework Program – Grant Agreement 777367 Copyright © Members of the XDC Collaboration, 2017-2020 Data Management for extreme scale computing XDC (eXtreme DataCloud) Paul Millar [email protected] 2 nd Rucio Community Workshop 28 th February– 1 st March 2019, Oslo Norway https://indico.cern.ch/event/773489/ … with thanks to the many people who contributed material.

Upload: others

Post on 08-Jul-2020

25 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

eXtreme DataCloud is co-funded by the Horizon2020 Framework Program – Grant Agreement 777367Copyright © Members of the XDC Collaboration, 2017-2020

Data Management for extreme scale computing

XDC (eXtreme DataCloud)

Paul [email protected]

2nd Rucio Community Workshop28th February– 1st March 2019, Oslo Norway

https://indico.cern.ch/event/773489/

… with thanks to the many people who contributed material.

Page 2: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

XDC Objectives

The eXtreme DataCloud is a software development and integration project

Develops scalable technologies for federating storage resources and managing data in highly distributed computing environments

Focus efficient, policy driven and Quality of Service based DM

The targeted platforms are the current and next generation e-Infrastructures deployed in Europe

European Open Science Cloud (EOSC)

The e-infrastructures used by the represented communities

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 2

Page 3: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

XDC Consortium

8 partners, 7 countries

7 research communities represented + EGI

XDC Total Budget: 3.07M€

XDC started on Nov 1st 2017 – will run for 27 months until Jan 31st 2020

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 3

ID Partner Country Represented Community Tools and system

1 INFN (Lead)

IT HEP/WLCGINDIGO-Orchestrator, INDIGO-CDMI(*)

2 DESY DE Research with Photons(XFEL)

dCache

3 CERN CH HEP/WLCG EOS, DYNAFED, FTS 4 AGH PL ONEDATA5 ECRIN [ERIC] Medical data6 UC ES Lifewatch7 CNRS FR Astro [CTA and LSST]8 EGI.eu NL EGI communities

Page 4: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Represented research communities

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 4

Page 5: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

XDC Technical TopicsIntelligent & Automated Dataset Distribution

Orchestration to realize a policy-driven data management

Data distribution policies based on Quality of Service (i.e. disks vs tape vs SSD) supporting geographical distributed resources (cross-sites)

Software lifecycle management

Data pre-processing during ingestion

Data management based on access patternsMove to ‘glacier-like’ storage unused data, move to fast storage “hot” data

at infrastructure level

Smart cachingTransparent access to remote data without the need of a-priori copy

Metadata management

Sensitive data handlingsecure storage and encryption

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 5

Page 6: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

XDC Technical TopicsIntelligent & Automated Dataset Distribution

Orchestration to realize a policy-driven data management

Data distribution policies based on Quality of Service (i.e. disks vs tape vs SSD) supporting geographical distributed resources (cross-sites)

Software lifecycle management

Data pre-processing during ingestion

Data management based on access patternsMove to ‘glacier-like’ storage unused data, move to fast storage “hot” data

at infrastructure level

Smart cachingTransparent access to remote data without the need of a-priori copy

Metadata management

Sensitive data handlingsecure storage and encryption

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 6

Focu

sing

on

thes

e to

pics

Page 7: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Production Level Components

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 7

c

Storage

Federation

Orchestration

dCacheINDIGO

Orchestrator

Cache

QoSCDMIHTTP

Cache

Page 8: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Software improvements

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 8

Page 9: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 9

Page 10: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Upload a file

OK

Delete a file

OK

Are these files on disk?

no, no, no, …

Stage files from tape

Request queued

Subscribe to events

OK

Something happened #1

Something happened #2

Something happened #3

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 10

Storage Events: new way to interact

Page 11: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Upload

OK

INDIGOOrchestrator

Subscribe …

OK

File uploaded

● User- and internally triggered events:– Data uploaded

– Data deleted/renamed/moved

– Tape flush/stage operations

● Uses: update catalogue, metadata extraction, data normalisation, build derived data, …

● Two event systems:– Site integration (Kafka)

– Per user events (SSE/inotify)

Upload

OK File uploaded

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 11

New solutions to old problems

Page 12: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

INDIGOOrchestrator

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 12

Page 13: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Data lifecycle and data management

Page 14: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 14

Introduction of “data awareness”

Page 15: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

2019-02-28 P.Millar – eXtreme DataCloud (XDC) 15

Introduction of “data awareness”

Page 16: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Orchestration system architecture

Page 17: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Orchestration system architecture

Page 18: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and
Page 19: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Dynafed integration with OIDC

Integration of support for Dynafed in two relevant OAuth rolesResource server

Dynafed will accept access tokens (such as those issued by IAM) and authorise the client on this basis

Client Dynafed can invoke the Authorization Code Grant flow if accessed with a browser, redirecting the end user to IAM in order to obtain the relevant credentials

NB - Dynafed doesn't speak OAuth to the federated storage systems

Orchestrator integrationOrchestrator now integrates with Dynafed, using an access token, so Dynafed is operating as a resource server

19P.Millar – eXtreme DataCloud (XDC)2019-02-28

Page 20: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Dynafed “Fourth party copy”

Dynafed can now operate in

"fourth party copy" mode. In this scenario it can be instructed to copy files between storage systems, streaming the data through itself.

The reference scenario here is Dynafed colocated with cloud storage, allowing data ingestion or export to/from storage systems that don't support TPC.

KnowsFederatesDrives TPCTunnels TPC

KnowsFederatesDrives TPCTunnels TPC

Users

endpoints stayindependent

xfers to/from anydestinationcross-protocol

endpoints stayindependent

xfers to/from anydestinationcross-protocol

20P.Millar – eXtreme DataCloud (XDC)2019-02-28

Page 21: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and
Page 22: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

FTS integration with OIDC: status

FTS Auth/Authz historically done only with X509 proxy certificates and VOMS groups/Roles

not user-friendly

X509 delegation needed

2 types of OIDC integrations implemented Directly accepting access tokens from users via CLI/REST API

Redirect WebFTS users to IAM in order to acquire a token and using it via the FTS REST API

Tokens are used both to authenticate to FTS and to the storages dCache and Storm are supporting OIDC for now

X509 delegation is not needed anymore! (both to FTS and to storages)

22P.Millar – eXtreme DataCloud (XDC)2019-02-28

Page 23: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Understand how to handle groups/roles for certain REST operationsWith X509 they are based on VOMS groups/roles

Extend REST operations to non-X509 identitiesUser banning now is based on the X509 User DNs

Integration of a Token Translation ServicePresent a token – get an X509 certificateNeeded for EOS in XDC, but of course for all the other storages which do not support OIDC yet

Needed also to use other protocols than HTTP

FTS integration with OIDC: plans

23P.Millar – eXtreme DataCloud (XDC)2019-02-28

Page 24: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

FTS QoS extension: status and plans

FTS extension to steer file’s QoS via CDMI interface supported by dCache

StatusFTS QoS job submission implemented, resulting in a QoS query

Gfal2 CDMI extension implemented (incl python bindings)

PlansFull integration of QoS logicTransfer/Transition logic

Use existing multi-hop logic to serialise transfer-then-QoS-transitionSeparate QoS daemon under development

Validate integration of all QoS methods in gfal2Definition of FTS QoS interface for Rucio/OrchestratorClassification of QoS classes for XDC

24P.Millar – eXtreme DataCloud (XDC)2019-02-28

Page 25: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

EOS

Page 26: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

EOS Caching – XCache

Provide a caching solution to work with EOSEvaluate viability of XCache + EOS storage

Integrate XCache with HTTP access plugin (XrdHttp)

Identity forwarding plugin (Github link) [ requires SSS protocol ]

Upcoming:

Write support

File notifications listener (to help evict stale files)

P.Millar – eXtreme DataCloud (XDC)2019-02-28 26

Page 27: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

EOS - Storage adoption

Use storage hosted externally through different protocols, such as S3, WebDav and XrootD

Operations on the provided storage done exclusively through EOS

Implemented by allowing filesystems to map to these protocols (s3:// , http:// , https://, root://)

External Storage System

Namespace (NS) node

Storage (ST) node

P.Millar – eXtreme DataCloud (XDC)2019-02-28 27

Page 28: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

EOS - File adoption

Discover and import metadata information about files existing on external storage resources

Imported files behave like regular EOS files

From here on, imported files should be managed only from within EOS

External Storage System

NS node

ST node

fs import start 1 /xdc/mypath//eos/xdc/mihai/import/

Scan path=/xdc/mypath/

File=/xdc/mypath/file1File=/xdc/mypath/file2

…File=/xdc/mypath/file99

Register filelpath=/xdc/mypath/file1

…lpath=/xdc/mypath/file99

<uuid>

import statusfs im

port query <uuid>

282019-02-28 P.Millar – eXtreme DataCloud (XDC)

Page 29: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Thanks for listening!

Page 30: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

Backup slides

Page 31: XDC (eXtreme DataCloud) · XDC Objectives The eXtreme DataCloud is a software development and integration project Develops scalable technologies for federating storage resources and

External Storage System

myfile.txt

fid=1024lpath=/eos/xdc/mihai/myfile.txtctime=timestamp(stored as extended attributes)

path=/eos/xdc/mihai/myfile.txt

/eos/xdc/mihai/

NS node

ST node

EOS - Working with an imported file (backup slide)

MGM keeps track of additional information, such as lpath and ctime.

P.Millar – eXtreme DataCloud (XDC)2019-02-28 3131