architecture and prototype of a wlcg data lake for hl-lhc · architecture and prototype of a wlcg...
TRANSCRIPT
Architecture and prototype of a WLCG
data lake for HL-LHC
Gavin McCance, Ian Bird, Jaroslava Schovancova, Maria
Girone, Simone Campana, Xavier Espinal Currul
(CERN)
112/07/[email protected] - CHEP2018
Motivation
HL-LHC needs are above the expected technology evolution (15%/yr) and
funding (flat)
We need to optimize hardware usage and operational costs
12/07/[email protected] - CHEP2018 2
Some ideas for reducing cost
12/07/[email protected] - CHEP2018 3
Site A
EOS2 copies
Site B
CEPH3 copies
Site C
dCacheRaid-N
Reduce Operational Cost:
deploy fewer (larger) storage
services.
Reduce hardware cost:
introduce the concept of QoS
(Quality of Service)Today we store more than we think
Tbps
Gbps
Co-location of data and processors
is not guaranteed
Data and Compute Infrastructures
12/07/[email protected] - CHEP2018 4
StorageStorage
Content Delivering and Caching
Storage
Grid Compute Grid Compute
Site N
Distributed Regional Storage
Compute Provisioning
Storage Interoperability Services
Grid Compute
Asynchronous Data Transfer
Storage
Distributed Storage
Volatile Storage
High Level Services High Level Services High Level Services
Cloud Compute
HPCCompute
@HOME
Co
mp
ute
In
fras
tru
ctu
reD
ata
(Lak
e)
Infr
astr
uct
ure
The eulake prototype
12/07/[email protected] - CHEP2018 5
RAL-UK
CERN-HUNS
CERN-CHNS
22ms
Dubna-RU
SARA-NL
NIKHEF-NL PIC-ES
Aarnet-AUS
300ms
42ms
18ms
42ms18ms
42ms
18ms
75ms93ms
300ms
CNAF-IT
Goal: test and
demonstrate some
of those ideas
We deployed a
Distributed Storage
prototype
Based on the EOS
technology NRC-KI
(PNPI)
88ms105ms
Quality of Service and Transitions
12/07/[email protected] - CHEP2018 6
2 Replicas 5 stripes: (n-2) RS Single Copy
Triggered conversion
by
‘namespace’ attr change
Pre-established auto
conversion Δt=1h by
‘namespace’ attribute
Dataset:100 files of 1GB - Single client writing (VM)
Three different QoS: 2 copies, 3+2 chunks, 1 copy. Automatic and triggered transition
Integration of eulake with ATLAS Data Management
12/07/[email protected] - CHEP2018 7
We expose eulake to the ATLAS data management system as storage endpoint
Data can be transferred from any ATLAS site into eulake
We imported 4 input samples in different eulake areas for the next tests
#files/source #files/dest
Activities
12/07/[email protected] - CHEP2018 8
1. deployment and commissioning
2. transfer tests and input data replication
3. data access
writing into eulake (MB/s)
eulake I/O (MB/s)
HammerCloud
12/07/[email protected] - CHEP2018 9
We integrated eulake with the HammerCloud framework
Allows test real workflows and data access patterns. Initial focus on ATLAS
Four test scenarios. Read from:
1. Local access (no eulake)2. eulake, data@CERN, WN@CERN3. eulake, data NOT@CERN, WN@CERN4. eulake, 4+2 stripes, WN@CERN
Data is copied from storage to WN
12/07/[email protected] - CHEP2018 1012/07/[email protected] - CHEP2018 10
Low I/O intensity workflow: ~40MB input (1 file), 2 events, ~5 mins/event
High I/O intensity workflow: ~6GB input (1 file), 1000 events, ~2 seconds/event
Local access (no eulake)
eulake, data@CERN
eulake, data NOT@CERN
eulake, 4+2 stripes
Local access (no eulake)
eulake, data@CERN
eulake, data NOT@CERN
eulake, 4+2 stripes
Local access (no eulake)
eulake, data@CERN
eulake, data NOT@CERN
eulake, 4+2 stripes
Local access (no eulake)
eulake, data@CERN
eulake, data NOT@CERN
eulake, 4+2 stripes
85s
186s
279s
948s
22s
28s
40s
40s
2.6s/evt
3.5s/evt
3.7s/evt
5.6s/evt
545s/evt
600s/evt
752s/evt
686s/evt
Conclusions
We set up a distributed storage instance based on EOS technology Small in terms of storage volume, large in the geographical sense Deployment varieties: Native EOS, EOS on Docker containers, volume export
(CEPH)
We integrated such instance with the ATLAS distributed computing services and HammerCloud Next step is integration with CMS
We have a prototype in place and understand what we want to measure We don't yet have enough stats to draw any firm conclusions
The prototype we set up serves to test many ideas in preparation for HL-LHC
12/07/[email protected] - CHEP2018 11
Acknowledgements
This work has been done in collaboration with many participant WLCG sites. Credit goes to their work
We would like to thank the ATLAS Distributed Computing and particularly the Rucio team for the help in the integration
We thank the EOS experts at CERN
Special thanks to Ivan Kadochnikov for the help in the final part of this work
12/07/[email protected] - CHEP2018 12