exa -scale data preservation in hep
DESCRIPTION
Exa -Scale Data Preservation in HEP. International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics. [email protected] APA/C-DAC Conference February 2014. Background. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/1.jpg)
Exa-Scale Data Preservation in HEP
[email protected] APA/C-DAC Conference
February 2014
International Collaboration for Data Preservation and Long Term Analysis in High Energy Physics
![Page 2: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/2.jpg)
2
Background
• Whilst this talk concerns data from High Energy Physics (HEP) experiments at CERN and elsewhere, many points are generic
• The scale: 100PB today, reaching ~5EB by 2030– “Trusted” repositories of this size– and with a lifetime of at
least decades – are a sine qua non of our work
• I will also talk about costs, business cases, problems and opportunities…
![Page 3: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/3.jpg)
BEFORE!
![Page 4: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/4.jpg)
4
200-400 MB/sec
Data flow to permanent storage: 4-6 GB/sec
1.25 GB/sec
1-2 GB/sec
1-2 GB/sec
CERN-JRC meeting Bob Jones
![Page 5: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/5.jpg)
The LHC Computing Grid, February 2010 5
Tier 0 – Tier 1 – Tier 2Tier-0 (CERN):•Data recording•Initial data reconstruction
•Data distribution
Tier-1 (11 centres):•Permanent storage•Re-processing•Analysis
Tier-2 (~130 centres):• Simulation• End-user analysis
Tier-2 centres in India:•Kolkata (ALICE)•Mumbai (CMS)
Frédéric Hemmer
![Page 6: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/6.jpg)
Managing 100 PBytes of data
27 January 2014 CERN-JRC meeting Bob Jones 6
![Page 7: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/7.jpg)
LHC Schedule
CERN-JRC meeting Bob Jones 7
First run LS1 Second run LS2 Third run LS3 HL-LHC
2009 2013 2014 2015 2016 2017 201820112010 2011 2019 2023 2024 2030?20212020 2022 …
LHC startup
900 GeV
7 TeVL=6x1033 cm-2s-2
Bunch spacing = 50 ns
Phase-0 Upgrade(design energy,
nominal luminosity)
14 TeVL=1x1034 cm-2s-2
Bunch spacing = 25 ns
Phase-1 Upgrade(design energy,
design luminosity)
14 TeVL=2x1034 cm-2s-2
Bunch spacing = 25 ns
Phase-2 Upgrade(High Luminosity)
14 TeVL=1x1035 cm-2s-2
Spacing = 12.5 ns
![Page 8: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/8.jpg)
8
ATLAS Higgs Candidates
![Page 9: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/9.jpg)
AFTER!
![Page 10: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/10.jpg)
10
CERN has ~100 PB archive
![Page 11: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/11.jpg)
11LS1 Status Report – 116th LHCCFrédérick Bordry 4th December 2013
LHCb b b b b b b b b b b b b o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o b b b b b b b b b b b b o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
Injectorso o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o b b b b b b b b b b b b o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
t
LHCo o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o b b b b b b b b b b b b o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
Injectorso o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o b b b b b b b b b b b b o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
LHCb b b b b b b b b b b b o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o b b b b b b b b b b b b o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
Injectorsb b b b b b b b b b b b o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o b b b b b b b b b b b b o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
2015 2016 2017 2018 2019Q4 Q1 Q2
2020 2021Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q3 Q4
2022 2023 2024 2025 2026 2027 2028
Q1 Q2 Q3 Q4 Q1 Q2Q3 Q4 Q1 Q2 Q3 Q4Q1 Q2 Q3
Q1 Q2 Q3 Q4Q1 Q2 Q3 Q4 Q1 Q2 Q1 Q2 Q3 Q4
2029 2030 2031 2032 2033 2034
Q3 Q4 Q1 Q2 Q3 Q4Q1 Q2 Q3 Q4 Q1 Q2Q3 Q4
Q2 Q3 Q4 Q1 Q2 Q32035
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q4Q2 Q3 Q4 Q1 Q2 Q3Q4 Q1 Q2 Q3 Q4 Q1
Only EYETS (19 weeks) (no Linac4 connection during Run2) LS2 starting in 2018 (July) 18 months + 3months BC (Beam Commissioning)LS3 LHC: starting in 2023 => 30 months + 3 BC
injectors: in 2024 => 13 months + 3 BC
But its still early days for the LHC!
Run 2 Run 3
Run 4
LS 2
LS 3
LS 4 LS 5Run 5
LHC schedule approved by CERN management and LHC experiments spokespersons and technical coordinatorsMonday 2nd December 2013
![Page 12: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/12.jpg)
ECFA European Commit tee fo r Future Acce le ra tors
HL-LHC Workshop 12
High Luminosity LHC (HL-LHC)Update of the European Strategy for Particle Physics adopted 30 May 2013 in a special session of CERN Council at Brussels.Statement c:
October 1, 2013
c) The discovery of the Higgs boson is the start of a major programme of work to measure this particle’s properties with the highest possible precision for testing the validity of the Standard Model and to search for further new physics at the energy frontier. The LHC is in a unique position to pursue this programme. Europe’s top priority should be the exploitation of the full potential of the LHC, including the high-luminosity upgrade of the machine and detectors with a view to collecting ten times more data than in the initial design, by around 2030. This upgrade programme will also provide further exciting opportunities for the study of flavour physics and the quark-gluon plasma.
![Page 13: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/13.jpg)
Predrag Buncic, October 3, 2013 ECFA Workshop Aix-Les-Bains - 13
Data: Outlook for HL-LHC
• Very rough estimate of a new RAW data per year of running using a simple extrapolation of current data volume scaled by the output rates. • To be added: derived data (ESD, AOD), simulation, user data…
PB
Run 1 Run 2 Run 3 Run 40.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
CMSATLASALICELHCb
![Page 14: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/14.jpg)
14
Volume: 100PB + ~50PB/year (+400PB/year from 2020)
![Page 15: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/15.jpg)
15
2. Digital library tools (Invenio) & services (CDS, INSPIRE, ZENODO) + related tools (HepData, RIVET, …)
3. Sustainable software, coupled with advanced virtualization techniques, “snap-shotting” and validation frameworks
4. Proven bit preservation at the 100PB scale, together with a sustainable funding model with an outlook to 2040/50
5. Open Data (“Open everything”)
1. DPHEP Portal
![Page 16: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/16.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
16
Case B) increasing archive growth
Start with 10PB, then +50PB/year, then +50% every 3y (or +15% / year)
![Page 17: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/17.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
17
Case B) increasing archive growth
![Page 18: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/18.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
18
Total cost: ~59.9M$(~2M$ / year)
Case B) increasing archive growth
![Page 19: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/19.jpg)
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
DSS
19
Case B) increasing archive growth
![Page 20: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/20.jpg)
20
Summary
1. DPHEP portal: build in collaboration with other disciplines, including RDA IG and the APA…
2. Digital libraries: continue existing collaborations3. Sustainable “bit preservation” – certified
repositories as part of EINFRA-1-20144. “Knowledge capture & preservation”: BIG
CHALLENGE not addressed in multi-disciplinary way: next funding round?
5. Open “Big Data”: a Big Opportunity (for RDA?)
![Page 21: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/21.jpg)
21
Portal Example # 1
![Page 22: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/22.jpg)
22
Portal Platform – Zenodo?
![Page 23: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/23.jpg)
David South | Data Preservation and Long Term Analysis in HEP | CHEP 2012, May 21-25 2012 | Page 23
Documentation projects with INSPIRE
> The ingestion of other documents is under discussion, including theses, preliminary results, conference talks and proceedings, paper drafts, ... More experiments working with INSPIRE, including CDF, D0 as well as BaBar
> Internal notes from all HERA experiments now available on INSPIRE Experiments no longer need to provide dedicated hardware for such things Password protected now, simple to make publicly available in the future
![Page 24: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/24.jpg)
LEP Cost would be “now” …
Completely different, of course …Direct resource cost is already
compatible with zero for LEP experiments Total ALEPH DATA + MC (analysis format) =
30 TB ALEPH: Shift50 = 320 CernUnit. One of
today’s pizza box largely exceeds this CDF data: O(10 PB), bought today for
<400kEur CDF CPU ~ 1MSi2k = 4 kHS06 = 40kEur
Here the main problem is knowledge /support, clearly Can you trust a “NP peak” 10 years later,
when experts are gone? ALEPH reproducibility test (M.Maggi, by
NO mean a DP solution) ~0.5 FTE for 3 months
Zero!!=0, but
decreasing fast
![Page 25: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/25.jpg)
25
![Page 26: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/26.jpg)
26
Open Data?
![Page 27: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/27.jpg)
27
Costs and Scale• There are 4 (main) collaborations + detectors at the LHC: the largest has
3000 members
• The annual cost of WLCG (infrastructure, operations, services) is ~EUR100M
• The CERN database services costs around 2MCHF per year for Materials (licenses, maintenance, hardware) and 2MCHF for personnel
• The central grid Experiment Integration Support team varied between 4-10 people, plus significant effort at sites and within experiments
• The DPHEP Full Costs of Curation workshop concluded that a team of ~4 people, with access to experts, could “make significant progress” (be careful with this number!)
![Page 28: Exa -Scale Data Preservation in HEP](https://reader036.vdocuments.net/reader036/viewer/2022062400/56816872550346895ddee1f2/html5/thumbnails/28.jpg)
28
Conclusions• Long-term data preservation is a journey, not a destination• As such, it is best not to venture out alone
• A clear understanding of costs & benefits is necessary to secure funding
• We are eager to share our knowledge and experience (exa-scale “bit preservation”)
• We have learned a lot through collaboration through the APA – and keen to learn more in the future