big data for big discoveries

18
Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: 10.5281/zenodo.45449, CC-BY-SA, images courtesy of CERN

Upload: govnet-events

Post on 27-Jan-2017

200 views

Category:

Government & Nonprofit


1 download

TRANSCRIPT

Page 1: Big Data for Big Discoveries

Big Data for Big DiscoveriesHow the LHC looks for Needles

by Burning HaystacksAlberto Di Meglio

CERN openlab Head

DOI: 10.5281/zenodo.45449, CC-BY-SA, images courtesy of CERN

Page 2: Big Data for Big Discoveries

The Large Hadron Collider (LHC)

Page 3: Big Data for Big Discoveries

Burning HaystacksThe Detectors: 7000 tons “microscopes” 150 million sensors Data generated 40 million times per second

Low-level Filters and Triggers 100,000 selections per second

High-level filters (HLT) 100 selections per second

Peta Bytes / sec !

Tera Bytes / sec !

Giga Bytes / sec !

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 3

Page 4: Big Data for Big Discoveries

Storage, Reconstruction, Simulation, Distribution

6 GB/s

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 4

Page 5: Big Data for Big Discoveries

Worldwide LHC Computing Grid

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016

Tier-0 (CERN):• Data recording• Initial data reconstruction• Data distribution

Tier-1 (12 centres):• Permanent storage• Re-processing• Analysis

Tier-2 (68 Federations, ~140 centres):• Simulation• End-user analysis

• 525,000 cores• 450 PB

5

Page 6: Big Data for Big Discoveries

6

1 PB/s of data generated by the detectors

Up to 30 PB/year of stored data

A distributed computing infrastructureof half a million cores working 24/7

An average of 40M jobs/month

An continuous data transfer rate of 4-6 GB/sacross the Worldwide LHC Grid (WLCG)

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016

Page 7: Big Data for Big Discoveries

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016

The Higgs Boson completes the Standard Model,but the Model explains only about 5% of our Universe

What is the other 95% of the Universe made of?How does gravity really works?

Why there is no antimatter in nature?Exotic particles? Dark matter? Extra dimensions?

7

Page 8: Big Data for Big Discoveries

LHC Schedule

First run LS1 Second run LS2 Third run LS3 HL-LHC2009 2013 2014 2015 2016 2017 201820112010 2011 2019 2023 2024 2030?20212020 2022 …

LHC startup

900 GeV

7 TeVL=6x1033 cm-2s-1

Bunch spacing = 50 ns

Phase-0 Upgrade(design energy,

nominal luminosity)

14 TeVL=1x1034 cm-2s-1

Bunch spacing = 25 ns

Phase-1 Upgrade(design energy,

design luminosity)

14 TeVL=2x1034 cm-2s-1

Bunch spacing = 25 ns

Phase-2 Upgrade(High Luminosity)

14 TeVL=1x1035 cm-2s-1

Spacing = 12.5 ns

FCC?

50 times more data than today in the next 10 years50 PB/s out of the detectors

5 PB/day to be stored

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 8

Page 9: Big Data for Big Discoveries

Information Technology Research Areas

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016

Data acquisition and filtering

Computing platforms, data analysis, simulation

Data storage and long-term data preservation

Compute provisioning (cloud)

Data analytics

Networks

9

Page 10: Big Data for Big Discoveries

Flat Budget

How can we Address the Challenges?

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 10

TechnologyCollaborationsEducation

Page 11: Big Data for Big Discoveries

Technology Evolution

More performance, less cost Redesign of DAQ

• Less custom hardware, more off-the-shelf CPUs, fast interconnects, optimized SW

Redesign of SW to exploit modern CPU/GPU platforms (vectorization, parallelization)

New architectures (e.g. automata)

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 11

Computing

Storage

Infrastructure

Data Analytics

Page 12: Big Data for Big Discoveries

Technology Evolution

Reduce data to be stored• Move computation from offline to online

More tapes, less disks• Dynamic data placements

New media• Object-disks, shingled

Persistent meta-data, in-memory ops• Flash/SSD, NVRAM, NVDIMM

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 12

Computing

Storage

Infrastructure

Data Analytics

Page 13: Big Data for Big Discoveries

Technology Evolution

Economies of scale Large-scale, joint procurement of

services and equipment Hybrid infrastructure models

• Allow commercial resource procurement• Large-scale cloud procurement models are

not yet clear

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 13

Computing

Storage

Infrastructure

Data Analytics

Page 14: Big Data for Big Discoveries

Technology Evolution

Reduce analysis cost, decrease time More efficient operations of the LHC

systems• Log analysis, proactive maintenance

Infrastructure monitoring, optimization Data reduction, event filtering and

identification on large data sets (~10PB)

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 14

Computing

Storage

Infrastructure

Data Analytics

Page 15: Big Data for Big Discoveries

Collaborations

Alberto Di Meglio – CERN openlab

Industrial Collaboration Frameworks

HEP Community

Joint Infrastructure Initiatives

Laboratories, academia, research

15

Page 16: Big Data for Big Discoveries

New educational requirements

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016

Multicore CPU programming,

graphical processors

(GPU),multithreaded

software

Software & Computing Engineers

Data analysis technologies,

tools,data storage, visualization,monitoring,

security, etc.

Data Scientists

Applications of physics to

other domains (hadron

therapy, etc.),Knowledge

transfer

Multidisciplinary applications

16

Page 17: Big Data for Big Discoveries

Take-Away Messages LHC is a long-term project

• Need to adapt and evolve in time Look for cost-effective solutions

• Continuous technology tracking• Aim for “just as good as necessary” based on

concrete scientific objectives Collaboration and education

• People is the most important asset to make this endeavour sustainable over time

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 17

Page 18: Big Data for Big Discoveries

Alberto Di Meglio – CERN LHC - HPC & Big Data 2016

EXECUTIVE CONTACT

Alberto Di Meglio, CERN openlab Head

[email protected]

TECHNICAL CONTACT

Maria Girone, CERN openlab CTO

[email protected]

Fons Rademakers, CERN openlab CRO

[email protected]

COMMUNICATION CONTACT

Andrew Purcell, CERN openlab Communication Officer

[email protected]

Mélissa Gaillard, CERN IT Communication Officer

[email protected]

ADMIN CONTACT

Kristina Gunne, CERN openlab Administration Officer

[email protected]

18

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. It includes photos, models and videos courtesy of CERN and uses contents provided by CERN and CERN openlab