bud17-300k2: high energy physics and armv8 64-bit? investigating the future of computing at cern

36
High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN DAVID AbDURACHMANOV (CMS) Joshua Wyatt SmitH (ATLAS) Jakob Blomer (CERN)

Upload: linaro

Post on 12-Apr-2017

256 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERNDAVID AbDURACHMANOV (CMS) Joshua Wyatt SmitH (ATLAS)Jakob Blomer (CERN)

Page 2: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

An International Laboratory

Page 3: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

CERN The European Organization for Nuclear Research

Geneva (CH)

CMS Experiment (FR)

CERN Meyrin (CH/FR)

CERN Prévessin (FR)

Geneva Airport (CH)

ATLAS Experiment (CH)

Page 4: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

WHERE ARE OUR SCIENTIFIC TOOLS?

..WE HIDE THEM DEEP UNDER GROUND..

Page 5: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

28.7m long15m DIAMETER 14'000 tons WEIGHT 500+ MCHF (we keep maintaining and upgraDing IT)

40 countries 172 institutes

Hi Higgs boson!

Compact Muon Solenoid -- CMS

Page 6: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

38 countries 174 institutes

46M long25M WIDE 25M HIGH 7'000 tons WEIGHT 540+ MCHF (we keep maintaining and upgraDing IT) Hi Higgs boson!

A Toroidal LHC ApparatuS -- ATLAS

Page 7: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

CMS

Page 8: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

CMS EVENT DISPLAY -- Higgs boson

Page 9: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

Two types of computing resourcesCMS Detector (HIGH LEVEL TRIGGER) Worldwide LHC Computing Grid (WLCG)

Full ownership ✓ Single "customer" ✓ High-bandwidth interconnect ✓

Partially owned Multiple "customers" Bandwidth varies

ONLINE COMPUTING FILTERING AND SELECTING DATA FROM DETECTOR ~16K x86_64 cores in 2016

OFFLINE COMPUTING THE DATA IS ALREADY STORED AND CAN BE PROCESSED LATER ~650K x86_64 CORES (changes frequently)

CERN has ~200K x86_64 cores will add another 100K in 2017 350 PB DISK 400+ PB TAPE MOVED >800 PB over network in 2016

CAN NOW BE USED as "opportunistic" resource for OFFLINE COMPUTING via OPENSTACK CAN BE done between "RUNS" (~6 hr) and longer technical stops VMS are killed/replaced before detector comes back to record data

THE FUTURE LOOKS DIFFERENT..

Page 10: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

THE GRID NEVER SLEEPS..Peak at 3 PM was 91222.282 MB/s

Distributed computing in HEP before ~2000 had multiple vendors involved, and incl. special workstations and heterogeneous computing

High Throughput Computing (HTC) converged on x86/Linux at ~2000

Commodity hardware enabled the current model of WLCG:

Build Once, Run Everywhere

This left us with two vendors: - Intel (dominating) - AMD

Page 11: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

WLCG in RUN 2Global transfer rates increased to > 40 GB/s (=2 x Run1)

Data acquisition >10PB / month

Regular transfers of 80 PB/month with 100 PB/month during July-Aug (many billions

of files)

OVER >800 PB transferred across WLCG in 2016

2016: 49.4 PB LHC data

58 PB all experiments

73 PB total

LHC performance is above expectations: All factors drivingcomputing have increased above anticipated levels

RUN 2

Page 12: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

How __it__ works?

> No single job batch submission system LSF, HTCondor, Slurm, ...

> No single storage solution NFS, GlusterFS, Hadoop, CUSTOM developed by HEP COMMUNITY

> Has 100+ different CPUs from the last 10 years, most are 4-5 years old

> Common operating system: RHEL-based

> HEP SPEC '06 benchmark is used for accounting in WLCG and by procurement Based on CPU SPEC 2006 all_cpp benchmark set

A Working group was established to prepare non proprietary replacement for HEP SPEC '06

Page 13: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

HIGH-LUMINOSITY LHC: BEST ESTIMATE

0

100

200

300

400

500

600

700

800

900

1000

Raw Derived

Dataestimatesfor1styearofHL-LHC(PB)

ALICE ATLAS CMS LHCb

0

50000

100000

150000

200000

250000

CPU(HS06)

CPUNeedsfor1stYearofHL-LHC(kHS06)

ALICE ATLAS CMS LHCb

DATA: RAW 50 PB 2016 -> 600 PB 2027 DERIVED (1 COPY) -> 80 PB 2016 -> 900 PB 2027

CPU: 60x from 2016

At least x10 above what is realistic to expect from technology with reasonably constant cost

Technology at ~20%/year will bring x6-10 in 10-11 years

We need to move from evolution to revolution in our computing model

Page 14: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

MOTIVATION FOR ARM in HEP?> Explore new hardware and software platforms that, in the future, may be more suited to its bulk

production workloads, i.e. Simulation to start

- Performance benchmarks

- POWER CONSUMPTION

- Are results consistent? (i.e. validation)

> Improves overall code quality

> More efficient computing - less energy / computation

> Geopolitics plays a role - server farms might be different architectures for various REGIONS (Russia,

Asia, etc.)

> Business model of ARM is very flexible

- Competition, freedom, flexibility

> How will it affect our resource estimates for hl-LHC (in 10 years)?

Page 15: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

SOFTWARE STACKSopen-source @ GITHUB

C++14

Python

Fortran

OS (RHEL/CentOS/SL)

Toolchain

Standard

HEP

Python zlib glibc OpenSSL ...

GCC Binutils GDB elfutils LLVM/Clang ...

ROOT FFTW EIGEN HepMC SciPy ...

CMSSW

CMS Software Bundle

CVMFS

CMSSW Firefox

SLOCs 6M 7M

Initial Release 2005 2002

Contributors >1300 >1200

Memory Footprint ~2GB ~0.3GB

Other CERN developed software would increase SLOCs

ROOT6 w/o Clang: 1.7M

GEANT4: 1.1M

The actual application software for "pattern recognition", "simulation", etc.

LCG externals

AtlasExternals

Gaudi

AthSimulation

ATLAS specific

not ATLAS specific

ATLAS CODEBASE (Athena)

Athena is ~6.5 million lines of code:

- ~2400 packages

- AthSimulation is a subset of Athena at ~350 packages

Full list of lCG (LHC Computing GRID) externals: http://lcgsoft.web.cern.ch/lcgsoft/

Page 16: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

Finally successful execution achieved!

The first AArch64 based WLCG site (demonstrator)

CMS Dashboard Task Monitoring

On June 26, 2015 CMS successfully executed CMSSW based job on AArch64 worker node via standard job injection pipeline and received output files back

Page 17: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

VALIDATIONSimulation is MoNte carlo PROCESS:

- NUMERICAL IDENTITY is not expected

- but different trends/histogram shapes are clear warning signs!H

its

0

5000

10000

15000

20000

25000

Intel XeonHP MoonshotIntel AtomAarch64_Proto

SimulationATLAS

SCT_x600− 400− 200− 0 200 400 600

ratio

0.6

0.8

1

1.2

1.4

Reconstructed hits in ATLAS SCT detectors RECONSTRUCTION EXAMPLE FROM CMSSW (x86_64 vs aarch64)

The difference between two architectures THE Main question: how significant is this?

Page 18: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

SAME, SINGLE Event: SPOT THE DIFFERENCES...

Intel Xeon AARch64_pROTO

Page 19: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

How many ARM CORES (excl. SMARtPHONES)?<2,000 pHYSICAL cores for AARCH64 in 2017 for

porting, benchmarking, optimization, and feedback NOT USED FOR PRODUCTION

Not everything will be powered directly by centos 7, but hopefully majority in SOME CASES we use centOS 7 via LINUX containers

Page 20: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

CERN openlab What is CERN OPENLAB?

"CERN openlab is a unique public-private partnership that accelerates the

development of cutting-edge solutions for the worldwide LHC community and

wider scientific research. Through CERN openlab, CERN collaborates with

leading ICT companies and research institutes." (http://openlab.cern/)

.. -> PHASE V (2015-2017) -> PHASE VI (starts in 2018) -> ..

Openlab works in Phases and we are currently in 5th phase (ends this yEAR)

with focuses on:

- data acquisition

- networks and connectiviTY

- data storage architectures

- Compute provisioning and management

- computing platforms

- data analyticsCode Modernization ARMv8 64-bit Porting, Optimization, and Benchmarking

Three BROAD areas for R&D for PHASE VI:

> Data center technologies and infrastructures

- NETWORKS

- Cloud computing

- storage and databases

- data center architectures (disaggregation)

> computing platforms and software

- architectures

- software modernization/acceleration

> data analytics and machine learning

- physics

- engineering (control systems, infrastructure optimization)

- great interest from other communities

Page 21: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

THE DIRECTION OF COMMUNITY

CERN openlab white paper on computing challenges (September 2017)

HEP SOFTWARE FOUNDATION, Community white paper (CWP) (summer 2017)

A Roadmap for HEP Software and Computing R&D for the 2020s Multiple working groups: Computing Models, Facilities, and Distributed Computing; Detector Simulation; Software

Trigger and Event Reconstruction; Visualization; Data Access and Management; Security and Access Control; Machine Learning; Conditions Database; Event Processing Frameworks; Physics Generators; Math Libraries; Software Development, Deployment and Validation/Verification; Data Analysis and Interpretation; Workflow and Resource

Management; Data and Software Preservation; Careers, Staffing and Training; Data Acquisition Software; Various Aspects of Technical Evolution (Software Tools, Hardware, Networking); Monitoring

(http://hepsoftwarefoundation.org/activities/cwp.html)

Page 22: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

Scavenging Cycles

Cloud

● Well accounted● Spot price market● Elasticity

Offload Peaks

HPC

● Allocation by grants

● Backfill mode

Simulation Bursts

Volunteer Computing

● Opportunistic cycles

● Outreach

Unmanaged Resources

Our applications and systems must adapt!

Page 23: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

Cloud Computing in High-Energy Physics

Drivers and Obstacles

1 Cost , (partially)

2 Control & trust /3 Specialized /

applications

4 Learn how to build better ,distributed systems

Themes

1 Hybrid academic-commercial clouds

2 Offload mainly simulation (up to 50%),i. e. no data lock-in

3 “Private” adoption of cloud technology

∙ OpenStack for virtualization∙ Ceph/RADOS as a BLOB store∙ “Data Mining as a Service”

Page 24: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

Cycles in the Cloud

Source: Gutsche

Page 25: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

Cycles in the Cloud

Source: Gutsche

Page 26: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

Cloud Resources in a Global Batch System

HEP Site

Cloud

Experiment’sFile Catalog

Experiment’sTask Queue

Site Storage

Software Cache

WorkersWorkersWorkers

register files

pull jobs

datatransfer

VM Factory

Virtual Machines

Agent CloudCloud

Gateway

Book Keeper

WebAPI

Micro

Virtual Machines

Agent CloudCloud

Gateway

Book Keeper

WebAPI

Micro

Virtual Machines

Agent CloudCloud

Gateway

Book Keeper

WebAPI

MicroSoftware Cache

monitors

starts &stops

pull jobs

registerfiles

writeoutput

Page 27: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

OpenStack / KVM

CERN OpenStack

∙ 200 k cores (growing)

∙ 3PB storage

∙ Spans 2 physicaldata centers

Our remote data centeris here in Budapest!

Page 28: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

A HEP Image for Clouds

Twofold system: 𝜇CernVM boot loader + OS delivered by custom network file system (CernVM-FS)

initrd: CernVM-FS + 𝜇Contextualisation

AUFS Writable Overlay

OS + Extras

Kernel

Boo

tLo

ader

Scra

tch

Disk

User Data(EC2, OpenStack, CernVM Online, . . . )

FuseAUFS

atlas alice· · ·

EL 4 EL 5 EL 6 EL 7

20M

B

∼ 30 000 CernVMs booted per day

Page 29: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

Porting CernVM to ARM

initrd: CernVM-FS + 𝜇Contextualisation

KernelB

oot

Load

er

FuseAUFS 20M

B

∙ Re-compile kernel, CernVM-FS etc. for AArch64

∙ Partition table & MBR −→ GPT and ESP (UEFI compliant)

Prototypes on X-Gene 1 and on new Cortex-A57 servers

Page 30: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

Porting CernVM to ARM

Page 31: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

Porting CernVM to ARM

Page 32: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

LHC@Home

∙ Serious computing

∙ Workstations & Gaming PCs

∙ > 2 trillion collisions simulatedfor CERN theory group(biggest computing resource for this group!)

∙ ATLAS’ second biggest “simulation site”

Page 33: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

The Future of Volunteer Computing?

Page 34: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

The Future of Volunteer Computing?

Page 35: BUD17-300K2: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN

Revisiting All Areas of Computing in HEP