big science and big databig science and big data dirk duellmann, cern apache big data europe 28 sep...
TRANSCRIPT
Big Science and Big DataDirk Duellmann, CERN
Apache Big Data Europe 28 Sep 2015, Budapest, Hungary
16/02/2015 Real-Time Analytics: Making better and faster business decisions 8
CERN IT Department CH-1211 Genève 23
Switzerland www.cern.ch/it
The Worldwide LHC Computing Grid
7000 tons, 150 million sensors generating data 40 millions times per second
i.e. a petabyte/s
The ATLAS experiment
5
LHCb: 200-400 MB/sec
Data flow to permanent storage: 4-6 GB/sec
Alice: 4 GB/sec
ATLAS: 1-2 GB/sec
CMS: 1-2 GB/sec
Data Collection and Archiving at CERN
The Worldwide LHC Computing Grid
Tier-1: permanent storage, re-processing, analysis
Tier-0 (CERN): data recording, reconstruction and distribution
Tier-2: Simulation,end-user analysis
> 2 million jobs/day
~350’000 cores
500 PB of storage
nearly 170 sites, 40 countries
10-100 Gb links
An international collaboration to distribute and analyse LHC data
Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicists
LHC – Big Data…Few PB of raw data becomes ~100 PB! ➔
• Duplicate raw data • Simulated data • Derived data products • Versions as software improves • Replicas to allow access by
more physicists
• 1st Try -‐ All data in an commercial Object Database (1995) – good match for complex data model and OO language integraLon – but the market predicted by many analysts did not materialise!
• 2nd Try -‐ All data in a relaLonal DB -‐ object relaLonal mapping (1999) – PB-‐scale of deployment was far for from being proven –Users code in C++ — and rejected data model definiLon in SQL
• Hybrid between RDBMS and structured files (from 2001 -‐ today) – RelaLonal DBs for transacLonal management of metadata (only TB-‐scale)
• File/dataset meta data, condiLons, calibraLon, provenance, work flow • via DB abstracLon (plugins: Oracle, MySQL, SQLite, FronLer/SQUID)
• Open source persistency framework (ROOT) –Uses C++ “introspecLon” to store/retrieve networks of C++ objects – Column-‐store for efficient sparse reading
9
How do we store/retrieve LHC data? A short history…
Processing a TTree
16
preselection analysisOk
Output list
Process()
Branch
Branch
Branch
BranchLeaf Leaf
Leaf Leaf Leaf
Leaf Leaf
Event n
Read needed parts only
TTree
Loop over events
1 2 n last
Terminate()- Finalize analysis
(fitting, ...)
Begin()- Create histograms- Define output list
TSelector
CERN Disk Storage Overview
2
AFS CASTOR EOS Ceph NFS CERNBoxRaw Capacity 3 PB 20 PB 140 PB 4 PB 200 TB 1.1 PBData Stored 390 TB 86 PB (tape) 27 PB 170 TB 36 TB 35 TBFiles Stored 2.7 B 300 M 284 M 77 M (obj) 120 M 14 M
AFS is CERN’s linux home directory service
CASTOR & EOS are mainly used for the physics use case (Data Analysis and DAQ)
Ceph is our storage backend for images and volumes in OpenStack
NFS is mainly used by engineering application
CERNBox is our file synchronisation service based on OwnCloud+EOS
CHEP 2015, Okinawa
Tape at CERN
14/4/201512
Archive read
15 PB 23 PB
27 PBArchive write
Data Volume 100 PB physics archive 7 PB backup (TSM)
Tape libraries 3+2 x IBM TS3500 4 x Oracle SL8500
Tape drives 100 physics archive 50 backup
Capacity 70k slots 30k tapes
A look into the Future
• LHC upgrades will further increase luminosity• Computing resources needs will be higher• Data generated will increase drastically
• Next accelerators• Future Circular Collider (80-100 km)
Bangalore –05/02/2015 India Analytics & Big Data Summit 2015 21
CHEP 2015, Okinawa
Archive: Large scale media migration
14/4/201513
…
LHC Run1
Repack
LHC Run1
Repack
Deadline:LHC run 2 start !
Part 1:Oracle T10000D
Part 2:IBM TS1150
[email protected] 17 201514
Smart vs Simple Archive: HSM Issues
• CASTOR had been designed as Hierarchical Storage Management system
• disk-only and multi-pool support were added later — painfully..
• required rates for namespace access and file-open exceeded earlier estimates
• Around LHC start also conceptual issues with the HSM model became visible
• “A file” is not a meaningful granule for managing data exchange— experiment use datasets
• Dataset parts needed to be “pinned” on disk by users to avoid cache trashing
• Users had to “trick” the HSM to do the right thing :-(
CERN IT Department CH-1211 Genève 23
Switzerland www.cern.ch/it
Internet Services
DSS EOS Project: Goals & Choices
• Server, media, file system failures need to be transparently absorbed
– key functionality: file level replication and rebalancing – data stays available after a failure - no human intervention
• Fine grained redundancy within one h/w setup – choose & change redundancy level for specific data
• either file replica count or erasure encoding • Support bulk deployment operations
– eg replace hundreds of servers at end of warranty • In-memory namespace (sparse hash per directory)
– file stat calls 1-2 orders faster – write ahead logging for durability
• Later in addition: transparent multi-site clustering • eg between Geneva and Budapest
16
EOS Raw Capacity Evolution
Why do we develop our own open source storage software?• Large science community trained to be effective with set of
products
• efficiency of this community is our main asset - not just the raw utilisation of CPUs and disks
• integration and specific support do matter
• community sharing via tools and formats even more
• Long term projects
• change of “vendor/technology” is not only likely but expected
• we carry old but valuable data through time (bit-preservation)
• “loss of data ownership” after first active project period
Does Kryder’s law still hold?
areal density CAGR
source: HDD Opportunities & Challenges, Now to 2020, Dave Anderson, Seagate
Object Disk• Each disk talks object storage
protocol over TCP – replication/failover with other disks
in a networked disk cluster – open access library for app
development
– Why now? • shingled media come with constrained
(object) semantic: eg no updates
– Early stage with several open questions • port price for disk network vs price gain
by reduced server/power cost? • standardisation of protocol/semantics to
allow app development at low risk of vendor binding?
Can we optimise our systems further?
• Infrastructure analytics
• apply statistical analysis to the complete system: storage, cpu, network, user app
• measure/predict quantitative impact of changes on real job population
• Easy!
• looks like physics analysis with infrastructure metrics instead of physics data
• … really?
Non-trivial…• Technically
• needs consolidated service and application side metrics
• usually: log data for human consumption — without data design
• Conceptually
• some established metrics turn out to be less useful for analysis of today’s hardware than expected
• cpu efficiency = t_cpu / t_wall ? storage efficiency = GB / s ?
• correlation does not imply causal relation
• Sociologically
• better observe “rule of local discovery”
• people who quantitatively understand the infrastructure are busy running services — Always …
Data Collection and Analysis Repository
eos
HDFS
ai
lsfreadbytes : numberfilename : stringopentime : time
Set: EOS readbytes : numberfilename : stringopentime : time
Set: EOS readbytes : numberfilename : stringopentime : time
Set: eos PeriodicExtract & Cleaning
MonitoringJSON Files
export
User extract
MR node
MR node
MR node
MR node
MR node
MR node
Hadoop
small, binary subset
Ramping up:~ 100 nodes~ 100 TB raw logs
In production:- Flume- HDFS- MR- Pig- Spark- Scoop- {Impala}
Current work items: Service: availability (eg isolation and rolling upgrades) Analytics: workbooks support for popular analysis tools: R/python/ROOT
Summary
• CERN has a long tradition in deploying large scale storage systems used by a distributed science community world-wide
• During the first LHC run period we have passed the 100 PB mark at CERN and more importantly have contributed to the rapid confirmation of the Higgs boson and many other LHC results
• For LHC Run 2 we have significantly upgraded & optimised the infrastructure in close collaboration between service providers and users
• Adding more quantitative infrastructure analytics to prepare for High-Luminosity-LHC
• CERN is already very active as user and provider in the open source world and the overlap with other Big Data communities is increasing.
Thank you!