the challenges of scientific computing

Download The Challenges of Scientific Computing

If you can't read please download the document

Upload: mira

Post on 25-Feb-2016

41 views

Category:

Documents


1 download

DESCRIPTION

The Challenges of Scientific Computing. Alberto Di Meglio CERN openlab CTO. DOI : 10.5281/zenodo.7116. Outline. What is CERN and How it Works Computing and Data Challenges in HEP New Requirements and Future Directions Scientific Collaborations. What is CERN and How Does it Work?. - PowerPoint PPT Presentation

TRANSCRIPT

Slide 1

The Challenges of Scientific ComputingAlberto Di MeglioCERN openlab CTO

DOI: 10.5281/zenodo.71161OutlineWhat is CERN and How it WorksComputing and Data Challenges in HEPNew Requirements and Future DirectionsScientific CollaborationsHuawei IT Leaders Forum - 18 September 2013, Amsterdam2

3Huawei IT Leaders Forum - 18 September 2013, AmsterdamWhat is CERN and How Does it Work?3What is CERN?Huawei IT Leaders Forum - 18 September 2013, Amsterdam4

European Organization for Nuclear Research Founded in 195420 Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdom Candidate for Accession: RomaniaAssociate Members in the Pre-Stage to Membership: Israel, Serbia, CyprusApplicant States: Slovenia, TurkeyObservers to Council: India, Japan, the Russian Federation, the United States of America, Turkey, the European Commission and UNESCO ~ 2300 staff ~ 1050 other paid personnel ~ 11000 users Budget (2012) ~1100 MCHF4What is the Universe made of?Huawei IT Leaders Forum - 18 September 2013, Amsterdam5

What gives the particles their masses?

How can gravity be integrated into a unified theory?

Why is there only matter and no anti-matter in the universe?

Are there more space-time dimensions than the 4 we know of?

What is dark energy and dark matter which makes up 95% of the universe ?5

The Large Hadron Collider (LHC)Huawei IT Leaders Forum - 18 September 2013, Amsterdam6

LHC FactsHuawei IT Leaders Forum - 18 September 2013, Amsterdam7Biggest accelerator (largest machine) in the world

Fastest racetrack on EarthProtons circulate at 99.9999991% the speed of lightEmptiest place in the solar systemPressure 10-13 atm (10x less than on the moon)Worlds largest refrigerator -271.3 C (1.9K)Hottest spot in the galaxytemperatures 100 000x hotter than the heart of the sun5.5 Trillion KWorlds biggest and most sophisticated detectorsMost data of any scientific experiment20-30 PB per year (as of today we have about 75 PB)Collisions in the LHCHuawei IT Leaders Forum - 18 September 2013, Amsterdam8

Computing and Data Challenges in HEPHuawei IT Leaders Forum - 18 September 2013, Amsterdam9

The LHC ChallengesHuawei IT Leaders Forum - 18 September 2013, Amsterdam10Signal/Noise: 10-13 (10-9 offline)Data volumeHigh rate * large number of channels * 4 experiments ~30 PB of new data each yearCompute powerEvent complexity * Nb. events * thousands users 300 k CPUs 170 PB of disk storageWorldwide analysis & fundingComputing funding locally in major regions & countriesEfficient analysis everywhere~1.5M jobs/day, 150k CPU-years/year GRID technology

Data Handling and ComputationHuawei IT Leaders Forum - 18 September 2013, Amsterdam11Online Triggers and Filters

Offline ReconstructionOffiine SimulationOffline AnalysisSelection & reconstructionEvent simulation

Event reprocessing

100%(raw data, 6 GB/s)10%(event summary)Batch Physics AnalysisInteractive Analysis1%

Processed data (active tapes)What is this data?Huawei IT Leaders Forum - 18 September 2013, Amsterdam12

Raw dataWas a detector element hit?How much energy?What time?Simulated dataSimulate particle collisions (hard interaction) according to a candidate theorySimulate interaction of primary and secondary particles with detector materialOutcome: detailed response of the detector to a known eventReconstructed data (derived from both of the above)Momentum of tracks (4-vectors)OriginEnergy in clusters (jets)Particle typeCalibration informationHighest data complexity is hereAnalysis data (derived from RECO data)User or group specific data abstractions or selectionsScience happens here

Data StorageHuawei IT Leaders Forum - 18 September 2013, Amsterdam13

CASTOR and EOS are using the same commodity disk serversWith RAID-1 for CASTOR (2 copies in the mirror)JBOD with RAIN for EOSReplicas spread over different disk serversTunable redundancyCASTOR and EOSStorage Systems developed at CERNThe GridHuawei IT Leaders Forum - 18 September 2013, Amsterdam14

Tier-0 (CERN):Data recordingInitial data reconstructionData distribution

Tier-1 (11 centres):Permanent storageRe-processingAnalysisTier-2 (~130 centres): Simulation End-user analysis

15New Requirements and Future Directions(Big Data and Data Analytics)Huawei IT Leaders Forum - 18 September 2013, AmsterdamLHC ScheduleHuawei IT Leaders Forum - 18 September 2013, Amsterdam16First runLS1Second runLS2Third runLS3HL-LHC20092013201420152016201720182011201020112019202320242030?202120202022LHC startup900 GeV7 TeVL=6x1033 cm-2s-2Bunch spacing = 50 nsPhase-0 Upgrade(design energy,nominal luminosity)14 TeVL=1x1034 cm-2s-2Bunch spacing = 25 nsPhase-1 Upgrade(design energy,design luminosity)14 TeVL=2x1034 cm-2s-2Bunch spacing = 25 nsPhase-2 Upgrade(High Luminosity)14 TeVL=1x1035 cm-2s-2Spacing = 12.5 nsOnline DAQ and TriggersHuawei IT Leaders Forum - 18 September 2013, Amsterdam17Most events produced in the detectors describe known physicsOnly 1 in 1013 events are considered interesting now, but discarded raw data is lost foreverFiltering is done using two-level triggers, the first in HW (L1), the second in SW (HLT)L1 triggers use complex custom processors, difficult to program, maintain and reconfigureLS1, LS2 and LS3 require higher and higher data ratesCan the L1 run at the same rate as LHC?

Implement L1 in softwareReplace HW triggers with commodity processors/co-processors (GPGPUs/XeonPhis?) easier to program and upgrade/maintainLong-term, merge L1 and HLT, need lots of fast, low-power, low-cost linksDAQ evolutionmulti Tbit transfersClose integration of network and CPUsRackscale architecture?Multi-Core PlatformsHuawei IT Leaders Forum - 18 September 2013, Amsterdam18Very high-level of parallelism, both online and offlineParallelism exploited sending multiple independent computational jobs on separate physical or virtual nodesMultiple physical cores partitioned across virtual machinesCurrent software not written to exploit multi-coresVery important issue with detector simulations, very CPU intensive tasks

Exploit multi-core platforms using vectorization techniquesBuild experience with data and task parallelism using Cilk and Haswell (CilkPlus and GCC)Requires rewriting the software Geant 5

Cloud InfrastructureHuawei IT Leaders Forum - 18 September 2013, Amsterdam19

New CERN Computer CentreTwo 100GB lines, one academic, one commercialAdditional capacity, redundancyRemote management, dynamic provisioningNo increase in staff, decrease in budgetCloud StorageHuawei IT Leaders Forum - 18 September 2013, Amsterdam20Data storage is criticalCurrently 75/100 PB tapes, 25/40 PB disk, getting close to max capacityNeed to be able to scale both predictably (planning) and dynamically (peak times)Can industrial cloud storage deliver the required quality of service maximizing Reliability, Performance and Cost at the same time?

Slow(latency)ExpensiveUnreliableTapesDisksFlash, Solid State DisksPremium filers, SANs, Partership CERN/IT & Huawei (S3 Storage Appliance, 0.8 PB, 1 year)Shared areas of investigationsReliability (100% despite disk failures) Functional tests on Amazon S3 protocol and interoperability with Grid production environments (tested ok)Performance measurements with sparse data read (comparable to existing LHC production systems)Total Cost of Ownership (TCO): clear model, no hidden costs found

References used for comparisonOpen Source self assembled solutions (based on openstack / swift)Traditional Intel-based file server based on Linux operating system and locally attached disks

Data AnalyticsHuawei IT Leaders Forum - 18 September 2013, Amsterdam21Massive amounts of data fromThe LHC data processingBut alsoThe LHC subsystemsMagnetsCryogenicsControl systemsElectrical systemsLogging data and alerts

How can the data be analyzed efficiently to find patterns or detect problems at early stages?Investigations ofPhysics data analysis in databases exploiting columnar DBs (Oracle Exadata)Hadoop and MapReduce techniquesPhysics event indexing, filtering and searchingMachine Learning techniques

22Scientific CollaborationsHuawei IT Leaders Forum - 18 September 2013, Amsterdam22International Scientific CollaborationsMany scientific projects are global collaborations of 100s of partnersEfficient computing and data infrastructures have become critical as the quantity, variety and rates of data generation keep increasingFunding does not scale in the same wayOptimization and sharing of resourcesCollaboration with commercial IT companies increasingly importantRequirements are not unique anymore

Huawei IT Leaders Forum - 18 September 2013, Amsterdam23

CERN Biomedical Facility

Scientific Computing as a ServiceHuawei IT Leaders Forum - 18 September 2013, Amsterdam24SoftwareRepos and RegistriesDiscoveryRegistrationAuthorshipCitationsReproducibilityPreservationResearchersApplicationsDevelopersPublicationsPatentsDatasetsScientific results

Click n citeon-demandSCaaS

An on-demand computing and data analysis serviceUnique IDs and formal relationships among digital objects (publications, datasets, software, people identities)Reproducibility and preservationResults attribution and recognition24

CERN openlab in a nutshellHuawei IT Leaders Forum - 18 September 2013, Amsterdam25A science industry partnership to drive R&D and innovation with over a decade of successEvaluate state-of-the-art technologies in a challenging environment and improve themTest in a research environment today what will be used in many business sectors tomorrowTrain next generation of engineers/employeesDisseminate results and outreach to new audiences25CERN openlab recipe: key ingredients The extreme demands from CERNs scientific programmeThe alignment of goals between partnersTrustYoung researchers with their talents, expertise and energyEfficient and lightweight structureRegular checkpoints/reviewsOutreach/communications/training26Huawei IT Leaders Forum - 18 September 2013, Amsterdam26ICE-DIP: Intel CERN European Doctorate Industrial ProgramThe project starts today: CERN will recruit the 5 PhD students on 3 year fellowship contracts in autumn 2013Each PhD student will be seconded to Intel for 18 monthsWill work with LHC experiments on future upgrade research themes:usage of many-core processors for data acquisitionfuture optical interconnect technologiesreconfigurable logicdata acquisition networks Associate partners: Nat. Univ. Ireland Maynooth & Dublin City Univ. (recruits will be enrolled in PhD programmes), Xena Networks (SME, Denmark)EC funding: ~ 1.25 million over 4 years

Huawei IT Leaders Forum - 18 September 2013, Amsterdam2727ConclusionsCERN and the LHC program have been among the first to address big data challengesSolutions have been developed and important results obtainedHowever, not unique anymore in both scientific and consumer applicationsNeed to exploit emerging technologies and share expertise with academia and commercial partnersLHC schedule will keep it at the bleeding edge of technology, providing excellent opportunities to companies to test ideas and technologies ahead of the market

Huawei IT Leaders Forum - 18 September 2013, Amsterdam28Huawei IT Leaders Forum - 18 September 2013, Amsterdam29

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. It includes photos, models and videos courtesy of CERN and uses contents provided by CERN and CERN openlab staff29