science and computing at slac · science and computing at slac donald r. lemma, b.sc., mpa, ph.d....
TRANSCRIPT
Science and Computing at SLAC
Donald R. Lemma, B.Sc., MPA, Ph.D.CIO and Computing Division DirectorSLAC National Accelerator Laboratory
4th Annual XLDB ConferenceOctober 6, 2010
Aligning Computing to Support the SLAC Scientific Mission
“To explore the ultimate structure and dynamics of matter and the properties of energy, space and time—at the smallest and largest scales, in the fastest processes
and at the highest energies”
About SLAC…•One of 17 National Laboratories funded by the US Department of Energy and Operated by Stanford University for 48 Years
•Science-concentric mission: No classified research or weapons work and all research is published
•Nearly 500 acres of land and 3 MILES of tunnels
•160 Megawatts of power
•1,600 staff and an equal number of visiting scientists and researchers
•Research at SLAC has lead to 6 Nobel Prizes (in both chemistry and physics)
•Discoveries include the Quark, Tau Lepton, and the first direct evidence of dark matter
•
Scientific Disciplines at SLAC• Particle Physics and Astrophysics
• Linac Coherent Light Source (LCLS)
• Photon Science
• Stanford Synchrotron Radiation Light Source (SSRL)
• Plus other PROJECTS, such as:– Space and ground-based data acquisition systems– High throughput structural biology
Some (Current) Fundamental Scientific Questions
• What are the ultimate Laws of Nature?– Are there new forces, beyond what we see today? Do the forces
unify? At what scale? Why is gravity so different from the other forces?
– What lies beyond the quarks and leptons? What completes the Standard Model?
• What is the structure of space and time?– Why are there four spacetime dimensions? Are there more?
What are their shapes and sizes? What is the quantum theory of gravity?
– What is the origin of mass?
• How did the Universe come to be?– What is the dark matter and dark energy? What happened to
antimatter? What powered the big bang? – What is the fate of the Universe?
Some (Current) Computing Challenges to Help Scientists Address These Questions
1. How do you capture data at a rate of a quadrillionth of a second?
2. What hardware/software can most effectively process petascale data?
3. How do you store and access trillions of files in a single file-system?
4. How do you process this volume of data and metadata?
5. How can you get all of this hardware to fit into a conventional datacenter given the limits on power, space, and cooling?
6. How do you deal with latency between the database, disk, and CPU at extreme scales?
But before we look forward, let’s take a quick look back…
Necessity is the Mother of Invention…SLAC’s Rich Computing Heritage Designing Systems to Meet Scientific Needs
• First Internet Web Connection in North America (between Tim Berners-Lee at CERN and Paul Kunz at SLAC)
• First Internet Application in the World (SPIRES)
• Instant Messaging
• Landmark Open Internet Database Ruling (Netscape Communications Corp. v. Konrad, No. C 00-20789 JW (N.D. Cal. April 2, 2001)
• Apple Computer traces some of its roots here (SLAC was the location of the “Home Brew Computer Club”)
SLAC’s Rich Computing Heritage• First Web Browser in the World (Midas)
• First Web Search Engine in the World
• Largest Scientific Database in the World
• Close Ties with the INNOVATIVE Computer Engineering coming out off Stanford. So, just for fun…
Google’s First Storage Array
Let’s Start at the Very Beginning…
Time
Scale
SCALE
1 x 10 01358122122
Scale: How We Do It…Large Area TelescopeGlobal and Local Collaboration
•Disparate teams with focused on working together to achieve a shared scientific objective
•Integrate the Computing Department from Inception, to Design to Operation…there is no Differentiation between IT and Engineering Professionals working on the project (overcoming pride and prejudice)
Scaling and Planning•10 years of operations foreseen (build the system for the organization we want to “become”, not the organization that “we are”)
•Hundreds of millions of datasets and processes
•Many hundreds of terabytes of data
•Computers are integral, not G&A overhead…humans cannot analyze the volumes of raw data generated by the Instrument
•Different users want to see different data “Slices”
•Time is Critical…Parallelise processing
Reliability•Tens of thousands of batch jobs per day (43k in a day is our record…approx 40k CPU-hrs)…it’s a long walk to the data capture device
Scale: Finding a drop of water in an oceanThe technique is not unlike what is done with private-sector financial consolidation systems: TRANSACTION to ERP to CONSOLIDATED REPORTING
DL
R
R
R
R
D
D
D
D
R
R
R
R
R
R
R
R
R
R
R
R
R
……
……
raw digirecon
F1 hr
1.5 hr
……
…
Fits
FitsRoot
downlink
Decompress Root
6 GB/day trending data into Oracle
A completely new type of file system needed to be developed to store billions of files with fault tolerance. SLAC developed “xrootd”
and used it with conventional database tools and technologies
Scale: How We Do It…Large Scale Synoptic Telescope
• 100+ petabytes
• 3,000 megapixel (3 billion pixel) detector
• Device will generate more data in the first 15 minutes of operation than the Hubble generated since it was launched
• Same management principals mentioned in the previous slide (collaboration, science/IT partnerships, early IT involvement, scaling and planning)
• SLAC is participating in the construction of an entirely new Data Access System, including standards, a database and database language
Scale: A New Tool for the World That is Coming Out of This Project
• Leading-edge database development work• XLDB
• Petascale databases• Workshop and (first) open conference (October 5-7 at SLAC)
• SciDB• Open source DMAS for scientific research• Driven by needs of data-intensive users with array data model• Applications in optical and radio astronomy, geoscience, biology, web, drug discovery, Wall Street, oil and gas• Designed for complex analyses on large data sets• Time series, spatial correlations, matrix operations• Data Provenance
SLAC helped jump-start SciDB, including co-founding as well as chairing the science advisory board
Scale: This Time, Going Down
1 x 10 -1-2-4-6-8-10-14
Atomic-Scale and Time…•SLAC’s X-ray laser captures data at a 100 femtosecond rate (a quadrillionth of a second)
•There are more femtoseconds in a minute than there are minutes since the beginning of the universe (measured at the Big Bang)•In 1 second, light will travel to the moon and back to Earth…in 100 femtoseconds light will traverse the distance of a human hair
•Computing needed innovative tools to come up with a method of capturing and assembling this data at a multi gigaByte per second rate
1/200 second1/1,000,000,000,000,000 second
Fundamental Mysteries
Realm of particle physics until now
Energy- Particle Physics• Power
• Detection• Trillions of events
• Data• Parsing and Analysis• Grid Computing• Refinement and data mining
Babar DetectorCERN Atlas Detector
Grid Computing
Parallelism in High Energy Physics (HEP) HEP data analysis deals with real, or simulated “events” aka collisions.
Events do not depend on each other. They can be processed in any order –forwards, backwards, in parallel, etc.
Historically ideal for “trivial parallelism”.
e.g. Trivial parallelism at the batch job level:
1,000,000,000 Events (100 Tbytes)
1000 x 100 GB batch jobs . . . . . . . .
. . . . . . . . . Batch workers:• 1 job per core• 2 GB per core• ~1 day per job• 0.1 (to 1000)
seconds per event Concatenate Output
Batch System
New Derived Dataset(s)
Some Computing Projects to Meet These Needs- Petacache flash memory
- Should be announcing new prototype 100Kw ultra-high density racks with 20Tflops computing capacity per rack, saving space and lowering cooling
- GPU Computing
GPU vs CPU Computing
=
10 racks of GPU-based computers = 1 PetaFlop = 3 x
Credit: Professor Todd Martinez, Stanford University
Device
Multiprocessor N
Multiprocessor 2
Multiprocessor 1
Device memory
Shared Memory
InstructionUnit
Processor 1
Registers
…Processor 2
Registers
Processor M
Registers
ConstantCache
TextureCache
Global, constant, texture memories
Strict memory hierarchy
Global and constant memory reside on the device -> slow access (constant memory is cached on chip)
Strict rules about who (thread/block/device/CPU) can access what memory and how (read/write)
Access speed varies from 1 clock cycle to 500 clock cycles
Need 10,000+ threads per GPU
Single precision 10x faster than double
Algorithms need to be redesigned!
Not just recompiling!
Credit: Professor Todd Martinez, Stanford University
Wrap-Up…