science and computing at slac · science and computing at slac donald r. lemma, b.sc., mpa, ph.d....

26
Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4 th Annual XLDB Conference October 6, 2010

Upload: others

Post on 13-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Science and Computing at SLAC

Donald R. Lemma, B.Sc., MPA, Ph.D.CIO and Computing Division DirectorSLAC National Accelerator Laboratory

4th Annual XLDB ConferenceOctober 6, 2010

Page 2: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Aligning Computing to Support the SLAC Scientific Mission

“To explore the ultimate structure and dynamics of matter and the properties of energy, space and time—at the smallest and largest scales, in the fastest processes

and at the highest energies”

Page 3: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

About SLAC…•One of 17 National Laboratories funded by the US Department of Energy and Operated by Stanford University for 48 Years

•Science-concentric mission: No classified research or weapons work and all research is published

•Nearly 500 acres of land and 3 MILES of tunnels

•160 Megawatts of power

•1,600 staff and an equal number of visiting scientists and researchers

•Research at SLAC has lead to 6 Nobel Prizes (in both chemistry and physics)

•Discoveries include the Quark, Tau Lepton, and the first direct evidence of dark matter

Page 4: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Scientific Disciplines at SLAC• Particle Physics and Astrophysics

• Linac Coherent Light Source (LCLS)

• Photon Science

• Stanford Synchrotron Radiation Light Source (SSRL)

• Plus other PROJECTS, such as:– Space and ground-based data acquisition systems– High throughput structural biology

Page 5: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Some (Current) Fundamental Scientific Questions

• What are the ultimate Laws of Nature?– Are there new forces, beyond what we see today? Do the forces

unify? At what scale? Why is gravity so different from the other forces?

– What lies beyond the quarks and leptons? What completes the Standard Model?

• What is the structure of space and time?– Why are there four spacetime dimensions? Are there more?

What are their shapes and sizes? What is the quantum theory of gravity?

– What is the origin of mass?

• How did the Universe come to be?– What is the dark matter and dark energy? What happened to

antimatter? What powered the big bang? – What is the fate of the Universe?

Page 6: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Some (Current) Computing Challenges to Help Scientists Address These Questions

1. How do you capture data at a rate of a quadrillionth of a second?

2. What hardware/software can most effectively process petascale data?

3. How do you store and access trillions of files in a single file-system?

4. How do you process this volume of data and metadata?

5. How can you get all of this hardware to fit into a conventional datacenter given the limits on power, space, and cooling?

6. How do you deal with latency between the database, disk, and CPU at extreme scales?

But before we look forward, let’s take a quick look back…

Page 7: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Necessity is the Mother of Invention…SLAC’s Rich Computing Heritage Designing Systems to Meet Scientific Needs

• First Internet Web Connection in North America (between Tim Berners-Lee at CERN and Paul Kunz at SLAC)

• First Internet Application in the World (SPIRES)

• Instant Messaging

• Landmark Open Internet Database Ruling (Netscape Communications Corp. v. Konrad, No. C 00-20789 JW (N.D. Cal. April 2, 2001)

• Apple Computer traces some of its roots here (SLAC was the location of the “Home Brew Computer Club”)

Page 8: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

SLAC’s Rich Computing Heritage• First Web Browser in the World (Midas)

• First Web Search Engine in the World

• Largest Scientific Database in the World

• Close Ties with the INNOVATIVE Computer Engineering coming out off Stanford. So, just for fun…

Google’s First Storage Array

Page 9: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Let’s Start at the Very Beginning…

Time

Scale

Page 10: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

SCALE

1 x 10 01358122122

Page 11: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Scale: How We Do It…Large Area TelescopeGlobal and Local Collaboration

•Disparate teams with focused on working together to achieve a shared scientific objective

•Integrate the Computing Department from Inception, to Design to Operation…there is no Differentiation between IT and Engineering Professionals working on the project (overcoming pride and prejudice)

Scaling and Planning•10 years of operations foreseen (build the system for the organization we want to “become”, not the organization that “we are”)

•Hundreds of millions of datasets and processes

•Many hundreds of terabytes of data

•Computers are integral, not G&A overhead…humans cannot analyze the volumes of raw data generated by the Instrument

•Different users want to see different data “Slices”

•Time is Critical…Parallelise processing

Reliability•Tens of thousands of batch jobs per day (43k in a day is our record…approx 40k CPU-hrs)…it’s a long walk to the data capture device

Page 12: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Scale: Finding a drop of water in an oceanThe technique is not unlike what is done with private-sector financial consolidation systems: TRANSACTION to ERP to CONSOLIDATED REPORTING

DL

R

R

R

R

D

D

D

D

R

R

R

R

R

R

R

R

R

R

R

R

R

……

……

raw digirecon

F1 hr

1.5 hr

……

Fits

FitsRoot

downlink

Decompress Root

6 GB/day trending data into Oracle

A completely new type of file system needed to be developed to store billions of files with fault tolerance. SLAC developed “xrootd”

and used it with conventional database tools and technologies

Page 13: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Scale: How We Do It…Large Scale Synoptic Telescope

• 100+ petabytes

• 3,000 megapixel (3 billion pixel) detector

• Device will generate more data in the first 15 minutes of operation than the Hubble generated since it was launched

• Same management principals mentioned in the previous slide (collaboration, science/IT partnerships, early IT involvement, scaling and planning)

• SLAC is participating in the construction of an entirely new Data Access System, including standards, a database and database language

Page 14: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Scale: A New Tool for the World That is Coming Out of This Project

• Leading-edge database development work• XLDB

• Petascale databases• Workshop and (first) open conference (October 5-7 at SLAC)

• SciDB• Open source DMAS for scientific research• Driven by needs of data-intensive users with array data model• Applications in optical and radio astronomy, geoscience, biology, web, drug discovery, Wall Street, oil and gas• Designed for complex analyses on large data sets• Time series, spatial correlations, matrix operations• Data Provenance

SLAC helped jump-start SciDB, including co-founding as well as chairing the science advisory board

Page 15: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Scale: This Time, Going Down

1 x 10 -1-2-4-6-8-10-14

Page 16: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Atomic-Scale and Time…•SLAC’s X-ray laser captures data at a 100 femtosecond rate (a quadrillionth of a second)

•There are more femtoseconds in a minute than there are minutes since the beginning of the universe (measured at the Big Bang)•In 1 second, light will travel to the moon and back to Earth…in 100 femtoseconds light will traverse the distance of a human hair

•Computing needed innovative tools to come up with a method of capturing and assembling this data at a multi gigaByte per second rate

1/200 second1/1,000,000,000,000,000 second

Page 17: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual
Page 18: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Fundamental Mysteries

Realm of particle physics until now

Page 19: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual
Page 20: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Energy- Particle Physics• Power

• Detection• Trillions of events

• Data• Parsing and Analysis• Grid Computing• Refinement and data mining

Babar DetectorCERN Atlas Detector

Page 21: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Grid Computing

Page 22: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Parallelism in High Energy Physics (HEP) HEP data analysis deals with real, or simulated “events” aka collisions.

Events do not depend on each other. They can be processed in any order –forwards, backwards, in parallel, etc.

Historically ideal for “trivial parallelism”.

e.g. Trivial parallelism at the batch job level:

1,000,000,000 Events (100 Tbytes)

1000 x 100 GB batch jobs . . . . . . . .

. . . . . . . . . Batch workers:• 1 job per core• 2 GB per core• ~1 day per job• 0.1 (to 1000)

seconds per event Concatenate Output

Batch System

New Derived Dataset(s)

Page 23: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Some Computing Projects to Meet These Needs- Petacache flash memory

- Should be announcing new prototype 100Kw ultra-high density racks with 20Tflops computing capacity per rack, saving space and lowering cooling

- GPU Computing

Page 24: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

GPU vs CPU Computing

=

10 racks of GPU-based computers = 1 PetaFlop = 3 x

Credit: Professor Todd Martinez, Stanford University

Page 25: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Device

Multiprocessor N

Multiprocessor 2

Multiprocessor 1

Device memory

Shared Memory

InstructionUnit

Processor 1

Registers

…Processor 2

Registers

Processor M

Registers

ConstantCache

TextureCache

Global, constant, texture memories

Strict memory hierarchy

Global and constant memory reside on the device -> slow access (constant memory is cached on chip)

Strict rules about who (thread/block/device/CPU) can access what memory and how (read/write)

Access speed varies from 1 clock cycle to 500 clock cycles

Need 10,000+ threads per GPU

Single precision 10x faster than double

Algorithms need to be redesigned!

Not just recompiling!

Credit: Professor Todd Martinez, Stanford University

Page 26: Science and Computing at SLAC · Science and Computing at SLAC Donald R. Lemma, B.Sc., MPA, Ph.D. CIO and Computing Division Director SLAC National Accelerator Laboratory 4th Annual

Wrap-Up…