cosmic-scale applications for cyberinfrastructure nsf mps cyberscience workshop nsf headquarters...

23
Cosmic-Scale Applications for Cyberinfrastructure NSF MPS Cyberscience Workshop NSF Headquarters Arlington, VA April 21, 2004 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technologies Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD

Upload: gladys-cooper

Post on 05-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Cosmic-Scale Applicationsfor Cyberinfrastructure

NSF MPS Cyberscience Workshop

NSF Headquarters

Arlington, VA

April 21, 2004

Dr. Larry Smarr

Director, California Institute for Telecommunications and Information Technologies

Harry E. Gruber Professor,

Dept. of Computer Science and Engineering

Jacobs School of Engineering, UCSD

Cosmic-Scale Science:Cyberinfrastructure Links Theory with Observation

• Two Examples– Formation of Structures in the Early Universe– Black Hole Collisions and Gravitational Radiation

• Common Features Emerge– $ Billions of New Instruments Generating Data– Much More Powerful Supercomputers Needed– Sophisticated Software Key

– eg, Automatic Mesh Refinement

• Cyberinfrastructure Required for Data Produced– Federated Repositories– Data Grid Middleware– Local Laboratory Standards-Based Clusters

NASA, ESA, S. Beckwith (STScI) and the HUDF Team

Hubble Ultra Deep Field

Fundamental Physics Challenge:Formation of First Galaxies and Clusters:

Faintest galaxies ~ 1 billion years old

Galaxy population is strongly evolving

380,000 yr

NASA WMAP

Source: Mike Norman, UCSD

Formation & Evolution of Galaxies:$Billions of New Digital Observatories

• Nature and Occurrence of the First Galaxies “First Light” (JWST, ALMA)

• Properties of High-Z Galaxies (HST, ALMA) Galaxy Building Blocks?

• Source(s) of Early Reionization (WMAP)

• Star Formation History of Galaxies (Spitzer)

• Emergence of the Hubble Types (DEEP2)

• Influence of Environment on Galaxy Type and Large Scale Structure (SDSS)

• Supermassive Black Hole Formation and AGN/QSO Phenomena In Galaxies (SDSS, HST, CXO)

Many Open Questions Are Being Investigated Observationally

Source: Mike Norman, UCSD

Cosmic Simulator with a Billion Zone and Gigaparticle Resolution

Compare with Sloan Survey

Source: Mike Norman, UCSD

SDSC Blue Horizon

Why Does the Cosmic SimulatorNeed Cyberinfrastructure?

• One Gigazone Run:– Generates ~10 TeraByte of Output– A “Snapshot” is 100 GB– Need to Visually Analyze as We Create SpaceTimes

• Visual Analysis Daunting – Single Frame is About 8GB– A Smooth Animation of 1000 Frames is 1000 x 8 GB=8TB– Stage on Rotating Storage to High Res Displays

• Can Run Evolutions Faster than We can Archive Them– File Transport Over Shared Internet ~50 Mbit/s

– 4 Hours to Move ONE Snapshot!

– Many Scientists Will Need Access for Analysis

Source: Mike Norman, UCSD

Limitations of Uniform Grids for Complex Scientific and Engineering Problems

Source: Greg Bryan, Mike Norman, NCSA

512x512x512 Run on 512-node CM-5

Gravitation Causes Continuous

Increase in Density Until There is a Large Mass in a

Single Grid Zone

Solution: Develop Automatic Mesh Refinement (AMR) to Resolve Mass Concentrations

Source: Greg Bryan, Mike Norman, John Shalf, NCSA

64x64x64 Run with Seven Levels of Adaption on SGI Power Challenge,Locally Equivalent to 8192x8192x8192 Resolution

• Background Image Shows Grid Hierarchy Used– Key to Resolving Physics is More Sophisticated Software– Evolution is from 10Myr to Present Epoch

• Every Galaxy > 1011 Msolar in 100 Mpc/H Volume Adaptively Refined With AMR– 2563 Base Grid

– Over 32,000 Grids At 7 Levels Of Refinement– Spatial Resolution of 4 kpc at Finest

– 150,000 CPU-hr On NCSA Origin2000 – Completed In 1999

• 5123 AMR or 10243 Unigrid Now Feasible – 8-64 Times The Mass Resolution– Can Simulate First Galaxies

AMR Allows Digital Exploration of Early Galaxy and Cluster Core Formation

Source: Mike Norman, UCSD

Hydrodynamic Cosmology Simulation of Galaxy Formation Using Parallel Adaptive Mesh Refinement (Enzo)

Image credit: Donna Cox, Bob Patterson (NCSA)Simulation: M. Norman (UCSD)

Cosmic Simulator:Thresholds of Capability and Discovery

• 2000: Formation of Galaxy Cluster Cores (1 TFLOP/s)• 2006: Properties of First Galaxies (40 TFLOP/s) • 2010: Emergence of Hubble Types (150 TFLOP/s)• 2014: Large Scale Distribution Of Galaxies By

Luminosity And Morphology (500 TFLOP/s)

Hubble types LSS

Source: Mike Norman, UCSD

Proposed Galaxy Simulation Cyber-Grid

User Grid

• Modelers

• Observers

• Visualizers

Developer Grid

• Enzo Code

• Data Mgmt

• Analysis Tools

• Visualization

• Middleware

Observational Survey Partners

• SDSS

• DEEP2

• SWIRE

Outreach

• Tutorials

• Animations

• PBS Nova

Production Simulated Galaxy Grid

Enzo Data Grid

EnzoSimulation

Code

Enzo Data Analysis

Tools

Portal Interface

Simulated Galaxy Archive

NSF NMI, PI: M. Norman, UCSD

LIGO, VIRGO, GEO and LISASearch for Gravitational Waves

• $1B Being Spent On Ground-Based LIGO/VIRGO/GEO and Space-Based LISA – Use Laser Interferometers To Detect

Waves• Matched Filtering of Waveforms

Requires Large Numbers of Simulations– Stored In Federated Repositories

• LISA’s Increased Sensitivity Vastly Opens Parameter Space: – Many Orders Of Magnitude More

Parameter Space to be Searched!

LIGO-Hanford

Virgo-Pisa

Source: Ed Seidel, LSU

Two Body Problem in General Relativity -The Collision of Two Black Holes

• Numerical Solution of Einstein Equations Required

• Problem Solution Started 40 Years Ago, 10 More to Go

• Wave Forms Critical for NSF LIGO Gravitational Wave Detector

• A PetaFLOPS-Class Grand Challenge

Oct. 10, 1995Matzner, Seidel, Shapiro, Smarr, Suen, Teukolsky, Winicuor

Megaflop Gigaflop TeraflopKiloflop

Lic

hn

ero

wic

z

The Numerical Two Black Hole ProblemSpans the Digital Computer Era

Hah

n &

Lin

dq

uis

t

DeW

itt/

Mis

ner

-Ch

apel

Hil

l

DeW

itt-

LL

NL

Cad

ez T

hes

is

Ep

ple

y T

hes

is

Sm

arr

Th

esis Modern Era

Relative Amount of Floating Point Operationsfor Three Epochs of the 2BH Collision Problem

9,000,000X

30,000X

1999Seidel & Suen, et al.

SGI Origin256 ProcessorsEach 500 Mflops

40 Hours

1977Eppley & Smarr

CDC 7600One ProcessorEach 35 Mflops

5 Hours

300X

1963Hahn & Lindquist

IBM 7090One ProcessorEach 0.2 Mflops

3 Hours

10,000x More Required!

What is Needed to Finish the Computing Job

• Current Black Hole Jobs– Grid: 768 X 768 X 384 Memory Used: 250+ GB

– Runtime:~ Day Or More Output: Multi-TB+ (Disk Limited)

• Inspiraling BH Simulations Are Volume Limited – Scale As N3-4

• Low-Resolution Simulations of BH Collisions: – Currently Require O(1015) FLOPS:

• High-Resolution Inspiraling Binaries Need:– Increased Simulation Volume, Evolution Time, And

Resolution - And O(1020 +) Flops

– 50-100TF With Adaptive Meshes Will Make This Possible

Source: Ed Seidel, LSU

Why Black Hole SimulationsNeed Cyberinfrastructure

• Software Development is Key– Use Adaptive Meshes to Accurately Resolve Metric – ~10 Levels Of Refinement,

– Several Machine-Days Per Spacetime

• Output– Minimal 25-100TB For Full Analysis (Multiple Orbits) of:

– Gravitational Waves – Event Horizon Structure Evolution

• Real-Time Scheduling Needed Across Multiple Resources For Collaborative Distributed Computing – Spawning (For Analysis, Steering Tasks), Migration – Interactive Viz From Distributed Collaborations– Implies Need for Dedicated Gigabit Light Pipes (Lambdas)

Source: Ed Seidel, LSU

Ensembles Of Simulations Needed for LIGO, GEO, LISA Gravitational Wave Astronomy

• Variations for Internal Approximations– Accuracy, Sensitivity Analysis To Gauge Parameters,

Resolution, Algorithms– Dozen Simulations Per Physical Scenario

• Variations In Physical Scenarios-->Waveform Catalogs– Masses, Spins, Orbital Characteristics Varied – Huge Parameter Space To Survey

• In Total: 103 - 106 Simulations Needed – Potentially Generating 25TB Each– Stored In Federated Repositories

• Data Analysis Of LIGO, GEO, LISA Signals– Interacting With Simulation Data– Managing Parameter Space/Signal Analysis

Source: Ed Seidel, LSU

To a Grid “Supercomputers” are Just High Performance Data Generators

• Similar to Particle Accelerators, Telescopes, Ocean Observatories, Microscopes, etc.

• All Require:– Web Portal Access for Real-Time Instrument Control– Grid Middleware for Security, Scheduling, Reservations– Federated Repositories for Data Archiving– Data Grids for Data Replication and Management– High Performance Networking to Deal With Data Floods– Local Visualization and Analysis Facilities– Multi-Site Multi-Modal Collaboration Software

• That is—a Cyberinfrastructure!

NSF Must Increase Funding for Community Software/Toolkit Development

• Major Problem To Enable Community– Modern Software Engineering– Training– User Support

• Require Toolkits For:– Sharing/Developing Of Community Codes – Algorithmic Libraries, e.g. AMR – Local Compute, Storage, Visualization, & Analysis– Federated Repositories– Grid Middleware – Lambda Provisioning

LambdaGrid Required to Support the Distributed Collaborative Teams

• Grand Challenge-Like Teams Involving US and International Collaborations– Example: GWEN (Gravitational Wave European Network)

Involves 20 Groups!

• Simulation Data Stored Across Geographically Distributed Spaces– Organization, Access, Mining Issues

• Collaborative Data Spaces to Support Interaction with: – Colleagues, Data, Simulations

• Need Lambda Provisioning For:– Coupling Supercomputers and Data Grid– Remote Visualization And Monitoring Of Simulations– Analysis Of Federated Data Sets By Virtual Organizations

Source: Ed Seidel, LSU

Special Thanks to:

• Ed Seidel– Director, Center for Computation and Technology,– Department of Physics and Astronomy,– Louisiana State University– & Albert-Einstein-Institut

– Potsdam, Germany

– Representing dozens of scientists

• Michael Norman– Director, Laboratory for Computational Astrophysics– Physics Department, – UC San Diego

• Members of the OptIPuter Team