1 oak ridge interconnects workshop - november 1999 asci-00-003. asci-00-003.1 asci terascale...

18
ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI Program Leader Mark Seager ASCI Terascale Systems Principal Investigator Lawrence Livermore National Laboratory University of California formed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract W-740

Upload: reynard-lyons

Post on 20-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

1Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Lawrence Livermore National Laboratory , P. O. Box 808, Livermore, CA 94551

ASCI-00-003.1

ASCI Terascale Simulation Requirements and Deployments

ASCI Terascale Simulation Requirements and Deployments

David A. NowakASCI Program Leader

Mark SeagerASCI Terascale Systems Principal Investigator

Lawrence Livermore National LaboratoryUniversity of California

David A. NowakASCI Program Leader

Mark SeagerASCI Terascale Systems Principal Investigator

Lawrence Livermore National LaboratoryUniversity of California

*Work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract W-7405-Eng-48.

Page 2: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

2Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

OverviewOverview

ASCI program backgroundApplications requirementsBalanced terascale computing environmentRed Partnership and CPLANTBlue-Mountain partnership

Sustained Stewardship TeraFLOP/s (SST)Blue-Pacific partnership

Sustained Stewardship TeraFLOP/s (SST)White partnership Interconnect issues for future machines

Page 3: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

3Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

A successful Stockpile Stewardship Program requires a successful ASCI

A successful Stockpile Stewardship Program requires a successful ASCI

B61 W62 W76 W78 W80 B83 W84 W87 W88

PrimaryPrimaryCertificationCertification

MaterialsMaterialsDynamicsDynamics

AdvancedAdvancedRadiographyRadiography

SecondarySecondaryCertificationCertification

EnhancedEnhancedSuretySurety

ICFICF

EnhancedEnhancedSurveillanceSurveillance

HostileHostileEnvironmentsEnvironments

WeaponsWeaponsSystemSystem

EngineeringEngineering

Directed Stockpile Work

Page 4: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

4Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

ASCI’s major technical elements meet Stockpile Stewardship requirements

ASCI’s major technical elements meet Stockpile Stewardship requirements

Defense Applications & Modeling

University Partnerships

Integration

Simulation& Computer

Science

Integrated Computing Systems

Applications Development

Physical& MaterialsModeling

V&V

DisCom2

PSE

Physical Infrastructure

& Platforms

Operation of Facilities

Alliances

Institutes

VIEWS

PathForward

Page 5: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

5Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Example terascale computing environment in CY00 with ASCI White at LLNL

Example terascale computing environment in CY00 with ASCI White at LLNL

Pro

gra

ms

ApplicationPerformance

Computing Speed

Parallel I/O Archive BW

Cache BW

ArchivalStorage

TFLOPS

PetaBytes

GigaBytes/sec GigaBytes/sec

10 3 10 5

0.1

1

10

100

1.6

16

160

TeraBytes/sec1600

0.1

1

100

10

0.2

2

20

20010

1

0.01

‘96

‘97

‘00

2004Year

DiskTeraBytes

5.0

50

500

5000

Memory

Memory BW

100

10

1

0.130300 3

0.3

TeraBytes

TeraBytes/sec

1Interconnect

TeraBytes/sec

0.05

550

10 2

0.1

0.5

ASCI is achieving programmatic objectives, but the computing environment will not be in balance at LLNL for the ASCI White platform.

ASCI is achieving programmatic objectives, but the computing environment will not be in balance at LLNL for the ASCI White platform.

Computational Resource Scaling for ASCI Physics Applications

1 FLOPS Peak Compute16 Bytes/s / FLOPS Cache BW1 Byte / FLOPS Memory3 Bytes/s / FLOPS Memory BW0.5 Bytes/s / FLOPS Interconnect BW50 Bytes / FLOPS Local Disk0.002 Byte/s / FLOPS Parallel I/O BW0.0001 Byte/s / FLOPS Archive BW1000 Bytes / FLOPS Archive

Computational Resource Scaling for ASCI Physics Applications

1 FLOPS Peak Compute16 Bytes/s / FLOPS Cache BW1 Byte / FLOPS Memory3 Bytes/s / FLOPS Memory BW0.5 Bytes/s / FLOPS Interconnect BW50 Bytes / FLOPS Local Disk0.002 Byte/s / FLOPS Parallel I/O BW0.0001 Byte/s / FLOPS Archive BW1000 Bytes / FLOPS Archive

Page 6: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

6Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

SNL/Intel ASCI RedSNL/Intel ASCI Red

• Good interconnect bandwidth

• Poor memory/NIC bandwidth

• Good interconnect bandwidth

• Poor memory/NIC bandwidth CPU CPUNIC 256 MB

CPU CPUNIC 256 MB

Kestrel Compute Board ~2332 in system

CPUNIC 128 MB

CPU 128 MB

Eagle I/O Board ~100 in system

128 MB

128 MB

Terascale Operating System (TOS)cc on node, not cc across board

Cougar Operating System

38 switches

32 switches

node

node

6.25 TB

6.25 TB

Storage

Classified

Unclassified

2 GB each Sustained (total system)

800 MB bi-directional per link

Shortest hop ~13 µs, Longest hop ~16 µs

Aggregate link bandwidth = 1.865 TB/sAggregate link bandwidth = 1.865 TB/s

Bottleneck

Good

Page 7: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

7Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

SNL/Compaq ASCI C-PlantSNL/Compaq ASCI C-Plant

SS S

S S

C-Plant is located at Sandia National LaboratoryCurrently a Hypercube, C-Plant will be reconfigured as a mesh in 2000C-Plant has 50 scalable unitsLINUX Operating SystemC-Plant is a “Beowulf” configuration

•16 “boxes”•2 16 port Myricom switches

•160 MB each direction•320 MB total

Scalable Unit:

“Box:”•500 MHz Compaq eV56 processor•192 MB SDRAM•55 MB NIC•Serial port•Ethernet port•______ Disk spaceHypercube

S S

S

This is a research project — a long way from being a production system.This is a research project — a long way from being a production system.

Page 8: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

8Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

LANL/SGI/Cray ASCI Blue Mountain 3.072 TeraOPS Peak

LANL/SGI/Cray ASCI Blue Mountain 3.072 TeraOPS Peak

48x128-CPU SMP= 3 TF 48x32 GB/SMP= 1.5TB 48x1.44 TB/SMP=70 TB

Inter-SMP Compute Fabric Bandwidth=48 x 12 HIPPI/SMP=115,200 MB/s bi-directional

20(14) HPSS Movers=40(27) HPSS Tape drives of 10-20 MB/s each=400-800 MB/s Bandwidth

LAN

HPSS Meta Data Server

Bandwidth=48 X 1 HIPPI/SMP=9,600 MB/s bi-directional

Link to LAN

Link to

Legacy Mxs

Bandwidth=8 HIPPI= 1,600 MB/s bi-directional

HPSS Bandwidth=4 HIPPI= 800 MB/s bi-directional

144

120

12

2

1

RAID Bandwidth=48 X 10 FC/SMP=96,000 MB/s bi-directional

Aggregate link bandwidth = 0.115 TB/sAggregate link bandwidth = 0.115 TB/s

Page 9: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

9Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Blue Mountain Planned GSN Compute Fabric

Blue Mountain Planned GSN Compute Fabric

9 Separate 32x32 X-Bar Switch Networks1 2 3 4 5 6 7 8 9

1 2 3

Expected Improvements

Throughput 115,200 MB/s => 460,800 MB/s, 4xLink Bandwidth 200 MB/s => 1,600 MB/s, 8xRound Trip Latency 110 µs => ~ 10 µs ,11x

3 Groups of 16 Computers each

Aggregate link bandwidth = 0.461 TB/sAggregate link bandwidth = 0.461 TB/s

Page 10: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

10Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

LLNL/IBM Blue-Pacific3.889 TeraOP/s Peak

LLNL/IBM Blue-Pacific3.889 TeraOP/s Peak

Sector S

Sector Y

Sector K

24

24

24

Each SP sector comprised of• 488 Silver nodes• 24 HPGN Links

System Parameters• 3.89 TFLOP/s Peak• 2.6 TB Memory• 62.5 TB Global disk

HPGNHPGN

HiPPI

2.5 GB/node Memory24.5 TB Global Disk8.3 TB Local Disk

1.5 GB/node Memory20.5 TB Global Disk4.4 TB Local Disk

1.5 GB/node Memory20.5 TB Global Disk4.4 TB Local Disk

FDDI

SST Achieved >1.2TFLOP/son sPPM and Problem

>70x LargerThan Ever Solved Before!

66

12

Aggregate link bandwidth = 0.439 TB/sAggregate link bandwidth = 0.439 TB/s

Page 11: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

11Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

I/O Hardware Architecture of SSTI/O Hardware Architecture of SST

System Data and Control Networks

488 Node IBM SP Sector

56 GPFSServers

432 Silver Compute Nodes

Each SST Sector• Has local and global I/O file system• 2.2 GB/s delivered global I/O performance• 3.66 GB/s delivered local I/O performance• Separate SP first level switches• Independent command and control• Link bandwidth = 300 Mb/s Bi-directional

Full system mode• Application launch over full 1,464 Silver nodes• 1,048 MPI/us tasks, 2,048 MPI/IP tasks• High speed, low latency communication between all nodes• Single STDIO interface

GPFS GPFS GPFS GPFS GPFS GPFS GPFS GPFS

24 SP Links to Second Level

Switch

Page 12: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

12Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Partisn (SN-Method) ScalingPartisn (SN-Method) Scaling

1

10

100

1000

10000

1 10 100 1000 10000

Number of Processors

RED

Blue Pac (1p/n)

Blue Pac (4p/n)

Blue Mtn (UPS)

Blue Mtn (MPI)

Blue Mtn(MPI.2s2r)

Blue Mtn (UPS.offbox)

Constant Number of Cells per Processor

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000 10000

Number of Processors

RED

Blue Pac (1p/n)

Blue Pac (4p/n)

Blue Mtn (UPS)

Blue Mtn (MPI)

Blue Mtn(MPI.2s2r)

Blue Mtn(UPS.offbox)

Constant Number of Cells per Processor

Page 13: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

13Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

The JEEP calculation adds to our understanding the performance of insensitive high explosives

The JEEP calculation adds to our understanding the performance of insensitive high explosives

• This calculation involved 600 atoms (largest number ever at such a high resolution) with 1,920 electrons, using about 3,840 processors

• This simulation provides crucial insight into the detonation properties of IHE at high pressures and temperatures.

• This calculation involved 600 atoms (largest number ever at such a high resolution) with 1,920 electrons, using about 3,840 processors

• This simulation provides crucial insight into the detonation properties of IHE at high pressures and temperatures.

• Relevant experimental data (e.g., shock wave data) on hydrogen fluoride (HF) are almost nonexistent because of its corrosive nature.

• Quantum-level simulations, like this one, of HF- H2O mixtures

can substitute for such experiments.

Page 14: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

14Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Silver Node delivered memory bandwidth is around 150-200 MB/s/process

Silver Node delivered memory bandwidth is around 150-200 MB/s/process

0

50

100

150

200

250

300

350

1P 2P 4P 8P

E4000 E6000 Silver AS4100 AS8400 Octane Onyx2

Silver Peak Bytes:FLOP/S Ratio is 1.3/2.565 = 0.51

Silver Delivered B:F Ratio is 200/114 = 1.75

Page 15: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

15Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

MPI_SEND/US delivers low latency and aggregate high bandwidth, but counter intuitive behavior per MPI taskMPI_SEND/US delivers low latency and aggregate high bandwidth, but counter intuitive behavior per MPI task

0

20

40

60

80

100

120

0.00E+00 5.00E+05 1.00E+06 1.50E+06 2.00E+06 2.50E+06 3.00E+06 3.50E+06

Message Size (B)

Pe

rfo

rman

ce (

MB

/s)

One Pair

Two Pair

Four Pair

Two Agg

Four Agg

Page 16: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

16Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

LLNL/IBM White10.2 TeraOPS Peak

LLNL/IBM White10.2 TeraOPS Peak

Aggregate link bandwidth = 2.048 TB/s

Five times better than the SST; Peak is three times better

Ratio of Bytes:FLOPS is improving

Aggregate link bandwidth = 2.048 TB/s

Five times better than the SST; Peak is three times better

Ratio of Bytes:FLOPS is improving

MuSST (PERF) System• 8 PDEBUG nodes w/16 GB SDRAM• ~484 PBATCH nodes w/8 GB SDRAM• 12.8 GB/s delivered global I/O performance• 5.12 GB/s delivered local I/O performance• 16 GigaBit Ethernet External Network• Up to 8 HIPPI-800

Programming/Usage Model• Application launch over ~492 NH-2 nodes• 16-way MuSPPA, Shared Memory, 32b MPI• 4,096 MPI/US tasks• Likely usage is 4 MPI tasks/node with

4 threads/MPI task• Single STDIO interface

Page 17: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

17Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Interconnect issues for future machines— Why Optical? —

Interconnect issues for future machines— Why Optical? —

Need to increase Bytes:FLOPS ratio Memory bandwidth (cache line) utilization will be dramatically lower for codes

that utilize arbitrarily connected meshes and adaptive refinement indirect addressing.

Interconnect bandwidth must be increased and latency must be reduced to allow a broader range of applications and packages to scale well

To get very large configurations (30 70 100 TeraOPS) larger SMPs will be deployed

For fixed B:F interconnect ratio this means that more bandwidth coming out of an SMP

Multiple pipes/planes will be used Optical reduces cable count

Machjne footprint is growing 24,000 square feet may require opticalNetwork interface paradigm

Virtual memory direct memory access Low-latency remote get/put

Reliability Availability and Serviceability (RAS)

Page 18: 1 Oak Ridge Interconnects Workshop - November 1999 ASCI-00-003. ASCI-00-003.1 ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI

18Oak Ridge Interconnects Workshop - November 1999

ASCI-00-003.

Lawrence Livermore National Laboratory , P. O. Box 808, Livermore, CA 94551

ASCI-00-003.1

ASCI Terascale Simulation Requirements and Deployments

ASCI Terascale Simulation Requirements and Deployments

David A. NowakASCI Program Leader

Mark SeagerASCI Terascale Systems Principal Investigator

Lawrence Livermore National LaboratoryUniversity of California

David A. NowakASCI Program Leader

Mark SeagerASCI Terascale Systems Principal Investigator

Lawrence Livermore National LaboratoryUniversity of California

*Work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract W-7405-Eng-48.