6/3/2015copyright g bell & tcm history center 1 supercomputers(t) gordon bell bay area research...
Post on 15-Jan-2016
220 views
TRANSCRIPT
04/21/23 Copyright G Bell & TCM History Center 1
Supercomputers(t)Gordon Bell
Bay Area Research CenterMicrosoft Corp.
http://research.microsoft.com/users/gbell
Photos courtesy of The Computer Museum History Center
Please only copy with credit!
http://www.computerhistory.org
04/21/23 Copyright G Bell & TCM History Center 2
Supercomputer
Largest computer at a given time Technical use for science and
engineering calculations Large government defense, weather,
aero laboratories are first buyers Price is no object Market size is 3-5
04/21/23 Copyright G Bell & TCM History Center 3
Growth in Computational Resources Used for UK Weather Forecasting
•1950
•2000
10T •
1T •
100G •
10G •
1G •
100M •
10M •
1M •
100K •
10K •
1K •
100 •
10 •
LeoMercury
KDF9
195
205YMP
1010/ 50 yrs = 1.5850
04/21/23 Copyright G Bell & TCM History Center 4
What a difference 25 years and spending >10x more makes!
LLNL 150 Mflops machine room c1978
Artist’s view of 40 Tflops
ESRDC c2002
04/21/23 Copyright G Bell & TCM History Center 5
Harvard Mark I aka IBM ASCC
04/21/23 Copyright G Bell & TCM History Center 6
I think there is a world I think there is a world
market for maybe five market for maybe five
computers.computers.
““ ””
Thomas Watson Senior, Chairman of IBM, 1943
04/21/23 Copyright G Bell & TCM History Center 7
The scientific market is still about that size… 3 computers
When scientific processing was 100% of the industry a good predictor
$3 Billion: 6 vendors, 7 architectures DOE buys 3 very big ($100-$200 M)
machines every 3-4 years
04/21/23 Copyright G Bell & TCM History Center 8
Supercomputer price (t)
Time $M structure example
1950 1 mainframes many...
1960 3 instruction //sm IBM / CDC
mainframe SMP
1970 10 pipelining 7600 / Cray 1
1980 30 vectors; SCI “Crays”
1990 250 MIMDs: mC, SMP, DSM “Crays”/MPP
2000 1,000 ASCI, COTS MPP Grid, Legion
04/21/23 Copyright G Bell & TCM History Center 9
Supercomputing: speed at any price, using parallelismIntra processor
Memory overlap & instruction lookaheadFunctional parallelism (2-4)Pipelining (10)SIMD ala ILLIAC 2d array of 64 pe vs vectorsWide instruction word (2-4)MTA (10-20)
MIMDs… processor replicationSMP (4-64)Distributed Shared Memory SMPs 100
MIMD… computer replicationMulticomputers aka MPP aka clusters (10K)Grid: 100K
04/21/23 Copyright G Bell & TCM History Center 10
High performance architectures timeline
1950 . 1960 . 1970 . 1980 . 1990 . 2000Vtubes Trans. MSI(mini) Micro RISC nMicr
Processor overlap, lookahead “killer micros”
Cray era 6600 7600 Cray1 X Y C T
Vector-----SMP---------------->
SMP mainframes---> “multis”----------->
DSM KSR SGI---->
Clusters TandmVAX IBM UNIX->
MPP if n>1000 Ncube Intel IBM->
Networks n>10,000 NOW Grid
04/21/23 Copyright G Bell & TCM History Center 11
High performance architectures timeline
1950 . 1960 . 1970 . 1980 . 1990 . 2000Vtubes Trans. MSI(mini) Micro RISC nMicr
Sequential programming---->------------------------------
<SIMD Vector--//---------------
Parallelization---
Parallel programming <---------------
multicomputers <--MPP era------
ultracomputers 10X in price 10xMPP
“in situ” resources 100x in //sm NOW VLC
Grid
04/21/23 Copyright G Bell & TCM History Center 12
Time line of hpcc contributions
1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010ProcessorsIBM Interleaving, overlap, Instruction lookahead
CDC/Cray/Supers 6600 7600 VectorDEC mini AlphaIntel 8008 8086,8 286 386 486 Ppro P2,3, Merced
RISC and "the killer micros" RISCVLIW Cydrome & Multiflow XXXSIMD Illiac IV CM1 CM2 Maspar XXXMulti-threaded Architecture Dennelcor? Tera MTA ????????????
MultiprocessorsSMP cabinet mainframes Burroughs, Univac, IBM, etc.-----------------SMP "multis_ Mulits=Sequent,Encore, etc. -------------SMP on a chip X-------------------
SMPv. Cray, NEC, Fujitsu, Hitachi XMP YMP C T ---------- ??????
Distributed Shared Memory KSR Origin numa----- ??????
Shared address multicomputers BBN T3D T3E
Multicomputers aka clusters aka MPPClusters of minis or mainframes Tandem VAX Clustr Sysplex UNIX ---------------------
MPPs: Intel, Thinking Machines, IBM CalT Ncube Beowulf------------------
Workstation clusters UC/B NOW etc.------- ??????NOW worldwide Grid-------- ??????
13Copyright G Bell & TCM History Center
04/21/23
Time line of hpcc contributions
1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
ProcessorsIBM Stretch 360 370 GCDC 1604 6600 7600 Cray 1DEC PDP 8 PDP11 VAX AlphaIntel 8008 8086,8 286 386 486 PproP2,3,MercedRISC all MIPS/Ppc/SparcVLIW Cydrome & Multiflow XXXSIMD Illiac IV CM1 CM2 MasparMulti-threaded Architecture Dennelcor? Tera MTA
MultiprocessorsSMP B5000, Univac, etc. Mulits=Sequent,Encore, etc.SMP.IBM 8090 ….SMP.SUN 10K
SMPv.Cray XMP YMP C TSMPv.NEC SX 1…… 5
DSM SUN SUN NUMADSM.SGI/Cray KSR Origin numa
T3D T3E
Multicomputers aka clusters aka MPPClusters Tandem VAX Clustr Sysplex UNIXMulticomputers CalTech Ncube BeowulfIntel MPPs iPSC1, 2,Par,Delta 1.Tf 2TfThinking Machines CM1,2, 5IBM MPP SP1 SP2NOW UC/B NOW Grid
04/21/23 Copyright G Bell & TCM History Center 14
Lehmer UC/Berkeley pre-computer number sieves
04/21/23 Copyright G Bell & TCM History Center 15
Eniac c1946
04/21/23 Copyright G Bell & TCM History Center 16
Manchester: the first computer. Baby, Mark I, and Atlas
04/21/23 Copyright G Bell & TCM History Center 17
von Neumann
computers: Rand
Johniac
04/21/23 Copyright G Bell & TCM History Center 18
Gene Amdahl’s Dissertation and first computer
04/21/23 Copyright G Bell & TCM History Center 19
IBM
04/21/23 Copyright G Bell & TCM History Center 20
IBM Stretch c1961 & 360/91 c1965
consoles!
04/21/23 Copyright G Bell & TCM History Center 21
IBM Terabit Photodigital Store c1967
04/21/23 Copyright G Bell & TCM History Center 22
STC Terabytes of storage c1999
04/21/23 Copyright G Bell & TCM History Center 23
Amdahl aka Fujitsu version of the 360 c1975
04/21/23 Copyright G Bell & TCM History Center 24
IBM ASCI Red @ LLNL
04/21/23 Copyright G Bell & TCM History Center 25
CDC, ETA, Cray Research, Cray Computer
04/21/23 Copyright G Bell & TCM History Center 26
Cray1925-1996
04/21/23 Copyright G Bell & TCM History Center 27
Circuits and Packaging, Plumbing (bits and atoms) & Parallelism… plus Programming and Problems Packaging, including heat removal High level bit plumbing… getting the bits
from I/O, into memory through a processor and back to memory and to I/O
Parallelism Programming: O/S and compiler Problems being solved
04/21/23 Copyright G Bell & TCM History Center 28
Seymour Cray Computers 1951: ERA 1103 control circuits 1957: Sperry Rand NTDS; to CDC 1959: Little Character to test transistor
ckts 1960: CDC 1604 (3600, 3800) & 160/160A 1964: CDC 6600 (6xxx series) 1969: CDC 7600
04/21/23 Copyright G Bell & TCM History Center 29
Cray Research, Cray Computer Corp. and SRC Computer Corp.
1976: Cray 1... (1/M, 1/S, XMP, YMP, C90, T90)
1985: Cray Computer Cray 2 from Cray Research; GaAs: Cray 3 (1993), Cray 4
1999: SRC Company large scale, shared memory multiprocessor using x86 microprocessors
04/21/23 Copyright G Bell & TCM History Center 30
Cray contributions…
Creative and productive during his entire career 1951-1996.
Creator and un-disputed designer of supers from c1960 1604 to Cray 1, 1s, 1m c1977… basis for SMPvector: XMP, YMP, T90, C90, 2, 3
Circuits, packaging, and cooling… “the mini” as a peripheral computer Use I/O computers versus I/O processors Use the main processor and interrupt it for I/O
versus I/O processors aka IBM Channels
04/21/23 Copyright G Bell & TCM History Center 31
Cray Contributions Multi-theaded processor (6600 PPUs) CDC 6600 functional parallelism leading to RISC…
software control Pipelining in the 7600 leading to... Use of vector registers: adopted by 10+ companies.
Mainstream for technical computing Established the template for vector supercomputer
architecture SRC Company use of x86 micro in 1986 that could
lead to largest, smP?
04/21/23 Copyright G Bell & TCM History Center 32
1.E-01
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1960 1970 1980 1990 2000
“Cray” Clock speed (Mhz), no. of processors, peak power (Mflops)
04/21/23 Copyright G Bell & TCM History Center 34
CDC 1604 & 6600
04/21/23 Copyright G Bell & TCM History Center 35
CDC 7600: pipelining
04/21/23 Copyright G Bell & TCM History Center 36
CDC 8600 Prototype:SMP, scalar,discrete circuits, failed to achieve clock speed
04/21/23 Copyright G Bell & TCM History Center 37
CDC STAR… ETA10
04/21/23 Copyright G Bell & TCM History Center 38
CDC 7600 & Cray 1 at Livermore
Cray 1 CDC 7600
Disks
04/21/23 Copyright G Bell & TCM History Center 39
Cray 1 #6 from LLNL.Located at The Computer Museum History Center, Moffett Field
04/21/23 Copyright G Bell & TCM History Center 40
Cray 1 150 Kw. MG set & heat exchanger
04/21/23 Copyright G Bell & TCM History Center 41
Cray XMP/4Proc.c1984
04/21/23 Copyright G Bell & TCM History Center 42
Cray 2 from NERSC/LBL
04/21/23 Copyright G Bell & TCM History Center 43
Cray 3 c1995 processor500 MHz32 modules 1K GaAs ic’s/module8 proc.
04/21/23 Copyright G Bell & TCM History Center 44
c1970: Beginning the search for parallelism
SIMDs Illiac IV CDC Star Cray 1
04/21/23 Copyright G Bell & TCM History Center 45
Iliac IV: first SIMD c 1970s
04/21/23 Copyright G Bell & TCM History Center 46
SCI (Strategic Computing Initiative)
funded by DARPA and aimed at a Teraflops!
Era of State computers and many efforts to build high speed computers… lead to HPCC
Thinking Machines, Intel supers,Cray T3 series
04/21/23 Copyright G Bell & TCM History Center 47
Minisupercomputers: a market whose time never came. Alliant, Convex, Ardent+Stellar= Stardent = 0,
04/21/23 Copyright G Bell & TCM History Center 48
Cydrome and Multiflow: prelude to wide word parallelism
in Merced Minisupers with VLIW attack the market Like the minisupers, they are repelled It’s software, software, and software Was it a basically good idea that will
now work as Merced?
04/21/23 Copyright G Bell & TCM History Center 49
MasPar...
A less costly, CM 1/2 done in silicon chips
It is repelled. S is the fatal flaw
04/21/23 Copyright G Bell & TCM History Center 50
Thinking Machines:
04/21/23 Copyright G Bell & TCM History Center 51
Thinking Machines: CM1 & CM5 c1983-1993
04/21/23 Copyright G Bell & TCM History Center 52
““
””
In Dec. 1995 computers In Dec. 1995 computers with 1,000 processors with 1,000 processors will do most of the will do most of the scientific processing. scientific processing.
Danny Hillis 1990 (1 paper or 1 company)
04/21/23 Copyright G Bell & TCM History Center 53
The Bell-Hillis BetMassive Parallelism in 1995TMC
World-wide
Supers
TMC
World-wide Supers
TMC
World-wideSupers
ApplicationsRevenue
Petaflops / mo.
04/21/23 Copyright G Bell & TCM History Center 54
Bell-Hillis Bet: wasn’t paid off!
My goal was not necessarily to just win the bet!
Hennessey and Patterson were to evaluate what was really happening…
Wanted to understand degree of MPP progress and programmability
04/21/23 Copyright G Bell & TCM History Center 55
KSR 1: first commercial DSM NUMA (non-uniform memory access) aka COMA (cache-only memory architecture)
04/21/23 Copyright G Bell & TCM History Center 56
SCI (c1980s): Strategic Computing Initiative funded
ATT/Columbia (Non Von), BBN Labs, Bell Labs/Columbia (DADO), CMU Warp (GE & Honeywell), CMU (Production Systems), Encore, ESL, GE (like connection machine), Georgia Tech, Hughes (dataflow), IBM (RP3), MIT/Harris, MIT/Motorola (Dataflow), MIT Lincoln Labs, Princeton (MMMP), Schlumberger (FAIM-1), SDC/Burroughs, SRI (Eazyflow), University of Texas, Thinking Machines (Connection Machine),
04/21/23 Copyright G Bell & TCM History Center 57
Those who gave their lives in the search for parallellism
Alliant, American Supercomputer, Ametek, AMT, Astronautics, BBN Supercomputer, Biin, CDC, Chen Systems, CHOPP, Cogent, Convex (now HP), Culler, Cray Computers, Cydrome, Dennelcor, Elexsi, ETA, E & S Supercomputers, Flexible, Floating Point Systems, Gould/SEL, IPM, Key, KSR, MasPar, Multiflow, Myrias, Ncube, Pixar, Prisma, SAXPY, SCS, SDSA, Supertek (now Cray), Suprenum, Stardent (Ardent+Stellar), Supercomputer Systems Inc., Synapse, Thinking Machines, Vitec, Vitesse, Wavetracer.
04/21/23 Copyright G Bell & TCM History Center 58
NCSA Cluster of 8 x 128 processors SGI Origin c1999
04/21/23 Copyright G Bell & TCM History Center 59
Humble beginning:
In 1981…would you
have predicted
this would be the
basis of supers?
04/21/23 Copyright G Bell & TCM History Center 60
Intel’s ipsc 1 & Touchstone Delta
04/21/23 Copyright G Bell & TCM History Center 61
Intel Sandia Cluster 9K PII: 1.8 TF
04/21/23 Copyright G Bell & TCM History Center 62
GB with NT, Compaq, HP cluster
04/21/23 Copyright G Bell & TCM History Center 63
192 HP 300 MHz
64 Compaq 333 MHz
• Andrew Chien, CS UIUC-->UCSD • Rob Pennington, NCSA• Myrinet Network, HPVM, Fast Msgs• Microsoft NT OS, MPI API
“Supercomputer performance at mail-order prices”-- Jim Gray, Microsoft
The Alliance LES NT Supercluster
04/21/23 Copyright G Bell & TCM History Center 64
Intel/Sandia: 9000x1 node Ppro
LLNL/IBM: 512x8 PowerPC (SP2)
LANL/Cray: 6144 CPUs
Maui Supercomputer Center– 512x1 SP2
Our Tax Dollars At WorkASCI for Stockpile Stewardship
65Copyright G Bell & TCM History Center
04/21/23
ASCI Blue Mountain 3.1 Tflops SGI Origin 2000
12,000 sq. ft. of floor space
1.6 MWatts of power
530 tons of cooling
384 cabinets to house 6144 CPU’s with 1536 GB (32GB / 128 CPUs)
48 cabinets for metarouters
96 cabinets for 76 TB of raid disks
36 x HIPPI-800 switch Cluster Interconnect
9 cabinets for 36 HIPPI switches
about 348 miles of fiber cable
04/21/23 Copyright G Bell & TCM History Center 66
Half of SGI ASCI Computer at LASL c1999
67Copyright G Bell & TCM History Center
04/21/23
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 2 3 4 5 6
6 Groups of 8 Computers each
18 16x16 Crossbar Switches
18 Separate NetworksLASL ASCI Cluster Interconnect
04/21/23 Copyright G Bell & TCM History Center 68
LASL ASCI Cluster Interconnect
04/21/23 Copyright G Bell & TCM History Center 69
Typical MCNP BNCT simulation:• 1 cm resolution (21x21x25)• 1 million particles• 1 hour on 200 MHz PC
ASCI Blue Mountain MCNP simulation:• 1 mm resolution (256x256x250)• 100 million particles• 2 hours on 6144 CPUs
3 TeraOps makes a difference!
04/21/23 Copyright G Bell & TCM History Center 70
LLNL ArchitectureSector S
Sector Y
Sector K
24
24
24
Each SP sector has• 488 Silver nodes• 24 HPGN Links
System Parameters• 3.89 TFLOP/s Peak• 2.6 TB Memory• 62.5 TB Global disk
HPGNHPGN
HiPPI
2.5 GB/node Memory24.5 TB Global Disk8.3 TB Local Disk
1.5 GB/node Memory20.5 TB Global Disk4.4 TB Local Disk
1.5 GB/node Memory20.5 TB Global Disk4.4 TB Local Disk
FDDI
SST Achieved >1.2TFLOP/son sPPM and Problem
>70x LargerThan Ever Solved Before!
66
12
04/21/23 Copyright G Bell & TCM History Center 71
I/O Hardware Architecture
System Data and Control Networks
488 Node IBM SP Sector
56 GPFSServers
432 Silver Compute Nodes
Each SST Sector• local and global I/O file system• 2.2 GB/s global I/O performance• 3.66 GB/s local I/O performance• Separate SP first level switches• Independent command and control
Full system mode• Application launch over full 1,464 Silver nodes• 1,048 MPI/us tasks, 2,048 MPI/IP tasks• High speed, low latency communication • Single STDIO interface
GPFS GPFS GPFS GPFS GPFS GPFS GPFS GPFS
24 SP Links to Second Level
Switch
04/21/23 Copyright G Bell & TCM History Center 72
Fujitsu VPP5000 multicomputer:(not available in the U.S.)
Computing nodesspeed: 9.6 Gflops vector, 1.2 Gflops scalar primary memory: 4-16 GBmemory bandwidth: 76 GB/s (9.6 x 64 Gb/s) inter-processor comm: 1.6 GB/s non-blocking with global addressing among all nodesI/O: 3 GB/s to scsi, hippi, gigabit ethernet, etc.
1-128 computers deliver 1.22 Tflops
04/21/23 Copyright G Bell & TCM History Center 73
NEC SX 5: clustered SMPv(not available in the U.S.)
SMPv computing nodes– 4 - 8 processors/computer– Processor pap: 8 Gflops– Memory– I/O speed
Cluster
04/21/23 Copyright G Bell & TCM History Center 74
NEC Supers
04/21/23 Copyright G Bell & TCM History Center 75
High Performance COTS Raceway and (RACE++) Busses
– ANSI Standardized– Mapped Memory, Message Passing, ‘Planned Direct’
Transfers– Circuit Switched; Basic Bus Interface Unit Is a 6 (8) Port
Bidirectional Switch at 40MB/s (66MB/s) Per Port.– Scales to 4000 Processors
Skychannel– ANSI Standardized– 320mb/sec; Crossbar backplane supports up to 1.6 GB/s
Throughput Non-blocking– Heart of Air Force $3M / 256 Gflops System
04/21/23 Copyright G Bell & TCM History Center 76
Mercury & Sky Computers - & $Rugged System With 10 Modules ~ $100K; $1K /#
Scalable to several K processors; ~1-10 Gflop / Ft3
10 9U Boards * 4 Ppc750’s 440 Specfp95 in 1 Ft3 (18.5 * 8 * 10.75”)
Sky 384 Signal Processor, #20 on ‘Top 500’, $3M
Mercury VME Platinum SystemMercury VME Platinum System Sky PPC Daughtercard Sky PPC Daughtercard
04/21/23 Copyright G Bell & TCM History Center 77
Brookhaven/Columbia QCD c1999(1999 Bell Prize for performance/$)
04/21/23 Copyright G Bell & TCM History Center 78
Brookhaven/Columbia QCD board
04/21/23 Copyright G Bell & TCM History Center 79
HT-MT: What’s 0.55? c1999
04/21/23 Copyright G Bell & TCM History Center 80
HT-MT…
Mechanical: cooling and signals Chips: design tools, fabrication Chips: memory, PIM Architecture: mta on steroids Storage material
04/21/23 Copyright G Bell & TCM History Center 81
HTMT challenges the heuristics for a successful computer
Mead 11 year rule: time between lab appearance and commercial use
Requires >2 break throughs Team’s first computer or super It’s government funded…
albeit at a university