the cray-1 computer system richard russell communications of the acm january 1978
TRANSCRIPT
The CRAY-1 The CRAY-1 Computer SystemComputer System
Richard RussellRichard Russell
Communications of the ACMCommunications of the ACMJanuary 1978January 1978
““The world’s most The world’s most expensive love-seat”expensive love-seat”
A “reasonably trim A “reasonably trim individual” can gain access individual” can gain access
to the interior of the to the interior of the machine.machine.
12.5 ns clock12.5 ns clock 8 MB internal semiconductor 8 MB internal semiconductor
memorymemory 4 KB of register storage4 KB of register storage Uses ECL throughoutUses ECL throughout 115 kW input power115 kW input power Simple gatesSimple gates
MemoryMemory
16 bank = 16 way interleaved access16 bank = 16 way interleaved access No bank conflicts except on stride No bank conflicts except on stride
lengths of 8 or 16lengths of 8 or 16 4 clock cycles per access4 clock cycles per access Can pull down 16 instructions per Can pull down 16 instructions per
cyclecycle 1 data word if being placed in 1 data word if being placed in
registersregisters
CoolingCooling
Big power + many modules = heatBig power + many modules = heat Aluminum/steel cooling rods with Freon Aluminum/steel cooling rods with Freon
flowflow Copper connectors pipe heat from chip Copper connectors pipe heat from chip
out to cooling rodsout to cooling rods Freon/oil leak problem on rod Freon/oil leak problem on rod
constructionconstruction Designed to keep module temperatures Designed to keep module temperatures
under 54 degrees Celsius under 54 degrees Celsius
Floating PointFloating Point
IEEE? IEEE? No.No.
Why?Why? Not written yet!Not written yet! Wouldn’t arrive until 7 years later.Wouldn’t arrive until 7 years later.
49 bit signed magnitude “mantissa”49 bit signed magnitude “mantissa” 15 bit biased exponent15 bit biased exponent
Production plans anticipate Production plans anticipate shipping one CRAY-1 per shipping one CRAY-1 per
quarter.quarter.
Topic: Vector ComputersTopic: Vector Computers
8 64X64 vector registers8 64X64 vector registers Process vector elements identicallyProcess vector elements identically Vector Mask register can protect an Vector Mask register can protect an
elementelement ““Chaining”Chaining”
Can use output of one vector operation Can use output of one vector operation as input to next before it is doneas input to next before it is done
Win = don’t have to store to memory Win = don’t have to store to memory then fetch from memorythen fetch from memory
Benefits of Vector Benefits of Vector ComputingComputing
Previously needed 100+ elements for Previously needed 100+ elements for vector to be useful over scalarvector to be useful over scalar CRAY-1 cuts that to 2-4CRAY-1 cuts that to 2-4
Don’t need to store vector elements Don’t need to store vector elements next to each other in memorynext to each other in memory
Max wait time is previous vector Max wait time is previous vector length + 4length + 4
Common wait time is functional unit Common wait time is functional unit time + 2time + 2
Vector Benefits Vector Benefits ContinuedContinued
CompilerCompiler
CFTCFT Automatically vectorizes inner loop if Automatically vectorizes inner loop if
possiblepossible No need to rewrite code!No need to rewrite code!
Can’t vectorize loops with control Can’t vectorize loops with control statements.statements.
Often slower than hand coded assembly.Often slower than hand coded assembly. Improve instruction scheduling “in the Improve instruction scheduling “in the
future”future”
QuestionsQuestions The CRAY-1 automatically vectorizes code The CRAY-1 automatically vectorizes code
loops. Current microprocessors usually use loops. Current microprocessors usually use smaller vector registers with extensions such smaller vector registers with extensions such as SSE to support SIMD operations. Do as SSE to support SIMD operations. Do modern compilers do these vector modern compilers do these vector optimizations automatically as the CRAY did optimizations automatically as the CRAY did or is it the explicit use of vector instructions or is it the explicit use of vector instructions that has dominated and why? Trade offs?that has dominated and why? Trade offs?
They say they can eventually make loops with They say they can eventually make loops with control flow in them vectorizable. Can you control flow in them vectorizable. Can you come up with a simple method to do so and/or come up with a simple method to do so and/or some reasons that make this case difficult?some reasons that make this case difficult?
Table 3Table 3
RegistersRegisters
A = 8 address registersA = 8 address registers B = 64 address-save registersB = 64 address-save registers S = 8 scalar registersS = 8 scalar registers T = 64 scalar-save registersT = 64 scalar-save registers V = 8 64X64 vector registersV = 8 64X64 vector registers
Special RegistersSpecial Registers VM = mask off vector elements to not operate onVM = mask off vector elements to not operate on VL = length of vector being processedVL = length of vector being processed P = parcel address countP = parcel address count BA = absolute address used as base for indexed BA = absolute address used as base for indexed
memory accesses (helps with dynamic user space memory accesses (helps with dynamic user space migration)migration)
LA = limits the accessible address spaceLA = limits the accessible address space XA = supports exchange operationXA = supports exchange operation F = flag register that holds various “condition F = flag register that holds various “condition
codes”codes” M = mode register (3 bits)M = mode register (3 bits)
Bit 1 = Floating Point Error/Interrupt EnableBit 1 = Floating Point Error/Interrupt Enable Bit 2 = Uncorrectable memory corruption Interrupt EnableBit 2 = Uncorrectable memory corruption Interrupt Enable Bit 3 = All interrupts disabled.Bit 3 = All interrupts disabled.
Front EndFront End
Needs an access terminal Needs an access terminal minicomputerminicomputer
Connects to a “CRAY access Connects to a “CRAY access channel” to control the computerchannel” to control the computer