cs 152 computer architecture and engineering cs252...
TRANSCRIPT
![Page 1: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/1.jpg)
CS152ComputerArchitectureandEngineeringCS252GraduateComputerArchitecture
Lecture13–VLIW
KrsteAsanovicElectricalEngineeringandComputerSciences
UniversityofCaliforniaatBerkeley
http://www.eecs.berkeley.edu/~krstehttp://inst.eecs.berkeley.edu/~cs152
![Page 2: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/2.jpg)
LastTimeinLecture12
§ Branchprediction– temporal,historyofasinglebranch– spatial,basedonpaththroughmultiplebranches
§ BranchHistoryTable(BHT)vs.BranchHistoryBuffer(BTB)– tradeoffincapacityversuslatency
§ Return-AddressStack(RAS)– specializedstructuretopredictsubroutinereturnaddresses
§ Fetchingmorethanonebasicblockpercycle– predictingmultiplebranches– tracecache
2
![Page 3: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/3.jpg)
SuperscalarControlLogicScaling
§ EachissuedinstructionmustsomehowcheckagainstW*Linstructions,i.e.,growthinhardwareµW*(W*L)
§ Forin-ordermachines,Lisrelatedtopipelinelatenciesandcheckisdoneduringissue(interlocksorscoreboard)
§ Forout-of-ordermachines,Lalsoincludestimespentininstructionbuffers(instructionwindoworROB),andcheckisdonebybroadcastingtagstowaitinginstructionsatwriteback(completion)
§ AsWincreases,largerinstructionwindowisneededtofindenoughparallelismtokeepmachinebusy=>greaterL
=>Out-of-ordercontrollogicgrowsfasterthanW2 (~W3)3
LifetimeL
IssueGroup
PreviouslyIssued
Instructions
IssueWidthW
![Page 4: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/4.jpg)
Out-of-OrderControlComplexity:MIPSR10000
4
ControlLogic
[SGI/MIPSTechnologiesInc.,1995]
![Page 5: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/5.jpg)
SequentialISABottleneck
5
Checkinstructiondependencies
Superscalarprocessor
a = foo(b);
for (i=0, i<
Sequentialsourcecode
Superscalarcompiler
Findindependentoperations
Scheduleoperations
Sequentialmachinecode
Scheduleexecution
![Page 6: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/6.jpg)
VLIW:VeryLongInstructionWord
§Multipleoperationspackedintooneinstruction§ Eachoperationslotisforafixedfunction§ Constantoperationlatenciesarespecified§ Architecturerequiresguaranteeof:
– Parallelismwithinaninstruction=>nocross-operationRAWcheck
– Nodatausebeforedataready=>nodatainterlocks6
TwoIntegerUnits,Single-CycleLatency
TwoLoad/StoreUnits,Three-CycleLatency TwoFloating-PointUnits,
Four-CycleLatency
IntOp2 MemOp1 MemOp2 FPOp1 FPOp2Int Op1
![Page 7: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/7.jpg)
EarlyVLIWMachines
§ FPSAP120B(1976)– scientificattachedarrayprocessor– firstcommercialwideinstructionmachine– hand-codedvectormathlibrariesusingsoftwarepipeliningandloopunrolling
§Multiflow Trace(1987)– commercializationofideasfromFisher’sYalegroupincluding“tracescheduling”
– availableinconfigurationswith7,14,or28operations/instruction
– 28operationspackedintoa1024-bitinstructionword
§ Cydrome Cydra-5(1987)– 7operationsencodedin256-bitinstructionword– rotatingregisterfile
7
![Page 8: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/8.jpg)
VLIWCompilerResponsibilities
§Scheduleoperationstomaximizeparallelexecution
§Guaranteesintra-instructionparallelism
§Scheduletoavoiddatahazards(nointerlocks)– TypicallyseparatesoperationswithexplicitNOPs
8
![Page 9: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/9.jpg)
LoopExecution
9
HowmanyFPops/cycle?
for (i=0; i<N; i++)
B[i] = A[i] + C;Int1 Int 2 M1 M2 FP+ FPx
loop: fldadd x1
fadd
fsdadd x2 bne
1 fadd / 8 cycles = 0.125
loop: fld f1, 0(x1)
add x1, 8
fadd f2, f0, f1
fsd f2, 0(x2)
add x2, 8
bne x1, x3, loop
Compile
Schedule
![Page 10: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/10.jpg)
LoopUnrolling
10
for (i=0; i<N; i++)
B[i] = A[i] + C;
for (i=0; i<N; i+=4)
{
B[i] = A[i] + C;
B[i+1] = A[i+1] + C;
B[i+2] = A[i+2] + C;
B[i+3] = A[i+3] + C;
}
Unroll inner loop to perform 4 iterations at once
Need to handle values of N that are not multiples of unrolling factor with final cleanup loop
![Page 11: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/11.jpg)
SchedulingLoopUnrolledCode
11
loop: fld f1, 0(x1)fld f2, 8(x1)fld f3, 16(x1)fld f4, 24(x1)add x1, 32fadd f5, f0, f1fadd f6, f0, f2 fadd f7, f0, f3 fadd f8, f0, f4fsd f5, 0(x2)fsd f6, 8(x2)fsd f7, 16(x2)fsd f8, 24(x2)add x2, 32bne x1, x3, loop
Schedule
Int1 Int 2 M1 M2 FP+ FPx
loop:
Unroll 4 ways
fld f1fld f2fld f3fld f4add x1 fadd f5
fadd f6fadd f7fadd f8
fsd f5fsd f6fsd f7fsd f8add x2 bne
How many FLOPS/cycle?4 fadds / 11 cycles = 0.36
![Page 12: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/12.jpg)
SoftwarePipelining
12
HowmanyFLOPS/cycle?
loop: fld f1, 0(x1)fld f2, 8(x1)fld f3, 16(x1)fld f4, 24(x1)add x1, 32fadd f5, f0, f1fadd f6, f0, f2 fadd f7, f0, f3 fadd f8, f0, f4fsd f5, 0(x2)fsd f6, 8(x2)fsd f7, 16(x2)add x2, 32fsd f8, -8(x2)bne x1, x3, loop
Int1 Int 2 M1 M2 FP+ FPxUnroll 4 ways firstfld f1fld f2fld f3fld f4
fadd f5fadd f6fadd f7fadd f8
fsd f5fsd f6fsd f7fsd f8
add x1
add x2bne
fld f1fld f2fld f3fld f4
fadd f5fadd f6fadd f7fadd f8
fsd f5fsd f6fsd f7fsd f8
add x1
add x2bne
fld f1fld f2fld f3fld f4
fadd f5fadd f6fadd f7fadd f8
fsd f5
add x1
loop:iterate
prolog
epilog
4 fadds / 4 cycles = 1
![Page 13: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/13.jpg)
SoftwarePipeliningvs.LoopUnrolling
13
time
performance
time
performance
Loop Unrolled
Software Pipelined
Startup overhead
Wind-down overhead
Loop Iteration
Loop Iteration
Software pipelining pays startup/wind-down costs only once per loop, not once per iteration
![Page 14: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/14.jpg)
CS152Administrivia
§ Lab2extension,dueFridayMarch15§ PS3dueMondayMarch18§ Midtermgradeswillbereleasedtoday§ RegraderequestswillbethroughGradescope
– WindowopensFriday,3/15/19at4pm(aftersection)– WindowclosesFriday,3/22/19at12pm(beforesection)
14
![Page 15: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/15.jpg)
0
2
4
6
8
10
12
20.00
24.00
28.00
32.00
36.00
40.00
44.00
48.00
52.00
56.00
60.00
64.00
68.00
70.00
Midterm 1 Grades: mean = 41.4, σ = 11.3CS152Administrivia
15
![Page 16: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/16.jpg)
CS252
CS252Administrivia
§ ReadingsnextweekonOoO superscalarmicroprocessors
16
![Page 17: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/17.jpg)
Whatiftherearenoloops?
17
§ Brancheslimitbasicblocksizeincontrol-flowintensiveirregularcode
§ DifficulttofindILPinindividualbasicblocks
Basicblock
![Page 18: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/18.jpg)
TraceScheduling[Fisher,Ellis]
18
§ Pickstringofbasicblocks,atrace,thatrepresentsmostfrequentbranchpath
§ Useprofilingfeedback orcompilerheuristicstofindcommonbranchpaths
§ Schedulewhole“trace”atonce§ Addfixup codetocopewithbranchesjumpingoutoftrace
![Page 19: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/19.jpg)
Problemswith“Classic”VLIW
§ Object-codecompatibility– havetorecompileallcodeforeverymachine,evenfortwomachinesinsamegeneration
§ Objectcodesize– instructionpaddingwastesinstructionmemory/cache– loopunrolling/softwarepipeliningreplicatescode
§ Schedulingvariablelatencymemoryoperations– cachesand/ormemorybankconflictsimposestaticallyunpredictablevariability
§ Knowingbranchprobabilities– Profilingrequiresansignificantextrastepinbuildprocess
§ Schedulingforstaticallyunpredictablebranches– optimalschedulevarieswithbranchpath
19
![Page 20: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/20.jpg)
VLIWInstructionEncoding
§ Schemestoreduceeffectofunusedfields– Compressedformatinmemory,expandonI-cacherefill
• usedinMultiflow Trace• introducesinstructionaddressingchallenge
– Markparallelgroups• usedinTMS320C6xDSPs,IntelIA-64
– Provideasingle-opVLIWinstruction• Cydra-5UniOp instructions
20
Group 1 Group 2 Group 3
![Page 21: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/21.jpg)
IntelItanium,EPICIA-64
§ EPICisthestyleofarchitecture(cf.CISC,RISC)– ExplicitlyParallelInstructionComputing(reallyjustVLIW)
§ IA-64isIntel’schosenISA(cf.x86,MIPS)– IA-64=IntelArchitecture64-bit– Anobject-code-compatibleVLIW
§ MercedwasfirstItaniumimplementation(cf.8086)– Firstcustomershipmentexpected1997(actually2001)– McKinley,secondimplementationshippedin2002– Recentversion,Poulson,eightcores,32nm,announced2011
21
![Page 22: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/22.jpg)
EightCoreItanium“Poulson”[Intel2011]
22
§ 8cores§ 1-cycle16KBL1I&Dcaches§ 9-cycle512KBL2I-cache§ 8-cycle256KBL2D-cache§ 32MBsharedL3cache§ 544mm2 in32nmCMOS§ Over3billiontransistors
§ Coresare2-waymultithreaded§ 6instruction/cyclefetch
– Two128-bitbundles
§ Upto12insts/cycleexecute
![Page 23: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/23.jpg)
IA-64InstructionFormat
§ Templatebitsdescribegroupingoftheseinstructionswithothersinadjacentbundles
§ Eachgroupcontainsinstructionsthatcanexecuteinparallel
23
Instruction 2 Instruction 1 Instruction 0 Template
128-bit instruction bundle
group i group i+1 group i+2group i-1
bundle j bundle j+1bundle j+2bundle j-1
![Page 24: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/24.jpg)
IA-64Registers
§ 128GeneralPurpose64-bitIntegerRegisters§ 128GeneralPurpose64/80-bitFloatingPointRegisters§ 641-bitPredicateRegisters
§ GPRs “rotate”toreducecodesizeforsoftwarepipelinedloops– Rotationisasimpleformofregisterrenamingallowingoneinstructiontoaddressdifferentphysicalregistersoneachiteration
24
![Page 25: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/25.jpg)
CS252
RotatingRegisterFiles
25
Problems:Scheduledloopsrequirelotsofregisters,Lotsofduplicatedcodeinprolog,epilog
Solution:Allocatenewsetofregistersforeachloopiteration
25
![Page 26: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/26.jpg)
CS252
RotatingRegisterFile
26
P0P1P2P3P4P5P6P7
RRB=3
+R1
RotatingRegisterBase(RRB)registerpointstobaseofcurrentregisterset.Valueaddedontologicalregisterspecifier togivephysicalregisternumber.Usually,splitintorotatingandnon-rotatingregisters.
26
![Page 27: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/27.jpg)
CS252
RotatingRegisterFile(PreviousLoopExample)
27
bloopsd f9, ()fadd f5, f4, ...ld f1, ()
Three cycle load latency encoded as difference of 3
in register specifier number (f4 - f1 = 3)
Four cycle fadd latency encoded as difference of 4
in register specifier number (f9 – f5 = 4)
bloopsd P17, ()fadd P13, P12,ld P9, () RRB=8bloopsd P16, ()fadd P12, P11,ld P8, () RRB=7bloopsd P15, ()fadd P11, P10,ld P7, () RRB=6bloopsd P14, ()fadd P10, P9,ld P6, () RRB=5bloopsd P13, ()fadd P9, P8,ld P5, () RRB=4bloopsd P12, ()fadd P8, P7,ld P4, () RRB=3bloopsd P11, ()fadd P7, P6,ld P3, () RRB=2bloopsd P10, ()fadd P6, P5,ld P2, () RRB=1
27
![Page 28: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/28.jpg)
IA-64PredicatedExecution
28
Problem:Mispredicted brancheslimitILPSolution:Eliminatehardtopredictbrancheswithpredicatedexecution
– AlmostallIA-64instructionscanbeexecutedconditionallyunderpredicate– InstructionbecomesNOPifpredicateregisterfalse
Inst 1Inst 2br a==b, b2
Inst 3Inst 4br b3
Inst 5Inst 6
Inst 7Inst 8
b0:
b1:
b2:
b3:
if
else
then
Four basic blocks
Inst 1Inst 2p1,p2 <- cmp(a==b)(p1) Inst 3 || (p2) Inst 5(p1) Inst 4 || (p2) Inst 6Inst 7Inst 8
Predication
One basic block
Mahlke et al, ISCA95: On average >50% branches removed
Warning:Complicatesbypassing!
![Page 29: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/29.jpg)
CS252
IA-64SpeculativeExecution
29
Problem: Branchesrestrictcompilercodemotion
Inst 1Inst 2br a==b, b2
Load r1Use r1Inst 3
Can’t move load above branch because might cause spurious exception
Load.s r1Inst 1Inst 2br a==b, b2
Chk.s r1Use r1Inst 3
Speculative load never causes exception, but sets “poison” bit on destination register
Check for exception in original home block jumps to fixup code if exception detected
Particularly useful for scheduling long latency loads early
Solution: Speculativeoperationsthatdon’tcauseexceptions
![Page 30: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/30.jpg)
CS252
IA-64DataSpeculation
30
Problem:Possiblememoryhazardslimitcodescheduling
Requires associative hardware in address check table
Inst 1Inst 2Store
Load r1Use r1Inst 3
Can’t move load above store because store might be to same address
Load.a r1Inst 1Inst 2Store
Load.cUse r1Inst 3
Data speculative load adds address to address check table
Store invalidates any matching loads in address check table
Check if load invalid (or missing), jump to fixup code if so
Solution:Hardwaretocheckpointerhazards
![Page 31: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/31.jpg)
LimitsofStaticScheduling
§ Unpredictablebranches§ Variablememorylatency(unpredictablecachemisses)§ Codesizeexplosion§ Compilercomplexity§ Despiteseveralattempts,VLIWhasfailedingeneral-purposecomputingarena(sofar).– MorecomplexVLIWarchitecturesareclosetoin-ordersuperscalarincomplexity,norealadvantageonlargecomplexapps.
§ SuccessfulinembeddedDSPmarket– SimplerVLIWswithmoreconstrainedenvironment,friendliercode.
31
![Page 32: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/32.jpg)
IntelKillsItanium
§ DonaldKnuth“ …Itaniumapproachthatwassupposedtobesoterrific—untilitturnedoutthatthewished-forcompilerswerebasicallyimpossibletowrite.”
§ “IntelofficiallyannouncedtheendoflifeandproductdiscontinuanceoftheItaniumCPUfamilyonJanuary30th,2019”,Wikipedia
32
![Page 33: CS 152 Computer Architecture and Engineering CS252 ...gamescrafters.berkeley.edu/~cs152/sp19/lectures/L13-VLIW.pdf · §Architecture requires guarantee of: – Parallelism within](https://reader034.vdocuments.net/reader034/viewer/2022042215/5ebb678c1c366d5f7a0a186b/html5/thumbnails/33.jpg)
Acknowledgements
§ ThiscourseispartlyinspiredbypreviousMIT6.823andBerkeleyCS252computerarchitecturecoursescreatedbymycollaboratorsandcolleagues:– Arvind (MIT)– JoelEmer (Intel/MIT)– JamesHoe(CMU)– JohnKubiatowicz (UCB)– DavidPatterson(UCB)
33