superscalar architectures: part 2 - introduction | csap · superscalar architectures: part 2...
TRANSCRIPT
SeoulNa)onalUniversity
1 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
SuperscalarArchitectures:Part2
Dynamic(Out-of-Order)Scheduling
Lecture3.2August23rd,2017
JaeW.Lee([email protected])ComputerScienceandEngineeringSeoulNaMonalUniversityDownloadthislectureslidesathPps://goo.gl/rJPMQUSlidecredits:[COD5e]and[CA:AQA5e]slidesfromElsevierInc.
SeoulNa)onalUniversity
2 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Outline
Reference:[CA:AQA5e]Ch.3.4-3.5
¢ Instruc)on-LevelParallelismandDependences
¢ DynamicSchedulingwithTomasuloAlgorithm
SeoulNa)onalUniversity
3 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Instruc)on-LevelParallelism
andDependences
SeoulNa)onalUniversity
4 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Instruc)on-LevelParallelism
¢ ILPislimitedby
§ Resourceconflicts§ Dependences
¢ Threetypesofdependences
§ (True)Datadependences§ Namedependences§ Controldependences
SeoulNa)onalUniversity
5 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
DataDependence
¢ Instruc)onjisdatadependentoninstruc)oniif§ InstrucMoniproducesaresultthatmaybeusedbyinstrucMonj§ InstrucMonjisdatadependentoninstrucMonkandinstrucMonkisdata
dependentoninstrucMoni
¢ Example:whichinstruc)onpairsaredatadependent?Loop: L.D F0,0(R1) # F0=array element ADD.D F4,F0,F2 # add scalar in F2 S.D F4,0(R1) # store result DADDUI R1,R1,#-8 # decrement pointer 8 bytes BNE R1,R2,LOOP # branch R1!=R2
¢ Dependentinstruc)onscannotbeexecutedsimultaneously
SeoulNa)onalUniversity
6 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
DataDependence
¢ Dependenciesareapropertyofprograms
¢ Pipelineorganiza)ondeterminesifdependenceis
detectedandifitcausesastall
§ Read-AYer-Write(RAW)hazard¢ Datadependenceconveys:
§ Possibilityofahazard§ Orderinwhichresultsmustbecalculated§ UpperboundonexploitableinstrucMonlevelparallelism
¢ Dependenciesthatflowthroughmemoryloca)onsare
difficulttodetect§ “memorydisambiguaMon”problem§ Does100(R4)=20(R6)?§ FromdifferentloopiteraMons,does20(R6)=20(R6)?
SeoulNa)onalUniversity
7 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
NameDependence
¢ Twoinstruc)onsusethesamenamebutnoflowofinforma)on
§ Notatruedatadependence,butisaproblemwhenreorderinginstrucEons§ AnEdependence:instrucMonjwritesaregisterormemorylocaMonthat
instrucMonireads§ IniMalordering(ibeforej)mustbepreserved§ CausingWrite-AYer-Read(WAR)hazard
§ Outputdependence:instrucMoniandinstrucMonjwritethesameregisterormemorylocaMon§ Orderingmustbepreserved§ CausingWrite-AYer-Write(WAW)hazard
¢ Toresolve,userenamingtechniques
SeoulNa)onalUniversity
8 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
DataandNameDependence:Examples
¢ (True)Datadependence
¢ An)dependence
¢ Outputdependence
r3 ß (r1) op (r2)
r5 ß (r3) op (r4)
r3 ß (r1) op (r2)
r1 ß (r4) op (r5)
r3 ß (r1) op (r2)
r3 ß (r4) op (r5)
SeoulNa)onalUniversity
9 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
DataHazards
¢ Adatahazardexistsif
§ ThereisanameordatadependencebetweeninstrucMons,and§ TheyarecloseenoughthatoverlapduringexecuMonwouldchange
theorderofaccesstotheoperandinvolvedinthedependence
¢ Threetypesofdatahazardscorrespondingtothreetypesofdependences
§ ReadaYerwrite(RAW)hazard–truedatadependence§ WriteaYerwrite(WAW)hazard–outputdependence§ WriteaYerread(WAR)hazard-anMdependence
SeoulNa)onalUniversity
10 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
ControlDependence
¢ Orderingofinstruc)oniwithrespecttoabranchinstruc)on
§ InstrucMoncontroldependentonabranchcannotbemovedbeforethebranchsothatitsexecuMonisnolongercontrollerbythebranch
§ AninstrucMonnotcontroldependentonabranchcannotbemovedaYerthebranchsothatitsexecuMoniscontrolledbythebranch
SeoulNa)onalUniversity
11 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
ControlDependence
¢ Examples
¢ ORinstruc)ondatadependentonDADDUandDSUBU
¢ AssumeR4isn’tuseda\erskip
§ PossibletomoveDSUBUbeforethebranch
Example1:
DADDU R1,R2,R3 BEQZ R4,L DSUBU R1,R1,R6
L: … OR R7,R1,R8
Example2:
DADDU R1,R2,R3 BEQZ R12,skip DSUBU R4,R5,R6 DADDU R5,R4,R9
skip: OR R7,R8,R9
11
SeoulNa)onalUniversity
12 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
DynamicSchedulingwith
TomasuloAlgorithm
SeoulNa)onalUniversity
13 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
DynamicScheduling
¢ Rearrangeorderofinstruc)onstoreducestallswhilemaintainingdataflow
¢ Advantages:
§ Compilerdoesn’tneedtohaveknowledgeofmicroarchitecture§ HandlescaseswheredependenciesareunknownatcompileMme
¢ Disadvantage:
§ SubstanMalincreaseinhardwarecomplexity§ ComplicatesexcepMons
SeoulNa)onalUniversity
14 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
DynamicScheduling
¢ Dynamicschedulingimplies:
§ Out-of-orderexecuMon§ Out-of-ordercompleMon
¢ CreatesthepossibilityforWARandWAWhazards
§ WARExample: DIVD F0,F2,F4 // assume takes long time ADDD F10,F0,F8 // RAW hazard on F0 SUBD F8,F8,F14 // WAR hazard on F8
¢ Twopopluardynamicschedulingalgorithms:
ScoreboardandTomasuloAlgorithm
§ Bothtrackwhenoperandsareavailable§ Tomasulofurtherintroducesregisterrenaminginhardware
§ MinimizesWAWandWARhazards
SeoulNa)onalUniversity
15 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloAlgorithm
¢ Bestknowndynamicschedulingalgorithm
§ Influencedvirtuallyallout-of-orderinstrucMonschedulingtechniques§ Alpha21264,HP8000,MIPS10000,PenMumII,PowerPC604,…
¢ FirstintroducedforIBM360/91(1966)
¢ Goal:Highperformancewithoutspecialcompilers
SeoulNa)onalUniversity
16 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloAlgorithm
¢ RegisterrenamingtoovercomeWAR/WAWhazards(1)
§ Example:
DIV.DF0,F2,F4ADD.DF6,F0,F8S.DF6,0(R1)SUB.DF8,F10,F14MUL.DF6,F10,F8
+namedependenceswithF6andF8
an)-dependence(WAR)
an)-(output)dependence(WAW)
SeoulNa)onalUniversity
17 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloAlgorithm
¢ RegisterrenamingtoovercomeWAR/WAWhazards(2)
§ Example:
DIV.DF0,F2,F4ADD.DS,F0,F8S.DS,0(R1)SUB.DT,F10,F14MUL.DF6,F10,T
§ NowonlyRAWhazardsremain,whichcanbestrictlyordered
SeoulNa)onalUniversity
18 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloAlgorithm
¢ Registerrenamingisprovidedbyreserva)onsta)ons(RS)inTomasuloAlgorithm
§ Contains:§ TheinstrucMon§ Bufferedoperandvalues(whenavailable)§ ReservaMonstaMonnumberofinstrucMonprovidingtheoperandvalues
§ RSfetchesandbuffersanoperandassoonasitbecomesavailable(notnecessarilyinvolvingregisterfile)
§ PendinginstrucMonsdesignatetheRStowhichtheywillsendtheiroutput§ Resultvaluesbroadcastonaresultbus,calledthecommondatabus(CDB)
§ Onlythelastoutputupdatestheregisterfile§ AsinstrucMonsareissued,theregisterspecifiersarerenamedwiththe
reservaMonstaMon§ MaybemorereservaMonstaMonsthanregisters
SeoulNa)onalUniversity
19 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloAlgorithm
¢ Tomasuloorganiza)onTomasulo(Organiza/on(
FP adders
Add1 Add2 Add3
FP multipliers
Mult1 Mult2
From Mem FP Registers
Reservation Stations
Common Data Bus (CDB)
To Mem
FP Op Queue
Load Buffers
Store Buffers
Load1 Load2 Load3 Load4 Load5 Load6
SeoulNa)onalUniversity
20 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloAlgorithm
¢ Reserva)onsta)on(RS)components
Op: OperaMontoperformintheunit(e.g.,+or–)Vj,Vk: ValueofSourceoperands
§ StorebuffershasVfield,resulttobestoredQj,Qk: ReservaMonstaMonsproducingsourceregisters(valuetobewriPen)§ Note:Qj,Qk=0=>ready§ StorebuffersonlyhaveQiforRSproducingresult
Busy: IndicatesreservaMonstaMonorFUisbusyRegisterresultstatus—IndicateswhichfuncMonalunitwillwriteeachregister,ifoneexists.BlankwhennopendinginstrucMonsthatwillwritethatregister.
SeoulNa)onalUniversity
21 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloAlgorithm
¢ ThreestagesofTomasuloAlgorithm
1. Issue—getinstrucMonfromFPOpQueue IfreservaMonstaMonfree(nostructuralhazard),controlissuesinstr&sendsoperands(renamesregisters).
2. ExecuMon—operateonoperands(EX) Whenbothoperandsreadythenexecute;ifnotready,watchCommonDataBusforresult
3. Writeresult—finishexecuMon(WB) WriteonCommonDataBustoallawaiMngunits;markreservaMonstaMonavailable
§ Normaldatabus:data+desMnaMon(“goto”bus)§ Commondatabus:data+source(“comefrom”bus)
§ 64bitsofdata+4bitsofFuncMonalUnitsourceaddress§ WriteifmatchesexpectedFuncMonalUnit(producesresult)§ Doesthebroadcast
SeoulNa)onalUniversity
22 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:AStraight-LineCodeInstruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F300 FU
SeoulNa)onalUniversity
23 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle1Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Load1
SeoulNa)onalUniversity
24 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle2Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Load2 Load1
Note:Canhavemul)pleloadsoutstanding
SeoulNa)onalUniversity
25 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle3Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Mult1 Load2 Load1
• Note:registersnamesareremoved(“renamed”)inReserva)onSta)ons;MULTissuedvs.scoreboard
• Load1comple)ng;whatiswai)ngforLoad1?
SeoulNa)onalUniversity
26 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle4Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 Yes SUBD M(A1) Load2Add2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(A1) Add1
• Load2comple)ng;whatiswai)ngforLoad2?
SeoulNa)onalUniversity
27 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle5Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
2 Add1 Yes SUBD M(A1) M(A2)Add2 NoAdd3 No
10 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Mult1 M(A2) M(A1) Add1 Mult2
SeoulNa)onalUniversity
28 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle6Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
1 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No
9 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 M(A2) Add2 Add1 Mult2
• IssueADDDhere?
SeoulNa)onalUniversity
29 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle7Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
0 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No
8 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 M(A2) Add2 Add1 Mult2
• Add1comple)ng;whatiswai)ngforit?
SeoulNa)onalUniversity
30 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle8Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 No2 Add2 Yes ADDD (M-M) M(A2)
Add3 No7 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 M(A2) Add2 (M-M) Mult2
SeoulNa)onalUniversity
31 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle9Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 No1 Add2 Yes ADDD (M-M) M(A2)
Add3 No6 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 M(A2) Add2 (M-M) Mult2
SeoulNa)onalUniversity
32 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle10Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 No0 Add2 Yes ADDD (M-M) M(A2)
Add3 No5 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU Mult1 M(A2) Add2 (M-M) Mult2
• Add2comple)ng;whatiswai)ngforit?
SeoulNa)onalUniversity
33 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle11Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
4 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
• WriteresultofADDDherevs.scoreboard?
• Allquickinstruc)onscompleteinthiscycle!
SeoulNa)onalUniversity
34 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle12Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
3 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
SeoulNa)onalUniversity
35 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle13Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
2 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
SeoulNa)onalUniversity
36 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle14Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
1 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
SeoulNa)onalUniversity
37 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle15Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 No
0 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 M(A2) (M-M+M)(M-M) Mult2
SeoulNa)onalUniversity
38 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle16Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 No
40 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU M*F4 M(A2) (M-M+M)(M-M) Mult2
SeoulNa)onalUniversity
39 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne(Cont)
Fasterthanlightcomputa)on(skipacoupleofcycles)…
SeoulNa)onalUniversity
40 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle55Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 No
1 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3055 FU M*F4 M(A2) (M-M+M)(M-M) Mult2
SeoulNa)onalUniversity
41 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle56Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 No
0 Mult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(A2) (M-M+M)(M-M) Mult2
• Mult2iscomple)ng;whatiswai)ngforit?
SeoulNa)onalUniversity
42 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleOne:Cycle57Instruction status: Exec Write
Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11
Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk
Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 Yes DIVD M*F4 M(A1)
Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(A2) (M-M+M)(M-M) Result
• Onceagain:In-orderissue,out-of-orderexecu)onandcomple)on.
SeoulNa)onalUniversity
43 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:ALoop
¢ Loopexamplecode Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 SUBI R1 R1 #8 BNEZ R1 Loop
§ AssumeMulMplytakes4clocks§ Assumefirstloadtakes8clocks(cachemiss),secondloadtakes1
clock(hit)§ Tobeclear,willshowclocksforSUBI,BNEZ§ Reality:integerinstrucMonsahead
SeoulNa)onalUniversity
44 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:ALoop
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 Load1 No1 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
0 80 Fu
SeoulNa)onalUniversity
45 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle1
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
1 80 Fu Load1
SeoulNa)onalUniversity
46 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle2
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
2 80 Fu Load1 Mult1
SeoulNa)onalUniversity
47 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle3
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
3 80 Fu Load1 Mult1
• Implicitrenamingsetsup“DataFlow”graph
SeoulNa)onalUniversity
48 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle4
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
4 80 Fu Load1 Mult1
• DispatchingSUBIInstruc)on
SeoulNa)onalUniversity
49 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle5
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
5 72 Fu Load1 Mult1
• and,BNEZinstruc)on
SeoulNa)onalUniversity
50 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle6
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
6 72 Fu Load2 Mult1
• No)cethatF0neverseesLoadfromloca)on80
SeoulNa)onalUniversity
51 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle7
• Registerfilecompletelydetachedfromcomputa)on
• FirstandSeconditera)oncompletelyoverlapped
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 No2 SD F4 0 R1 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
7 72 Fu Load2 Mult2
SeoulNa)onalUniversity
52 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle8
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
8 72 Fu Load2 Mult2
SeoulNa)onalUniversity
53 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle9
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
9 72 Fu Load2 Mult2• Load1comple)ng:whoiswai)ng?
• Note:DispatchingSUBI
SeoulNa)onalUniversity
54 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle10
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 10 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
4 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
10 64 Fu Load2 Mult2• Load2comple)ng:whoiswai)ng?
• Note:DispatchingBNEZ
SeoulNa)onalUniversity
55 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle11
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
3 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #84 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
11 64 Fu Load3 Mult2
• Nextloadinsequence
SeoulNa)onalUniversity
56 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle12
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
2 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #83 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
12 64 Fu Load3 Mult2
• Whynotissuethirdmul)ply?
SeoulNa)onalUniversity
57 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle13
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
1 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #82 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
13 64 Fu Load3 Mult2
SeoulNa)onalUniversity
58 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle14
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1
0 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #81 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
14 64 Fu Load3 Mult2
• Mult1comple)ng.Whoiswai)ng?
SeoulNa)onalUniversity
59 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle15
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8
0 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
15 64 Fu Load3 Mult2
• Mult2comple)ng.Whoiswai)ng?
SeoulNa)onalUniversity
60 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle16
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 No
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
16 64 Fu Load3 Mult1
SeoulNa)onalUniversity
61 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle17
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
17 64 Fu Load3 Mult1
SeoulNa)onalUniversity
62 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle18
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
18 64 Fu Load3 Mult1
SeoulNa)onalUniversity
63 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle19
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 19 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
19 64 Fu Load3 Mult1
SeoulNa)onalUniversity
64 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo:Cycle20
Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 No2 SD F4 0 R1 8 19 20 Store3 Yes 64 Mult1
Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:
Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop
Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30
20 64 Fu Load3 Mult1
SeoulNa)onalUniversity
65 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
TomasuloExampleTwo
¢ WhycanTomasulooverlapitera)onsofloops?
§ Registerrenaming§ MulMpleiteraMonsusedifferentphysicaldesMnaMonsforregisters(dynamicloopunrolling).
§ ReservaMonstaMons§ PermitinstrucMonissuetoadvancepastintegercontrolflowoperaMons
§ Otheridea:Tomasulobuildingdynamic“DataFlow”graphfrominstrucMons
SeoulNa)onalUniversity
66 heig-vd/snusummeruniversity2017:howmodernprocessorswork?
Summary:TomasuloAlgorithm
¢ Reserva)onssta)ons:renamingtolargersetofregisters+bufferingsourceoperands
§ PreventsregistersasboPleneck§ AvoidsWAR,WAWhazardsofScoreboard§ AllowsloopunrollinginHW
¢ Dynamichardwareschemescanunrollloopsdynamicallyinhardware
§ Formoflimiteddataflow§ RegisterrenamingisessenMal
¢ Las)ngContribu)onsofTomasuloAlgorithm
§ Dynamicscheduling§ Registerrenaming§ Load/storedisambiguaMon
¢ IBM360/91descendants:Pen)umII,PPC604,MIPSR10000,Alpha21264,andcoun)ng...