superscalar architectures: part 2 - introduction | csap · superscalar architectures: part 2...

66
Seoul Na)onal University 1 heig-vd/snu summer university 2017: how modern processors work? Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23 rd , 2017 Jae W. Lee ([email protected] ) Computer Science and Engineering Seoul NaMonal University Download this lecture slides at hPps://goo.gl/rJPMQU Slide credits: [COD5e] and [CA:AQA5e] slides from Elsevier Inc.

Upload: others

Post on 14-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

1 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

SuperscalarArchitectures:Part2

Dynamic(Out-of-Order)Scheduling

Lecture3.2August23rd,2017

JaeW.Lee([email protected])ComputerScienceandEngineeringSeoulNaMonalUniversityDownloadthislectureslidesathPps://goo.gl/rJPMQUSlidecredits:[COD5e]and[CA:AQA5e]slidesfromElsevierInc.

Page 2: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

2 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Outline

Reference:[CA:AQA5e]Ch.3.4-3.5

¢  Instruc)on-LevelParallelismandDependences

¢  DynamicSchedulingwithTomasuloAlgorithm

Page 3: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

3 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Instruc)on-LevelParallelism

andDependences

Page 4: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

4 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Instruc)on-LevelParallelism

¢  ILPislimitedby

§  Resourceconflicts§  Dependences

¢  Threetypesofdependences

§  (True)Datadependences§  Namedependences§  Controldependences

Page 5: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

5 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

DataDependence

¢  Instruc)onjisdatadependentoninstruc)oniif§  InstrucMoniproducesaresultthatmaybeusedbyinstrucMonj§  InstrucMonjisdatadependentoninstrucMonkandinstrucMonkisdata

dependentoninstrucMoni

¢  Example:whichinstruc)onpairsaredatadependent?Loop: L.D F0,0(R1) # F0=array element ADD.D F4,F0,F2 # add scalar in F2 S.D F4,0(R1) # store result DADDUI R1,R1,#-8 # decrement pointer 8 bytes BNE R1,R2,LOOP # branch R1!=R2

¢  Dependentinstruc)onscannotbeexecutedsimultaneously

Page 6: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

6 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

DataDependence

¢  Dependenciesareapropertyofprograms

¢  Pipelineorganiza)ondeterminesifdependenceis

detectedandifitcausesastall

§  Read-AYer-Write(RAW)hazard¢  Datadependenceconveys:

§  Possibilityofahazard§  Orderinwhichresultsmustbecalculated§  UpperboundonexploitableinstrucMonlevelparallelism

¢  Dependenciesthatflowthroughmemoryloca)onsare

difficulttodetect§  “memorydisambiguaMon”problem§  Does100(R4)=20(R6)?§  FromdifferentloopiteraMons,does20(R6)=20(R6)?

Page 7: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

7 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

NameDependence

¢  Twoinstruc)onsusethesamenamebutnoflowofinforma)on

§  Notatruedatadependence,butisaproblemwhenreorderinginstrucEons§  AnEdependence:instrucMonjwritesaregisterormemorylocaMonthat

instrucMonireads§  IniMalordering(ibeforej)mustbepreserved§  CausingWrite-AYer-Read(WAR)hazard

§  Outputdependence:instrucMoniandinstrucMonjwritethesameregisterormemorylocaMon§  Orderingmustbepreserved§  CausingWrite-AYer-Write(WAW)hazard

¢  Toresolve,userenamingtechniques

Page 8: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

8 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

DataandNameDependence:Examples

¢  (True)Datadependence

¢  An)dependence

¢  Outputdependence

r3 ß (r1) op (r2)

r5 ß (r3) op (r4)

r3 ß (r1) op (r2)

r1 ß (r4) op (r5)

r3 ß (r1) op (r2)

r3 ß (r4) op (r5)

Page 9: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

9 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

DataHazards

¢  Adatahazardexistsif

§  ThereisanameordatadependencebetweeninstrucMons,and§  TheyarecloseenoughthatoverlapduringexecuMonwouldchange

theorderofaccesstotheoperandinvolvedinthedependence

¢  Threetypesofdatahazardscorrespondingtothreetypesofdependences

§  ReadaYerwrite(RAW)hazard–truedatadependence§  WriteaYerwrite(WAW)hazard–outputdependence§  WriteaYerread(WAR)hazard-anMdependence

Page 10: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

10 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

ControlDependence

¢  Orderingofinstruc)oniwithrespecttoabranchinstruc)on

§  InstrucMoncontroldependentonabranchcannotbemovedbeforethebranchsothatitsexecuMonisnolongercontrollerbythebranch

§  AninstrucMonnotcontroldependentonabranchcannotbemovedaYerthebranchsothatitsexecuMoniscontrolledbythebranch

Page 11: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

11 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

ControlDependence

¢  Examples

¢  ORinstruc)ondatadependentonDADDUandDSUBU

¢  AssumeR4isn’tuseda\erskip

§  PossibletomoveDSUBUbeforethebranch

Example1:

DADDU R1,R2,R3 BEQZ R4,L DSUBU R1,R1,R6

L: … OR R7,R1,R8

Example2:

DADDU R1,R2,R3 BEQZ R12,skip DSUBU R4,R5,R6 DADDU R5,R4,R9

skip: OR R7,R8,R9

11

Page 12: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

12 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

DynamicSchedulingwith

TomasuloAlgorithm

Page 13: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

13 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

DynamicScheduling

¢  Rearrangeorderofinstruc)onstoreducestallswhilemaintainingdataflow

¢  Advantages:

§  Compilerdoesn’tneedtohaveknowledgeofmicroarchitecture§  HandlescaseswheredependenciesareunknownatcompileMme

¢  Disadvantage:

§  SubstanMalincreaseinhardwarecomplexity§  ComplicatesexcepMons

Page 14: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

14 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

DynamicScheduling

¢  Dynamicschedulingimplies:

§  Out-of-orderexecuMon§  Out-of-ordercompleMon

¢  CreatesthepossibilityforWARandWAWhazards

§  WARExample: DIVD F0,F2,F4 // assume takes long time ADDD F10,F0,F8 // RAW hazard on F0 SUBD F8,F8,F14 // WAR hazard on F8

¢  Twopopluardynamicschedulingalgorithms:

ScoreboardandTomasuloAlgorithm

§  Bothtrackwhenoperandsareavailable§  Tomasulofurtherintroducesregisterrenaminginhardware

§  MinimizesWAWandWARhazards

Page 15: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

15 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloAlgorithm

¢  Bestknowndynamicschedulingalgorithm

§  Influencedvirtuallyallout-of-orderinstrucMonschedulingtechniques§  Alpha21264,HP8000,MIPS10000,PenMumII,PowerPC604,…

¢  FirstintroducedforIBM360/91(1966)

¢  Goal:Highperformancewithoutspecialcompilers

Page 16: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

16 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloAlgorithm

¢  RegisterrenamingtoovercomeWAR/WAWhazards(1)

§  Example:

DIV.DF0,F2,F4ADD.DF6,F0,F8S.DF6,0(R1)SUB.DF8,F10,F14MUL.DF6,F10,F8

+namedependenceswithF6andF8

an)-dependence(WAR)

an)-(output)dependence(WAW)

Page 17: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

17 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloAlgorithm

¢  RegisterrenamingtoovercomeWAR/WAWhazards(2)

§  Example:

DIV.DF0,F2,F4ADD.DS,F0,F8S.DS,0(R1)SUB.DT,F10,F14MUL.DF6,F10,T

§  NowonlyRAWhazardsremain,whichcanbestrictlyordered

Page 18: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

18 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloAlgorithm

¢  Registerrenamingisprovidedbyreserva)onsta)ons(RS)inTomasuloAlgorithm

§  Contains:§  TheinstrucMon§  Bufferedoperandvalues(whenavailable)§  ReservaMonstaMonnumberofinstrucMonprovidingtheoperandvalues

§  RSfetchesandbuffersanoperandassoonasitbecomesavailable(notnecessarilyinvolvingregisterfile)

§  PendinginstrucMonsdesignatetheRStowhichtheywillsendtheiroutput§  Resultvaluesbroadcastonaresultbus,calledthecommondatabus(CDB)

§  Onlythelastoutputupdatestheregisterfile§  AsinstrucMonsareissued,theregisterspecifiersarerenamedwiththe

reservaMonstaMon§  MaybemorereservaMonstaMonsthanregisters

Page 19: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

19 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloAlgorithm

¢  Tomasuloorganiza)onTomasulo(Organiza/on(

FP adders

Add1 Add2 Add3

FP multipliers

Mult1 Mult2

From Mem FP Registers

Reservation Stations

Common Data Bus (CDB)

To Mem

FP Op Queue

Load Buffers

Store Buffers

Load1 Load2 Load3 Load4 Load5 Load6

Page 20: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

20 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloAlgorithm

¢  Reserva)onsta)on(RS)components

Op: OperaMontoperformintheunit(e.g.,+or–)Vj,Vk: ValueofSourceoperands

§  StorebuffershasVfield,resulttobestoredQj,Qk: ReservaMonstaMonsproducingsourceregisters(valuetobewriPen)§  Note:Qj,Qk=0=>ready§  StorebuffersonlyhaveQiforRSproducingresult

Busy: IndicatesreservaMonstaMonorFUisbusyRegisterresultstatus—IndicateswhichfuncMonalunitwillwriteeachregister,ifoneexists.BlankwhennopendinginstrucMonsthatwillwritethatregister.

Page 21: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

21 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloAlgorithm

¢  ThreestagesofTomasuloAlgorithm

1. Issue—getinstrucMonfromFPOpQueue IfreservaMonstaMonfree(nostructuralhazard),controlissuesinstr&sendsoperands(renamesregisters).

2. ExecuMon—operateonoperands(EX) Whenbothoperandsreadythenexecute;ifnotready,watchCommonDataBusforresult

3. Writeresult—finishexecuMon(WB) WriteonCommonDataBustoallawaiMngunits;markreservaMonstaMonavailable

§  Normaldatabus:data+desMnaMon(“goto”bus)§  Commondatabus:data+source(“comefrom”bus)

§  64bitsofdata+4bitsofFuncMonalUnitsourceaddress§  WriteifmatchesexpectedFuncMonalUnit(producesresult)§  Doesthebroadcast

Page 22: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

22 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:AStraight-LineCodeInstruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 Load1 NoLD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F300 FU

Page 23: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

23 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle1Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 Load2 NoMULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Load1

Page 24: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

24 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle2Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F302 FU Load2 Load1

Note:Canhavemul)pleloadsoutstanding

Page 25: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

25 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle3Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 Load1 Yes 34+R2LD F2 45+ R3 2 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F303 FU Mult1 Load2 Load1

•  Note:registersnamesareremoved(“renamed”)inReserva)onSta)ons;MULTissuedvs.scoreboard

•  Load1comple)ng;whatiswai)ngforLoad1?

Page 26: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

26 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle4Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 Load2 Yes 45+R3MULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 Yes SUBD M(A1) Load2Add2 NoAdd3 NoMult1 Yes MULTD R(F4) Load2Mult2 No

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F304 FU Mult1 Load2 M(A1) Add1

•  Load2comple)ng;whatiswai)ngforLoad2?

Page 27: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

27 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle5Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

2 Add1 Yes SUBD M(A1) M(A2)Add2 NoAdd3 No

10 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F305 FU Mult1 M(A2) M(A1) Add1 Mult2

Page 28: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

28 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle6Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

1 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No

9 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F306 FU Mult1 M(A2) Add2 Add1 Mult2

•  IssueADDDhere?

Page 29: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

29 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle7Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

0 Add1 Yes SUBD M(A1) M(A2)Add2 Yes ADDD M(A2) Add1Add3 No

8 Mult1 Yes MULTD M(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F307 FU Mult1 M(A2) Add2 Add1 Mult2

•  Add1comple)ng;whatiswai)ngforit?

Page 30: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

30 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle8Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No2 Add2 Yes ADDD (M-M) M(A2)

Add3 No7 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F308 FU Mult1 M(A2) Add2 (M-M) Mult2

Page 31: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

31 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle9Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No1 Add2 Yes ADDD (M-M) M(A2)

Add3 No6 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F309 FU Mult1 M(A2) Add2 (M-M) Mult2

Page 32: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

32 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle10Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 No0 Add2 Yes ADDD (M-M) M(A2)

Add3 No5 Mult1 Yes MULTD M(A2) R(F4)

Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU Mult1 M(A2) Add2 (M-M) Mult2

•  Add2comple)ng;whatiswai)ngforit?

Page 33: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

33 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle11Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

4 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

•  WriteresultofADDDherevs.scoreboard?

•  Allquickinstruc)onscompleteinthiscycle!

Page 34: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

34 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle12Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

3 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 35: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

35 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle13Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

2 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 36: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

36 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle14Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

1 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 37: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

37 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle15Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 No

0 Mult1 Yes MULTDM(A2) R(F4)Mult2 Yes DIVD M(A1) Mult1

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU Mult1 M(A2) (M-M+M)(M-M) Mult2

Page 38: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

38 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle16Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

40 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

Page 39: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

39 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne(Cont)

Fasterthanlightcomputa)on(skipacoupleofcycles)…

Page 40: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

40 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle55Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

1 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3055 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

Page 41: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

41 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle56Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 No

0 Mult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(A2) (M-M+M)(M-M) Mult2

•  Mult2iscomple)ng;whatiswai)ngforit?

Page 42: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

42 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleOne:Cycle57Instruction status: Exec Write

Instruction j k Issue Comp Result Busy AddressLD F6 34+ R2 1 3 4 Load1 NoLD F2 45+ R3 2 4 5 Load2 NoMULTD F0 F2 F4 3 15 16 Load3 NoSUBD F8 F6 F2 4 7 8DIVD F10 F0 F6 5 56 57ADDD F6 F8 F2 6 10 11

Reservation Stations: S1 S2 RS RSTime Name Busy Op Vj Vk Qj Qk

Add1 NoAdd2 NoAdd3 NoMult1 NoMult2 Yes DIVD M*F4 M(A1)

Register result status:Clock F0 F2 F4 F6 F8 F10 F12 ... F3056 FU M*F4 M(A2) (M-M+M)(M-M) Result

•  Onceagain:In-orderissue,out-of-orderexecu)onandcomple)on.

Page 43: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

43 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:ALoop

¢  Loopexamplecode Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 SUBI R1 R1 #8 BNEZ R1 Loop

§  AssumeMulMplytakes4clocks§  Assumefirstloadtakes8clocks(cachemiss),secondloadtakes1

clock(hit)§  Tobeclear,willshowclocksforSUBI,BNEZ§  Reality:integerinstrucMonsahead

Page 44: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

44 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:ALoop

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 Load1 No1 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

0 80 Fu

Page 45: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

45 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle1

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

1 80 Fu Load1

Page 46: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

46 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle2

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

2 80 Fu Load1 Mult1

Page 47: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

47 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle3

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

3 80 Fu Load1 Mult1

•  Implicitrenamingsetsup“DataFlow”graph

Page 48: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

48 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle4

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

4 80 Fu Load1 Mult1

•  DispatchingSUBIInstruc)on

Page 49: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

49 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle5

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

5 72 Fu Load1 Mult1

•  and,BNEZinstruc)on

Page 50: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

50 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle6

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

6 72 Fu Load2 Mult1

•  No)cethatF0neverseesLoadfromloca)on80

Page 51: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

51 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle7

•  Registerfilecompletelydetachedfromcomputa)on

•  FirstandSeconditera)oncompletelyoverlapped

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

7 72 Fu Load2 Mult2

Page 52: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

52 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle8

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

8 72 Fu Load2 Mult2

Page 53: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

53 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle9

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

9 72 Fu Load2 Mult2•  Load1comple)ng:whoiswai)ng?

•  Note:DispatchingSUBI

Page 54: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

54 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle10

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 10 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

4 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

10 64 Fu Load2 Mult2•  Load2comple)ng:whoiswai)ng?

•  Note:DispatchingBNEZ

Page 55: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

55 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle11

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

3 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #84 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

11 64 Fu Load3 Mult2

•  Nextloadinsequence

Page 56: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

56 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle12

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

2 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #83 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

12 64 Fu Load3 Mult2

•  Whynotissuethirdmul)ply?

Page 57: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

57 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle13

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

1 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #82 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

13 64 Fu Load3 Mult2

Page 58: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

58 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle14

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

0 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #81 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

14 64 Fu Load3 Mult2

•  Mult1comple)ng.Whoiswai)ng?

Page 59: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

59 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle15

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8

0 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

15 64 Fu Load3 Mult2

•  Mult2comple)ng.Whoiswai)ng?

Page 60: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

60 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle16

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

16 64 Fu Load3 Mult1

Page 61: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

61 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle17

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

17 64 Fu Load3 Mult1

Page 62: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

62 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle18

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

18 64 Fu Load3 Mult1

Page 63: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

63 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle19

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 19 Store3 Yes 64 Mult1

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

19 64 Fu Load3 Mult1

Page 64: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

64 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo:Cycle20

Instruction status: Exec WriteITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 No2 SD F4 0 R1 8 19 20 Store3 Yes 64 Mult1

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

20 64 Fu Load3 Mult1

Page 65: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

65 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

TomasuloExampleTwo

¢  WhycanTomasulooverlapitera)onsofloops?

§  Registerrenaming§  MulMpleiteraMonsusedifferentphysicaldesMnaMonsforregisters(dynamicloopunrolling).

§  ReservaMonstaMons§  PermitinstrucMonissuetoadvancepastintegercontrolflowoperaMons

§  Otheridea:Tomasulobuildingdynamic“DataFlow”graphfrominstrucMons

Page 66: Superscalar Architectures: Part 2 - Introduction | csap · Superscalar Architectures: Part 2 Dynamic (Out-of-Order) Scheduling Lecture 3.2 August 23rd, 2017 Jae W. Lee (jaewlee@snu.ac.kr)

SeoulNa)onalUniversity

66 heig-vd/snusummeruniversity2017:howmodernprocessorswork?

Summary:TomasuloAlgorithm

¢  Reserva)onssta)ons:renamingtolargersetofregisters+bufferingsourceoperands

§  PreventsregistersasboPleneck§  AvoidsWAR,WAWhazardsofScoreboard§  AllowsloopunrollinginHW

¢  Dynamichardwareschemescanunrollloopsdynamicallyinhardware

§  Formoflimiteddataflow§  RegisterrenamingisessenMal

¢  Las)ngContribu)onsofTomasuloAlgorithm

§  Dynamicscheduling§  Registerrenaming§  Load/storedisambiguaMon

¢  IBM360/91descendants:Pen)umII,PPC604,MIPSR10000,Alpha21264,andcoun)ng...