lecture 03 instruction set principles · 2019-01-10 · hence, register architecture classification...
TRANSCRIPT
![Page 1: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/1.jpg)
Lecture03InstructionSetPrinciples
CSCE513ComputerArchitecture
DepartmentofComputerScienceandEngineeringYonghong Yan
[email protected]://cse.sc.edu/~yanyh
1
![Page 2: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/2.jpg)
Contents
1. Introduction2. ClassifyingInstructionSetArchitectures3. MemoryAddressing4. TypeandSizeofOperands5. OperationsintheInstructionSet6. InstructionsforControlFlow7. EncodinganInstructionSet8. CrosscuttingIssues:TheRoleofCompilers9. RISC-VISA
• Supplement(notcovered)– RISCvsCISC– ComparisonofISA
• AppendixK 2
![Page 3: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/3.jpg)
1Introduction
InstructionSetArchitecture– theportionofthemachinevisibletotheassemblylevelprogrammerortothecompilerwriter– Tousethehardwareofacomputer,wemustspeak itslanguage– Thewordsofacomputerlanguagearecalledinstructions,and
itsvocabularyiscalledaninstructionset
instructionset
software
hardware
Instr.# Operation+Operandsi movl -4(%ebp),%eax(i+1) addl %eax,(%edx)(i+2) cmpl 8(%ebp),%eax(i+3) jl L5:L5:
3
![Page 4: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/4.jpg)
sum.s forX86
• http://www.cs.virginia.edu/~evans/cs216/guides/x86.html• https://en.wikibooks.org/wiki/X86_Assembly/SSE
2operands-8(%eax):Memoryaddress
4
![Page 5: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/5.jpg)
sum.s forRISC-V
https://riscv.org/
2or3operands-20(s0):Memoryaddress
5
![Page 6: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/6.jpg)
ISAInReal
• Apdfdocumentthatdefinesthemodel/architecture/interfaceofthemachine– X86andIntelSDM:https://software.intel.com/en-
us/articles/intel-sdm• Severalthousandspages
– RISC-VISASpec:https://riscv.org/specifications/• Latestversion2.2,145pages
• AspecificationthatprovidestheISAdetails
• ReviewChapter2oftheCODbook
6
![Page 7: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/7.jpg)
2ClassifyingInstructionSetArchitectures
OperandstorageinCPU Wherearetheyotherthanmemory
#explicitoperandsnamedperinstruction
Howmany?Min,Max,Average
Addressingmode Howtheeffectiveaddressforanoperandcalculated?Canalluseanymode?
Operations Whataretheoptionsfortheopcode?
Type&sizeofoperands Howistypingdone?Howisthesizespecified?
Thesechoicescriticallyaffectnumberofinstructions,CPI,andCPUcycletime
7
![Page 8: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/8.jpg)
ISAClassification
• Mostbasicdifferentiation:internalstorageinaprocessor– Operandsmaybenamedexplicitly orimplicitly
• Majorchoices:1. Inanaccumulatorarchitecture oneoperandisimplicitly the
accumulator=>similartocalculator2. Theoperandsinastackarchitecture areimplicitly onthe
topofthestack3. Thegeneral-purposeregisterarchitectures haveonly
explicit operands– eitherregistersormemorylocation
8
![Page 9: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/9.jpg)
FourISAClasses
• Register-memory:X86(CISC)
• Register-register:RISC(e.g.ARM,MIPS,RISC-V,Power)
9
![Page 10: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/10.jpg)
RegisterMachines• Howmanyregistersaresufficient?• General-purposeregistersvs.special-purposeregisters
• compilerflexibilityandhand-optimization• Twomajorconcernsforarithmeticandlogicalinstructions(ALU)
1.Twoorthreeoperands• X+YÞ X• X+Y Þ Z
2.Howmanyoftheoperandsmaybememoryaddresses(0– 3)
Hence,registerarchitectureclassification(#mem,#operands)
Numberofmemoryaddresses
Maximumnumberofoperandsallowed
TypeofArchitecture Examples
0 3 Load-Store Alpha,ARM,MIPS,PowerPC,SPARC,SuperH,TM32
1 2 Register-Memory IBM360/370,Intel80x86,Motorola68000,TITMS320C54x
2 2 Memory– memory VAX(alsohas3operandformats)
3 3 Memory- memory VAX(alsohas2operandformats)
10
![Page 11: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/11.jpg)
(0,3):Register-Register(RISC)
• ALUisRegistertoRegister– alsoknownas– pureReducedInstructionSetComputer(RISC)
• Advantages– Simplefixedlengthinstructionencoding– Decodeissimplesinceinstructiontypesaresmall– Simplecodegenerationmodel– InstructionCPItendstobeveryuniform
• Exceptformemoryinstructionsofcourse– butthereareonly2ofthem- loadandstore
• Disadvantages– Instructioncounttendstobehigher– Someinstructionsareshort- wastinginstructionwordbits
11
![Page 12: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/12.jpg)
(1,2):Register-Memory(CISC,X86)
• EvolvedRISCandalsooldCISC– newRISCmachinescapableofdoingspeculativeloads– predicatedand/ordeferredloadsarealsopossible
• Advantages– dataaccesstoALUimmediatewithoutloadingfirst– instructionformatisrelativelysimpletoencode– codedensityisimprovedoverRegister(0,3)model
• Disadvantages– operandsarenotequivalent- sourceoperandmaybedestroyed– needformemoryaddressfieldmaylimit#ofregisters– CPIwillvary
• ifmemorytargetisinL0cachethennotsobad• ifnot- lifegetsmiserable
12
![Page 13: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/13.jpg)
(2,2)or(3,3):Memory-Memory
Notusedtoday
• TrueandmostcomplexCISCmodel– currentlyextinctandlikelytoremainso– morecomplexmemoryactionsarelikelytoappearbutnot– directlylinkedtotheALU
• Advantages– mostcompactcode– doesn’twasteregistersfortemporaryvalues
• goodideaforuseoncedata- e.g.streamingmedia
• Disadvantages– largevariationininstructionsize- mayneedashoe-horn– largevariationinCPI- i.e.workperinstruction– exacerbatestheinfamousmemorybottleneck
• registerfilereducesmemoryaccessesifreused
13
![Page 14: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/14.jpg)
Summary:TradeoffsfortheISAClasses
Type Advantages Disadvantages
Register-register(0,3)
Simple,fixedlengthinstructionencoding.Simplecodegenerationmodel.Instructionstakesimilarnumbersofclockstoexecute.
Higherinstructioncountthanarchitectureswithmemoryreferencesintheinstructions.Moreinstructionsandlowerinstructiondensityleadstolargerprograms
Register-memory(1,2)
Datacanbeaccessedwithoutaseparateloadinstructionfirst.Instructionformattendstobeeasytoencodeandyieldsgooddensity
Operandsarenotequivalentsinceasourceoperandisdestroyed.Encodingaregisternumberandamemoryaddressineachinstructionmayrestrictthenumberofregisters.Clocksperinstructionvarybyoperandlocation
Memory-memory(2,2)or(3,3)
Mostcompact.Doesnotwasteregistersfortemporaries.
Largevariationininstructionsize,especiallyforthree-operandinstructions.Inaddition,largevariationinworkperinstruction.Memoryaccessescreatememorybottleneck.(Notusedtoday)
14
![Page 15: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/15.jpg)
3MemoryAddressing
•Objectshavebyteaddresses– thenumberofbytescountedfromthebeginningofmemory
•ObjectLength:–bytes(8bits),halfwords(16bits),–words(32bits),anddoublewords(64bits).–Thetypeisimpliedinopcode,e.g.,
• LDB– loadbyte• LDW– loadword,etc
• ByteOrdering– LittleEndian: putsthebytewhoseaddressisxx00attheleastsignificantpositionintheword.(7,6,5,4,3,2,1,0)
– BigEndian: putsthebytewhoseaddressisxx00atthemostsignificantpositionintheword.(0,1,2,3,4,5,6,7)
• Problemoccurswhenexchangingdataamongmachineswithdifferentorderings
15
![Page 16: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/16.jpg)
InterpretingMemoryAddresses
• AlignmentIssues– Accessestoobjectslargerthanabytemustbealigned.
• AnaccesstoanobjectofsizesbytesatbyteaddressAisalignedifAmods=0.
– Misalignmentcauseshardwarecomplications• sincememoryistypicallyalignedonawordoradouble-wordboundary
• MisalignmenttypicallyresultsinanalignmentfaultthatmustbehandledbytheOS
• Hence– byteaddressisanything- nevermisaligned– halfword- evenaddresses- loworderaddressbit=0(XXXXXXX0)
elsetrap– word- loworder2addressbits=0(XXXXXX00)elsetrap– doubleword- loworder3addressbits=0(XXXXX000)elsetrap
16
![Page 17: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/17.jpg)
MemoryAlignment
17
![Page 18: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/18.jpg)
Aligned/MisalignedAddresses
18
![Page 19: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/19.jpg)
AddressingModes
• Howarchitecturespecifytheeffectiveaddressofanobject?– Effectiveaddress:theactualmemoryaddressspecifiedbythe
addressingmode.• E.g.Mem[R[R1]] referstothecontentsofthememorylocationwhoselocationisgivenbythecontentsofregister1(R1).
• AddressingModes:– Register.– Immediate– Displacement– Registerindirect,……..
-20(s0):Memoryaddress
19
![Page 20: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/20.jpg)
AddressModes
20
![Page 21: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/21.jpg)
AddressingModeImpacts
• Instructioncounts• ArchitectureComplexity• CPI
21
![Page 22: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/22.jpg)
SummaryofUseofMemoryAddressingModes
22
![Page 23: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/23.jpg)
DisplacementValuesareWidelyDistributed
Impactinstructionlength
23
![Page 24: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/24.jpg)
DisplacementAddressingMode
• Benchmarksshow– 12bitsofdisplacementwouldcaptureabout75%ofthefull32-bit
displacements– 16bitsshouldcaptureabout99%
• Remember:– optimizeforthecommoncase.Hence,thechoiceisatleast12-16bits
• Foraddressesthatdofitindisplacementsize:Add R4,10000(R0)
• Foraddressesthatdon’tfitindisplacementsize,thecompilermustdothefollowing:
Load R1,1000000Add R1,R0Add R4,0(R1)
24
![Page 25: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/25.jpg)
ImmediateAddressingMode
• Usedwherewewanttogettoanumericalvalueinaninstruction• Around25%oftheoperationshaveanimmediateoperand
Athighlevel:
a=b+3;
if(a>17)
goto Addr
AtAssemblerlevel:
LoadR2,#3AddR0,R1,R2
LoadR2,#17CMPBGTR1,R2
LoadR1,AddressJump(R1)
25
![Page 26: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/26.jpg)
About25%ofdatatransferandALUoperationshaveanimmediateoperand
Impactinstructionlength
26
![Page 27: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/27.jpg)
NumberofBitsforImmediate
• 16bitswouldcaptureabout80%and8bitsabout50%.
Impactinstructionlength
27
![Page 28: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/28.jpg)
Summary:MemoryAddressing
• Anewarchitectureexpectedtosupportatleast:displacement,immediate,andregisterindirect– represent75%to99%oftheaddressingmodes
• Thesizeoftheaddressfordisplacementmodetobeatleast12-16bits– capture75%to99%ofthedisplacements
• Thesizeoftheimmediatefieldtobeatleast8-16bits– capture50%to80%oftheimmediates
Processorsrelyoncompilerstogeneratecodesusingthoseaddressingmode
28
![Page 29: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/29.jpg)
4 TypeAndSizeofOperands
• Thetypeoftheoperandisusuallyencodedintheopcode– e.g.,LDB– loadbyte;LDW– loadword
• Commonoperandtypes:(implytheirsizes)Character(8bitsor1byte)Halfword(16bitsor2bytes)Word(32bitsor4bytes)Doubleword(64bitsor8bytes)Singleprecisionfloatingpoint(4bytesor1word)Doubleprecisionfloatingpoint(8bytesor2words)ü CharactersarealmostalwaysinASCIIü 16-bitUnicode(usedinJava)isgainingpopularityü Integersaretwo’scomplementbinaryü FloatingpointsfollowtheIEEEstandard754
• Somearchitecturessupportpackeddecimal:4bitsareusedtoencodethevalues0-9;2decimaldigitsarepackedintoeachbyte
Howisthetypeofanoperanddesignated?
29
![Page 30: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/30.jpg)
DistributionofDataAccessesbySize
30
![Page 31: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/31.jpg)
Summary:TypeandSizeofoperands
• 32-architecturesupports8-,16-,and32-bitintegers,32-bitand64-bitIEEE754floating-pointdata.
• Anew64-bitaddressarchitecturesupports64-bitintegers• MediaprocessorandDSPsneedwideraccumulatingregistersforaccuracy.
31
![Page 32: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/32.jpg)
5 OperationsintheInstructionSet
• Allcomputersgenerallyprovideafullsetofoperationsforthefirstthreecategories
• Allcomputersmusthavesomeinstructionsupportforbasicsystemfunctions
• Graphicsinstructionstypicallyoperateonmanysmallerdataitemsinparallel
32
![Page 33: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/33.jpg)
Top10Instructionsfor80x86
33
![Page 34: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/34.jpg)
InstructionEncoding
• RISC-VR-formatinstruction
34
• RISC-VI-formatinstruction
![Page 35: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/35.jpg)
6 InstructionsforControlFlow
• Controlinstructionschangetheflowofcontrol:– insteadofexecutingthenextinstruction,theprogrambranchesto
theaddressspecifiedinthebranchinginstructions• Theybreakthepipeline
– Difficulttooptimizeout– ANDtheyarefrequent
• Fourtypesofcontrolinstructions– Conditionalbranches
• if…else,for/while,switch/case,…– Jumps– unconditionaltransfer
• goto– Procedurecalls
• foo()– Procedurereturns
• return35
![Page 36: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/36.jpg)
BreakdownofControlFlowInstructions
– Conditionalbranches– Jumps– unconditionaltransfer– Procedurecalls– Procedurereturns
• Issues:– Whereisthetargetaddress?Howtospecifyit?(label)– Caller:Whereisreturnaddresskept?Howarethearguments
passed?– Callee:Whereisreturnaddress?Howaretheresultspassed?
36
![Page 37: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/37.jpg)
AddressingModesforControlFlowInstructions
• PC-relative(ProgramCounter)– SupplyadisplacementaddedtothePC
• Knownatcompiletimeforjumps,branches,andcalls(specifiedwithintheinstruction)
– Thetargetisoftennearthecurrentinstruction• Requiringfewerbits• Independentlyofwhereitisloaded(positionindependence)
• Registerindirectaddressing– dynamicaddressing– Thetargetaddressmaynotbeknownatcompiletime– Namingaregisterthatcontainsthetargetaddress
• Caseorswitchstatements• VirtualfunctionsormethodsinC++orJava• High-orderfunctionsorfunctionpointersinCorC++• Dynamicallysharedlibraries
37
![Page 38: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/38.jpg)
BranchDistances
38
![Page 39: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/39.jpg)
ConditionalBranchOptions
Figure2.21Majormethodsforevaluatingbranchconditions
39
![Page 40: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/40.jpg)
ComparisonTypevs.Frequency
• Mostloopsgofrom0ton.• Mostbackwardbranchesareloops– takenabout90%
Program % backward branches
% all control instructions that
modify PCgcc 26% 63%spice 31% 63%TeX 17% 70%Average 25% 65% 40
![Page 41: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/41.jpg)
ProcedureInvocationOptions• Procedurecallsandreturns
– controltransfer– statesaving;thereturnaddressmustbesavedNewerarchitecturesrequirethecompilertogeneratestoresandloads
foreachregistersavedandrestored
• Twobasicconventionsinusetosaveregisters– callersaving:thecallingproceduremustsavetheregistersthatit
wantspreservedforaccessafterthecall• thecalledprocedureneednotworryaboutregisters
– callee saving:thecalledproceduremustsavetheregistersitwantstouse
• leavingthecallerunrestrained
mostrealsystemstodayuseacombinationofboth• Applicationbinaryinterface(ABI)thatsetdownthebasicrulesastowhichregisterbecallersavedandwhichshouldbecallee saved
41
![Page 42: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/42.jpg)
7.EncodinganInstructionSet
• Opcode:specifyingtheoperation• #ofoperand
– addressingmode– addressspecifier:tellswhataddressingmodeisused– Load-storecomputer
• Onlyonememoryoperand• Onlyoneortwoaddressingmodes
• Thearchitecturemustbalancingseveralcompetingforceswhenencodingtheinstructionset:– #ofregisters&&Addressingmodes– Sizeofregisters&&Addressingmodefields– Averageinstructionsize&&Averageprogramsize.– Easytohandleinpipelineimplementation.
42
![Page 43: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/43.jpg)
Example:x86andAlpha
• x86:
• Alpha:
43
![Page 44: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/44.jpg)
ThreeBasicVariationsforInstructionEncoding
Thelengthof80x86(CISC)instructionsvariesbetween1and17bytes.
ThelengthofmostRISCISAinstructionsare4bytes.
X86programaregenerallysmallerthanRISCISA.
ToreduceRISCcodesize
44
![Page 45: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/45.jpg)
InstructionLengthTradeoffs
• Fixedlength:Lengthofallinstructionsthesame+Easiertodecodesingleinstructioninhardware+Easiertodecodemultipleinstructionsconcurrently-- Wastedbitsininstructions(Whyisthisbad?)-- Harder-to-extendISA(howtoaddnewinstructions?)
• Variablelength:Lengthofinstructionsdifferent(determinedbyopcode andsub-opcode)+Compactencoding(Whyisthisgood?)
Intel432:Huffmanencoding(sortof).6to321bitinstructions.How?-- Morelogictodecodeasingleinstruction-- Hardertodecodemultipleinstructionsconcurrently
• Tradeoffs– Codesize(memoryspace,bandwidth,latency)vs.hardwarecomplexity– ISAextensibilityandexpressiveness– Performance?Smallercodevs.imperfectdecode
45
![Page 46: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/46.jpg)
Uniformvs Non-uniformDecode
• Uniformdecode:Samebitsineachinstructioncorrespondtothesamemeaning– Opcode isalwaysinthesamelocation– immediatevalues,…– Many“RISC” ISAs:Alpha,MIPS,SPARC+Easierdecode,simplerhardware+Enablesparallelism:generatetargetaddressbeforeknowingtheinstruction
isabranch-- Restrictsinstructionformat(fewerinstructions?)orwastesspace
• Non-uniformdecode– E.g.,opcode canbethe1st-7thbyteinx86+Morecompactandpowerfulinstructionformat-- Morecomplexdecodelogic
46
![Page 47: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/47.jpg)
ReducedCodeSizeinRISCs
• Hybridencoding– support16-bitand32-bitinstructionsinRISC,eg.ARMThumb,MIPS16andRISC-V– Narrowinstructionssupportfeweroperations,smalleraddressand
immediatefields,fewerregisters,andtwo-addressformatratherthantheclassicthree-addressformat
– Claimacodesizereductionofupto40%
• CompressioninIBM’sCodePack– Addshardwaretodecompressinstructionsastheyarefetchedfrom
memoryonaninstructioncachemiss– Theinstructioncachecontainsfull32-bitinstructions,but
compressedcodeiskeptinmainmemory,ROMs,andthedisk– Claimcodereduction35%- 40%– PowerPCcreateaHashtableinmemorythatmapbetween
compressedanduncompressedaddress.Codesize35%~40%
• Hitachi’sSuperH:fixed16-bitformat– 16ratherthan32registers– fewerinstructions
47
![Page 48: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/48.jpg)
SummaryofInstructionEncoding
• Threechoices– Variable,fixedandhybrid– Notethedifferencesofhybridandvariable
• Choicesofinstructionencodingisatradeoffbetween– Forperformance:fixedencoding– Forcodesize:variableencoding
• HowhybridencodingisusedinRISCtoreducecodesize– 16bitand32bit
• Ingeneral,wesee:– RISC:fixedorhybrid– CISC:variable
48
![Page 49: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/49.jpg)
8TheRoleofCompilers• Almostallprogrammingisdoneinhigh-levellanguages.
– AnISAisessentiallyacompliertarget.
• Seebackupslidesforthecompilationstagebymostcompiler,e.g.gcc
• Compilergoals:– Allcorrectprogramsexecutecorrectly– Mostcompiledprogramsexecutefast(optimizations)– Fastcompilation– Debuggingsupport
49
![Page 50: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/50.jpg)
TypicalModernCompilerStructure
Figure A.19 Compilers typically consist of two to four passes, with more highly optimizing compilers having more passes.This structure maximizes the probability that a program compiled at various levels of optimization will produce the same outputwhen given the same input. The optimizing passes are designed to be optional and may be skipped when faster compilation is thegoal and lower-quality code is acceptable. A pass is simply one phase in which the compiler reads and transforms the entireprogram. (The term phase is often used inter-changeably with pass.) Because the optimizing passes are separated, multiplelanguages can use the same optimizing and code generation passes. Only a new front end is required for a new language. 50
![Page 51: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/51.jpg)
OptimizationTypes
• Highlevel– doneatornearsourcecodelevel– Ifprocedureiscalledonlyonce,putitin-lineandsaveCALL– moregeneralcase:ifcall-count<somethreshold,putthemin-line
• Local– donewithinstraight-linecode– commonsub-expressionsproducesamevalue– eitherallocatea
registerorreplacewithsinglecopy– constantpropagation– replaceconstantvaluedvariablewiththe
constant– stackheightreduction– re-arrangeexpressiontreetominimize
temporarystorageneeds• Global– acrossabranch
– copypropagation– replaceallinstancesofavariableAthathasbeenassignedX(i.e.,A=X)withX.
– codemotion– removecodefromaloopthatcomputessamevalueeachiterationoftheloopandputitbeforetheloop
– simplifyoreliminatearrayaddressingcalculationsinloops
51
![Page 52: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/52.jpg)
OptimizationTypes
• Machine-dependentoptimizations– basedonmachineknowledge– strengthreduction– replacemultiplybyaconstantwithshifts
andadds• wouldmakesenseiftherewasnohardwaresupportforMUL• atrickierversion:17´ =arithmeticleftshift4andadd
• Pipeliningscheduling– reorderinstructionstoimprovepipelineperformance– dependencyanalysis– branchoffsetoptimization- reordercodetominimizebranch
offsets
52
![Page 53: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/53.jpg)
MajorTypesofOptimizations
53
![Page 54: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/54.jpg)
ComplierOptimizations– ChangeinIC
• L0– unoptimized• L1– localopts,codescheduling,&localreg.allocation• L2– globaloptsandlooptransformations,&globalreg.Allocation• L3– procedureintegration
gcc -O2hello.c -ohello
54
![Page 55: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/55.jpg)
CompilerBasedRegisterOptimization
• Compilerassumessmallnumberofregisters(16-32)– Optimizinguseisuptocompiler– HLLprogramshavenoexplicitreferencestoregisters
• CompilerApproach– Assignsymbolicorvirtualregistertoeachcandidatevariable– Map(unlimited)symbolicregisterstorealregisters– Symbolicregistersthatdonotoverlapcansharerealregisters– Ifyourunoutofrealregisterssomevariables
• Spilling
55
![Page 56: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/56.jpg)
GraphColoring
• Givenagraphofnodesandedges– Assignacolor toeachnode
• Adjacentnodeshavedifferentcolors• Useminimumnumberofcolors
• Registrationallocation– Nodesaresymbolicregisters– Tworegistersthatareliveinthesameprogramfragmentare
joinedbyanedge– Trytocolor thegraphwithn colors,wheren isthenumberof
realregisters– Nodesthatcannotbecolored areplacedinmemory
https://en.wikipedia.org/wiki/Graph_coloring
56
![Page 57: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/57.jpg)
Iron-codeSummary• SectionA.2—Usegeneral-purposeregisterswithaload-storearchitecture.• SectionA.3—Supporttheseaddressingmodes:displacement(withanaddressoffset
sizeof12to16bits),immediate(size8to16bits),andregisterindirect.• SectionA.4—Supportthesedatasizesandtypes:8-,16-,32-,and64-bitintegersand
64-bitIEEE754floating-pointnumbers.– Nowwesee16-bitFPfordeeplearninginGPU
• http://www.nextplatform.com/2016/09/13/nvidia-pushes-deep-learning-inference-new-pascal-gpus/
• SectionA.5—Supportthesesimpleinstructions,sincetheywilldominatethenumberofinstructionsexecuted:load,store,add,subtract,moveregister- register,andshift.
• SectionA.6—Compareequal,comparenotequal,compareless,branch(withaPC-relativeaddressatleast8bitslong),jump,call,andreturn.
• SectionA.7—Usefixedinstructionencodingifinterestedinperformance,andusevariableinstructionencodingifinterestedincodesize.
• SectionA.8—Provideatleast16general-purposeregisters,besurealladdressingmodesapplytoalldatatransferinstructions,andaimforaminimalistIS
– Oftenuseseparatefloating-pointregisters.– Thejustificationistoincreasethetotalnumberofregisterswithoutraisingproblemsin
theinstructionformatorinthespeedofthegeneral-purposeregisterfile.Thiscompromise,however,isnotorthogonal.
57
![Page 58: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/58.jpg)
RealWorldISA
58
![Page 59: Lecture 03 Instruction Set Principles · 2019-01-10 · Hence, register architecture classification (# mem, # operands) Number of memory addresses Maximum number of operands allowed](https://reader030.vdocuments.net/reader030/viewer/2022040906/5e7c0a231279f44e2370dbd0/html5/thumbnails/59.jpg)
Thedetailsindesignistotrade-off!
59