cs 61c: great ideas in computer architecture single-cycle...
TRANSCRIPT
CS61C:GreatIdeasinComputerArchitecture
Single-CycleCPUDatapath &Control
1
Instructors:VladimirStojanovicandNicholasWeaverhttp://inst.eecs.Berkeley.edu/~cs61c/sp16
StagesofExecutiononDatapath
instruction
memory
+4
rtrsrd
registers
ALU
Data
memory
imm
1.InstructionFetch
2.Decode/RegisterRead
3.Execute 4.Memory 5.RegisterWrite
PC
2
WhyFiveStages?(1/2)
• Couldwehaveadifferentnumberofstages?– Yes,otherISAshavedifferentnaturalnumberofstages• Andthesedays,pipeliningcanbemuchmoreaggressivethanthe"natural"5stagesMIPSuses
• WhydoesMIPShavefiveifinstructionstendtoidleforatleastonestage?– Fivestagesaretheunionofalltheoperationsneededbyalltheinstructions.
– Oneinstructionusesallfivestages:theload3
WhyFiveStages?(2/2)• lw $r3,16($r1)#r3=Mem[r1+16]– Stage1:fetchthisinstruction,incrementPC– Stage2:decodetodetermineitisalw,thenreadregister$r1
– Stage3:add16 tovalueinregister$r1 (retrievedinStage2)
– Stage4:readvaluefrommemoryaddresscomputedinStage3
– Stage5:writevaluereadinStage4intoregister$r3
4
ALU
instruction
memory
+4
registers
Data
memory
imm
31x
reg[1] +16reg[1]
MEM[r1+16]
Example:lw InstructionPC
lw r3,17(r1)
5
16
Clickers/PeerInstruction
• WhichtypeofMIPSinstructionisactiveinthefeweststages?
A:LWB:BEQC:JD:JALE:ADDU
6
ProcessorDesign:5stepsStep1:Analyzeinstructionsettodetermine datapathrequirements
– Meaningofeachinstructionisgivenbyregistertransfers– Datapath mustincludestorageelementforISAregisters– Datapath mustsupporteachregistertransferStep2:Selectsetofdatapath components&establishclockmethodology
Step3:Assembledatapath componentsthatmeettherequirements
Step4:Analyzeimplementationofeachinstructiontodeterminesettingofcontrolpointsthatrealizestheregistertransfer
Step5:Assemblethecontrollogic7
• AllMIPSinstructionsare32bitslong.3formats:
– R-type
– I-type
– J-type
• Thedifferentfieldsare:– op:operation(“opcode”)oftheinstruction– rs,rt,rd:thesourceanddestinationregisterspecifiers– shamt:shiftamount– funct:selectsthevariantoftheoperationinthe“op”field– address/immediate:addressoffsetorimmediatevalue– targetaddress:targetaddressofjumpinstruction
op target address02631
6 bits 26 bits
op rs rt rd shamt funct061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt address/immediate016212631
6 bits 16 bits5 bits5 bits
TheMIPSInstructionFormats
8
• ADDUandSUBU– addu rd,rs,rt– subu rd,rs,rt
• ORImmediate:– ori rt,rs,imm16
• LOADandSTOREWord– lw rt,rs,imm16– sw rt,rs,imm16
• BRANCH:– beq rs,rt,imm16
op rs rt rd shamt funct061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
TheMIPS-liteSubset
9
• Colloquiallycalled“RegisterTransferLanguage”• RTLgivesthemeaning oftheinstructions• Allstartbyfetchingtheinstructionitself{op , rs , rt , rd , shamt , funct} ← MEM[ PC ]
{op , rs , rt , Imm16} ← MEM[ PC ]
Inst Register Transfers
ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4
SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4
ORI R[rt] ← R[rs] | zero_ext(Imm16); PC ← PC + 4
LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4
STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4
BEQ if ( R[rs] == R[rt] )PC ← PC + 4 + {sign_ext(Imm16), 2’b00}
else PC ← PC + 4
RegisterTransferLevel(RTL)
10
Step1:RequirementsoftheInstructionSet
• Memory(MEM)– Instructions&data(willuseoneforeach)
• Registers(R:32,32-bitwideregisters)– ReadRS– ReadRT– WriteRTorRD
• ProgramCounter(PC)• Extender(sign/zeroextend)• Add/Sub/OR/etc unitforoperationonregister(s)or
extendedimmediate(ALU)• Add4(+maybeextendedimmediate)toPC• Compareregisters?
11
Step2:ComponentsoftheDatapath
• CombinationalElements• StorageElements+ClockingMethodology• BuildingBlocks
32
32
A
B32
Sum
CarryOut
CarryIn
Adder
32A
B32
Y32
Select
MUX
Multiplexer
32
32
A
B32
Result
OP
ALU
ALU
Adder
12
ALUNeedsforMIPS-lite+RestofMIPS• Addition,subtraction,logicalOR,==:
ADDU R[rd] = R[rs] + R[rt]; ...SUBU R[rd] = R[rs] – R[rt]; ... ORI R[rt] = R[rs] | zero_ext(Imm16)...
BEQ if ( R[rs] == R[rt] )...
• Testtoseeifoutput==0foranyALUoperationgives==test.How?
• P&HalsoaddsAND,SetLessThan(1ifA<B,0otherwise)
• ALUfollowsChapter513
StorageElement:IdealizedMemory• “Magic”Memory– Oneinputbus:DataIn– Oneoutputbus:DataOut
• Memorywordisfoundby:– ForRead:AddressselectsthewordtoputonData_Out– ForWrite:SetWrite_Enable =1:addressselectsthememorywordtobewrittenviatheData_In bus
• Clockinput(CLK)– CLKinputisafactorONLYduringwriteoperation– Duringreadoperation,behavesasacombinationallogicblock:Addressvalid⇒ Data_Out validafter“accesstime”
Clk
Data_In
Write_Enable
32 32Data_Out
Address
14
StorageElement:Register(BuildingBlock)
• SimilartoDFlip-Flopexcept– N-bitinputandoutput–Write_Enable input
• Write_Enable:– Negated(ordeasserted)(0):Data_Out willnotchange
– Asserted(1):Data_Out willbecomeData_In onpositiveedgeofclock
clk
Data_In
Write_Enable
N N
Data_Out
15
StorageElement:RegisterFile
• RegisterFileconsistsof32registers:– Two32-bitoutputbusses:
busAandbusB– One32-bitinputbus:busW
• Registerisselectedby:– RA(number)selectstheregistertoputonbusA (data)– RB(number)selectstheregistertoputonbusB (data)– RW(number)selectstheregistertobewritten
viabusW (data)whenWrite_Enable is1• Clockinput(clk)
– Clk inputisafactorONLYduringwriteoperation– Duringreadoperation,behavesasacombinationallogicblock:
• RAorRBvalid⇒ busA orbusB validafter“accesstime.”
Clk
busW
Write_Enable
3232
busA
32busB
5 5 5RW RA RB
32x32-bitRegisters
16
Step3a:InstructionFetchUnit• RegisterTransferRequirements⇒DatapathAssembly
• InstructionFetch• ReadOperandsandExecuteOperation
• CommonRTLoperations– FetchtheInstruction:mem[PC]
– Updatetheprogramcounter:• SequentialCode:PC← PC+4
• BranchandJump:PC← “somethingelse” 32
InstructionWordAddress
InstructionMemory
PCclk
NextAddressLogic
17
• R[rd] = R[rs] op R[rt] (addu rd,rs,rt)– Ra,Rb,andRw comefrominstruction’sRs,Rt,andRdfields
– ALUctr and RegWr:controllogicafterdecodingtheinstruction
• …Alreadydefinedtheregisterfile&ALU
Step3b:Add&Subtract
32Result
ALUctr
clk
busW
RegWr
3232
busA
32busB
5 5 5
Rw Ra Rb32x32-bitRegisters
Rs RtRd
ALUop rs rt rd shamt funct
061116212631
6bits 6bits5bits5bits5bits5bits
18
ClockingMethodology
• Storageelementsclockedbysameedge• Flip-flops(FFs)andcombinationallogichavesomedelays
– Gates:delayfrominputchangetooutputchange– SignalsatFFDinputmustbestablebeforeactiveclockedgetoallow
signaltotravelwithintheFF(set-uptime),andwehavetheusualclock-to-Qdelay
• “Criticalpath” (longestpaththroughlogic)determineslengthofclockperiod
Clk
.
.
.
.
.
.
.
.
.
.
.
.
19
Register-RegisterTiming:OneCompleteCycle(Add/Sub)
Clk
PCRs,Rt,Rd,Op,Func
ALUctr
InstructionMemoryAccessTime
OldValue NewValue
RegWr OldValue NewValue
DelaythroughControl Logic
busA,BRegisterFileAccessTime
OldValue NewValue
busWALUDelay
OldValue NewValue
OldValue NewValue
NewValueOldValue
RegisterWriteOccursHere32
clk
busW32busA
32
busB
5 5
Rw Ra Rb
RegFile
Rs Rt
ALU
5Rd
20
ALUctrRegWr
3c:LogicalOp(or)withImmediate• R[rt]=R[rs]opZeroExt[imm16]
op rs rt immediate016212631
6bits 16bits5bits5bits
immediate016 1531
16bits16bits0000000000000000
WhataboutRtRead?
32
ALUctr
clk
RegWr
32
32busA
32
busB
5 5
Rw Ra Rb
RegFile
Rs
Rt
Rt
Rd
ZeroExt 3216imm16
ALUSrc
01
0
1
ALU5
RegDst
WritingtoRt register(notRd)!!
21
3d:LoadOperations• R[rt]=Mem[R[rs]+SignExt[imm16]]Example:lw rt,rs,imm16
op rs rt immediate016212631
6bits 16bits5bits5bits
32
ALUctr
clk
busW
RegWr
32
32busA
32
busB
5 5
Rw Ra Rb
RegFile
Rs
Rt
Rt
RdRegDst
Extender 3216Imm16
ALUSrcExtOp
MemtoReg
clk
01
0
1
ALU 0
1Adr
DataMemory
5
22
3e:StoreOperations• Mem[R[rs]+SignExt[imm16]]=R[rt]Ex.:sw rt, rs, imm16
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
32
ALUctr
clk
busW
RegWr
32
32busA
32
busB
5 5
Rw Ra Rb
RegFile
Rs
Rt
Rt
RdRegDst
Extender 3216Imm16
ALUSrcExtOp
MemtoReg
clk
Data_In32
MemWr01
0
1
ALU 0
1WrEn Adr
DataMemory
5
23
3e:StoreOperations• Mem[R[rs]+SignExt[imm16]]=R[rt]Ex.:sw rt, rs, imm16
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
32
ALUctr
clk
busW
RegWr
32
32busA
32
busB
5 5
Rw Ra Rb
RegFile
Rs
Rt
Rt
RdRegDst
Extender 3216imm16
ALUSrcExtOp
MemtoReg
clk
Data In32
MemWr01
0
1
ALU 0
1WrEn Adr
DataMemory
5
24
3f:TheBranchInstruction
beq rs, rt, imm16–mem[PC]Fetchtheinstructionfrommemory– Equal=(R[rs]==R[rt])Calculatebranchcondition– if(Equal)Calculatethenextinstruction’saddress
• PC=PC+4+(SignExt(imm16)x4)
else• PC=PC+4
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
25
Datapath forBranchOperationsbeq rs,rt,imm16
Datapath generatescondition(Equal)
op rs rt immediate016212631
6 bits 16 bits5 bits5 bits
Already have mux, adder, need special sign extender for PC, need equal compare (sub?)Imm16
clk
PC
00
4 nPC_sel
PC Ext
Adder
Adder
Mux
Inst Address
32
ALUctr
clk
busW
RegWr
32busA
32
busB
5 5
Rw Ra Rb
RegFile
Rs Rt
ALU
5
=
Equal
26
InstructionFetchUnitincludingBranch
• if(Equal==1)thenPC=PC+4+SignExt[imm16]*4;elsePC=PC+4
op rs rt immediate016212631
• HowtoencodenPC_sel?•DirectMUXselect?•Branchinst./notbranchinst.
• Let’spick2ndoption
Adr
InstMemory
nPC_selInstruction<31:0>
Equal
nPC_sel
Q:Whatlogicgate?
imm16 clk
PC
00
4
PCExt
AdderAdder
Mux
0
1
MUX_ctrl
27
nPC_sel zero? MUX0 x 01 0 01 1 1
Equal MUX_ctrl
PuttingitAllTogether:A SingleCycleDatapath
Imm16
32
ALUctr
clk
busW
RegWr
32
32busA
32
busB
5 5
Rw Ra Rb
RegFile
Rs
Rt
Rt
RdRegDst
Extender
3216Imm16
ALUSrcExtOp
MemtoReg
clk
Data_In32
MemWrEqual
Instruction<31:0><21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRtRs
clk
PC
00
4
nPC_sel & Equal
PC Ext
Adr
InstMemory
Adder
Adder
Mux
01
0
1
=ALU 0
1WrEn Adr
DataMemory
5
28
imm16
32
ALUctr
clk
busW
RegWr
32
32busA
32
busB
5 5Rw Ra Rb
RegFile
Rs
Rt
Rt
RdRegDst
Extender 3216imm16
ALUSrcExtOp
MemtoReg
clk
Data In32
MemWrEqual
Instruction<31:0><21:25>
<16:20>
<11:15>
<0:15>
Imm16RdRtRs
clk
PC
00
4
nPC_sel &Equal
PC Ext
Adr
InstMemory
Adder
Adder
Mux
01
0
1
=ALU 0
1WrEn Adr
DataMemory
5
Clickers/PeerInstructionWhatnewinstructionwouldneednonewdatapath hardware?• A:branchifreg==immediate• B:addtworegistersandbranchifresultzero• C:storewithauto-incrementofbaseaddress:
– sw rt,rs,offset//rs incrementedbyoffsetafterstore
• D:shiftleftlogicalbytwobits
29
Administrivia
• Project2-2released,due3/08(aTuesday!)
• GuerrillaSession:– SynchronousDigitalSystems–Wed3/023- 5PM@241Cory– Sat3/051- 3PM@521Cory
30