lec-4-5 mips isa & processor
TRANSCRIPT
MIPS ISA and PROCESSORMIPS ISA and PROCESSOR
Ajit PalAjit Pal
ProfessorProfessorDepartment of Computer Science and EngineeringDepartment of Computer Science and EngineeringIndian Institute of Technology KharagpurIndian Institute of Technology KharagpurINDIA-721302INDIA-721302
High Performance Computer High Performance Computer ArchitectureArchitecture
Case Study
MIPS Instruction Set architecture
Registers– MIPS has 32 architected 32-bit registers– Effective use of registers is key to program performance
Data Transfer– 32 words in registers, millions of words in main memory– Instruction accesses memory via memory address– Address indexes memory, a large single-dimensional array
Alignment– Most architectures address individual bytes– Addresses of sequential words differ by 4– Words must always start at addresses that are multiples of 4
Operands in MIPS
r0: always 0r1: reserved for the assemblerr2, r3: function return valuesr4~r7: function call argumentsr8~r15: “caller-saved” temporariesr16~r23 “callee-saved” temporariesr24~r25 “caller-saved” temporariesr26, r27: reserved for the operating systemr28: global pointerr29: stack pointerr30: callee-saved temporariesr31: return address
Register Usage Conventions
Registers
Data Types in MIPS
Data and instructions are both represented in bits– 32-bit architectures employ 32-bit instructions– Combination of fields specifying operations/operands
MIPS operates on:32-bit (unsigned or 2’s complement) integers,32-bit (single precision floating point) real numbers, 64-bit (double precision floating point) real numbers; 32-bit words, bytes and half words can be loaded into GPRs After loading into GPRs, bytes and half words are either zero or sign bit expanded to fill the 32 bits; Only 32-bit units can be loaded into FPRs and 32-bit real numbers are stored in even numbered FPRs. 64-bit real numbers are stored in two consecutive FPRs, starting with even-numbered register.Floating-point numbers IEEE standard 754 float: 8-bit exponent, 23-bit significand double:11-bit exponent, 52-bit significand
MIPS Memory OrganizationMIPS Memory Organization
• MIPS supports byte addressability:
it implies that a byte is the smallest unit with its
address;
32-bit word has to start at byte address that is
multiple of 4;
Thus, 32-bit word at address 4n includes four bytes
with addresses: 4n, 4n+1, 4n+2, and 4n+3.
16-bit half word has to start at byte address that is
multiple of 2; Thus, 16-bit word at address 2n includes
two bytes with addresses: 2n and 2n+1.
it implies that an address is given as 32-bit unsigned
MIPS Instruction Formats
R-Type Instruction Fields– op: basic operation of instruction, called the opcode– rs, rt: first, second register source operand– rd: register destination operand– shamt: shift amount for shift instructions– funct: specifies a variant of the operation, called function code
rdrtrsop shamt funct
• I-Type Instruction Fields– Opcode specifies instruction format
• J-Type Instruction Fields
MIPS Instruction Formats
MIPS Addressing ModesMIPS Addressing Modes
• Register Addressing– Operand is a register (e.g., add $s2, $s0, $s1)
• Base or Displacement Addressing– Operand is at memory location whose address is sum of register and constant (e.g., lw $s1, 8($s0))
• Immediate Addressing– Operand is constant within instruction (e.g., addi $s1, $s0, 4)
MIPS Addressing ModesMIPS Addressing Modes
• PC-Relative Addressing– Address is sum of PC and constant within instruction (e.g., beq $s0, $s1, 16)
MIPS Addressing ModesMIPS Addressing Modes
• Pseudo-direct Addressing– Address is 26 bits of constant within instruction concatenated with upper 6 bits of PC (e.g., j 1000)
Survey of MIPS Instruction Set• Arithmetic
• Addition– add $s1, $s2, $s3 # $s1 = $2 + $s3, overflow detected– addu $s1, $s2, $s3 # $s1 = $s2 x $s3, overflow undetected
• Subtraction– sub $s1, $s2, $s3 # $s1 = $s2 - $s3, overflow detected– subu $s1, $s2, $s3 # $s1 = $s2 - $s3, overflow undetected
• Immediate/Constants– addi $s1, $s2, 100 # $s1 = $s2 + 100, overflow detected– addiu $s1, $s2, 100 # $s1 = $s2 + 100, overflow undetected
• Multiply– mult $s2, $s3 # Hi, Lo = $s2 x $s3, 64-bit signed product– multu $s2, $s3 # Hi, Lo = $s2 x $s3, 64-bit unsigned product
• Divide– div $s2, $s3 # Lo = $s2/$s3, Hi = $s2 mod $s3– divu $s2, $s3 # Lo = $s2/$s3, Hi = $s2 mod $s3, unsigned
• Ancillary– mfhi $s1 # $s1 = Hi, get copy of Hi– mflo $s1 # $s1 = Lo, get copy of low
• Arithmetic
Survey of MIPS Instruction Set
• Boolean Operations– and $s1, $s2, $s3 # $s1 = $s2 & $s3, bit-wise AND– or $s1, $s2, $s3 # $s1 = $s2 | $s3, bit-wise OR
• Immediate/Constants– andi $s1, $s2, 100 # $s1 = $s2 & 100, bit-wise AND– ori $s1, $s2, 100 # $s1 = $s2 | 100, bit-wise OR
• Shifting– sll $s1, $s2, 10 # $s1 = $s2 << 10, shift left– srl $s1, $s2, 10 # $s1 = $s2 >> 100, shift right
Survey of MIPS Instruction Set
• Logical
• Load Operations– lw $s1, 100($s2) # $s1 = Mem($s2+100), load word– lbu $s1, 100($s2) # $s1 = Mem($s2+100), load byte– lui $s1, 100 # $s1 = 100 * 2^16, load upper imm.
• Store Operations– sw $s1, 100($s2) # Mem[$s2+100] = $s1, store word– sb $s1, 100($s2) # Mem[$s2+100] = $s1, store byte
Survey of MIPS Instruction Set
• Data Transfer
Survey of MIPS Instruction Set• Control Transfer
• Jump Operations– j 2500 # go to 10000– jal 2500 # $ra = PC+4, go to 10000, for procedure call– jr $ra # go to $ra, for procedure return
• Branch Operations– beq $s1, $s2, 25 # if($s1==$s2), go to PC+4+100, PC-relative– bne $s1, $s2, 25 # if($s1!=$s2), go to PC+4+100, PC-relative
• Comparison Operations– slt $s1, $s2, $s3 # if($s2<$s3) $s1=1, else $s1=0, 2’s comp.– slti $s1, $s2, 100 # if($s2<100), $s1=1, else $s1=0, 2’s comp.– sltu $s1, $s2, $s3 # if($s2<$s3), $s1=1, else $s1=0, unsigned– sltiu $s1, $s2, 100 # if($s2<100) $s1=1, else $s1=0, unsigned
Survey of MIPS Instruction Set• Floating Point Operations
• Arithmetic– {add.s, sub.s, mul.s, div.s} $f2, $f4, $f6 # single-precision– {add.d, sub.d, mul.d, div.d} $f2, $f4, $f6 # double-precision– Floating-point registers are used in even-odd pairs, using evennumber register as its name
• Data Transfer– {lwc1, swc1} $f1, 100($s2)– Transfer data to/from floating-point register file
• Conditional Branch– {c.lt.s, c.lt.d} $f2, $f4 # if ($f2 < $f4), cond=1, else cond=0– {bclt, bclf} 25 # if (cond==1/0), go to PC+4+100
Case Study
MIPS Instruction Set Processor
Introduction Starting point:
■ The specification of the MIPS instruction set drives the design of the hardware.
■ Will restrict design to integer type instructions Identify common functions to all instructions, and within instruction
classes – easy to do in a RISC architecture■ Instruction fetch■ Access one or more registers■ Use ALU
Asserted signals – a high or low level of a signal which implies a logically “true” condition … an “action” level. The text will only assert a logically high level, ie., a “1”.
Clocking■ Assume “edge triggered” clocking (as opposed to level sensitive).■ A storage circuit or flip-flop stores a value on the clock transition
edge.■ Model is flip-flops with combinational logic between them■ Propagation delay through combinations logic between storage
elements determines clock cycle length.■ Single clock cycle vs. multi-clock cycle design approach
Single Versus Multi-clock Cycle Design Start out with a single “long” clock cycle for each instruction .
■ Entire instruction gets executed in a single clock pulse
■ Controller is pure combinational logic
■ Design is simple
■ You would think that a single clock cycle per instruction execution would give us super high performance – but not so:
Slowest instruction determines speed of all instructions.
Because various phases of the instructions need the same hardware resource, and all is needed at the same time (clock pulse)
■ Some hardware is redundant – another disadvantage of single phaseExamples:2 memories: instruction and data memory 2 adders and an ALU
Design Summary Has a performance bottleneck
■ The clock cycle time is determined by the longest path in the machine
■ The simple jmp instruction will take as long as the load word (lw)
■ The instruction which uses the longest data path dictates the time for all others.
What about a variable time clock design?■ Still a single clock ■ Clock pulse interval is a function of the opcode ■ Average time for instruction theoretically improves
But■ It difficult to implement - lots of overhead to overcome
Let’s start simple with a single clock cycle design for simplicity reasons and later convert to multi-clock cycle.
Basic Abstract View of the Data Path
Shows common functions for most instructions
Registers
Register #
Da ta
Register #
Datamem ory
Address
Data
Register #
PC Instruction AL U
Instructionm em ory
Ad dress
Data Path for Instruction Fetching
PC
Instruct ionm em ory
Readaddre ss
Ins truct ion
4
A dd
Basic Data Path for R-type Instruction
Red lines are for control signals generated by the controller
Ins truc tionR eg is te rs
W ritereg is te r
R eaddata 1
R eaddata 2
R ea dreg is te r 1
R ea dreg is te r 2
W riteda ta
A L Uresu lt
A LU
Ze ro
R egW rite
ALU operation3
Adding the Data Path for lw & sw Instruction
Implements:lw $t1, offset_value($t2)sw $t1, offset_value($t2)The offset value is a 16-bit signed immediate field & must be sign extended to 32 bits
Immediate offset data
Ins truction
16 32
R egiste rsW ri tereg is te r
Re addata 1
Re addata 2
R ea dregis te r 1
R ea dregis te r 2
Da tam em ory
W ri tedata
Readdata
W ri tedata
S ignex tend
ALUres ul t
Z eroALU
Address
M em Read
M em W rite
R egW rite
ALU operation3
Adding the Data Path for beq Instruction
Implements beq $t1, $t2, offsetOffset is a signed 16 bit immediate field, & thus must be
sign extended. In addition we shift left by 2 (make low bits are 00)to address to a word boundary
To PC
16 32Sign
extend
Z eroALU
Sum
Shiftleft 2
To branchcontrol logic
Branch target
P C + 4 from instruction datapath
Instruction
Add
R egistersW riteregister
Readdata 1
Readdata 2
Readregister 1
Readregister 2
W ritedata
ALU operation3
RegWrite
Putting It All Together
j instruction to be added laterNeed control circuits to drive control lines in red.Two control units will be designd: ALU Control & “Main Control
Incremented PC or beq branch address
unsuccessful branch
Successful branch
PC
Instructio nmemory
Readaddress
Instruction
16 32
Add ALUresult
Mux
Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1Readregister 2
S hiftleft 2
4
Mux
ALU operation3
RegWrite
Mem Read
MemW rite
PCSrc
ALU Src
M emtoReg
ALUresult
Z eroALU
D atamemory
Address
Writedata
Readd ata M
ux
Signextend
Add
Instruction RegDst RegWrite ALUSrc MemRea
d
MemWri
te
MemToReg PCSrc ALU
operation
R-format 1 1 0 0 0 0 0 0000
(and)
0001
(or)
0010
(add)
0110
(sub)
lw 0 1 1 1 0 1 0 0010
(add)
sw X 0 1 0 1 X 0 0010
(add)
beq x 0 0 0 0 X 1 or 0 0110
(sub)
Control
We next add the control unit that generates ■ write signal for each state element■ control signals for each multiplexer■ ALU control signal
Input to control unit: instruction opcode and function
code
Control Unit
Divided into two parts■ Main Control Unit
● Input: 6-bit opcode● Output: all control signals for Muxes, RegWrite,
MemRead, MemWrite and a 2-bit ALUOp signal■ ALU Control Unit
● Input: 2-bit ALUOp signal generated from Main Control Unit and 6-bit instruction function code
● Output: 4-bit ALU control signal
Truth Table for Main Control Unit
Main Control Unit
ALU Control Unit
Must describe hardware to compute 4-bit ALU
control input given■ 2-bit ALUOp signal from Main Control Unit ■ function code for arithmetic
Describe it using a truth table (can turn into gates):
ALU Control bits
0010
0010
0110
0010
0110
0000
0001
0111
Putting It All Together Again
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
BranchRegDst
ALUSrc
Instruction [31 26]
4
16 32Instruction [15 0]
0
0Mux
0
1
Control
Add ALUresult
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
PCSrc
Datamemory
Writedata
Readdata
Mux
1
Instruction [15 11]
ALUcontrol
Shiftleft 2
ALUAddress
Use this for R-type, memory, & beq instructions scenarios.
Addition of the Unconditional Jump
We now add one more op code to our single cycle design:■ Op code 2: “j”■ The format is op field 28-31 is a “2”■ Remaining 26 low bits is the immediate target address
The full 32 bit target address is computed by concatenating:■ Upper 4 bits of PC+4■ 26 bit immediate field of the jump instruction■ Bits 00 in the lowest positions (word boundary)■ See text chapter 3, p. 150
An additional control line from the main controller will have to be generated to select this “new” instruction
A two bit shifter is also added to get the two low order zeros
Final Design with jump Instruction
S hiftleft 2
PC
Instructionmemory
Readaddress
Instruction[31– 0]
Datamemory
Readdata
W ritedata
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction [15– 11]
Instruction [20– 16]
Instruction [25– 21]
Add
ALUresult
Zero
Instruction [5– 0]
MemtoReg
ALUOp
MemW rite
RegWrite
MemR ead
Branch
JumpRegDst
ALUSrc
Instruction [31– 26]
4
Mux
Instruction [25– 0] Jump address [31– 0]
PC +4 [31– 28]
Signextend
16 32Instruction [15– 0]
1
Mux
1
0
Mux
0
1
Mux
0
1
ALUcontrol
Control
Add ALUresult
Mux
0
1 0
ALU
Shiftleft 2
26 28
Address
Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur
Thanks!Thanks!