lec-4-5 mips isa & processor

MIPS ISA and PROCESSORMIPS ISA and PROCESSOR

Ajit PalAjit Pal

ProfessorProfessorDepartment of Computer Science and EngineeringDepartment of Computer Science and EngineeringIndian Institute of Technology KharagpurIndian Institute of Technology KharagpurINDIA-721302INDIA-721302

High Performance Computer High Performance Computer ArchitectureArchitecture

Case Study

MIPS Instruction Set architecture

Registers– MIPS has 32 architected 32-bit registers– Effective use of registers is key to program performance

Data Transfer– 32 words in registers, millions of words in main memory– Instruction accesses memory via memory address– Address indexes memory, a large single-dimensional array

Alignment– Most architectures address individual bytes– Addresses of sequential words differ by 4– Words must always start at addresses that are multiples of 4

Operands in MIPS

r0: always 0r1: reserved for the assemblerr2, r3: function return valuesr4~r7: function call argumentsr8~r15: “caller-saved” temporariesr16~r23 “callee-saved” temporariesr24~r25 “caller-saved” temporariesr26, r27: reserved for the operating systemr28: global pointerr29: stack pointerr30: callee-saved temporariesr31: return address

Register Usage Conventions

Registers

Data Types in MIPS

Data and instructions are both represented in bits– 32-bit architectures employ 32-bit instructions– Combination of fields specifying operations/operands

MIPS operates on:32-bit (unsigned or 2’s complement) integers,32-bit (single precision floating point) real numbers, 64-bit (double precision floating point) real numbers; 32-bit words, bytes and half words can be loaded into GPRs After loading into GPRs, bytes and half words are either zero or sign bit expanded to fill the 32 bits; Only 32-bit units can be loaded into FPRs and 32-bit real numbers are stored in even numbered FPRs. 64-bit real numbers are stored in two consecutive FPRs, starting with even-numbered register.Floating-point numbers IEEE standard 754 float: 8-bit exponent, 23-bit significand double:11-bit exponent, 52-bit significand

MIPS Memory OrganizationMIPS Memory Organization

• MIPS supports byte addressability:

it implies that a byte is the smallest unit with its

address;

32-bit word has to start at byte address that is

multiple of 4;

Thus, 32-bit word at address 4n includes four bytes

with addresses: 4n, 4n+1, 4n+2, and 4n+3.

16-bit half word has to start at byte address that is

multiple of 2; Thus, 16-bit word at address 2n includes

two bytes with addresses: 2n and 2n+1.

it implies that an address is given as 32-bit unsigned

MIPS Instruction Formats

R-Type Instruction Fields– op: basic operation of instruction, called the opcode– rs, rt: first, second register source operand– rd: register destination operand– shamt: shift amount for shift instructions– funct: specifies a variant of the operation, called function code

rdrtrsop shamt funct

• I-Type Instruction Fields– Opcode specifies instruction format

• J-Type Instruction Fields

MIPS Instruction Formats

MIPS Addressing ModesMIPS Addressing Modes

• Register Addressing– Operand is a register (e.g., add $s2, $s0, $s1)

• Base or Displacement Addressing– Operand is at memory location whose address is sum of register and constant (e.g., lw $s1, 8($s0))

• Immediate Addressing– Operand is constant within instruction (e.g., addi $s1, $s0, 4)


• PC-Relative Addressing– Address is sum of PC and constant within instruction (e.g., beq $s0, $s1, 16)


• Pseudo-direct Addressing– Address is 26 bits of constant within instruction concatenated with upper 6 bits of PC (e.g., j 1000)

Survey of MIPS Instruction Set• Arithmetic

• Addition– add $s1, $s2, $s3 # $s1 = $2 + $s3, overflow detected– addu $s1, $s2, $s3 # $s1 = $s2 x $s3, overflow undetected

• Subtraction– sub $s1, $s2, $s3 # $s1 = $s2 - $s3, overflow detected– subu $s1, $s2, $s3 # $s1 = $s2 - $s3, overflow undetected

• Immediate/Constants– addi $s1, $s2, 100 # $s1 = $s2 + 100, overflow detected– addiu $s1, $s2, 100 # $s1 = $s2 + 100, overflow undetected

• Multiply– mult $s2, $s3 # Hi, Lo = $s2 x $s3, 64-bit signed product– multu $s2, $s3 # Hi, Lo = $s2 x $s3, 64-bit unsigned product

• Divide– div $s2, $s3 # Lo = $s2/$s3, Hi = $s2 mod $s3– divu $s2, $s3 # Lo = $s2/$s3, Hi = $s2 mod $s3, unsigned

• Ancillary– mfhi $s1 # $s1 = Hi, get copy of Hi– mflo $s1 # $s1 = Lo, get copy of low

• Arithmetic

Survey of MIPS Instruction Set

• Boolean Operations– and $s1, $s2, $s3 # $s1 = $s2 & $s3, bit-wise AND– or $s1, $s2, $s3 # $s1 = $s2 | $s3, bit-wise OR

• Immediate/Constants– andi $s1, $s2, 100 # $s1 = $s2 & 100, bit-wise AND– ori $s1, $s2, 100 # $s1 = $s2 | 100, bit-wise OR

• Shifting– sll $s1, $s2, 10 # $s1 = $s2 << 10, shift left– srl $s1, $s2, 10 # $s1 = $s2 >> 100, shift right


• Logical

• Load Operations– lw $s1, 100($s2) # $s1 = Mem($s2+100), load word– lbu $s1, 100($s2) # $s1 = Mem($s2+100), load byte– lui $s1, 100 # $s1 = 100 * 2^16, load upper imm.

• Store Operations– sw $s1, 100($s2) # Mem[$s2+100] = $s1, store word– sb $s1, 100($s2) # Mem[$s2+100] = $s1, store byte


• Data Transfer

Survey of MIPS Instruction Set• Control Transfer

• Jump Operations– j 2500 # go to 10000– jal 2500 # $ra = PC+4, go to 10000, for procedure call– jr $ra # go to $ra, for procedure return

• Branch Operations– beq $s1, $s2, 25 # if($s1==$s2), go to PC+4+100, PC-relative– bne $s1, $s2, 25 # if($s1!=$s2), go to PC+4+100, PC-relative

• Comparison Operations– slt $s1, $s2, $s3 # if($s2<$s3) $s1=1, else $s1=0, 2’s comp.– slti $s1, $s2, 100 # if($s2<100), $s1=1, else $s1=0, 2’s comp.– sltu $s1, $s2, $s3 # if($s2<$s3), $s1=1, else $s1=0, unsigned– sltiu $s1, $s2, 100 # if($s2<100) $s1=1, else $s1=0, unsigned

Survey of MIPS Instruction Set• Floating Point Operations

• Arithmetic– {add.s, sub.s, mul.s, div.s} $f2, $f4, $f6 # single-precision– {add.d, sub.d, mul.d, div.d} $f2, $f4, $f6 # double-precision– Floating-point registers are used in even-odd pairs, using evennumber register as its name

• Data Transfer– {lwc1, swc1} $f1, 100($s2)– Transfer data to/from floating-point register file

• Conditional Branch– {c.lt.s, c.lt.d} $f2, $f4 # if ($f2 < $f4), cond=1, else cond=0– {bclt, bclf} 25 # if (cond==1/0), go to PC+4+100

Case Study

MIPS Instruction Set Processor

Introduction Starting point:

■ The specification of the MIPS instruction set drives the design of the hardware.

■ Will restrict design to integer type instructions Identify common functions to all instructions, and within instruction

classes – easy to do in a RISC architecture■ Instruction fetch■ Access one or more registers■ Use ALU

Asserted signals – a high or low level of a signal which implies a logically “true” condition … an “action” level. The text will only assert a logically high level, ie., a “1”.

Clocking■ Assume “edge triggered” clocking (as opposed to level sensitive).■ A storage circuit or flip-flop stores a value on the clock transition

edge.■ Model is flip-flops with combinational logic between them■ Propagation delay through combinations logic between storage

elements determines clock cycle length.■ Single clock cycle vs. multi-clock cycle design approach

Single Versus Multi-clock Cycle Design Start out with a single “long” clock cycle for each instruction .

■ Entire instruction gets executed in a single clock pulse

■ Controller is pure combinational logic

■ Design is simple

■ You would think that a single clock cycle per instruction execution would give us super high performance – but not so:

Slowest instruction determines speed of all instructions.

Because various phases of the instructions need the same hardware resource, and all is needed at the same time (clock pulse)

■ Some hardware is redundant – another disadvantage of single phaseExamples:2 memories: instruction and data memory 2 adders and an ALU

Design Summary Has a performance bottleneck

■ The clock cycle time is determined by the longest path in the machine

■ The simple jmp instruction will take as long as the load word (lw)

■ The instruction which uses the longest data path dictates the time for all others.

What about a variable time clock design?■ Still a single clock ■ Clock pulse interval is a function of the opcode ■ Average time for instruction theoretically improves

But■ It difficult to implement - lots of overhead to overcome

Let’s start simple with a single clock cycle design for simplicity reasons and later convert to multi-clock cycle.

Basic Abstract View of the Data Path

Shows common functions for most instructions

Registers

Register #

Da ta

Register #

Datamem ory

Address

Data

Register #

PC Instruction AL U

Instructionm em ory

Ad dress

Data Path for Instruction Fetching

PC

Instruct ionm em ory

Readaddre ss

Ins truct ion

4

A dd

Basic Data Path for R-type Instruction

Red lines are for control signals generated by the controller

Ins truc tionR eg is te rs

W ritereg is te r

R eaddata 1

R eaddata 2

R ea dreg is te r 1

R ea dreg is te r 2

W riteda ta

A L Uresu lt

A LU

Ze ro

R egW rite

ALU operation3

Adding the Data Path for lw & sw Instruction

Implements:lw $t1, offset_value($t2)sw $t1, offset_value($t2)The offset value is a 16-bit signed immediate field & must be sign extended to 32 bits

Immediate offset data

Ins truction

16 32

R egiste rsW ri tereg is te r

Re addata 1

Re addata 2

R ea dregis te r 1

R ea dregis te r 2

Da tam em ory

W ri tedata

Readdata

W ri tedata

S ignex tend

ALUres ul t

Z eroALU

Address

M em Read

M em W rite

R egW rite

ALU operation3

Adding the Data Path for beq Instruction

Implements beq $t1, $t2, offsetOffset is a signed 16 bit immediate field, & thus must be

sign extended. In addition we shift left by 2 (make low bits are 00)to address to a word boundary

To PC

16 32Sign

extend

Z eroALU

Sum

Shiftleft 2

To branchcontrol logic

Branch target

P C + 4 from instruction datapath

Instruction

Add

R egistersW riteregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

W ritedata

ALU operation3

RegWrite

Putting It All Together

j instruction to be added laterNeed control circuits to drive control lines in red.Two control units will be designd: ALU Control & “Main Control

Incremented PC or beq branch address

unsuccessful branch

Successful branch

PC

Instructio nmemory

Readaddress

Instruction

16 32

Add ALUresult

Mux

Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1Readregister 2

S hiftleft 2

4

Mux

ALU operation3

RegWrite

Mem Read

MemW rite

PCSrc

ALU Src

M emtoReg

ALUresult

Z eroALU

D atamemory

Address

Writedata

Readd ata M

ux

Signextend

Add

Instruction RegDst RegWrite ALUSrc MemRea

d

MemWri

te

MemToReg PCSrc ALU

operation

R-format 1 1 0 0 0 0 0 0000

(and)

0001

(or)

0010

(add)

0110

(sub)

lw 0 1 1 1 0 1 0 0010

(add)

sw X 0 1 0 1 X 0 0010

(add)

beq x 0 0 0 0 X 1 or 0 0110

(sub)

Control

We next add the control unit that generates ■ write signal for each state element■ control signals for each multiplexer■ ALU control signal

Input to control unit: instruction opcode and function

code

Control Unit

Divided into two parts■ Main Control Unit

● Input: 6-bit opcode● Output: all control signals for Muxes, RegWrite,

MemRead, MemWrite and a 2-bit ALUOp signal■ ALU Control Unit

● Input: 2-bit ALUOp signal generated from Main Control Unit and 6-bit instruction function code

● Output: 4-bit ALU control signal

Truth Table for Main Control Unit

Main Control Unit

ALU Control Unit

Must describe hardware to compute 4-bit ALU

control input given■ 2-bit ALUOp signal from Main Control Unit ■ function code for arithmetic

Describe it using a truth table (can turn into gates):

ALU Control bits

0010

0010

0110

0010

0110

0000

0001

0111

Putting It All Together Again

PC

Instructionmemory

Readaddress

Instruction[31– 0]

Instruction [20 16]

Instruction [25 21]

Add

Instruction [5 0]

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

BranchRegDst

ALUSrc

Instruction [31 26]

4

16 32Instruction [15 0]

0

0Mux

0

1

Control

Add ALUresult

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

PCSrc

Datamemory

Writedata

Readdata

Mux

1

Instruction [15 11]

ALUcontrol

Shiftleft 2

ALUAddress

Use this for R-type, memory, & beq instructions scenarios.

Addition of the Unconditional Jump

We now add one more op code to our single cycle design:■ Op code 2: “j”■ The format is op field 28-31 is a “2”■ Remaining 26 low bits is the immediate target address

The full 32 bit target address is computed by concatenating:■ Upper 4 bits of PC+4■ 26 bit immediate field of the jump instruction■ Bits 00 in the lowest positions (word boundary)■ See text chapter 3, p. 150

An additional control line from the main controller will have to be generated to select this “new” instruction

A two bit shifter is also added to get the two low order zeros

Final Design with jump Instruction

S hiftleft 2

PC

Instructionmemory

Readaddress

Instruction[31– 0]

Datamemory

Readdata

W ritedata

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction [15– 11]



Add

ALUresult

Zero


MemtoReg

ALUOp

MemW rite

RegWrite

MemR ead

Branch

JumpRegDst

ALUSrc


4

Mux

Instruction [25– 0] Jump address [31– 0]

PC +4 [31– 28]

Signextend

16 32Instruction [15– 0]

1

Mux

1

0

Mux

0

1

Mux

0

1

ALUcontrol

Control

Add ALUresult

Mux

0

1 0

ALU

Shiftleft 2

26 28

Address

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Thanks!Thanks!

lec-4-5 mips isa & processor

Documents