pipelining basics - bt.nitk.ac.in · data types and sizes signed and unsigned data – 2's...

Pipelining Basics

Outline● Addressing Modes● MIPS ISA● MIPS Pipeline

Addressing Modes● How are operands specified in instructions?

Add R1, R2, R3 Regs[R4] <- Regs[R3] + Regs[R2] Register

Add R4, R3, #5 Regs[R4] <- Regs[R3] + 5 Immediate

Regs[R4] <- Regs[R3] + Mem[100 + Regs[R1]]

DisplacementAdd R4, R3, 100(R1)

Regs[R4] <- Regs[R3] + Mem[Regs[R1]]

Register IndirectAdd R4, R3, (R1)

Regs[R4] <- Regs[R3] + Mem[0x475] AbsoluteAdd R4, R3, (0x475)

Regs[R4] <- Regs[R3] + Mem[Mem[R1]]

Memory IndirectAdd R4, R3, @(R1)

Regs[R4] <- Regs[R3] + Mem[100 + PC]

PC relativeAdd R4, R3, 100(PC)

Regs[R4] <- Regs[R3] + Mem[100 + Regs[R1] + Regs[R5] * 4]

ScaledAdd R4, R3, 100(R1)[R5]

Data Types and Sizes

● Signed and Unsigned Data– 2's complement representation

● Real numbers (Floating point)– IEEE 754 Single precision and Double precision

● Addresses

ISA Encoding● Fixed Width

– Eg.: RISC Architectures: MIPS, PowerPC, SPARC, ARM

● Variable Length (Mostly Fixed or Compressed)– Eg. CISC Architectures: IBM 360, x86, Motorola

68K, VAX, …

● Mostly fixed or Compressed– MIPS16, THUMB

● Very Long Instruction Words– Multiple instructions in a fixed width bundle

– Eg.: Multiflow, HP/ST Lx, TI C6000

x86 (IA-32) Instruction Encoding

InstructionPrefix

Opcode ModR/MScale,Index

BaseDisplace

mentImmediate

Up to four prefixes

(1 byte each)

1, 2 or 3B 1B(if needed)

1B(if needed)

0,1,2, or 4B(if needed)

0,1,2, or 4B(if needed)

x86 and x86-64 instruction formatPossible instructions 1 to 18 bytes long

REP MOVSB

Example – MIPS64 ISA● RISC, load-store architecture● 32-bit instructions, fixed format● 32 64-bit GPRs, R0-R31, 32 64-bit FPRs, F0-F31

– R0 is hardwired to 0.

– Can hold 32-bit floats also (with other ½ unused).

– “SIMD” extensions operate on more floats in 1 FPR

● Special registers– Floating-point status register

● Load/store 8-, 16-, 32-, 64-bit integers– All sign-extended to fill 64-bit GPR

– Also 32- bit floats/doubles

MIPS64 Addressing Modes● Register (Arithmetic, Logical ops only)● Immediate (Arithmetic, Logical ) & Displacement

(load/stores only)– 16-bit immediate/offset field

– Register indirect: use 0 as displacement offset

– Direct (absolute): use R0 as displacement base

● Byte-addressed memory, 64-bit address● Software-settable big-endian/little-endian flag● Alignment required 100 101 102 103

104 105 106 107

Word aligned addresses

MIPS64 InstructionsDATA TRANSFER INSTRUCTIONSInstruction Opcode/Mnemonic Examples

Load LB, LBU, LH, LHU, LW, LWU, LD, SDL.S, L.D

LD R1, 30(R2)L.S F0, 50(R3)

Store SB, SH, SW, SDS.S, S.D

SH R3, 502(R2)SB R2, R1(R3)

● L: Load● S: Store

● B: Byte (8b), H: Half Word (16b), W: Word (32b)

● U: Upper● I: Immediate

Decode Instruction, Fetch Operands, Effective address calculation,

Memory access, Update RF.

MIPS64 Instructions

ARITHMETIC/LOGICAL INSTRUCTIONS

Logical and Arithmetic Shift, Set less than…

DADD, DADDI, DADDIU, DSUB, DSUBU, DMUL, DMULU, DDIV, DDIVUAND, OR, XOR, ANDI, ORI, XORILUIDSLL, DSRL, SLT, SLTI, SLTU

DADDU R1, R2, R3

ANDI R1, #43

SLT R1, R2, R3

Decode Instruction, Fetch operands, Arithmetic operation, Update results in RF.

MIPS64 Instructions

CONTROL INSTRUCTIONS

Branch, Jump, Control transfer

BEQZ, BNEZBEQ, BNEJ, JRJAL, JALRERET

BEQ R1, R2, label

J label

Decode Instruction, Fetch operands, Compare condition, Update PC.

MIPS Instruction Formats

● R-type.

● I-type.

● J-type

6 bits 5 bits 5 bits 5 bits 6 bits5 bits

op rs rt rd shamt funct

6 bits 5 bits 5 bits 16 bits

op rs rt immediate

6 bits 26 bits

op Offset added to PC

op: Opcode (class of instruction). Eg. ALUfunct: Which subunit of the ALU to activate?

OP rt, rs, IMM

OP rd, rs, rt

OP LABEL

Implementation of RISC ISA - 1● Instruction Fetch (IF)

AD

D

PC

4

InstructionMemory

IR

NPC

IR Mem[PC]

NPC PC+4

Implementation of RISC ISA - 2● Instruction Decode/Register Fetch (ID)

RegistersIR

Imm Sign-extended immediate filed of IR

A Regs[rs]

SignExtend

A

B

Imm16 32

B Regs[rt]

rs

rt

rd

Implementation of RISC ISA - 3● Execution/Effective Address (EX)

AL

UALUOuput A + Imm

A

B

Imm

ALUOutput

MUX

ALUOuput A func B

ALUOuput A func Imm

Register-Register andRegister-Immediate Instructions

Memory Reference

Implementation of RISC ISA - 3● Execution/Effective Address (EX)

AL

UALUOuput A + Imm

A

B

Imm

ALUOutput

MUX

ALUOuput A func B

ALUOuput A func Imm


Memory Reference ALUOuput NPC + (Imm << 2);

Cond (A == 0)

Branch Instruction

Implementation of RISC ISA – 3 (cont)● Execution/Effective Address (EX)

AL

U

ALUOuput A + Imm

A

B

Imm

ALUOutput

MUX

ALUOuput A func B

ALUOuput A func Imm


Memory Reference ALUOuput NPC + (Imm << 2);

Cond (A == 0)

Branch Instruction

NPC

MUX

Zero? Cond

Implementation of RISC ISA - 4● Memory Access/Branch Completion (MEM)

DataMemory

LMD

NPC

ALUOutput

Cond

MUX

PC

LMD Mem[ALUOutput]

Memory Reference

Mem[ALUOutput] B

if (Cond) PC ALUOutputBranch

B

Implementation of RISC ISA - 5● Write back (WB)

ALUOutput

MUX

LMD

Regs[rd] ALUOutput

Regs[rt] ALUOutput


Regs[rt] LMD

Load Instruction

RegisterFile

Implementation of RISC ISA - Stages● Instruction Fetch (IF)● Instruction Decode/Register Fetch (ID)

– Fixed field decoding

● Execution/Effective address (EX)● Memory Access (MEM)● Write back (WB)

MIPS Datapath

AD

D

PC

4

IM

NPC

RegsIR

SignExtend

A

B

Imm16 32

rs

rt

rd

AL

U ALUOutput

MUX

MUX

Zero? Cond

DM LMD MUX

MUX

Instruction Fetch Instruction Decode/Register Fetch

Execute/Address

Calculation

MemoryAccess

WriteBack

IF ID EX MEM WB

MIPS Pipeline

Hennessy & Patterson, CA-QA, Appendix C, 5ed. MK, 2013

IF ID EX MEM WB

MIPS Pipeline

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

i1

i2

i3

i4

...

Time(clock cycles)

1 2 3 4 5 6 7 8 9

Example: When will i10000 complete? What is the average clock cycles per Instruction (CPI)? If the processor were not pipelined, when would i10000 complete? What is the average CPI? (Assume same clock period for both designs)

Some Equations

● Unpipelined: Time to execute one instruction

● N stage pipeline. Time per stage,

T exec=T +T ovh

T stage=TN

+T ovh

IF ID EX MEM WB

Tovh

Tstage

IF ID EX MEM WB

Tovh

T

Unpipelined ProcessorUnpipelined Processor

Pipelined ProcessorPipelined Processor

Some Equations● Unpipelined: Time to execute one instruction

● N stage pipeline. Time per stage,

● Total time per instruction = ● Clock cycle time = ● Clock speed = ● Ideal speedup = ● Cycles to complete one instruction = N● Average CPI = 1

T exec=T +T ovh

T stage=TN

+T ovh

T inst=N×(TN

+T ovh)=T +N×T ovh

1T clock

T clock=TN

+T ovh

Speedupideal=T+T ovh

T /N +T ovh

Pipeline PerformanceAn unpipelined processor has 1ns clock cycle. ALU Operation and branches take 4 cycles and Memory ops take 5 cycles. Relative frequencies of the operations are 40%, 20%, and 40%. Suppose Clock skew and setup, pipelining adds 0.2ns of overhead to the clock. What is the speedup?

Average Instruction Execution time = Clock cycle * Average CPI

CPI=∑i=1

n IC iInstructionCount

×CPI i

Multiple Issue Integer Pipeline

IMRF

Read

AB

DM

RF

Write

IR0

IR1

Zero?

IF ID EX MEM WB

Outline● Addressing Modes● MIPS ISA● MIPS Pipeline

References

Operations and Operands

ALUControl

i1 i2

o

... Register File

.........

...Memory

PR

OC

ES

SO

R

Machine Models

ALU

...

.........

...

TOS

STACK

ALU

.........

...

ACCUMULATOR

ALU

...

.........

...

REGISTOR-MEMORY

ALU

...

.........

...

REGISTER-REGISTER

C = A + B

ALU

...

............

TOS

STACK

ALU

............

ACCUMULATOR

ALU

...

............

REGISTOR-MEMORY

ALU

...

............

REGISTER-REGISTER

Push APush BAddPop C

Load AAdd BStore C

Load R1, AAdd R3, R1, BStore R3, C

Load R1, ALoad R2, BAdd R3, R1, R2Store R3, C

Machine Models – Comparison● Number of explicitly named operands● Number of instructions that can access data

from memory● Code size● Amount of data transferred between memory

and processor● Complexity of hardware● Ease of compilation (ease of generation of

machine code).

The Stack Machine Model

● What is the sequence of instructions?● Convert the equation to its Reverse Polish

Notation form.– ab*cde/-*

How is the expression x = (a*b)+(c- (d/e) evaluated ona stack based machine?How is the expression x = (a*b)+(c- (d/e) evaluated ona stack based machine?

ExampleExample

The Stack Machine Model

Evaluate ab*cde/- on a stack based machineEvaluate ab*cde/- on a stack based machine

...

...

...

...

...

...

...

STACK

0xFF

0xFE

172

3

13............7

a

b

c

d

...

...

MEMORY

0x00

0x01

0x02

0x03

0x04

0x05

0x065

17210

1721

172

d

de

dx

What is the minimumsize of the stackrequired to evaluatethis expression ?

What is the minimumsize of the stackrequired to evaluatethis expression ?

Class Work Example

For each machine model, write a code sequence to evaluatethe following expressions.For each machine model, write a code sequence to evaluatethe following expressions.

ExampleExample

b=a3+3⋅a2+2⋅a+7c= x3

+3⋅a2+2⋅b+7

For each machine model, what is the (a) total instructions inthe code sequence, (b) Execution time in clock cycles, (c) CPI?Given: Load, store, arithmetic and logic tasks take 1 cycle.Multiply completes in 4 clock cycles.

For each machine model, what is the (a) total instructions inthe code sequence, (b) Execution time in clock cycles, (c) CPI?Given: Load, store, arithmetic and logic tasks take 1 cycle.Multiply completes in 4 clock cycles.

Real World Instruction SetsArch Type #Oper #Mem Data

Size#Regs Addr

SizeUse

Alpha Reg-Reg 3 0 64b 32 64b Workstation

ARM Reg-Reg 3 0 32/64b 16 32/64b Cell Phone, Embedded

MIPS Reg-Reg 3 0 32/64b 32 32b/64b Workstation

SPARC Reg-Reg 3 0 32/64b 24-32 32b/64b DSP

TI C6000 Reg-Reg 3 0 32b 32 32b Mainframe

IBM 360 Reg-Mem 2 1 32b 16 24/31/64 Personal Computers

x86 Reg-Mem 2 1 8/16/32/64b

4/8/24 16/32/64 PC

VAX Mem-Mem 3 3 32b 16 32b Minicomputers

Motorola6800

Accumulator

1 1/2 8b 0 16b Microcontroller

MIPS64 InstructionsDATA TRANSFER INSTRUCTIONSInstruction Opcode/Mnemonic Examples

Load LB, LBU, LH, LHU, LW, LWU, LD, SDL.S, L.D

LD R1, 30(R2)L.S F0, 50(R3)

Store SB, SH, SW, SDS.S, S.D

SH R3, 502(R2)SB R2, R1(R3)

Move MOV.S, MOV.DMFC0, MTC0MFC1, MTC1

MOV.S F2, F3

● L: Load● S: Store

● B: Byte (8b), H: Half Word (16b), W: Word (32b)

● U: Upper● I: Immediate

MIPS64 Instructions

ARITHMETIC/LOGICAL INSTRUCTIONS

Multiply Accumulate,Logical and Arithmetic Shift, Set less than…

DADD, DADDI, DADDIU, DSUB, DSUBU, DMUL, DMULU, DDIV, DDIVUAND, OR, XOR, ANDI, ORI, XORILUIDSLL, DSRL, DSRA, DSLLVSLT, SLTI, SLTU

DADDU R1, R2, R3

LUI R1, #43

SLT R1, R2, R3

43

LUI R1, #43

0 0 …. … … … … … … … 0 0 0 …. …. 0

MIPS64 Instructions

CONTROL INSTRUCTIONS

Branch, Jump, Control transfer

BEQZ, BNEZBEQ, BNEMOVN, MOVZJ, JRJAL, JALRERET

BEQ R1, R2, label

MOVZ R1, R2, R3

J label

MIPS64 Instructions

FLOATING POINT

FP Arithmetic ADD.D, ADD.S, ADD.PSSUB.D, SUB.S, SUB.PSMULD, MUL.S, MUL.PSDIV.D, DIV.S, DIV.PSCVT.D.S, CVT.D.L, CVT.D.W, CVT.S._.C.LT.D, C.GT.D, C.LE.D, C.GE.D, C.EQ.D, C.NE.D, C.__.S

pipelining basics - bt.nitk.ac.in · data types and sizes signed and unsigned data – 2's...

Documents