pipelining - intranet deibhome.deib.polimi.it/santambr/dida/phd/wonderland/... · based on the...

43
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? Pipelining Ver. Jan 14, 2014 Marco D. Santambrogio: [email protected] Simone Campanoni: [email protected]

Upload: others

Post on 25-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

POLITECNICO DI MILANO Parallelism in wonderland:

are you ready to see how deep the rabbit hole goes?

Pipelining Ver. Jan 14, 2014

Marco D. Santambrogio: [email protected] Simone Campanoni: [email protected]

Page 2: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

2

Outline

!   Processors and Instruction Sets

!   Review of pipelining

!   MIPS !   Reduced Instruction Set of MIPS™ Processor !   Implementation of MIPS Processor Pipeline

Page 3: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Main Characteristics of MIPS™ Architecture

!   RISC (Reduced Instruction Set Computer) Architecture Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the performance of CISC CPUs.

!   LOAD/STORE Architecture ALU operands come from the CPU general purpose registers and they cannot directly come from the memory. Dedicated instructions are necessary to: !   load data from memory to registers !   store data from registers to memory

!   Pipeline Architecture: Performance optimization technique based on the overlapping of the execution of multiple instructions derived from a sequential execution flow.

3

Page 4: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

A Typical RISC ISA

4

!   32-bit fixed format instruction (3 formats) !   32 32-bit GPR (R0 contains zero, DP take pair) !   3-address, reg-reg arithmetic instruction !   Single address mode for load/store:

base + displacement !   no indirection

!   Simple branch conditions !   Delayed branch

!   Example: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

Page 5: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Approaching an ISA

5

!   Instruction Set Architecture !   Defines set of operations, instruction format, hardware supported data

types, named storage, addressing modes, sequencing

!   Meaning of each instruction is described by RTL on architected registers and memory

!   Given technology constraints assemble adequate datapath !   Architected storage mapped to actual storage !   Function units to do all the required operations !   Possible additional storage (eg. MAR, MBR, …) !   Interconnect to move information among regs and FUs

!   Map each instruction to sequence of RTLs !   Collate sequences into symbolic controller state transition

diagram (STD) !   Implement controller

Page 6: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

6

Example: MIPS

Op 31 26 0 15 16 20 21 25

Rs1 Rd immediate

Op 31 26 0 25

Op 31 26 0 15 16 20 21 25

Rs1 Rs2

target

Rd Opx

Register-Register 5 6 10 11

Register-Immediate

Op 31 26 0 15 16 20 21 25

Rs1 Rs2/Opx immediate

Branch

Jump / Call

Page 7: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Datapath vs Control

!   Datapath: Storage, FU, interconnect sufficient to perform the desired functions !   Inputs are Control Points !   Outputs are signals

!   Controller: State machine to orchestrate operation on the data path !   Based on desired function and signals

Datapath Controller

Control Points

signals

7

Page 8: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Datapath vs Control

Memory

Data  BUS

Address  B

US

Control  BUS

PC

PSW

Data  path Control  Unit CPU

IR Registri

ALU

Data

Control

Instruc?on

8

Page 9: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Starting scenario

0789  

Data  BUS

Address  B

US

Contro  BUS

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  

PSW

Data  path Control  Unit CPU

Register

ALU

IR

PC

R00 R01 R02 R03 R04 R05 …

Memory

Instruc?on

Da

ta

9

Page 10: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

0789  

Read Instruction 0789

0789  

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  

PSW

Data  path Control  Unit

CPU

Registri

ALU

IR

PC

R00 R01 R02 R03 R04 R05 …

load    R02,4000  

+1  0790  

Reading  

Memory

Instruc?on

Da

ta

Data  Bus

Address  B

us

Contro  bus

10

Page 11: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

0790  

Exe Instruction 0789

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  

PSW

Data  path Control  Unit

CPU

Registri

ALU

IR

PC

R00 R01 R02 R03 R04 R05 … load    R02,4000  4000  

1492  

Reading  

Memory

Instruc?on

Da

ta

Data  Bus

Address  B

us

Contro  bus

11

Page 12: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

0790  

Read instruction 0790

0790  

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  

PSW

Data  path Control  Unit

CPU

Registri

ALU

IR

PC

R00 R01 R02 R03 R04 R05 …

load    R03,4004  

+1  0791  

load    R02,4000  

1492  

Reading  

Memory

Instruc?on

Da

ta

Data  Bus

Address  B

us

Contro  bus

12

Page 13: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

0791  

Exe Instruction 0790

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  

PSW

Data  path Control  Unit

CPU

Registri

ALU

IR

PC

R00 R01 R02 R03 R04 R05 … load    R03,4004  4004  

1918  

Reading  

1492  

Memory

Instruc?on

Da

ta

Data  Bus

Address  B

us

Contro  bus

13

Page 14: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

0791  

Read Instruction 0791

0791  

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  

PSW

Data  path Control  Unit

CPU

Registri

ALU

IR

PC

R00 R01 R02 R03 R04 R05 …

add  R01,R02,R03  

+1  0792  

load    R03,4004  

1492  

Reading  

1918  

Memory

Instruc?on

Da

ta

Data  Bus

Address  B

us

Contro  bus

14

Page 15: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

0792  

Exe Instruction 0791

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  

PSW

Data  path Control  Unit

CPU

Registri

ALU

IR

PC

R00 R01 R02 R03 R04 R05 … add  R01,R02,R03  

1492  

1918  

1492  

1918  

add  

ackt  3410  

Memory

Instruc?on

Da

ta

Data  Bus

Address  B

us

Contro  bus

15

Page 16: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

0792  

Read Instruction 0792

0792  

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  

PSW

Data  path Control  Unit

CPU

Registri

ALU

IR

PC

R00 R01 R02 R03 R04 R05 …

load    R02,4008  

+1  0793  

add  R01,R02,R03  

1492  

Reading  

1918  

3410  

Memory

Instruc?on

Da

ta

Data  Bus

Address  B

us

Contro  bus

16

Page 17: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

0793  

Exe Instruction 0792

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  

PSW

Data  path Control  Unit

CPU

Registri

ALU

IR

PC

R00 R01 R02 R03 R04 R05 … load    R02,4008  4008  

2006  

Reading  

1492  

1918  

3410  

Memory

Instruc?on

Da

ta

Data  Bus

Address  B

us

Contro  bus

17

Page 18: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

0793  

Read Instruction 0793

0793  

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  

PSW

Data  path Control  Unit

CPU

Registri

ALU

IR

PC

R00 R01 R02 R03 R04 R05 …

add  R01,R01,R02  

+1  0794  

load    R02,4008  

2006  

Reading  

1918  

3410  

Memory

Instruc?on

Da

ta

Data  Bus

Address  B

us

Contro  bus

18

Page 19: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

0794  

Exe Instruction 0793

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  

PSW

Data  path Control  Unit

CPU

Registri

ALU

IR

PC

R00 R01 R02 R03 R04 R05 … add  R01,R01,R02  

2006  

1918  

2006  

3410  

add  

ack  5416  

3410  

Memory

Instruc?on

Da

ta

Data  Bus

Address  B

us

Contro  bus

19

Page 20: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

0794  

Read Instruction 0794

0794  

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  

PSW

Data  path Control  Unit CPU

Registri

ALU

IR

PC

R00 R01 R02 R03 R04 R05 …

store  R01,4000  

+1  0795  

add  R01,R01,R02  

2006  

Reading  

1918  

5416  

Memory

Instruc?on

Da

ta

Data  Bus

Address  B

us

Contro  bus

20

Page 21: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

…  …   …  …  …  

4000   1492  

4004   1918  

4008   2006  

…  …   …  …  …  5416  

0795  

Exe Instruction 0794

…  …   …  …  …  

0789   load    R02,4000  

0790   load    R03,4004  

0791   add      R01,R02,R03  

0792   load    R02,4008  

0793   add      R01,R01,R02  

0794   store  R01,4000  

…  …   …  …  …  

PSW

Data  path Control  Unit CPU

Registri

ALU

IR

PC

R00 R01 R02 R03 R04 R05 … store  R01,4000  4000  

writing  

2006  

1918  

5416  

Memory

Instruc?on

Da

ta

Data  Bus

Address  B

us

Contro  bus

21

Page 22: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Reduced Instruction Set of MIPS Processor

!   ALU instructions: add $s1, $s2, $s3 # $s1 ← $s2 + $s3

addi $s1, $s1, 4 # $s1 ← $s1 + 4

!   Load/store instructions: lw $s1, offset ($s2) # $s1 ← M[$s2+offset]

sw $s1, offset ($s2) M[$s2+offset] ← $s1

!   Branch instructions to control the control flow of the program: !   Conditional branches: the branch is taken only if the condition

is satisfied. Examples: beq (branch on equal) and bne (branch on not equal) beq $s1, $s2, L1 # go to L1 if ($s1 == $s2) bne $s1, $s2, L1 # go to L1 if ($s1 != $s2)

!   Unconditional jumps: the branch is always taken. Examples: j (jump) and jr (jump register) !!j L1 # go to L1 jr $s1 # go to add. contained in $s1

22

Page 23: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Execution of MIPS Instructions

Every instruction in the MIPS subset can be implemented in at most 5 clock cycles as follows:

!   Instruction Fetch Cycle: !   Send the content of Program Counter register to

Instruction Memory and fetch the current instruction from Instruction Memory. Update the PC to the next sequential address by adding 4 to the PC (since each instruction is 4 bytes).

!   Instruction Decode and Register Read Cycle !   Decode the current instruction (fixed-field decoding)

and read from the Register File of one or two registers corresponding to the registers specified in the instruction fields.

!   Sign-extension of the offset field of the instruction in case it is needed.

23

Page 24: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Execution of MIPS instructions

!   Execution Cycle The ALU operates on the operands prepared in the previous cycle depending on the instruction type: !   Register-Register ALU Instructions:

!   ALU executes the specified operation on the operands read from the RF

!   Register-Immediate ALU Instructions: !   ALU executes the specified operation on the first operand

read from the RF and the sign-extended immediate operand

!   Memory Reference: !   ALU adds the base register and the offset to calculate the

effective address.

!   Conditional branches: !   Compare the two registers read from RF and compute the

possible branch target address by adding the sign-extended offset to the incremented PC.

24

Page 25: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Execution of MIPS instructions

!   Memory Access (ME) !   Load instructions require a read access to the Data

Memory using the effective address !   Store instructions require a write access to the Data

Memory using the effective address to write the data from the source register read from the RF

!   Conditional branches can update the content of the PC with the branch target address, if the conditional test yielded true.

!   Write-Back Cycle (WB) !   Load instructions write the data read from memory in

the destination register of the RF !   ALU instructions write the ALU results into the

destination register of the RF.

25

Page 26: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Execution of MIPS Instructions

ALU Instructions: op $x,$y,$z

Read of Source Regs. $y and $z

Instr. Fetch &. PC Increm.

ALU OP ($y op $z)

Write Back of Destinat. Reg. $x

Conditional Branch: beq $x,$y,offset

Read of Source Regs. $x and $y

Instr. Fetch & PC Increm.

ALU Op. ($x-$y) & (PC+4+offset)

Write PC

Load Instructions: lw $x,offset($y)

Read of Base Reg. $y

Instr. Fetch & PC Increm.

ALU Op. ($y+offset)

Read Mem. M($y+offset)

Write Back of Destinat. Reg. $x

Store Instructions: sw $x,offset($y)

Read of Base Reg. $y & Source $x

Instr. Fetch & PC Increm.

ALU Op. ($y+offset)

Write Mem. M($y+offset)

26

Page 27: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Memory Access

Write Back

Instruction Fetch

Instr. Decode Reg. Fetch

Execute Addr. Calc

L M D

ALU

MU

X

Mem

ory

Reg File

MU

X

MU

X

Data

Mem

ory

MU

X

Sign Extend

4

Adder Zero?

Next SEQ PC

Address

Next PC

WB Data

Inst

RD

RS1 RS2

Imm IR <= mem[PC]

PC <= PC + 4

Reg[IRrd] <= Reg[IRrs] opIRop Reg[IRrt]

MIPS Data path

27

Page 28: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Instructions Latency

Instruction Type

Instruct. Mem.

Register Read

ALU Op.

Data Memory

Write Back

Total Latency

ALU Instr. 2 1 2 0 1 6 ns Load 2 1 2 2 1 8 ns Store 2 1 2 2 0 7 ns Cond. Branch 2 1 2 0 0 5 ns Jump 2 0 0 0 0 2 ns

28

Page 29: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Single-cycle Implementation of MIPS

!   The length of the clock cycle is defined by the critical path given by the load instruction: T = 8 ns (f = 125 MHz).

!   We assume each instruction is executed in a single clock cycle !   Each module must be used once in a clock cycle !   The modules used more than once in a cycle must be

duplicated. !   We need an Instruction Memory separated from the Data Memory. !   Some modules must be duplicated, while other modules must be

shared from different instruction flows !   To share a module between two different instructions, we need a

multiplexer to enable multiple inputs to a module and select one of different inputs based on the configuration of control lines.

29

Page 30: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Implementation of MIPS data path with Control Unit

C ontent  R eg.  2  

C ontent  R eg.  1  

[15-­‐0]  

[25-­‐21]  

[20-­‐16]  

[15-­‐11]  

R ead  Data  

Write    Data  

R agis terR ead  2  

R egis ter  R ead  1  

Write  R egis ter  

Z ero  

32  bit  16  bit  

M  U  X  

M  U  X  R esult  

AL U  Reg is ter  F ile  

Write  Data  

Write  Address  

R ead  Address  

Data  Memory  

S ign  E xtens ion  

Ins truc tion  Memory  

Ins truction  

R ead  Address  

+4  Adder  

PC  

2-­‐bit  L eft  S hifter  

Adder   M  U  X  

M  U  X  

R egWR  

MemWR  MemRD  

OP  

[31-­‐26]   C ontrol  Unit  

Destination  R egis ter   B ranch  

MemToR eg  

ALU_opB  

A  

B  

ALU  C ontrol  Unit  

ALU_op  

[5-­‐0]  

30

Page 31: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Multi-cycle Implementation

!   The instruction execution is distributed on multiple cycles (5 cycles for MIPS)

!   The basic cycle is smaller (2 ns ⇒ instruction latency = 10 ns)

!   Implementation of multi-cycle CPU: !   Each phase of the instruction execution requires a clock

cycle !   Each module can be used more than once per instruction

in different clock cycles: possible sharing of modules !   We need internal registers to store the values to be used

in the next clock cycles.

31

Page 32: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Pipelining

!   Performance optimization technique based on the overlap of the execution of multiple instructions deriving from a sequential execution flow.

!   Pipelining exploits the parallelism among instructions in a sequential instruction stream.

!   Basic idea: The execution of an instruction is divided into different phases (pipelines stages), requiring a fraction of the time necessary to complete the instruction.

!   The stages are connected one to the next to form the pipeline: instructions enter in the pipeline at one end, progress through the stages, and exit from the other end, as in an assembly line.

32

Page 33: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Pipelining

!   Advantage: technique transparent for the programmer. !   Technique similar to a assembly line: a new car exits

from the assembly line in the time necessary to complete one of the phases.

!   An assembly line does not reduce the time necessary to complete a car, but increases the number of cars produced simultaneously and the frequency to complete cars.

33

Page 34: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Sequential vs. Pipelining Execution

2 ns

Time

I2

I3

I1 WB MEM EX ID IF

2 ns

2 ns

WB MEM EX ID IF

WB MEM EX ID IF

WB MEM EX ID IF

WB MEM EX ID IF 2 ns

I4

I5

I2

I1

WB MEM EX ID IF WB MEM EX ID IF

10 ns 10 ns

34

Page 35: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Pipelining

!   The time to advance the instruction of one stage in the pipeline corresponds to a clock cycle.

!   The pipeline stages must be synchronized: the duration of a clock cycle is defined by the time requested by the slower stage of the pipeline (i.e. 2 ns).

!   The goal is to balance the length of each pipeline stage

!   If the stages are perfectly balanced, the ideal speedup due to pipelining is equal to the number of pipeline stages.

35

Page 36: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Performance Improvement

!   Ideal case (asymptotically):If we consider the single-cycle unpipelined CPU1 with clock cycle of 8 ns and the pipelined CPU2 with 5 stages of 2 ns : !   The latency (total execution time) of each

instruction is worsened: from 8 ns to 10 ns !   The throughput (number of instructions completed

in the time unit) is improved of 4 times: (1 instruction completed each 8 ns) vs. (1 instruction completed each 2 ns)

36

Page 37: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Performance Improvement

!   Ideal case (asymptotically): If we consider the multi-cycle unpipelined CPU3 composed of 5 cycles of 2 ns and the pipelined CPU2 with 5 stages of 2 ns : !   The latency (total execution time) of each

instruction is not varied (10 ns) !   The throughput (number of instructions completed

in the time unit) is improved of 5 times: (1 instruction completed every 10 ns) vs. (1 instruction completed every 2 ns)

37

Page 38: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Pipeline Execution of MIPS Instructions

ALU Instructions: op $x,$y,$z

Conditional Branches: beq $x,$y,offset

Load Instructions: lw $x,offset($y)

Read of Base Reg. $y

Instr. Fetch & PC Increm.

ALU Op. ($y+offset)

Read Mem. M($y+offset)

Write Back Destinat. Reg. $x

Read of Source Regs. $y and $z

Instr. Fetch & PC Increm.

ALU Op. ($y op $z)

Write Back Destinat. Reg. $x

Store Instructions: sw $x,offset($y) Read of Base Reg. $y & Source $x

Instr. Fetch & PC Increm.

ALU Op. ($y+offset)

Write Mem. M($y+offset)

Read of Source Regs. $x and $y

Instr. Fetch & PC Increm.

ALU Op. ($x-$y) & (PC+4+offset)

Write PC

ID Instruction Decode

IF Instruction Fetch

EX Execution

ME Memory Access

WB Write Back

38

Page 39: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Implementation of MIPS pipeline

C ontent  regis ter  2  

C ontent  regis ter  1  

[15-­‐0]  

[25-­‐21]  

[20-­‐16]  

[15-­‐11]  

R ead  Data  

Write  Data  

R egis ter  R ead  2  

R egis ter  R ead  1  

R egis ter  Write  

Z ero  

32  bit  16  bit  

M  U  X  

M  U  X  R esult  

AL U  

R F  

Write  Data  

Write  Address  

R ead  Address  

Data  Memory  

S ign  extens ion  

Ins truc tion  Memory  

Ins truction  

R ead  Address  

+4  Adder  

PC  

2-­‐bit  L eft    S hifter  

M  U  X  

M  U  X  

ID/E X  IF /ID   MEM/WB  E X/MEM  

IF — Instruction Fetch

ID — Instruction Decode EX — Execution WB — Write Back

MEM — Memory Access

WR  

WR   RD  OP  

Adder  

39

Page 40: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Visualizing Pipelining

I n s t r. O r d e r

Time (clock cycles)

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Reg ALU

DMem Ifetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5

40

Page 41: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Note: Optimized Pipeline

!   Register File used in 2 stages: Read access during ID and write access during WB !   What happens if read and write refer to the same

register in the same clock cycle? !   It is necessary to insert one stall

!   Optimized Pipeline: the RF read occurs in the second half of clock cycle and the RF write in the first half of clock cycle !   What happens if read and write refer to the same

register in the same clock cycle? !   It is not necessary to insert one stall

41

From now on,

this is the Pipeline

we are going to use

Page 42: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

42

5 Steps of MIPS Datapath Figure A.3, Page A-9

Memory Access

Write Back

Instruction Fetch

Instr. Decode Reg. Fetch

Execute Addr. Calc

ALU

Mem

ory

Reg File

MU

X

MU

X

Data

Mem

ory

MU

X

Sign Extend

Zero?

IF/ID

ID/EX

MEM

/WB

EX/M

EM

4

Adder

Next SEQ PC Next SEQ PC

RD RD RD WB

Dat

a

Data stationary control local decode for each instruction phase / pipeline stage

Next PC

Address

RS1 RS2

Imm

MU

X

IR <= mem[PC]; PC <= PC + 4

A <= Reg[IRrs]; B <= Reg[IRrt]

rslt <= A opIRop B

Reg[IRrd] <= WB

WB <= rslt

42

Page 43: Pipelining - Intranet DEIBhome.deib.polimi.it/santambr/dida/phd/wonderland/... · Based on the concept of executing only simple instructions in a reduced basic cycle to optimize the

Questions

43