cs222: pipeline processor · 2017. 4. 12. · pipeline design • single cycle – poor resource...

CS222: Pipeline Processor Design

Dr. A. Sahu

Dept of Comp. Sc. & Engg.Dept of Comp. Sc. & Engg.

Indian Institute of Technology Guwahati

1

Outline• Pipeline processor• Basic Structure of Pipeline• Hazards• Hazards

– Data Hazards (Data dependency)R H d (S d i– Resource Hazards (Same resource used in two stage)

C t l h d (B h i t ti )– Control hazards (Branch instruction)

2

Problems with single cycle designProblems with single cycle design

• Slowest instruction pulls down the clockSlowest instruction pulls down the clock frequency

• Resource utilization is poor• Resource utilization is poor• There are some instructions which are i ibl b i l d i hiimpossible to be implemented in this manner– Think which are the instructions ?

1. Clock period in single cycle design1. Clock period in single cycle design

tt ttR l

clockperiodtR

tRtM

tR

tR

tA

tA

tI

tI

R‐class

lw

period

tMtR

tR

tA

tA

tI

tI

sw

tR tAt+t

tIt+tI

beq

t+tI

t+jtI

j

1. Clock period in multi‐cycle design1. Clock period in multi cycle design

clocktR

tRtM

tR

tR

tA

tA

tI

tI

R‐class

lw

clockperiod

RM

tM

R

tR

t

A

tA

t

I

tI

t

sw

tR tAt+t

tIt+t

beq

t+tI

t+jtI

j

Single Cycle DatapathSingle Cycle Datapath1

0

s2s2ins[25‐0]ja[31‐0]

28

0

++ s2s2

1

4

PC+4[31‐28]

ins[25‐21]

00

1

1100

1

PCPC

IM

adins

RF

rad1rad2wadwd

rd1

rd2

DMad rdA

LU

ins[25 21]ins[20‐16]

ins[15‐11]11

0011wd DMwd

sxsxins[15‐0]16

Multi‐Cycle: Resource UtilizationMulti Cycle: Resource Utilization

• Merge IM/DMMerge IM/DM– Lw: IM/PC++ ‐R ‐ ALU‐ DM ‐R– Sw: IM/PC++ ‐R ‐ ALU‐ DM/

• Eliminate 1st Adder and Use ALU– As 1st adder is used in 1st Cycle and ALU is free inAs 1 adder is used in 1 Cycle and ALU is free in 1st Cycle

• Eliminate 2nd Adder and Use ALU– As 2nd adder is used in 2nd Cycle and ALU is free in 2nd Cycle

7

Pipeline DesignPipeline Design • Single Cycle

– Poor Resource Utilization, ,TC >= long Instr latency

• Multi Cycle– TC > Loner Stage, Better Utilization, Still performance need toperformance need to improve using pipeline

– When Decoding INSi you h Scan Fetch INSi+1

• Pipeline

8

Instruction PipelineInstruction Pipeline

IF D EX Mem WB

IF D EX Mem WBIF D EX Mem WB

IF D EX Mem WB

IF D EX Mem WB

IF D EX Mem WB

Performance: 1 instruction per Cycle9

All the Stages work in parallel, No resource can be shared by stages

Single cycle datapath (abstract)Single cycle datapath (abstract)

+

+4

PCPC

IM

adins

RF

rad

wadwd

rd1

rd2

DMad rdA

LU

wd DMwd

Pipelined datapathPipelined datapath

IF ID EX Mem WB

IF/ID ID/EX EX/Mem Mem/WB

+

+4

PC

IM

adins

RF

rad

wadwd

rd1

rd2

DM

ad rd

ALU

wd DMwd

Don’t share resources in StagesDon t share resources in Stages

• In Multi Cycle DesignIn Multi Cycle Design– ALU used for PC++ and Offset AddingUsed for 1st Adder and 2nd Adder– Used for 1st Adder and 2nd Adder

– Register FILE is used in 2nd and 4th CycleI Pi li• In Pipeline – Use Separate resource 1st Adder, 2nd Adder & ALU– Register FILE is accesses 1st Half of 2nd Cycle and 2nd Half of 4th Cycle

12

Put back multiplexersPut back multiplexers

IF ID EX Mem WB1


1

0

d

+

+4

s2s2

PCPC

IM

adins

RF

rad1

wad

wd

rd1

rd2

DMad rdA

LU

0

1

0

11100

1

rad2

wd DMwd

0011

sxsx

Correction for WB stageCorrection for WB stage

IF ID EX Mem WB1


1

0

d

+

+4

s2s2

PCPC

IM

adins

RF

rad1

wad

wd

rd1

rd2

DMad rdA

LU 11001

rad2

00

wd DMwd

0011

sxsx

11

Abstract: Adding controlAbstract: Adding control1

0

ololcontro

contro

d

+

+4

s2s2

PCPC

IM

adins

RF

rad1

wad

wd

rd1

rd2

DMad rdA

LU 11001

rad2

wd DMwd

00

0011

sxsx

Actrl

Actrl

11

Control signals with delaysControl signals with delays1

0

ololcontro

contro

d

+

+4

s2s2

PCPC

IM

adins

RF

rad1

wad

wd

rd1

rd2

DMad rdA

LU 11001

rad2

wd DMwd

00

0011

sxsx

Actrl

Actrl

11

Correction for RF write signalCorrection for RF write signal1

0

ololcontro

contro

d

+

+4

s2s2

PCPC

IM

adins

RF

rad1

wad

wd

rd1

rd2

DMad rdA

LU 11001

rad2

wd DMwd

00

0011

sxsx

Actrl

Actrl

11

Types of Pipelined processorsTypes of Pipelined processors

• Degree of overlapg p– Serial, Overlapped, Pipelined, Super‐pipelined

• Depthp– Shallow, Deep

• StructureStructure– Linear, Non ‐ linear

• Scheduling of operationsScheduling of operations– Static, Dynamic

Degree of overlap DepthSerial Shallow

O l dOverlapped

Pipelined

Deep

Pipeline StructurePipeline Structure

A B CLinearPipeline

A B CNon‐linearPipeline

Sequence: A, B, C, B, C, A, C, Aq

Scheduling/timing alternativesScheduling/timing alternatives

• Static• Static– same sequence of stages for all instructions– all actions in orderall actions in order– if one instruction stalls, all subsequent instructions are delayedy

• Dynamic– above conditions are relaxed– higher throughput is achieved

Dynamic Scheduling

• type 1 : beginnings (decode) and endings ( ) i d(put away) in order

• type 2 : only beginnings in order• type 3 : no order restrictions except dependencies

• type 1 extended : beginnings in order, references that effect memory state are in dorder[note that a memory reference may lead to page fault]page fault]

Pipelining and CPI

Type CPIypSerial 5 – 6

Overlapped 3Pipelined (static) 1 5 2Pipelined (static) 1.5 – 2

Pipelined (dynamic) 1.2 – 1.5p ( y )Multiple instruction issue < 1.0

Hazards in Pipelining

• Data dependencies => Data hazardsData dependencies > Data hazards– RAW (read after write)– WAR (write after read)– WAR (write after read)– WAW (write after write)R fli t > St t l h d• Resource conflicts => Structural hazards– use of same resource in different stages

• Procedural dependencies => Control hazards– conditional and unconditional branches, calls/returns

Data Hazards

read/write

previousinstr

read/write

current

read/write

instr

delay = 3

Structural HazardsStructural Hazards

• Use of a hardware resource in A B A CCaused by Resource Conflicts

• Use of a hardware resource in more than one cycle

A B A C

A B A C

A B A C

• Different sequences of

A B A C

A B C Dresource usage by different instructions A C B D

• Non‐pipelined multi‐cycle resources

F D X X

F D X X

Handling Data Hazardsgpreviousinstr

WEXinstr

currentinstr

R EXData Forwarding1

instr

previousi t

WInstruction

instr Reordering

2

currenti

R

2

instr

Stalls due to data hazardsStalls due to data hazards

I: lw $t1,...

IM RF DM

ALU

RFIadd $s1,$t1,..

IM RF DM

ALU

RFI+1 IMIM

A

Stalls due to control hazardsStalls due to control hazards

I: beq ...,L

IM RF DM

ALU

RFI...

L: add ...

I+1 RFIM

I+2 IM

IM RF DM

ALU

RFL

Control Hazards

b h

cond eval target addr gen

branchinstr

next inlineinstr delay = 2

delay = 5

targetinstr

• the order of cond eval and target addr genmay be different• cond evalmay be done in previous instructiony p

Handling hazardsHandling hazards

• Data hazardsData hazards – detect instructions with data dependence– introduce nop instructions (bubbles) in the p ( )pipeline

– more complex: data forwarding• Control hazards

– detect branch instructions– flush inline instructions if branching occurs– more complex: branch prediction

Pipeline Data HazardsPipeline Data Hazards

• Stalls due to data hazards• Stalls due to data hazards• Control to introduce stall cycles• Detecting data hazard conditions• Data forwarding paths• Data forwarding paths• Data forwarding control• Stalls with data forwarding

Stalls due to data hazardsStalls due to data hazards

I: lw $t1,...instruction view

IM RF DM

ALU

RFIadd $s1,$t1,..

IM RF DM

ALU

RFI+1 IMIM×

A

RF RF DMU RFI+1 RFIM√ ALU√

Actual forwarding pathsActual forwarding paths

EX Mem WB

ID/EX EX/Mem Mem/WB

0

fwdA

DM

ad rd

ALU 1

0

0

1

12

01 0 DM

wd01 1

20

1

fwdB fwdC

cs222: pipeline processor · 2017. 4. 12. · pipeline design • single cycle – poor resource...

Documents