cs222: pipeline processor · 2017. 4. 12. · pipeline design • single cycle – poor resource...
TRANSCRIPT
-
CS222: Pipeline Processor Design
Dr. A. Sahu
Dept of Comp. Sc. & Engg.Dept of Comp. Sc. & Engg.
Indian Institute of Technology Guwahati
1
-
Outline• Pipeline processor• Basic Structure of Pipeline• Hazards• Hazards
– Data Hazards (Data dependency)R H d (S d i– Resource Hazards (Same resource used in two stage)
C t l h d (B h i t ti )– Control hazards (Branch instruction)
2
-
Problems with single cycle designProblems with single cycle design
• Slowest instruction pulls down the clockSlowest instruction pulls down the clock frequency
• Resource utilization is poor• Resource utilization is poor• There are some instructions which are i ibl b i l d i hiimpossible to be implemented in this manner– Think which are the instructions ?
-
1. Clock period in single cycle design1. Clock period in single cycle design
tt ttR l
clockperiodtR
tRtM
tR
tR
tA
tA
tI
tI
R‐class
lw
period
tMtR
tR
tA
tA
tI
tI
sw
tR tAt+t
tIt+tI
beq
t+tI
t+jtI
j
-
1. Clock period in multi‐cycle design1. Clock period in multi cycle design
clocktR
tRtM
tR
tR
tA
tA
tI
tI
R‐class
lw
clockperiod
RM
tM
R
tR
t
A
tA
t
I
tI
t
sw
tR tAt+t
tIt+t
beq
t+tI
t+jtI
j
-
Single Cycle DatapathSingle Cycle Datapath1
0
s2s2ins[25‐0]ja[31‐0]
28
0
++ s2s2
1
4
PC+4[31‐28]
ins[25‐21]
00
1
1100
1
PCPC
IM
adins
RF
rad1rad2wadwd
rd1
rd2
DMad rdA
LU
ins[25 21]ins[20‐16]
ins[15‐11]11
0011wd DMwd
sxsxins[15‐0]16
-
Multi‐Cycle: Resource UtilizationMulti Cycle: Resource Utilization
• Merge IM/DMMerge IM/DM– Lw: IM/PC++ ‐R ‐ ALU‐ DM ‐R– Sw: IM/PC++ ‐R ‐ ALU‐ DM/
• Eliminate 1st Adder and Use ALU– As 1st adder is used in 1st Cycle and ALU is free inAs 1 adder is used in 1 Cycle and ALU is free in 1st Cycle
• Eliminate 2nd Adder and Use ALU– As 2nd adder is used in 2nd Cycle and ALU is free in 2nd Cycle
7
-
Pipeline DesignPipeline Design • Single Cycle
– Poor Resource Utilization, ,TC >= long Instr latency
• Multi Cycle– TC > Loner Stage, Better Utilization, Still performance need toperformance need to improve using pipeline
– When Decoding INSi you h Scan Fetch INSi+1
• Pipeline
8
-
Instruction PipelineInstruction Pipeline
IF D EX Mem WB
IF D EX Mem WBIF D EX Mem WB
IF D EX Mem WB
IF D EX Mem WB
IF D EX Mem WB
Performance: 1 instruction per Cycle9
All the Stages work in parallel, No resource can be shared by stages
-
Single cycle datapath (abstract)Single cycle datapath (abstract)
+
+4
PCPC
IM
adins
RF
rad
wadwd
rd1
rd2
DMad rdA
LU
wd DMwd
-
Pipelined datapathPipelined datapath
IF ID EX Mem WB
IF/ID ID/EX EX/Mem Mem/WB
+
+4
PC
IM
adins
RF
rad
wadwd
rd1
rd2
DM
ad rd
ALU
wd DMwd
-
Don’t share resources in StagesDon t share resources in Stages
• In Multi Cycle DesignIn Multi Cycle Design– ALU used for PC++ and Offset AddingUsed for 1st Adder and 2nd Adder– Used for 1st Adder and 2nd Adder
– Register FILE is used in 2nd and 4th CycleI Pi li• In Pipeline – Use Separate resource 1st Adder, 2nd Adder & ALU– Register FILE is accesses 1st Half of 2nd Cycle and 2nd Half of 4th Cycle
12
-
Put back multiplexersPut back multiplexers
IF ID EX Mem WB1
IF/ID ID/EX EX/Mem Mem/WB
1
0
d
+
+4
s2s2
PCPC
IM
adins
RF
rad1
wad
wd
rd1
rd2
DMad rdA
LU
0
1
0
11100
1
rad2
wd DMwd
0011
sxsx
-
Correction for WB stageCorrection for WB stage
IF ID EX Mem WB1
IF/ID ID/EX EX/Mem Mem/WB
1
0
d
+
+4
s2s2
PCPC
IM
adins
RF
rad1
wad
wd
rd1
rd2
DMad rdA
LU 11001
rad2
00
wd DMwd
0011
sxsx
11
-
Abstract: Adding controlAbstract: Adding control1
0
ololcontro
contro
d
+
+4
s2s2
PCPC
IM
adins
RF
rad1
wad
wd
rd1
rd2
DMad rdA
LU 11001
rad2
wd DMwd
00
0011
sxsx
Actrl
Actrl
11
-
Control signals with delaysControl signals with delays1
0
ololcontro
contro
d
+
+4
s2s2
PCPC
IM
adins
RF
rad1
wad
wd
rd1
rd2
DMad rdA
LU 11001
rad2
wd DMwd
00
0011
sxsx
Actrl
Actrl
11
-
Correction for RF write signalCorrection for RF write signal1
0
ololcontro
contro
d
+
+4
s2s2
PCPC
IM
adins
RF
rad1
wad
wd
rd1
rd2
DMad rdA
LU 11001
rad2
wd DMwd
00
0011
sxsx
Actrl
Actrl
11
-
Types of Pipelined processorsTypes of Pipelined processors
• Degree of overlapg p– Serial, Overlapped, Pipelined, Super‐pipelined
• Depthp– Shallow, Deep
• StructureStructure– Linear, Non ‐ linear
• Scheduling of operationsScheduling of operations– Static, Dynamic
-
Degree of overlap DepthSerial Shallow
O l dOverlapped
Pipelined
Deep
-
Pipeline StructurePipeline Structure
A B CLinearPipeline
A B CNon‐linearPipeline
Sequence: A, B, C, B, C, A, C, Aq
-
Scheduling/timing alternativesScheduling/timing alternatives
• Static• Static– same sequence of stages for all instructions– all actions in orderall actions in order– if one instruction stalls, all subsequent instructions are delayedy
• Dynamic– above conditions are relaxed– higher throughput is achieved
-
Dynamic Scheduling
• type 1 : beginnings (decode) and endings ( ) i d(put away) in order
• type 2 : only beginnings in order• type 3 : no order restrictions except dependencies
• type 1 extended : beginnings in order, references that effect memory state are in dorder[note that a memory reference may lead to page fault]page fault]
-
Pipelining and CPI
Type CPIypSerial 5 – 6
Overlapped 3Pipelined (static) 1 5 2Pipelined (static) 1.5 – 2
Pipelined (dynamic) 1.2 – 1.5p ( y )Multiple instruction issue < 1.0
-
Hazards in Pipelining
• Data dependencies => Data hazardsData dependencies > Data hazards– RAW (read after write)– WAR (write after read)– WAR (write after read)– WAW (write after write)R fli t > St t l h d• Resource conflicts => Structural hazards– use of same resource in different stages
• Procedural dependencies => Control hazards– conditional and unconditional branches, calls/returns
-
Data Hazards
read/write
previousinstr
read/write
current
read/write
instr
delay = 3
-
Structural HazardsStructural Hazards
• Use of a hardware resource in A B A CCaused by Resource Conflicts
• Use of a hardware resource in more than one cycle
A B A C
A B A C
A B A C
• Different sequences of
A B A C
A B C Dresource usage by different instructions A C B D
• Non‐pipelined multi‐cycle resources
F D X X
F D X X
-
Handling Data Hazardsgpreviousinstr
WEXinstr
currentinstr
R EXData Forwarding1
instr
previousi t
WInstruction
instr Reordering
2
currenti
R
2
instr
-
Stalls due to data hazardsStalls due to data hazards
I: lw $t1,...
IM RF DM
ALU
RFIadd $s1,$t1,..
IM RF DM
ALU
RFI+1 IMIM
A
-
Stalls due to control hazardsStalls due to control hazards
I: beq ...,L
IM RF DM
ALU
RFI...
L: add ...
I+1 RFIM
I+2 IM
IM RF DM
ALU
RFL
-
Control Hazards
b h
cond eval target addr gen
branchinstr
next inlineinstr delay = 2
delay = 5
targetinstr
• the order of cond eval and target addr genmay be different• cond evalmay be done in previous instructiony p
-
Handling hazardsHandling hazards
• Data hazardsData hazards – detect instructions with data dependence– introduce nop instructions (bubbles) in the p ( )pipeline
– more complex: data forwarding• Control hazards
– detect branch instructions– flush inline instructions if branching occurs– more complex: branch prediction
-
Pipeline Data HazardsPipeline Data Hazards
• Stalls due to data hazards• Stalls due to data hazards• Control to introduce stall cycles• Detecting data hazard conditions• Data forwarding paths• Data forwarding paths• Data forwarding control• Stalls with data forwarding
-
Stalls due to data hazardsStalls due to data hazards
I: lw $t1,...instruction view
IM RF DM
ALU
RFIadd $s1,$t1,..
IM RF DM
ALU
RFI+1 IMIM×
A
RF RF DMU RFI+1 RFIM√ ALU√
-
Actual forwarding pathsActual forwarding paths
EX Mem WB
ID/EX EX/Mem Mem/WB
0
fwdA
DM
ad rd
ALU 1
0
0
1
12
01 0 DM
wd01 1
20
1
fwdB fwdC
-
35