chapter 8 - pipelining
TRANSCRIPT
-
7/29/2019 Chapter 8 - Pipelining
1/38
Pipelining
-
7/29/2019 Chapter 8 - Pipelining
2/38
Overview
Pipelining is widely used in modern
processors.
Pipelining improves system performance interms of throughput.
Pipelined organization requires sophisticated
compilation techniques.
-
7/29/2019 Chapter 8 - Pipelining
3/38
Basic Concepts
-
7/29/2019 Chapter 8 - Pipelining
4/38
Making the Execution of
Programs Faster
Use faster circuit technology to build the
processor and the main memory.
Arrange the hardware so that more than oneoperation can be performed at the same time.
In the latter way, the number of operations
performed per second is increased.
-
7/29/2019 Chapter 8 - Pipelining
5/38
Use the Idea of Pipelining in a
Computer
F1
E1
F2
E2
F3
E3
I1
I2
I3
(a) Sequential execution
Instructionfetchunit
Executionunit
Interstage buffer
B1
(b) Hardware organization
Time
F1
E1
F2
E2
F3
E3
I1
I2
I3
Instruction
(c) Pipelined execution
Figure 8.1. Basic idea of instruction pipelining.
Clock cycle 1 2 3 4Time
Fetch + Execution
F1I1 1 E1 1
-
7/29/2019 Chapter 8 - Pipelining
6/38
Use the Idea of Pipelining in a
Computer
FI
F
F
I
I
E
E
E
Fi u r . . t i l in .
( ) I n t r uc t i n c u ti n i v i i n t f u r t
F : Ftchin tr u cti n
: cin tr u cti n
n f t c hrn
E : E cu tr t i n
: ritr u lt
I n t r t u f f r
( ) H r r r n i t in
3
Fetch + Decode
+ Execution + Write
Textbook page: 457
-
7/29/2019 Chapter 8 - Pipelining
7/38
Role of Cache Memory
Each pipeline stage is expected to complete in one
clock cycle.
The clock period should be long enough to let the
slowest pipeline stage to complete. Faster stages can only wait for the slowest one to
complete.
Since main memory is very slow compared to the
execution, if each instruction needs to be fetched
from main memory, pipeline is almost useless.
Fortunately, we have cache.
-
7/29/2019 Chapter 8 - Pipelining
8/38
Pipeline Performance
The potential increase in performance
resulting from pipelining is proportional to the
number of pipeline stages.
However, this increase would be achieved
only if all pipeline stages require the same
time to complete, and there is no interruption
throughout program execution. Unfortunately, this is not true.
-
7/29/2019 Chapter 8 - Pipelining
9/38
Pipeline Performance
F1
F
F3
I1
I
I3
1
3
1
3
1
3
s c o
FI
l c k c c l
i r . . n i n r i n kin m r h n n l kl .
FI
Tim
-
7/29/2019 Chapter 8 - Pipelining
10/38
Pipeline Performance
The previous pipeline is said to have been stalled for two clockcycles.
Any condition that causes a pipeline to stall is called a hazard.
Data hazard any condition in which either the source or the
destination operands of an instruction are not available at thetime expected in the pipeline. So some operation has to bedelayed, and the pipeline stalls.
Instruction (control) hazard a delay in the availability of aninstruction causes the pipeline to stall.
Structural hazard the situation when two instructions requirethe use of a given hardware resource at the same time.
-
7/29/2019 Chapter 8 - Pipelining
11/38
Pipeline Performance
F
F
F
I
I
I
E
E
E
In truction
F i ur . . i l in t l l c u y c c h i i n F .
3Clc cycl
( I n t r uc ti n c ut i n t i n u c c i v cl c c y cl
3Clc cycl
St g
F: F t ch
: c
E : E c u t
: rit
F F F
i l i l i l
E E Ei l i l i l
i l i l i l
( F un ct i n r f rm y c h r c r t i n u cc i v c l c cy cl
F F F
im
im
Idle periods
stalls (bubbles)
Instruction
hazard
-
7/29/2019 Chapter 8 - Pipelining
12/38
Pipeline Performance
F1
F
F3
I1
I L
I3
11
3
1
s c o
FI
l c k c cl 1
i u r . . ff c f L i n ru ci n n i l in i mi n .
FI
Tim
3 3
Load X(R1), R2Structural
hazard
-
7/29/2019 Chapter 8 - Pipelining
13/38
Pipeline Performance
Again, pipelining does not result in individual
instructions being executed faster; rather, it is the
throughput that increases.
Throughput is measured by the rate at whichinstruction execution is completed.
Pipeline stall causes degradation in pipeline
performance.
We need to identify all hazards that may cause the
pipeline to stall and to find ways to minimize their
impact.
-
7/29/2019 Chapter 8 - Pipelining
14/38
Quiz
Four instructions, the I2 takes two clock
cycles for execution. Pls draw the figure for 4-
stage pipeline, and figure out the total cycles
needed for the four instructions to complete.
-
7/29/2019 Chapter 8 - Pipelining
15/38
Data Hazards
-
7/29/2019 Chapter 8 - Pipelining
16/38
Data Hazards
We must ensure that the results obtained when instructions areexecuted in a pipelined processor are identical to those obtainedwhen the same instructions are executed sequentially.
Hazard occurs
A 3 + A
B 4 A
No hazard
A 5 C
B 20 + C
When two operations depend on each other, they must be
executed sequentially in the correct order. Another example:
Mul R2, R3, R4
Add R5, R4, R6
l k lTim
-
7/29/2019 Chapter 8 - Pipelining
17/38
Data Hazards
11 ul
1 1
s
i r . . i l i l l 1.
1
Figure 8.6. Pipeline stalled by data dependency between D2 and W1.
-
7/29/2019 Chapter 8 - Pipelining
18/38
Operand Forwarding
Instead of from the register file, the second
instruction can get data directly from the
output of ALU after the previous instruction is
completed.
A special arrangement needs to be made to
forward the output of ALU to the input of
ALU.
-
7/29/2019 Chapter 8 - Pipelining
19/38
-
7/29/2019 Chapter 8 - Pipelining
20/38
Handling Data Hazards in
Software
Let the compiler detect and handle thehazard:
I1: Mul R2, R3, R4
NOP
NOP
I2: Add R5, R4, R6
The compiler can reorder the instructions toperform some useful work during the NOPslots.
-
7/29/2019 Chapter 8 - Pipelining
21/38
Side Effects
The previous example is explicit and easily detected.
Sometimes an instruction changes the contents of a registerother than the one named as the destination.
When a location other than one explicitly named in an instruction
as a destination operand is affected, the instruction is said tohave a side effect. (Example?)
Example: conditional code flags:
Add R1, R3
AddWithCarry R2, R4
Instructions designed for execution on pipelined hardware shouldhave few side effects.
-
7/29/2019 Chapter 8 - Pipelining
22/38
Instruction Hazards
-
7/29/2019 Chapter 8 - Pipelining
23/38
Time lost as a result of branch instruction is
referred as branch penalty
Branch penalty is one clock cycle
For longer pipeline, branch penalty may be
higher
Reducing branch penalty requires branch
instruction to be computed earlier in pipeline
-
7/29/2019 Chapter 8 - Pipelining
24/38
Instruction fetch unit should have dedicated
hardware to identify branch instruction
Compute branch target address as quickly
as possible after instruction is fetched
So above task in performed in decode stage
Explaination of fig-8.8,8.9-a,8.9-b
s r u c o
1l ck cc l
im
F1I1 E1
-
7/29/2019 Chapter 8 - Pipelining
25/38
Unconditional Branches
FI r n ch
I3
I
E
F3
F E
F +1 E +1I 1
i u r . . n i l cc l c u r n ch i n r uc i n .
E cu t i n u n it i l
F i u r . . r n c h t i m in .
-
7/29/2019 Chapter 8 - Pipelining
26/38
Branch Timing
- Branch penalty
- Reducing the penalty
-
7/29/2019 Chapter 8 - Pipelining
27/38
Instruction Queue and
Prefetching
F : Fetchinstruction
E : Executeinstruction
W : Writeresults
D : Dispatch/Decode
Instruction queue
Instruction fetch unit
Figure 8.10. Use of an instruction queue in the hardware organization of Figure 8.2b.
unit
-
7/29/2019 Chapter 8 - Pipelining
28/38
To reduce the effect of cache miss or
stall,processor employ sophisticated fetch
unit
Fetched instruction are placed in instruction
queue before they are are needed
instruction queue can store severalinstructions
Dispatch unit takes instruction from front of
the queue and send to execution unit Explaination pg-468 2nd para (exam must)
-
7/29/2019 Chapter 8 - Pipelining
29/38
Branch instruction is executed concurrentlywith other instruction. This technique is called
branch folding
branch folding occurs only at the time abranch instruction is encountered,atleast one
instruction is available in the queue other
than branch instruction
-
7/29/2019 Chapter 8 - Pipelining
30/38
Conditional Braches
A conditional branch instruction introducesthe added hazard caused by the dependencyof the branch condition on the result of a
preceding instruction. The decision to branch cannot be made until
the execution of that instruction has beencompleted.
Branch instructions represent about 20% ofthe dynamic instruction count of mostprograms.
-
7/29/2019 Chapter 8 - Pipelining
31/38
Branch instruction can be handled in several
ways to reduce branch penalty
Delayed branch
Branch prediction
Dynamic branch prediction
-
7/29/2019 Chapter 8 - Pipelining
32/38
DELAYED BRANCH
Location following a branch instruction is
called branch delay slot
There may be more than one branch delay
slot depending on the time to execute a
branch instruction
A techcnique called delay branching can
minimize the penalty incurred as a result ofbranch instruction
-
7/29/2019 Chapter 8 - Pipelining
33/38
-
7/29/2019 Chapter 8 - Pipelining
34/38
Arrange the instruction execution such thatwhether or not the branch is taken
Place useful instruction in delay slot
But no useful instruction can be placed indelay slot
But slot can be filled with NOP instructions
Effectiveness of delay branch approachdepends on how often it is possible to reorder
instructions
-
7/29/2019 Chapter 8 - Pipelining
35/38
BRANCH PREDICTION
In branch prediction, assume that branch will
not take place and continue to fetch
instruction in sequential address order
Speculative execution means that instruction
are executed before the processor is certain
that they are in correct execution sequence
Hence care must be taken no processor orregister are updated until it is confirmed that
these instruction should indeed be executed
-
7/29/2019 Chapter 8 - Pipelining
36/38
Branch prediction is classified into 2 types
1.Static branch prediction
Prediction is always the same every time a given
instruction takes place
2.Dynamic branch prediction
Prediction decision may change based depending
on execution history
-
7/29/2019 Chapter 8 - Pipelining
37/38
2 state algorithm
-
7/29/2019 Chapter 8 - Pipelining
38/38
4 state algorithm