chapter 8 - pipelining

Upload: nagpal3

Post on 14-Apr-2018

275 views

Category:

Documents


4 download

TRANSCRIPT

  • 7/29/2019 Chapter 8 - Pipelining

    1/38

    Pipelining

  • 7/29/2019 Chapter 8 - Pipelining

    2/38

    Overview

    Pipelining is widely used in modern

    processors.

    Pipelining improves system performance interms of throughput.

    Pipelined organization requires sophisticated

    compilation techniques.

  • 7/29/2019 Chapter 8 - Pipelining

    3/38

    Basic Concepts

  • 7/29/2019 Chapter 8 - Pipelining

    4/38

    Making the Execution of

    Programs Faster

    Use faster circuit technology to build the

    processor and the main memory.

    Arrange the hardware so that more than oneoperation can be performed at the same time.

    In the latter way, the number of operations

    performed per second is increased.

  • 7/29/2019 Chapter 8 - Pipelining

    5/38

    Use the Idea of Pipelining in a

    Computer

    F1

    E1

    F2

    E2

    F3

    E3

    I1

    I2

    I3

    (a) Sequential execution

    Instructionfetchunit

    Executionunit

    Interstage buffer

    B1

    (b) Hardware organization

    Time

    F1

    E1

    F2

    E2

    F3

    E3

    I1

    I2

    I3

    Instruction

    (c) Pipelined execution

    Figure 8.1. Basic idea of instruction pipelining.

    Clock cycle 1 2 3 4Time

    Fetch + Execution

    F1I1 1 E1 1

  • 7/29/2019 Chapter 8 - Pipelining

    6/38

    Use the Idea of Pipelining in a

    Computer

    FI

    F

    F

    I

    I

    E

    E

    E

    Fi u r . . t i l in .

    ( ) I n t r uc t i n c u ti n i v i i n t f u r t

    F : Ftchin tr u cti n

    : cin tr u cti n

    n f t c hrn

    E : E cu tr t i n

    : ritr u lt

    I n t r t u f f r

    ( ) H r r r n i t in

    3

    Fetch + Decode

    + Execution + Write

    Textbook page: 457

  • 7/29/2019 Chapter 8 - Pipelining

    7/38

    Role of Cache Memory

    Each pipeline stage is expected to complete in one

    clock cycle.

    The clock period should be long enough to let the

    slowest pipeline stage to complete. Faster stages can only wait for the slowest one to

    complete.

    Since main memory is very slow compared to the

    execution, if each instruction needs to be fetched

    from main memory, pipeline is almost useless.

    Fortunately, we have cache.

  • 7/29/2019 Chapter 8 - Pipelining

    8/38

    Pipeline Performance

    The potential increase in performance

    resulting from pipelining is proportional to the

    number of pipeline stages.

    However, this increase would be achieved

    only if all pipeline stages require the same

    time to complete, and there is no interruption

    throughout program execution. Unfortunately, this is not true.

  • 7/29/2019 Chapter 8 - Pipelining

    9/38

    Pipeline Performance

    F1

    F

    F3

    I1

    I

    I3

    1

    3

    1

    3

    1

    3

    s c o

    FI

    l c k c c l

    i r . . n i n r i n kin m r h n n l kl .

    FI

    Tim

  • 7/29/2019 Chapter 8 - Pipelining

    10/38

    Pipeline Performance

    The previous pipeline is said to have been stalled for two clockcycles.

    Any condition that causes a pipeline to stall is called a hazard.

    Data hazard any condition in which either the source or the

    destination operands of an instruction are not available at thetime expected in the pipeline. So some operation has to bedelayed, and the pipeline stalls.

    Instruction (control) hazard a delay in the availability of aninstruction causes the pipeline to stall.

    Structural hazard the situation when two instructions requirethe use of a given hardware resource at the same time.

  • 7/29/2019 Chapter 8 - Pipelining

    11/38

    Pipeline Performance

    F

    F

    F

    I

    I

    I

    E

    E

    E

    In truction

    F i ur . . i l in t l l c u y c c h i i n F .

    3Clc cycl

    ( I n t r uc ti n c ut i n t i n u c c i v cl c c y cl

    3Clc cycl

    St g

    F: F t ch

    : c

    E : E c u t

    : rit

    F F F

    i l i l i l

    E E Ei l i l i l

    i l i l i l

    ( F un ct i n r f rm y c h r c r t i n u cc i v c l c cy cl

    F F F

    im

    im

    Idle periods

    stalls (bubbles)

    Instruction

    hazard

  • 7/29/2019 Chapter 8 - Pipelining

    12/38

    Pipeline Performance

    F1

    F

    F3

    I1

    I L

    I3

    11

    3

    1

    s c o

    FI

    l c k c cl 1

    i u r . . ff c f L i n ru ci n n i l in i mi n .

    FI

    Tim

    3 3

    Load X(R1), R2Structural

    hazard

  • 7/29/2019 Chapter 8 - Pipelining

    13/38

    Pipeline Performance

    Again, pipelining does not result in individual

    instructions being executed faster; rather, it is the

    throughput that increases.

    Throughput is measured by the rate at whichinstruction execution is completed.

    Pipeline stall causes degradation in pipeline

    performance.

    We need to identify all hazards that may cause the

    pipeline to stall and to find ways to minimize their

    impact.

  • 7/29/2019 Chapter 8 - Pipelining

    14/38

    Quiz

    Four instructions, the I2 takes two clock

    cycles for execution. Pls draw the figure for 4-

    stage pipeline, and figure out the total cycles

    needed for the four instructions to complete.

  • 7/29/2019 Chapter 8 - Pipelining

    15/38

    Data Hazards

  • 7/29/2019 Chapter 8 - Pipelining

    16/38

    Data Hazards

    We must ensure that the results obtained when instructions areexecuted in a pipelined processor are identical to those obtainedwhen the same instructions are executed sequentially.

    Hazard occurs

    A 3 + A

    B 4 A

    No hazard

    A 5 C

    B 20 + C

    When two operations depend on each other, they must be

    executed sequentially in the correct order. Another example:

    Mul R2, R3, R4

    Add R5, R4, R6

    l k lTim

  • 7/29/2019 Chapter 8 - Pipelining

    17/38

    Data Hazards

    11 ul

    1 1

    s

    i r . . i l i l l 1.

    1

    Figure 8.6. Pipeline stalled by data dependency between D2 and W1.

  • 7/29/2019 Chapter 8 - Pipelining

    18/38

    Operand Forwarding

    Instead of from the register file, the second

    instruction can get data directly from the

    output of ALU after the previous instruction is

    completed.

    A special arrangement needs to be made to

    forward the output of ALU to the input of

    ALU.

  • 7/29/2019 Chapter 8 - Pipelining

    19/38

  • 7/29/2019 Chapter 8 - Pipelining

    20/38

    Handling Data Hazards in

    Software

    Let the compiler detect and handle thehazard:

    I1: Mul R2, R3, R4

    NOP

    NOP

    I2: Add R5, R4, R6

    The compiler can reorder the instructions toperform some useful work during the NOPslots.

  • 7/29/2019 Chapter 8 - Pipelining

    21/38

    Side Effects

    The previous example is explicit and easily detected.

    Sometimes an instruction changes the contents of a registerother than the one named as the destination.

    When a location other than one explicitly named in an instruction

    as a destination operand is affected, the instruction is said tohave a side effect. (Example?)

    Example: conditional code flags:

    Add R1, R3

    AddWithCarry R2, R4

    Instructions designed for execution on pipelined hardware shouldhave few side effects.

  • 7/29/2019 Chapter 8 - Pipelining

    22/38

    Instruction Hazards

  • 7/29/2019 Chapter 8 - Pipelining

    23/38

    Time lost as a result of branch instruction is

    referred as branch penalty

    Branch penalty is one clock cycle

    For longer pipeline, branch penalty may be

    higher

    Reducing branch penalty requires branch

    instruction to be computed earlier in pipeline

  • 7/29/2019 Chapter 8 - Pipelining

    24/38

    Instruction fetch unit should have dedicated

    hardware to identify branch instruction

    Compute branch target address as quickly

    as possible after instruction is fetched

    So above task in performed in decode stage

    Explaination of fig-8.8,8.9-a,8.9-b

    s r u c o

    1l ck cc l

    im

    F1I1 E1

  • 7/29/2019 Chapter 8 - Pipelining

    25/38

    Unconditional Branches

    FI r n ch

    I3

    I

    E

    F3

    F E

    F +1 E +1I 1

    i u r . . n i l cc l c u r n ch i n r uc i n .

    E cu t i n u n it i l

    F i u r . . r n c h t i m in .

  • 7/29/2019 Chapter 8 - Pipelining

    26/38

    Branch Timing

    - Branch penalty

    - Reducing the penalty

  • 7/29/2019 Chapter 8 - Pipelining

    27/38

    Instruction Queue and

    Prefetching

    F : Fetchinstruction

    E : Executeinstruction

    W : Writeresults

    D : Dispatch/Decode

    Instruction queue

    Instruction fetch unit

    Figure 8.10. Use of an instruction queue in the hardware organization of Figure 8.2b.

    unit

  • 7/29/2019 Chapter 8 - Pipelining

    28/38

    To reduce the effect of cache miss or

    stall,processor employ sophisticated fetch

    unit

    Fetched instruction are placed in instruction

    queue before they are are needed

    instruction queue can store severalinstructions

    Dispatch unit takes instruction from front of

    the queue and send to execution unit Explaination pg-468 2nd para (exam must)

  • 7/29/2019 Chapter 8 - Pipelining

    29/38

    Branch instruction is executed concurrentlywith other instruction. This technique is called

    branch folding

    branch folding occurs only at the time abranch instruction is encountered,atleast one

    instruction is available in the queue other

    than branch instruction

  • 7/29/2019 Chapter 8 - Pipelining

    30/38

    Conditional Braches

    A conditional branch instruction introducesthe added hazard caused by the dependencyof the branch condition on the result of a

    preceding instruction. The decision to branch cannot be made until

    the execution of that instruction has beencompleted.

    Branch instructions represent about 20% ofthe dynamic instruction count of mostprograms.

  • 7/29/2019 Chapter 8 - Pipelining

    31/38

    Branch instruction can be handled in several

    ways to reduce branch penalty

    Delayed branch

    Branch prediction

    Dynamic branch prediction

  • 7/29/2019 Chapter 8 - Pipelining

    32/38

    DELAYED BRANCH

    Location following a branch instruction is

    called branch delay slot

    There may be more than one branch delay

    slot depending on the time to execute a

    branch instruction

    A techcnique called delay branching can

    minimize the penalty incurred as a result ofbranch instruction

  • 7/29/2019 Chapter 8 - Pipelining

    33/38

  • 7/29/2019 Chapter 8 - Pipelining

    34/38

    Arrange the instruction execution such thatwhether or not the branch is taken

    Place useful instruction in delay slot

    But no useful instruction can be placed indelay slot

    But slot can be filled with NOP instructions

    Effectiveness of delay branch approachdepends on how often it is possible to reorder

    instructions

  • 7/29/2019 Chapter 8 - Pipelining

    35/38

    BRANCH PREDICTION

    In branch prediction, assume that branch will

    not take place and continue to fetch

    instruction in sequential address order

    Speculative execution means that instruction

    are executed before the processor is certain

    that they are in correct execution sequence

    Hence care must be taken no processor orregister are updated until it is confirmed that

    these instruction should indeed be executed

  • 7/29/2019 Chapter 8 - Pipelining

    36/38

    Branch prediction is classified into 2 types

    1.Static branch prediction

    Prediction is always the same every time a given

    instruction takes place

    2.Dynamic branch prediction

    Prediction decision may change based depending

    on execution history

  • 7/29/2019 Chapter 8 - Pipelining

    37/38

    2 state algorithm

  • 7/29/2019 Chapter 8 - Pipelining

    38/38

    4 state algorithm