vii. process pipelining725g84/include/725g84-7-pipeline.pdf · branch prediction strategies: static...

48
VII. Process Pipelining Computer Architecture and Operating Systems & Operating Systems: 725G84 Ahmed Rezine 1 Computer Architecture & Operating Systems (725G84 - HT16)

Upload: others

Post on 02-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

VII. Process Pipelining� Computer Architecture and

Operating Systems & Operating

Systems: 725G84

Ahmed Rezine

1 Computer Architecture & Operating Systems (725G84 - HT16)

Page 2: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Components of a Computer

Computer Architecture & Operating Systems (725G84 - HT16)

Page 3: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Recap: stages of the datapath

Computer Architecture & Operating Systems (725G84 - HT16)

Page 4: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Recap: the stages of the datapath

  We can look at the datapath as five stages, one step per stage   IF: Instruction fetch from memory   ID: Instruction decode & register read   EX: Execute operation or calculate address   MEM: Access memory operand   WB: Write result back to register

Computer Architecture & Operating Systems (725G84 - HT16)

Page 5: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

The 5 Stages

Computer Architecture & Operating Systems (725G84 - HT16)

Page 6: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Single Cycle: Performance   Assume time for stages is

  100ps for register read or write   200ps for other stages

  What is the clock cycle time ?

Computer Architecture & Operating Systems (725G84 - HT16)

Page 7: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Single Cycle: Performance   Assume time for stages is

  100ps for register read or write   200ps for other stages

Computer Architecture & Operating Systems (725G84 - HT16)

Instr Instr fetch Register read

ALU op Memory access

Register write

Total time

lw

sw

R-format

beq

Page 8: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Single Cycle: Performance   Assume time for stages is

  100ps for register read or write   200ps for other stages

Computer Architecture & Operating Systems (725G84 - HT16)

Instr Instr fetch Register read

ALU op Memory access

Register write

Total time

lw 200ps 100 ps 200ps 200ps 100 ps 800ps

sw 200ps 100 ps 200ps 200ps 700ps

R-format 200ps 100 ps 200ps 100 ps 600ps

beq 200ps 100 ps 200ps 500ps

Page 9: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Single Cycle: Performance

Computer Architecture & Operating Systems (725G84 - HT16)

Page 10: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Performance Issues

  Longest delay determines clock period   Critical path: load instruction   Instruction memory → register file → ALU →

data memory → register file

Computer Architecture & Operating Systems (725G84 - HT16)

Page 11: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Performance Issues   Violates design principle

  Making the common case fast

  Improve performance by pipelining!

Computer Architecture & Operating Systems (725G84 - HT16)

Page 12: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

!  Pipelining = Overlap the stages

Performance Issues

Computer Architecture & Operating Systems (725G84 - HT16)

Page 13: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Pipeline

Computer Architecture & Operating Systems (725G84 - HT16)

Page 14: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Clock Cycle   Even if some stages take only 100ps instead

of 200ps, the pipelined execution clock cycle must have the worst case clock cycle time of 200ps

Computer Architecture & Operating Systems (725G84 - HT16)

Page 15: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Speedup from Pipelining   If all stages are balanced

  i.e., all take the same time

Computer Architecture & Operating Systems (725G84 - HT16)

Time between pipelined instructions = Time between non-pipelined instructions

Number of stages

Page 16: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Speedup from Pipelining   If the stages are not balanced (not equal),

speedup is less. Look at our example:   Time between instructions (non-pipelined) =

800ps   Number of stages = 5   Time between instructions (pipelined) =

800/5 =160 ps   But, what did we get ?

  Also read text in p.334

Computer Architecture & Operating Systems (725G84 - HT16)

Page 17: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Speedup from Pipelining   Recall from Chapter 1: Throughput

versus latency

  What does pipelining improve?

  Speedup in pipelining is due to increased throughput   Latency (time for each instruction) does not

decrease

Computer Architecture & Operating Systems (725G84 - HT16)

Page 18: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Hazards   Situations that prevent starting the

next instruction in the next cycle are called hazards

  There are 3 kinds of hazards   Structural hazards   Data hazards   Control hazards

Computer Architecture & Operating Systems (725G84 - HT16)

Page 19: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Structural Hazards   When an instruction cannot execute in

proper cycle due to a conflict for use of a resource

Computer Architecture & Operating Systems (725G84 - HT16)

Page 20: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Example of structural hazards

  Imagine   a MIPS pipeline with a single data and instruction memory   a fourth additional instruction in the pipeline; when this instruction

is trying to fetch from instruction memory, the first instruction is fetching from data memory – leading to a hazard

  Hence, pipelined datapaths require separate instruction/data memories   Or separate instruction/data caches

Computer Architecture & Operating Systems (725G84 - HT16)

Page 21: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Data Hazards   When an instruction cannot execute in

its proper cycle because it depends on completion of data access by a previous instruction

Computer Architecture & Operating Systems (725G84 - HT16)

Page 22: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Data Hazards   add $s0, $t0, $t1 sub $t2, $s0, $t3

  For the above code, when can we start the second instruction ?

Computer Architecture & Operating Systems (725G84 - HT16)

Page 23: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Data Hazards   add $s0, $t0, $t1 sub $t2, $s0, $t3

Computer Architecture & Operating Systems (725G84 - HT16)

IM Reg DM WB

IM Reg DM WB

Bubble Bubble Bubble

Bubble Bubble Bubble

Bubble Bubble Bubble

Bubble Bubble

Bubble Bubble

Bubble Bubble

Page 24: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Pipeline stalls   Bubble or the pipeline stall is used to resolve

a hazard

  BUT it leads to wastage of cycles = performance deterioration

  Solve the performance problem by ‘forwarding’ the required data

Computer Architecture & Operating Systems (725G84 - HT16)

Page 25: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Forwarding (aka Bypassing)

  Use result when it is computed : don’t wait for it to be stored in a register

 Requires extra connections in the datapath

Computer Architecture & Operating Systems (725G84 - HT16)

Page 26: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Forwarding (aka Bypassing)

  add $s0, $t0, $t1 sub $t2, $s0, $t3

  For the above code, draw the datapath with forwarding

Computer Architecture & Operating Systems (725G84 - HT16)

Page 27: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Forwarding (aka Bypassing)

  add $s0, $t0, $t1�sub $t2, $s0, $t3

  For the above code, draw the datapath with forwarding

Computer Architecture & Operating Systems (725G84 - HT16)

Page 28: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Load-Use Data Hazard   Can’t always avoid stalls by forwarding

  If value not computed when needed

Computer Architecture & Operating Systems (725G84 - HT16)

Page 29: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Code Scheduling to Avoid Stalls

  Reorder code to avoid use of load result in the next instruction

  C code for A = B + E; C = B + F;

Computer Architecture & Operating Systems (725G84 - HT16)

lw $t1, 0($t0)

lw $t2, 4($t0)

add $t3, $t1, $t2

sw $t3, 12($t0)

lw $t4, 8($t0)

add $t5, $t1, $t4

sw $t5, 16($t0)

13 cycles

Page 30: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

lw $t1, 0($t0)

lw $t2, 4($t0)

add $t3, $t1, $t2

sw $t3, 12($t0)

lw $t4, 8($t0)

add $t5, $t1, $t4

sw $t5, 16($t0)

stall

stall

13 cycles

Computer Architecture & Operating Systems (725G84 - HT16)

Code Scheduling to Avoid Stalls

  Reorder code to avoid use of load result in the next instruction

  C code for A = B + E; C = B + F;

Page 31: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

lw $t1, 0($t0)

lw $t2, 4($t0)

add $t3, $t1, $t2

sw $t3, 12($t0)

lw $t4, 8($t0)

add $t5, $t1, $t4

sw $t5, 16($t0)

stall

stall

lw $t1, 0($t0)

lw $t2, 4($t0)

lw $t4, 8($t0)

add $t3, $t1, $t2

sw $t3, 12($t0)

add $t5, $t1, $t4

sw $t5, 16($t0)

11 cycles 13 cycles

Code Scheduling to Avoid Stalls

  Reorder code to avoid use of load result in the next instruction

  C code for A = B + E; C = B + F;

Computer Architecture & Operating Systems (725G84 - HT16)

Page 32: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Control Hazards   When the proper instruction cannot execute

in the proper pipeline clock cycle because the instruction that was fetched is not the one that is needed

Computer Architecture & Operating Systems (725G84 - HT16)

Page 33: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Control Hazards   Branch determines flow of control

  Fetching next instruction depends on branch outcome

  Naïve/Simple Solution: Stall if we fetch a branch instruction

Computer Architecture & Operating Systems (725G84 - HT16)

Page 34: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Stalling for Branch Hazards

Computer Architecture & Operating Systems (725G84 - HT16)

beq $4, $0, there

and $12, $2, $5

or ...

add ...

sw ...

IM Reg DM WB

IM Reg

IM Reg DM

IM Reg DM WB

IM Reg DM WB

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

Bubble Bubble Bubble

Page 35: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Stalling for Branch Hazards

  Seems wasteful, particularly when the branch isn’t taken.

  What if we predict?

Computer Architecture & Operating Systems (725G84 - HT16)

Page 36: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Branch prediction   Correct branch prediction is very important

and can produce substantial performance improvements.

  Speculative execution means that instructions are executed before the processor is certain that they are in the correct execution path.

  Branch prediction strategies:   Static prediction   Dynamic prediction

Computer Architecture & Operating Systems (725G84 - HT16)

Page 37: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Branch prediction   Correct branch prediction is very important

and can produce substantial performance improvements.

  Speculative execution means that instructions are executed before the processor is certain that they are in the correct execution path.

  Branch prediction strategies:   Static prediction   Dynamic prediction

Computer Architecture & Operating Systems (725G84 - HT16)

If it turns out that the prediction was correct, execution goes on without introducing any branch penalty. If, however, the prediction is not fulfilled, the instruction(s) started in advance and all their associated data must be purged and the state previous to their execution restored.

Page 38: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Static branch prediction   Static prediction techniques do not take into

consideration execution history.

  Static approaches:   Predict never taken: assumes that the branch

is not taken.   Predict always taken: assumes that the

branch is taken.

Computer Architecture & Operating Systems (725G84 - HT16)

Page 39: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Assume Branch Not Taken

  works pretty well when you’re right

Computer Architecture & Operating Systems (725G84 - HT16)

beq $4, $0, there

and $12, $2, $5

or ...

add ...

sw ...

IM Reg DM WB

IM Reg

IM Reg DM

IM Reg DM WB

IM Reg DM WB

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

Page 40: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Assume Branch Not Taken

  same performance as stalling when you’re wrong

Computer Architecture & Operating Systems (725G84 - HT16)

beq $4, $0, there

and $12, $2, $5

or ...

add ...

there: sub $12, $4, $2

IM Reg

IM Reg

IM

IM Reg

IM Reg DM Reg

CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8

Flush

Flush

Flush

Wrong Path

Page 41: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Dynamic Branch Prediction

  Dynamic approach : predict based on recent history

  E.g, assume that the branch will do what it did the last time

Computer Architecture & Operating Systems (725G84 - HT16)

Page 42: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

One Bit Predictor   One-bit is used in order to record the last

execution

  The system predicts the same behavior as for the last time.

  - - - - - - - - - -

  LOOP: - - - - - - - - - - -

  - - - - - - - - - - -

  BNE R1, R2, LOOP

  - - - - - - - - - - -

Computer Architecture & Operating Systems (725G84 - HT16)

Page 43: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

1-Bit Predictor: Shortcoming

  Inner loop branches mispredicted twice!

Computer Architecture & Operating Systems (725G84 - HT16)

outer: … … inner: … … beq …, …, inner … beq …, …, outer

  Mispredict as taken on last iteration of inner loop

  Then mispredict as not taken on first iteration of inner loop next time around

Page 44: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

2-Bit Predictor   Only change prediction on two successive

mispredictions

Computer Architecture & Operating Systems (725G84 - HT16)

Page 45: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Branch History Table

Computer Architecture & Operating Systems (725G84 - HT16)

Page 46: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

MIPS with Prediction Schemes

Computer Architecture & Operating Systems (725G84 - HT16)

Prediction correct

Prediction incorrect

No pipeline stalls

Pipeline stalls and must be flushed. The execution has to be restarted.

Page 47: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Recap: Hazards   Situations that prevent starting the next

instruction in the next cycle

  Structural hazards   A required resource is busy

  Data hazards   Need to wait for previous instruction to complete

its data read/write

  Control hazards   Deciding on control action depends on previous

instruction

Computer Architecture & Operating Systems (725G84 - HT16)

Page 48: VII. Process Pipelining725G84/include/725G84-7-pipeline.pdf · Branch prediction strategies: Static prediction Dynamic prediction Computer Architecture & Operating Systems (725G84

Pipeline Summary

Computer Architecture & Operating Systems (725G84 - HT16)

!  Pipelining improves performance by increasing instruction throughput !  Executes multiple instructions in parallel

!  Each instruction has the same latency

!  Subject to hazards !  Structure, data, control

!  Instruction set design affects complexity of pipeline implementation (p.335)