ilp: control flowbojnordi/classes/6810/f19/slides/09-ilp.pdf¤only one entry and one exit point per...

36
ILP: CONTROL FLOW CS/ECE 6810: Computer Architecture Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah

Upload: others

Post on 06-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

ILP: CONTROL FLOW

CS/ECE 6810: Computer Architecture

Mahdi Nazm Bojnordi

Assistant Professor

School of Computing

University of Utah

Page 2: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Overview

¨ Announcement¤ Homework 2 is due tonight (11:59PM)

¨ This lecture¤ Performance bottleneck¤ Program flow¤ Branch instructions¤ Branch prediction

Page 3: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Performance Bottleneck

¨ Key performance limitation¤ Number of instructions fetched per second is limited

Page 4: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Performance Bottleneck

¨ Key performance limitation¤ Number of instructions fetched per second is limited

¨ How to increase fetch performance?

Page 5: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Performance Bottleneck

¨ Key performance limitation¤ Number of instructions fetched per second is limited

¨ How to increase fetch performance?¤ Wider fetch (multiple pipelines)

Page 6: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Performance Bottleneck

¨ Key performance limitation¤ Number of instructions fetched per second is limited

¨ How to increase fetch performance?¤ Wider fetch (multiple pipelines)¤ Deeper fetch (multiple stages)

Page 7: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Performance Bottleneck

¨ Key performance limitation¤ Number of instructions fetched per second is limited

How to handle branches?

¨ How to increase fetch performance?¤ Wider fetch (multiple pipelines)¤ Deeper fetch (multiple stages)

Page 8: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Impact of Branches

¨ Example C code¤ No structural hazards¤ What is fetch rate (IPS)?

do {sum = sum + i;i = i – 1;

} while(i != j);

Page 9: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Impact of Branches

¨ Example C code¤ No structural hazards¤ What is fetch rate (IPS)?

do {sum = sum + i;i = i – 1;

} while(i != j);

Loop: ADD R1, R1, R2ADDI R2, R2, #-1BNEQ R2, R0, Loopstall

Assembly code:¨ Five-stage pipeline

¤ Cycle time = 10ns

Fetch Decode Execute Memory Writeback

Page 10: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Impact of Branches

¨ Example C code¤ No structural hazards¤ What is fetch rate (IPS)?

do {sum = sum + i;i = i – 1;

} while(i != j);

Loop: ADD R1, R1, R2ADDI R2, R2, #-1BNEQ R2, R0, Loopstallstallstall

Assembly code:¨ Ten-stage pipeline

¤ Cycle time = 5ns

Fetch Decode Execute Memory Writeback

Page 11: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Program Flow

¨ A program contains basic blocks¤ Only one entry and one exit point per basic block

…branch

branch

…jump

Page 12: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Program Flow

¨ A program contains basic blocks¤ Only one entry and one exit point per basic block

…branch

branch

…jump

¨ Branches¤ Conditional vs. unconditional

n How to check conditionsn Jumps, calls, and returns

¤ Target addressn Absolute addressn Relative to the program counter

Page 13: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Branch Instructions

¨ Branch penalty due to unknown outcome¤ Direction and target

¨ How to reduce penalty

Inst.Memory

PC +

4

NPC

Inst

ruct

ion

target

clk

clkdirection

Page 14: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Branch Instructions

¨ Branch penalty due to unknown outcome¤ Direction and target

¨ How to reduce penalty

Can we predict what instruction to be fetched? Inst.

Memory

PC +

4

NPC

Inst

ruct

ion

target

clk

clkdirection

Page 15: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Branch Prediction

¨ How to predict the outcome of a branch¤ Profiling the entire program¤ Predict based on common cases

Page 16: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Branch Prediction

¨ How to predict the outcome of a branch¤ Profiling the entire program¤ Predict based on common cases

i = 10000;do {

r = i%4;if(r != 0) {

sum = sum + i;}i = i – 1;

} while(i != 0);

Example C/C++ code:

How many branches?

Page 17: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Branch Prediction

¨ How to predict the outcome of a branch¤ Profiling the entire program¤ Predict based on common cases

i = 10000;do {

r = i%4;if(r != 0) {

sum = sum + i;}i = i – 1;

} while(i != 0);

Example C/C++ code:

How many branches?

=>

=>

Page 18: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Branch Prediction

¨ How to predict the outcome of a branch¤ Profiling the entire program¤ Predict based on common cases

ADDI R1, R0, #10000do:

ANDI R2, R1, #3BEQ R2, R0, skpADD R3, R3, R1

skp: ADDI R1, R1, #-1BNEQ R1, R0, do

Assembly code:

Page 19: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Branch Prediction

¨ How to predict the outcome of a branch¤ Profiling the entire program¤ Predict based on common cases

ADDI R1, R0, #10000do:

ANDI R2, R1, #3BEQ R2, R0, skpADD R3, R3, R1

skp: ADDI R1, R1, #-1BNEQ R1, R0, do

branch-1

branch-2

TAKEN NOT-TAKEN

Assembly code:

Page 20: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Branch Prediction

¨ How to predict the outcome of a branch¤ Profiling the entire program¤ Predict based on common cases

ADDI R1, R0, #10000do:

ANDI R2, R1, #3BEQ R2, R0, skpADD R3, R3, R1

skp: ADDI R1, R1, #-1BNEQ R1, R0, do

branch-1

branch-2

TAKEN NOT-TAKEN

Assembly code:

9999

2500 7500

1

Page 21: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Branch Prediction

¨ The goal of branch prediction¤ To avoid stall cycles in fetch stage

¨ Types¤ Static prediction (based on direction or profile)

n Always not-takenn Target = next PC

n Always takenn Target = unknown

¤ Dynamic predictionn Special hardware using PC

Page 22: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Branch Prediction

¨ The goal of branch prediction¤ To avoid stall cycles in fetch stage

¨ Types¤ Static prediction (based on direction or profile)

n Always not-takenn Target = next PC

n Always takenn Target = unknown

¤ Dynamic predictionn Special hardware using PC

Which ones are influenceda. Performanceb. Energyc. Power

Page 23: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Branch Prediction/Misprediction

¨ Prediction accuracy?¤ A: always not-taken

¤ B: always taken

i = 100;do {

sum = sum + i;i = i – 1;

} while(i != 0);

Page 24: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Branch Prediction/Misprediction

¨ Prediction accuracy?¤ A: always not-taken

¤ B: always taken

i = 100;do {

sum = sum + i;i = i – 1;

} while(i != 0);0.01

0.99

Page 25: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Problem

¨ Compute IPC of a scalar processor when there are¤ no data/structural hazards, only control hazards,¤ every 5th instruction is a branch, and¤ 90% branch prediction accuracy

Page 26: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Problem

¨ Compute IPC of a scalar processor when there are¤ no data/structural hazards, only control hazards,¤ every 5th instruction is a branch, and¤ 90% branch prediction accuracy

¨ IPC = 1/ (1 + stalls per instruction)¨ = 1/(1 + 0.2x0.1x1) = 0.98

Page 27: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Dynamic Branch Prediction

¨ Hardware unit capable of learning at runtime¤ 1. Prediction logic

n Direction (taken or not-taken)n Target address (where to fetch next)

¤ 2. Outcome validation and trainingn Outcome is computed regardless of prediction

¤ 3. Recovery from mispredictionn Nullify the effect of instructions on the wrong path

Page 28: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Simple Dynamic Predictors

¨ One-bit branch predictor¤ Keep track of and use the outcome of last executed

branch

¨ Prediction accuracy

while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}

}

branch-1

branch-2

Page 29: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Simple Dynamic Predictors

¨ One-bit branch predictor¤ Keep track of and use the outcome of last executed

branch

¨ Prediction accuracy

while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}

}

branch-1

branch-2

while:ADDI R3, R0, #10JMP chk1

for1: …chk1: BNQ R1, R3, for1

ADDI R3, R0, #20JMP chk2

for2: …chk2: BNQ R2, R3, for2

JMP while** Loop implementation suggested by an student **

Page 30: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Simple Dynamic Predictors

¨ One-bit branch predictor¤ Keep track of and use the outcome of last executed

branch

¨ Prediction accuracy

while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}

}

branch-1

branch-2

N T

taken

takennot-taken

not-taken

Page 31: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Simple Dynamic Predictors

¨ One-bit branch predictor¤ Keep track of and use the outcome of last executed

branch

¨ Prediction accuracy

while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}

}

branch-1

branch-2

N T

taken

takennot-taken

not-taken

• A single predictor shared by multiple branches

• Two mispredictions for loops (1 entry and 1 exit)

Page 32: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Bimodal Branch Predictors

¨ One-bit branch predictor¤ Keep track of and use the outcome of last executed

branch

while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}

}

branch-1

branch-2

N T

taken

takennot-taken

not-taken

Accuracy = 26/30 = 0.86

¨ Shared predictor¨ Two mispredictions per loop

How to improve?

Page 33: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Bimodal Branch Predictors

¨ Two-bit branch predictor¤ Increment if taken¤ Decrement if untaken

while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}

}

branch-1

branch-2

Page 34: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Bimodal Branch Predictors

¨ Two-bit branch predictor¤ Increment if taken¤ Decrement if untaken

while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}

}

branch-1

branch-2

01 10

taken

takennot-taken

not-taken

00 11

Page 35: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Bimodal Branch Predictors

¨ Two-bit branch predictor¤ Increment if taken¤ Decrement if untaken

while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}

}

branch-1

branch-2

01 10

taken

takennot-taken

not-taken

00 11• One misprediction on loop exit

• Accuracy = 28/30 = 0.93

Page 36: ILP: CONTROL FLOWbojnordi/classes/6810/f19/slides/09-ilp.pdf¤Only one entry and one exit point per basic block ... c. Power. Branch Prediction/Misprediction ... 09-ilp Created Date:

Bimodal Branch Predictors

¨ Two-bit branch predictor¤ Increment if taken¤ Decrement if untaken

while(1) {for(i=0; i<10; i++) {}for(j=0; j<20; j++) {}

}

branch-1

branch-2

01 10

taken

takennot-taken

not-taken

00 11• One misprediction on loop exit

• Accuracy = 28/30 = 0.93

• How to improve?• 3-bit predictor?

• Problem?• A single predictor shared

by many branches