cpe 432 computer design 6 – ilp: part ii – branch prediction

24
CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California, Berkeley

Upload: peggy

Post on 21-Feb-2016

74 views

Category:

Documents


0 download

DESCRIPTION

CPE 432 Computer Design 6 – ILP: Part II – Branch Prediction. Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California, Berkeley. Outline. Static Branch Prediction Dynamic Branch Prediction Branch History Table Correlated Branch Prediction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

CPE 432 Computer Design

6 – ILP: Part II – Branch Prediction

Dr. Gheith Abandah

Adapted from the slides of Prof. David Patterson, University of California, Berkeley

Page 2: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 2

Outline• Static Branch Prediction• Dynamic Branch Prediction• Branch History Table• Correlated Branch Prediction• Tournament Predictors• Branch Target Buffer

Page 3: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 3

12%

22%

18%

11% 12%

4%6%

9% 10%

15%

0%

5%

10%

15%

20%

25%

compres

seqntott

espre

sso gc

c li

doduc

ear

hydro2d

mdljdp

su2cor

Mis

pred

ictio

n R

ate

Static Branch Prediction• We learned how to schedule code around delayed

branch• To reorder code around branches, need to predict

branch statically when compile • Simplest scheme is to predict a branch as taken

– Average misprediction = untaken branch frequency = 34% SPEC

• More accurate scheme predicts branches using profile information collected from earlier runs, and modify prediction based on last run:

Integer Floating Point

Page 4: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 4

Dynamic Branch Prediction• Why does prediction work?

– Underlying algorithm has regularities– Data that is being operated on has regularities– Instruction sequence has redundancies that are artifacts of

way that humans/compilers think about problems

• Is dynamic branch prediction better than static branch prediction?– Seems to be – There are a small number of important branches in programs

which have dynamic behavior

• Performance = ƒ(accuracy, cost of misprediction)

Page 5: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 5

Branch History Table• Use the lower k bits of the PC to address index

the table

Branch Address

Branch History Table (BHT)

2k entries

Predict Taken or not Taken

k <n>

Page 6: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 6

BHT, n=1

• Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 iterations before exit):– End of loop case, when it exits instead of looping as before– First time through loop on next time through code, when it

predicts exit instead of looping

0 1

Not taken

TakenNot taken

Taken

Predict not taken

Predict taken

Page 7: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 7

• Solution: 2-bit scheme where change prediction only if get misprediction twice

• Red: stop, not taken• Green: go, taken• Adds hysteresis to decision making process

BHT, n=2

T

T NT

NT

Predict Taken

Predict Not Taken

Predict Taken

Predict Not TakenT

NTT

NT

Page 8: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 8

18%

5%

12%10% 9%

5%

9% 9%

0% 1%

0%2%4%6%8%

10%12%14%16%18%20%

eqnto

tt

espre

sso gc

c lispice

doduc

spice

fpppp

matrix300

nasa7

Mis

pred

ictio

n R

ate

BHT Accuracy

• Mispredict because either:– Wrong guess for that branch– Got branch history of wrong branch when index the table

• 4096 entry table:

Integer Floating Point

Page 9: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 9

Outline• Static Branch Prediction• Dynamic Branch Prediction• Branch History Table• Correlated Branch Prediction• Tournament Predictors• Branch Target Buffer

Page 10: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 10

Correlated Branch Prediction• Aka Two-Level Predictors• Motivation

1. if (a==2) a=0;2. if (b==2) b=0;3. if (a != b) {…}

Can predict that Condition (3) is not true if Conditions (1) and (2) are true.

Page 11: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 11

Correlated Branch Prediction• Idea: record m most recently executed branches

as taken or not taken, and use that pattern to select the proper n-bit branch history table

• In general, (m,n) predictor means record last m branches to select between 2m history tables, each with n-bit counters– Thus, old 2-bit BHT is a (0,2) predictor

• Global Branch History: m-bit shift register keeping T/NT status of last m branches.

• Each entry in table has m n-bit predictors.

Page 12: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 12

Correlating Branch Prediction

(2,2) predictor

– Behavior of recent branches selects between four predictions of next branch, updating just that prediction

Branch address

2-bits per branch predictor

Prediction

2-bit global branch history

4

Page 13: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 13

Correlated Branch PredictionExample:Find the size of (2, 2) predictor with k=10

Size = 2m * n * 2k

= 4 * 2 * 1K = 8 Kbits

Page 14: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 14

0%

Freq

uenc

y of

Mis

pred

ictio

ns

0%1%

5%6% 6%

11%

4%

6%5%

1%2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)

Accuracy of Different Schemes

4096 Entries 2-bit BHTUnlimited Entries 2-bit BHT1024 Entries (2,2) BHT

nasa

7

mat

rix30

0

dodu

cd

spic

e

fppp

p

gcc

expr

esso

eqnt

ott li

tom

catv

Page 15: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 15

Outline• Static Branch Prediction• Dynamic Branch Prediction• Branch History Table• Correlated Branch Prediction• Tournament Predictors• Branch Target Buffer

Page 16: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 16

Tournament Predictors

• Multilevel branch predictor

• Use n-bit saturating counter to choose between predictors

• Usual choice between global and local predictors

Page 17: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 17

Tournament Predictors

Tournament predictor using, say, 4K 2-bit counters indexed by local branch address. Chooses between:

• Global predictor– 4K entries index by history of last 12 branches (212 = 4K)

– Each entry is a standard 2-bit predictor

• Local predictor– Local history table: 1024 10-bit entries recording last 10

branches, index by branch address

– The pattern of the last 10 occurrences of that particular branch used to index table of 1K entries with 3-bit saturating counters

Page 18: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 18

Alpha 21264 Branch Predictor

Selector

Global

Local

BHR 4K

<2>

addr 4K

<2>

<12>

k=12PredictionSelect

addr 1K10 1K

<10> <3>

Size = 4K*2 + 4K*2 + 1K*10 + 1K*3 = 29 Kbits

Page 19: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 19

Comparing Predictors (Fig. 2.8)• Advantage of tournament predictor is ability to

select the right predictor for a particular branch– Particularly crucial for integer benchmarks. – A typical tournament predictor will select the global predictor

almost 40% of the time for the SPEC integer benchmarks and less than 15% of the time for the SPEC FP benchmarks

Page 20: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 20

Outline• Static Branch Prediction• Dynamic Branch Prediction• Branch History Table• Correlated Branch Prediction• Tournament Predictors• Branch Target Buffer

Page 21: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

• Branch target calculation is costly and stalls the instruction fetch.

• BTB stores PCs the same way as caches• The PC of a branch is sent to the BTB• When a match is found the corresponding

Predicted PC is returned• If the branch was predicted taken, instruction

fetch continues at the returned predicted PC

Branch Target Buffers (BTB)

Page 22: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

Branch Target Buffers

Page 23: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 23

Pentium 4 Misprediction Rate (per 1000 instructions, not per branch)

11

13

7

12

9

10 0 0

5

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

164.gzip

175.vpr

176.gcc

181.m

cf

186.crafty

168.w

upwise

171.swim

172.m

grid

173.applu

177.m

esa

Bran

ch m

ispr

edic

tion

s pe

r 10

00 I

nstr

uctio

ns

SPECint2000 SPECfp2000

6% misprediction rate per branch SPECint (19% of INT instructions are branch)

2% misprediction rate per branch SPECfp(5% of FP instructions are branch)

Page 24: CPE 432 Computer Design  6 – ILP: Part II – Branch Prediction

04/22/23 CPE 432, 6-ILP2 24

Dynamic Branch Prediction Summary• Prediction becoming important part of execution• Branch History Table: 2 bits for loop accuracy• Correlation: Recently executed branches correlated

with next branch– Either different branches (GA)– Or different executions of same branches (PA)

• Tournament predictors take insight to next level, by using multiple predictors – usually one based on global information and one based on local

information, and combining them with a selector– In 2006, tournament predictors using 30K bits are in processors

like the Power5 and Pentium 4

• Branch Target Buffer: include branch address & prediction