cs222: pipeline: branch performance · 2017. 4. 12. · pipeline: branch performance &...

Post on 02-Oct-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CS222CS222: Pipeline: Branch PerformancePipeline: Branch Performance

& Superscalar/VLIW

Dr.  A. Sahu

Dept of Comp. Sc. & Engg.Dept of Comp. Sc. & Engg.

Indian Institute of Technology Guwahati

Outline• Improving Branch Performance 

P i Cl B h Eli i i B h–Previous Class : Branch Elimination, Branch Speed up  

–Branch Prediction• Fixed, Static, DynamicFixed, Static, Dynamic

–Branch target capture • BTB, BTAC, BTIC

• Introduction to VLIW and Superscalarp

Improving Branch Performance

• Branch EliminationBranch Elimination– replace branch with other instructions

• Branch Speed Upp p– reduce time for computing CC and TIF

• Branch Prediction– guess the outcome and proceed, undo if necessary

• Branch Target Capture– make use of history

Branch EliminationBranch Elimination

Use conditional instructions

C(predicated execution)

T

F

S C : S

OP1BC CC Z 2

OP1BC  CC = Z, ∗ + 2ADD  R3, R2, R1OP2

ADD  R3, R2, R1, NZOP2

Branch Speed Up : p pearly target address generation

• Assume each instruction is Branch• Assume each instruction is Branch

• Generate target address while decoding

• If target in same page omit translation

• After decoding discard target address if not Branch

IF    IF IF D    TIF  TIF TIFAG

BC

Branch Speed Up : p pincrease CC ‐ branch gap

Increase the gap between condition checkingIncrease the gap between condition checking and branching

l• Early CC setting

• Delayed branch

Improving Branch Performance

• Branch EliminationBranch Elimination– replace branch with other instructions

• Branch Speed Upp p– reduce time for computing CC and TIF

• Branch Prediction– guess the outcome and proceed, undo if necessary

• Branch Target Capture– make use of history

Branch PredictionBranch Prediction

• Treat conditional branches as unconditionalTreat conditional branches as unconditional branches / NOP

• Undo if necessary• Undo if necessary

Strategies:– Fixed (always guess inline or guess target)

– Static (guess on the basis of instruction type)

– Dynamic (guess based on recent history)

Static Branch Prediction

Instr % Guess Branch CorrectInstr % Guess Branch Correct

uncond 14.5 always 100% 14.5%

cond 58 never 54% 27%

loop 9.8 always 91% 9%

call/ret 17 7 always 100% 17 7%call/ret 17.7 always 100% 17.7%

Total 68.2%Total 68.2%

B h P di tiBranch Prediction: (guess inline, go inline)CC

IF    IF D    AG  AG DF   DF EX   EX

IF IF D AG AG TIF TIFI‐1

CC

IF    IF    D    AG  AG   TIF  TIF

IF    IF    DI+1

I

IF     IF   D

I+1

I+2delay = 0

I+2

B h P di tiBranch Prediction: guess inline, goto target

CC

IF    IF    D    AG  AG  DF   DF   EX   EXI‐1

CC

IF    IF    D    AG  AG   TIF  TIF

IF IF D’ D AG

I

IF    IF    D D    AG

IF IF’ D’ IF IF D

T

T+1 IF     IF    D IF    IF     D

delay = 6T+1

B h P di tiBranch Prediction: guess target, go inline

CCIF    IF    D    AG  AG  DF   DF   EX   EX

IF IF D AG AG TIF TIF

I‐1CC

IF    IF    D    AG  AG   TIF  TIFI

T D

D’ DI+1

T

D’ DI+2

delay = 5

B h P di tiBranch Prediction: guess target, goto target

CC

IF    IF    D    AG  AG  DF   DF   EX   EXI‐1

CC

IF    IF    D    AG  AG   TIF  TIF

IF IF D’ D AG

I

IF    IF    D D    AG

IF IF’ D’ IF IF D

T

T+1 IF     IF    D IF    IF     D

delay = 4T+1

S diti l b hSame as unconditional branch

Static prediction strategyStatic prediction strategy

Let p = probability of taking branchp p y g

guess target: delayt = 4 p + 5 (1 ‐ p) = 5 ‐ p

guess inline: delay 6 p + 0 (1 p) 6 pguess inline: delayi = 6 p + 0 (1 ‐ p) = 6 p

⇒ if (delayt < delayi) guess targetelse guess inline

(delayt < delayi) ⇒ 5 ‐ p < 6 p( yt yi) p p

⇒ p > 5/7 = .71

Static prediction strategy ‐p gythresholds for different instructions

CC

IF    IF    D    AG  AG  DF   DF   EX   EX

IF IF D AG AG TIF TIF

I‐1

CC

actual→ T  I

IF    IF    D    AG  AG   TIF  TIFI

guess T 4  5

↓ I 6  0

guess target if  4 p + 5 (1 ‐ p) < 6 p + 0 (1 ‐ p)

i e p > 71i.e. p > .71

Static prediction strategy ‐p gythresholds for different instructions

CCIF    IF    D    AG  AG  DF   DF   EX   EX

IF IF D AG AG TIF TIF EX EX

I‐1CC

actual→ T  I

IF    IF D    AG  AG TIF  TIF EX   EXILoop control

guess T 4  6

↓ I  7  1

guess target if  4 p + 6 (1 ‐ p) < 7 p + 1 (1 ‐ p)

i e p > 62i.e. p > .62

Static prediction strategy ‐p gythresholds for different instructions

CC

IF    IF    D    AG  AG  DF   DF   EX   EX

IF IF D AG TIF TIF

I‐1

CC

actual→ T  I

IF    IF    D    AG  TIF  TIFIregister address

guess T 3  5

↓ I 6  0guess target if  3 p + 5 (1 ‐ p) < 6 p + 0 (1 ‐ p)

i e p > 62i.e. p > .62

Dynamic Branch Prediction

Dynamic Branch Prediction ‐ybasic idea

Predict based on the history of previous branchPredict based on the history of previous branch

loop: xxx 2 miss‐predictions

fxxx     for every

xxx occurrence

xxx

BC loopBC loop

Dynamic Branch Prediction ‐y2 bit prediction scheme

N

0 1

T

T

0/1 3/2T

N

N

T Npredict taken predict not taken

2 3

T

N

Dynamic Branch Prediction ‐ysecond scheme

Predict based on the history of previous nPredict based on the history of previous nbranches e.g., if n = 3 then

3 branches taken⇒ predict taken3 branches taken ⇒ predict taken

2 branches taken ⇒ predict taken

1 branch taken     ⇒ predict not takenp

0 branches taken ⇒ predict not taken

Dynamic Branch Prediction ‐yBimodal predictor

Maintain saturating counters

0 1 2 3

T T TTN

N N N

One counter per branch orOne counter per cache line -

merge results if multiple branchesmerge results if multiple branches

Dynamic Branch Prediction ‐yHistory of last n occurrences

current entry updated entrycurrent entry updated entry

outcome of lastthree occurrences t l t

1     1     0 1     1     1three occurrencesof this branch

actual outcome‘taken’

0 : not taken1 : taken

prediction using majority decision

Correlation between branchesCorrelation between branches

B1: if (x) • B3 can be predictedB1: if (x)

...

• B3 can be predicted with 100% accuracy 

B2: if (y) based on the outcomes of B1 and

...

z = x && y

outcomes of B1 and B2

z = x && y

B3: if (z)

...

Improving Branch Performance

• Branch Elimination– replace branch with other instructions

• Branch Speed Up– reduce time for computing CC and TIF

• Branch Prediction– guess the outcome and proceed, undo if necessary

• Branch Target Capture– make use of history

Branch Target CaptureBranch Target Capture• Branch Target Buffer (BTB)• Target Instruction Buffer (TIB)• Target Instruction Buffer (TIB)

instr addr pred stats targettarget addrprob of target change < 5% target addrtarget instr

prob of target change < 5%

BTB PerformanceBTB Performance

BTB missgo inline

BTB hitgo to target

decision4 6go inline

inline

go to target

result target  inline target

.4    .6

dela 0 5 4 0

.8  .2 .2  .8

delay 0 5 4 0

.4*.8*0 + .4*.2*5 + .6*.2*4 + .6*.8*00 88= 0.88

BTC: Structure of TablesBTC: Structure of Tables

Instruction fetch path withInstruction fetch path with

• BTAC (Branch Target Add Cache)( g )

• BTIC  (Branch Target Ins Cache)

Compute/fetch scheme(no dynamic branch prediction)

IF

InstructionFetch address

BTA

A        I        I + 1      I + 2    I + 3

I ‐ cache

FAR

Compute

IIFA

++BTA

Next sequentialaddress BTI BTI+1 BTI+2 BTI+3BTI   BTI+1   BTI+2   BTI+3

BTAC scheme

IF

InstructionFetch address

BTA

A        I        I + 1      I + 2    I + 3BA    BTA

I ‐ cache

FAR

IIFABTAC

++

Next sequentialaddress BTI BTI+1 BTI+2 BTI+3BTI   BTI+1   BTI+2   BTI+3

BTIC scheme ‐ 1BTIC scheme  1

IF

InstructionFetch address

BTA

A        IBA     BTI      BTA+

I ‐ cache

FAR

IIFABTIC

++

Next sequentialaddress

To decoder

Superscalar/VLIWp /• Instruction level parallelism

• EndSem Exam : Covers only post Midsem part• EndSem Exam : Covers only post Midsem part

• VLIW  (Intel Itanium, TI OMAP)

• Superscalar (Pentium, Athlon)– Parallel Issue, Parallel Decode

– Dependency Check (Reservation Station, Renaming)

– Parallel Execute, Serial Commit

top related