lec jan22 2009
DESCRIPTION
TRANSCRIPT
![Page 1: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/1.jpg)
Anshul Kumar, CSE IITD
CSL718 : Pipelined ProcessorsCSL718 : Pipelined ProcessorsCSL718 : Pipelined Processors
Improving Branch Performance22nd Jan, 2009
![Page 2: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/2.jpg)
Anshul Kumar, CSE IITD slide 2
Improving Branch PerformanceImproving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
![Page 3: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/3.jpg)
Anshul Kumar, CSE IITD slide 3
Branch EliminationBranch EliminationBranch Elimination
C
S
Use conditional instructions(predicated execution)
T
F
C : S
OP1BC CC = Z, ∗
+ 2
ADD R3, R2, R1OP2
OP1ADD R3, R2, R1, NZOP2
![Page 4: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/4.jpg)
Anshul Kumar, CSE IITD slide 4
Branch Elimination - contd.Branch Elimination Branch Elimination -- contd.contd.
IF IF IF D AG DF DF DF EX EX
IF IF IF D AG TIF TIF TIF
IF IF IF D’ D AG
OP1
ADD/OP2
BC
CC
IF IF IF D AG DF DF DF EX EXADD(cond)
![Page 5: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/5.jpg)
Anshul Kumar, CSE IITD slide 5
Improving Branch PerformanceImproving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
![Page 6: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/6.jpg)
Anshul Kumar, CSE IITD slide 6
Branch Speed Up : early target address generation
Branch Speed Up : Branch Speed Up : early target address generationearly target address generation
• Assume each instruction is Branch• Generate target address while decoding• If target in same page omit translation• After decoding discard target address if not
Branch
IF IF IF D TIF TIF TIFAGBC
![Page 7: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/7.jpg)
Anshul Kumar, CSE IITD slide 7
Branch Speed Up : increase CC - branch gap Branch Speed Up : Branch Speed Up :
increase CC increase CC -- branch gapbranch gapIncrease the gap between condition checking
and branching• Early CC setting• Delayed branch
![Page 8: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/8.jpg)
Anshul Kumar, CSE IITD slide 8
Early CC setting: insert n instructions (branch taken)
Early CC setting: Early CC setting: insert insert nn instructionsinstructions (branch taken)(branch taken)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 6
I-1
T
I
T+1
CC
(Delay can be reduced withlarger target buffer)
n = 0
![Page 9: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/9.jpg)
Anshul Kumar, CSE IITD slide 9
Early CC setting: insert n instructionsEarly CC setting: Early CC setting: insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 5
I-1
T
I
T+1
CCn = 1
IF IF D AG AG DF DF EX EXJ
![Page 10: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/10.jpg)
Anshul Kumar, CSE IITD slide 10
Early CC setting: insert n instructionsEarly CC setting: Early CC setting: insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 4
I-1
T
I
T+1
CCn = 2
IF IF D AG AG DF DF EX EXJIF IF D AG AG DF DF EX EXK
![Page 11: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/11.jpg)
Anshul Kumar, CSE IITD slide 11
Early CC setting: insert n instructionsEarly CC setting: Early CC setting: insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 4
I-1
T
I
T+1
CCn = 3
IF IF D AG AG DF DF EX EXJIF IF D AG AG DF DF EX EXK
IF IF D AG AG DF DF EX EXL
![Page 12: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/12.jpg)
Anshul Kumar, CSE IITD slide 12
Early CC setting: insert n instructions (branch not taken)
Early CC setting: Early CC setting: insert insert nn instructionsinstructions (branch not taken)(branch not taken)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF D
delay = 5
I-1
I+1
I
I+2
CCn = 0
![Page 13: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/13.jpg)
Anshul Kumar, CSE IITD slide 13
Early CC setting: insert n instructionsEarly CC setting: Early CC setting: insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF D
delay = 4
I-1
I+1
I
I+2
CCn = 1
IF IF D AG AG DF DF EX EXJ
![Page 14: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/14.jpg)
Anshul Kumar, CSE IITD slide 14
Early CC setting: insert n instructionsEarly CC setting: Early CC setting: insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF D
delay = 3
I-1
I+1
I
I+2
CCn = 2
IF IF D AG AG DF DF EX EXJIF IF D AG AG DF DF EX EXK
![Page 15: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/15.jpg)
Anshul Kumar, CSE IITD slide 15
Early CC setting: insert n instructionsEarly CC setting: Early CC setting: insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF D
delay = 2
I-1
I+1
I
I+2
CCn = 3
IF IF D AG AG DF DF EX EXJIF IF D AG AG DF DF EX EXK
IF IF D AG AG DF DF EX EXL
![Page 16: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/16.jpg)
Anshul Kumar, CSE IITD slide 16
Delayed Branch: insert n instructions (branch taken)
Delayed Branch: Delayed Branch: insert insert nn instructionsinstructions (branch taken)(branch taken)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 6
I-1
T
I
T+1
CCn = 0
![Page 17: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/17.jpg)
Anshul Kumar, CSE IITD slide 17
Delayed Branch : insert n instructionsDelayed Branch : Delayed Branch : insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG DF DF EX EX
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 5
I-1
T
J
T+1
CCn = 1
IF IF D AG AG TIF TIF I
![Page 18: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/18.jpg)
Anshul Kumar, CSE IITD slide 18
Delayed Branch : insert n instructionsDelayed Branch : Delayed Branch : insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG DF DF EX EX
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 4
I-1
T
K
T+1
CCn = 2
IF IF D AG AG TIF TIFIIF IF D AG AG DF DF EX EXJ
![Page 19: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/19.jpg)
Anshul Kumar, CSE IITD slide 19
Delayed Branch : insert n instructionsDelayed Branch : Delayed Branch : insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG DF DF EX EX
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 3
I-1
T
L
T+1
CCn = 3
IF IF D AG AG TIF TIF IIF IF D AG AG DF DF EX EXJ
IF IF D AG AG DF DF EX EXK
![Page 20: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/20.jpg)
Anshul Kumar, CSE IITD slide 20
Delayed Branch : insert n instructions (branch not taken)
Delayed Branch : Delayed Branch : insert insert nn instructionsinstructions (branch not taken)(branch not taken)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF D
delay = 5
I-1
I+1
I
I+2
CCn = 0
![Page 21: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/21.jpg)
Anshul Kumar, CSE IITD slide 21
Delayed Branch : insert n instructionsDelayed Branch : Delayed Branch : insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG DF DF EX EX
IF IF D’ D AG
IF IF’ D’ IF D
delay = 4
I-1
I+1
J
I+2
CCn = 1
IF IF D AG AG TIF TIF I
![Page 22: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/22.jpg)
Anshul Kumar, CSE IITD slide 22
Delayed Branch : insert n instructionsDelayed Branch : Delayed Branch : insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG DF DF EX EX
IF IF D’ D AG
IF IF’ D’ IF D
delay = 3
I-1
I+1
K
I+2
CCn = 2
IF IF D AG AG TIF TIF IIF IF D AG AG DF DF EX EXJ
![Page 23: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/23.jpg)
Anshul Kumar, CSE IITD slide 23
Delayed Branch : insert n instructionsDelayed Branch : Delayed Branch : insert insert nn instructionsinstructions
IF IF D AG AG DF DF EX EX
IF IF D AG AG DF DF EX EX
IF IF D’ D AG
IF IF’ D’ IF D
delay = 2
I-1
I+1
L
I+2
CCn = 3
IF IF D AG AG TIF TIFIIF IF D AG AG DF DF EX EXJ
IF IF D AG AG DF DF EX EXK
![Page 24: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/24.jpg)
Anshul Kumar, CSE IITD slide 24
Summary - Branch Speed UpSummary Summary -- Branch Speed UpBranch Speed Up
n=0 n=1 n=2 n=3 n=4 n=5uncond 4 4 4 4 4 4cond (T) 6 5 4 4 4 4cond (I) 5 4 3 2 1 0uncond 4 3 2 1 0 0cond (T) 6 5 4 3 2 1cond (I) 5 4 3 2 1 0de
laye
dea
rly C
Cbr
anch
setti
ng
![Page 25: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/25.jpg)
Anshul Kumar, CSE IITD slide 25
Improving Branch PerformanceImproving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
![Page 26: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/26.jpg)
Anshul Kumar, CSE IITD slide 26
Branch PredictionBranch PredictionBranch Prediction
• Treat conditional branches as unconditional branches / NOP
• Undo if necessaryStrategies:
– Fixed (always guess inline)– Static (guess on the basis of instruction type)– Dynamic (guess based on recent history)
![Page 27: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/27.jpg)
Anshul Kumar, CSE IITD slide 27
Prediction based on statisticsPrediction based on statisticsPrediction based on statistics
Total 68.2% 72.2%
Instr % Branch
uncond 14.5 100%
cond 58 54%
loop 9.8 91%
call/ret 17.7 100%
Guess Correct
always 14.5%
never 27%
always 9%
always 17.7%
Guess Correct
always 14.5%
always 31%
always 9%
always 17.7%
![Page 28: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/28.jpg)
Anshul Kumar, CSE IITD slide 28
Branch Prediction (guess inline, go inline) Branch PredictionBranch Prediction (guess inline, go inline)(guess inline, go inline)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D
IF IF D
delay = 0
I-1
I+1
I
I+2
CC
![Page 29: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/29.jpg)
Anshul Kumar, CSE IITD slide 29
Branch Prediction (guess inline, goto target) Branch PredictionBranch Prediction
(guess inline, (guess inline, gotogoto target)target)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 6
I-1
T
I
T+1
CC
![Page 30: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/30.jpg)
Anshul Kumar, CSE IITD slide 30
Branch Prediction (guess target, go inline) Branch PredictionBranch Prediction (guess target, go inline)(guess target, go inline)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
D’ D
D’ D
delay = 5
I-1
I+1
I
I+2
CC
TD
![Page 31: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/31.jpg)
Anshul Kumar, CSE IITD slide 31
Branch Prediction (guess target, goto target) Branch PredictionBranch Prediction
(guess target, (guess target, gotogoto target)target)
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF
IF IF D’ D AG
IF IF’ D’ IF IF D
delay = 4
I-1
T
I
T+1
CC
Same as unconditional branch
![Page 32: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/32.jpg)
Anshul Kumar, CSE IITD slide 32
Static prediction strategyStatic prediction strategyStatic prediction strategy
Let p = probability of taking branchguess target: delayt = 4 p + 5 (1 - p) = 5 - pguess inline: delayi = 6 p + 0 (1 - p) = 6 p⇒ if (delayt < delayi ) guess target
else guess inline(delayt < delayi ) ⇒ 5 - p < 6 p
⇒ p > 5/7 = .71
![Page 33: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/33.jpg)
Anshul Kumar, CSE IITD slide 33
Static prediction strategy - thresholds for different instructions
Static prediction strategy Static prediction strategy -- thresholds for different instructionsthresholds for different instructions
actual → T Iguess T 4 5
↓
I 6 0guess target if 4 p + 5 (1 - p) < 6 p + 0 (1 - p)
i.e. p > .71
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIFI-1
I
CC
![Page 34: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/34.jpg)
Anshul Kumar, CSE IITD slide 34
Static prediction strategy - thresholds for different instructions
Static prediction strategy Static prediction strategy -- thresholds for different instructionsthresholds for different instructions
actual → T Iguess T 4 6
↓
I 7 1guess target if 4 p + 6 (1 - p) < 7 p + 1 (1 - p)
i.e. p > .62
IF IF D AG AG DF DF EX EX
IF IF D AG AG TIF TIF EX EXI-1
I
CC
Loop control
![Page 35: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/35.jpg)
Anshul Kumar, CSE IITD slide 35
Static prediction strategy - thresholds for different instructions
Static prediction strategy Static prediction strategy -- thresholds for different instructionsthresholds for different instructions
actual → T Iguess T 3 5
↓
I 6 0guess target if 3 p + 5 (1 - p) < 6 p + 0 (1 - p)
i.e. p > .62
IF IF D AG AG DF DF EX EX
IF IF D AG TIF TIFI-1
I
CC
register address
![Page 36: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/36.jpg)
Anshul Kumar, CSE IITD slide 36
Delayed Branch with NullificationDelayed Branch with NullificationDelayed Branch with Nullification
(Also called annulment )• Delay slot is used optionally• Branch instruction specifies the option• Option may be exercised based on
correctness of branch prediction• Helps in better utilization of delay slots
![Page 37: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/37.jpg)
Anshul Kumar, CSE IITD slide 37
Variants of NullificationVariants of NullificationVariants of Nullification
D D
bc
D D
bc
D D
bc
D D
bc
1.No annulment
(branch-with-execute)
2.Annul if not taken
(branch-or-skip)
3.AnnulIf taken
(branch-with-skip)
4.Annulalways
Examples•SPARC: 1, 2•MC88100: 1, 4•i860: 2, 4•HP PA: 1, 2, 3
![Page 38: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/38.jpg)
Anshul Kumar, CSE IITD slide 38
Annulment illustrationAnnulment illustrationAnnulment illustration
bc
D
bc
D
use branch-or-skip use branch-with-skip
![Page 39: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/39.jpg)
Anshul Kumar, CSE IITD slide 39
Dynamic Branch Prediction - basic idea
Dynamic Branch Prediction Dynamic Branch Prediction -- basic ideabasic idea
Predict based on the history of previous branch
loop: xxx 2 mispredictionsxxx for everyxxx occurrencexxxBC loop
![Page 40: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/40.jpg)
Anshul Kumar, CSE IITD slide 40
Dynamic Branch Prediction - 2 bit prediction scheme
Dynamic Branch Prediction Dynamic Branch Prediction -- 2 bit prediction scheme2 bit prediction scheme
0 1
2 3
N
T
N
T
N
T
T N0/1 3/2
predict taken predict not taken
![Page 41: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/41.jpg)
Anshul Kumar, CSE IITD slide 41
Dynamic Branch Prediction - Bimodal predictor
Dynamic Branch Prediction Dynamic Branch Prediction -- Bimodal predictorBimodal predictor
Maintain saturating counters
0 1 2 3
T
N
T
N
T
N
TN
![Page 42: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/42.jpg)
Anshul Kumar, CSE IITD slide 42
Dynamic Branch Prediction - History of last n occurrences
Dynamic Branch Prediction Dynamic Branch Prediction -- History of last History of last nn occurrencesoccurrences
1 1 0
current entry
1 1 1
updated entry
outcome of lastthree occurrencesof this branch
0 : not taken1 : taken
prediction using majority decision
actual outcome‘taken’
![Page 43: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/43.jpg)
Anshul Kumar, CSE IITD slide 43
Dynamic Branch Prediction - storing prediction counters
Dynamic Branch Prediction Dynamic Branch Prediction -- storing prediction countersstoring prediction counters
store in separate buffer or in cache directory
CACHEdirectory storage
cache line
counterOne counter per branch orOne counter per cache line -
merge results if multiple branches
![Page 44: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/44.jpg)
Anshul Kumar, CSE IITD slide 44
Correct guesses vs. history lengthCorrect guesses vs. history lengthCorrect guesses vs. history length
n Compiler Business Scientific Supervisor0 64.1 64.4 70.4 54.01 91.9 95.2 86.6 79.72 93.3 96.5 90.8 83.43 93.7 96.6 91.0 83.54 94.5 96.8 91.8 83.75 94.7 97.0 92.0 83.9
![Page 45: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/45.jpg)
Anshul Kumar, CSE IITD slide 45
Two-Level PredictionTwoTwo--Level PredictionLevel Prediction
• Uses two levels of information to make a direction prediction– Branch History Table (BHT) - last n
occurrences– Pattern History Table (PHT) - saturating 2 bit
counters• Captures patterned behavior of branches
– Groups of branches are correlated– Particular branches have particular behavior
![Page 46: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/46.jpg)
Anshul Kumar, CSE IITD slide 46
Correlation between branchesCorrelation between branchesCorrelation between branches
B1: if (x)...
B2: if (y)...
z = x && yB3: if (z)
...
• B3 can be predicted with 100% accuracy based on the outcomes of B1 and B2
![Page 47: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/47.jpg)
Anshul Kumar, CSE IITD slide 47
PHT
T/NT
1 0 1 1 0GBHR
PHT
PC
T/NT
BHT
1 1 0 1 0
1 1 1 0 0
0 0 1 1 1
0 1 1 1 1
Global Predictor Local Predictor
Some Two-level PredictorsSome TwoSome Two--level Predictorslevel Predictors
bits from PC and BHT can be combined to index PHT
![Page 48: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/48.jpg)
Anshul Kumar, CSE IITD slide 48
Two-level Predictor ClassificationTwoTwo--level Predictor Classificationlevel Predictor Classification
• Yeh and Patt 3-letter naming scheme– Type of history collected
• G (global), P (per branch), S (per set)
– PHT type• A (adaptive), S (static)
– PHT organization• g (global), p (per branch), s (per set)
• Examples - GAs, PAp etc.
![Page 49: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/49.jpg)
Anshul Kumar, CSE IITD slide 49
Improving Branch PerformanceImproving Branch PerformanceImproving Branch Performance
• Branch Elimination– replace branch with other instructions
• Branch Speed Up– reduce time for computing CC and TIF
• Branch Prediction– guess the outcome and proceed, undo if necessary
• Branch Target Capture– make use of history
![Page 50: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/50.jpg)
Anshul Kumar, CSE IITD slide 50
Branch Target CaptureBranch Target CaptureBranch Target Capture
• Branch Target Buffer (BTB)• Target Instruction Buffer (TIB)
instr addr pred stats targettarget addrtarget instr
prob of target change < 5%
![Page 51: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/51.jpg)
Anshul Kumar, CSE IITD slide 51
BTB PerformanceBTB PerformanceBTB Performance
BTB missgo inline
inline
BTB hitgo to target
decision
result target inline target
delay 0 6 5 0
.4 .6
.8 .2 .2 .8
.4*.8*0 + .4*.2*6 + .6*.2*5 + .6*.8*0= 1.08
![Page 52: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/52.jpg)
Anshul Kumar, CSE IITD slide 52
Dynamic information about branchDynamic information about branchDynamic information about branch
• Previous branch decisions
• Explicit prediction• Stored in cache
directory Branch History Table (BHT)
• Previous target address / instruction
• Implicit prediction• Stored in separate buffer Branch Target Buffer (BTB)Br Target Addr Cache (BTAC)
Target Instr Buffer (TIB)Br Target Instr Cache (BTIC)
These two can be combined
![Page 53: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/53.jpg)
Anshul Kumar, CSE IITD slide 53
Storing prediction infoStoring prediction infoStoring prediction info
In cache
directory storage
cache line
counter
instr addr pred stats target
In separatebuffer
![Page 54: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/54.jpg)
Anshul Kumar, CSE IITD slide 54
Combined prediction mechanismCombined prediction mechanismCombined prediction mechanism
• Explicit : use history bits• Implicit : use BTB hit/miss
– hit ⇒ go to target, miss ⇒ go inline• Combined : BTB hit/miss followed by
explicit prediction using history bits.– commonly used :
hit ⇒ go to target, miss ⇒ explicit prediction– alternatively :
miss ⇒ go inline, hit ⇒ explicit prediction
![Page 55: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/55.jpg)
Anshul Kumar, CSE IITD slide 55
Combined predictionCombined predictionCombined prediction
BTB missI
BTB hit BTB miss
I
BTB hitT
I T
expl predict
Prediction ⇒ T: Target, I: Inline Actual outcome ⇒ T: Target, I: Inline
I T I T
T
I T I T
Iexpl predict
T
I T
![Page 56: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/56.jpg)
Anshul Kumar, CSE IITD slide 56
Structure of TablesStructure of TablesStructure of Tables
Instruction fetch path with• BHT• BTAC• BTIC
![Page 57: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/57.jpg)
Anshul Kumar, CSE IITD slide 57
Compute/fetch schemeCompute/fetch schemeCompute/fetch scheme
I - cache
IFA R
+
InstructionFetch address
ComputeBTA
BTAIIFA
Next sequentialaddress
A I I + 1 I + 2 I + 3
BTI BTI+1 BTI+2 BTI+3
(no dynamic branch prediction)
![Page 58: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/58.jpg)
Anshul Kumar, CSE IITD slide 58
BHT (Branch History Table)BHT (Branch History Table)BHT (Branch History Table)
I-cache16 K
4-way set assocBHT
Predictionlogic
2 2 2 2History bits
InstructionFetch address
2 2 2 2
128 x 4entries
128 x 4 lines8 instr/line
4 instr/cycle
decode queue
issue queue
4 x 1 instr
4 x 1 instr
Taken / not takenBTA for a taken guess
![Page 59: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/59.jpg)
Anshul Kumar, CSE IITD slide 59
BTAC schemeBTAC schemeBTAC scheme
I - cache
IFA R
+
InstructionFetch addressBTA
IIFA
Next sequentialaddress
A I I + 1 I + 2 I + 3
BTI BTI+1 BTI+2 BTI+3
BTAC
BA BTA
![Page 60: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/60.jpg)
Anshul Kumar, CSE IITD slide 60
BTIC scheme - 1BTIC scheme BTIC scheme -- 11
I - cache
IFA R
+
InstructionFetch addressBTA
IIFA
Next sequentialaddress
A I
BTIC
BA BTI BTA+
To decoder
![Page 61: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/61.jpg)
Anshul Kumar, CSE IITD slide 61
BTIC scheme - 2BTIC scheme BTIC scheme -- 22
I - cache
IFA R
+
InstructionFetch addressBTA+
IIFA
Next sequentialaddress
A I I+1
BTIC
BA BTI BTI+1
To decoder
computed
![Page 62: Lec Jan22 2009](https://reader034.vdocuments.net/reader034/viewer/2022042601/5495f63cb47959514d8b4ea6/html5/thumbnails/62.jpg)
Anshul Kumar, CSE IITD slide 62
ReferencesReferencesReferences1. M.J. Flynn, "Computer Architecture :
Pipelined and Parallel Processor Design", Narosa Publishing House/ Jones and Bartlett, 1996.
2. D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997.
3. D.A. Patterson, J.L. Hennessy, "Computer Architecture : A Quantitative Approach", Morgan Kaufmann Publishers, 2006.