a penalty-sensitive branch predictor
DESCRIPTION
A Penalty-Sensitive Branch Predictor. Yue Hu David M. Koppelman Lu Peng. Department of Electrical and Computer Engineering Louisiana State University . . 1. Motivation. Typical branch predictor: to decrease misprediction rate ( MR ):. - PowerPoint PPT PresentationTRANSCRIPT
Yue HuDavid M. Koppelman
Lu Peng
A Penalty-Sensitive Branch Predictor
Department of Electrical and Computer EngineeringLouisiana State University .
Why not favor HP branches to decrease their MR?
1. MotivationTypical branch predictor: to decrease misprediction rate (MR):
i.e. Two-level adaptive (Yeh & Patt), Neural (Vintan & Jimenez) and LTAGE (Seznec)
Performance can also be improved even if MR doesn’t decrease
Even if total MR doesn't decrease, performance could still be improved
Time
Run 1Run 2
Time that a mispredicted branch is on the wrong path
However
2
High penalty (HP) Low penalty (HP)
The same program on the same computers but different branch predictors
Two-class TAGE predictor
Loop predictor
PCResolve cycles
Final prediction
History
Penalty predictor
PC
PC
Loop enabled?
Yes
No
PC
1 2
3
1: Predict a branch: HP or LP?2: Based on TAGE, can favor HP branches, while only provide normal operation for LP branches;3: Enabled only when beneficial.
Design Overview2. Design OverviewMain predictor
Assistant predictor
Figure 1. Overall structure of our predictor
3
…
Penalty table
8-bit penalty counter (CNT) 1-bit penalty state (STA)
Design Overview2.1 Penalty Predictor
CNT = 0;STA = LP
Penalty>= 120 cyc?
CNT += 8; CNT --;Yes
No
CNT >= 192?
STA = HPYes
No
CNT == 0?
STA = LPYes
No
High-penalty state remains at least hundreds of executions, so the following HP branches can get benefits. 4
Bank 0 Bank 1 Bank 2 Bank 3 Bank 4 Bank 5 Bank 6
History
PC
hash hash hash hashhash hash
...
2-bit bimodalpredictor
3-bit pred
2-bit use (U)
[9-16]-bit tag
Hash (His, PC) Index: direct to one entry in each bank;
wider tag
Prediction:
Higher bank: longer history, wider tag -> more accurate
Design Overview2.2 Two-class TAGE Predictor
Tag: check whether hit (H) or miss (M);
U0
U2
U0
U1
U1U1H
M
HM
M
MM
Final Prediction
[Only rough idea]
5
Update:
New entries allocated at higher banks when mispred.LP: only one entry allocated; HP: a second entry allocated with two limitations
1. A bank with a useless entry;
Design Overview2.2 Two-class TAGE Predictor
Bank 0 Bank 1 Bank 2 Bank 3 Bank 4 Bank 5 Bank 6
M U0
H U2
M U1
M U0
M U1 M U0
History
PC
hash hash hash hashhash hash
...
mispred Since occupied, not used.
First allocation here
HP’s double-entry allocation doesn’t harm that of LP too much
Since occupied, not used.
Second allocation here for HP
2. Last two allocations in the bank are one-entry allocations; 6
Update: Design Overview2.2 Two-class TAGE Predictor
Bank 0 Bank 1 Bank 2 Bank 3 Bank 4 Bank 5 Bank 6
M U0
H U2
M U1
M U0
M U1 M U0
History
PC
hash hash hash hashhash hash
...
mispred Since occupied, not used.
First allocation here
Double-entry allocation favors HP branches so that their new entries can survive longer time to establish their usefulness.
Since occupied, not used.
Second allocation here for HP
Two cases for U01. Entry itself is not recently useful, if ever;2. New allocation, usefulness hasn’t been established
7
CL0
1C
L02
CL0
3C
L04
CL
05C
L06
CL0
7C
L08
CL0
9C
L10
CL
11C
L12
CL1
3C
L14
CL1
5C
L16 I I I I I I S S S S S
Ave
-100
102030405060708090
100
1. predicted to be HP (50.2%);2. among all branches, actual HP (27%);3. predicted LP while turn out to be HP (1.3%);
PerformanceAnalysis3.1 Penalty Predictor
Average penalty of branches predicted LP: 121 HP: 212 cycles
%
8
covers 98.7% actual HP
8K 16K 32K 64K 128K 256K0.03
0.031
0.032
0.033
0.034
0.035
0.036
0.037
0.038
0.039
High-penalty branches
-5E-5
-4E-5
-6E-5
-4E-5
-8E-5-7E-5
8K 16K 32K 64K 128K 256K0.03
0.031
0.032
0.033
0.034
0.035
0.036
0.037
0.038
0.039LTAGE PSLTAGE
Low-penalty branches
+7E-5
+4E-5+3E-6
+3E-5
+2E-5
-9E-5
3.2 Two-class TAGE predictorMR
PerformanceAnalysis
1. MR of HP branches is about 10% higher;
All negative
2. Penalty-Sensitive (PS) method effectively favors HP branch;3. 64KB: HP, -6E-5; LP, +3E-5. 9Overall, it is beneficial.
Loop branches; branches with cache misses
4 SummaryOur penalty-sensitive branch predictor works Penalty predictor: 50.2% predicted HP; covers 98.7% actual HP Average penalty ( HP VS LP= 212: 121)Two-class TAGE predictor: favor HP branches, globally beneficial, but limited
Limited favoring mechanism: Double-entry allocation for HP branches to increase the chance that their new entries will survive longer time to establish usefulness. Future: more helpful favoring mechanism needed
10
Conclusion:
2. Even if total MR doesn’t decrease, performance could still be improved by favoring HP branches;
1. Mispredicted HP branches are more harmful;
3. Can be applied to any predictors once we can find an effective favoring mechanism.
Thanks!
11
CL0
1
CL0
2
CL0
3
CL0
4
CL0
5
CL0
6
CL0
7
CL0
8
CL0
9
CL1
0
CL1
1
CL1
2
CL1
3
CL1
4
CL1
5
CL1
6
INT0
1
INT0
2
INT0
3
INT0
4
INT0
5
INT0
6
MM
01
MM
02
MM
03
MM
04
MM
05
MM
06
MM
07
SE
R01
SE
R02
SE
R03
SE
R04
SE
R05
WS
01
WS
02
WS
03
WS
04
WS
05
WS
06
Ave
rage
0
50
100
150
200
250
300Lo_AvgPen Hi_AvgPen
1830317
Penalty Predictor Backup Slides
12
8K 16K 32K 64K 128K 256K0.03
0.031
0.032
0.033
0.034
0.035
0.036
0.037
0.038
0.039
High-penalty branches
-5E-5
-4E-5
-6E-5
-4E-5
-8E-5-7E-5
8K 16K 32K 64K 128K 256K0.03
0.031
0.032
0.033
0.034
0.035
0.036
0.037
0.038
0.039LTAGE PSLTAGE
Low-penalty branches
+7E-5
+4E-5+3E-6
+3E-5
+2E-5
-9E-5
Two-class TAGE predictorMR
-6E-5 -4.7E-4
= 12.8%
-6E-5
-4.7E-4
Penalty-Sensitive achieved 12.8% improvement on MR of HP Branch that would be achieved by doubling storage budget.
Backup Slides
13
Loop PredictorC
lient
01C
lient
02C
lient
03C
lient
04C
lient
05C
lient
06C
lient
07C
lient
08C
lient
09C
lient
10C
lient
11C
lient
12C
lient
13C
lient
14C
lient
15C
lient
16in
t01
int0
2in
t03
int0
4in
t05
int0
6m
m01
mm
02m
m03
mm
04m
m05
mm
06m
m07
serv
er01
serv
er02
serv
er03
serv
er04
serv
er05
ws0
1w
s02
ws0
3w
s04
ws0
5w
s06
Ave
rage
0100200300400500600700800900
1000PSTAGE(without loop)
PSLTAGE16431643
22082208
28392839
82047920
66246630
25922515
1000 987
36733596
1.3% Improvement with only 0.53KB
MPPKI
Average MPPKI normalized to 1000 Very efficient
Backup Slides
14