idealized piecewise linear branch prediction daniel a. jiménez department of computer science...
TRANSCRIPT
![Page 1: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/1.jpg)
Idealized Piecewise Linear Branch Prediction
Daniel A. Jiménez
Department of Computer ScienceRutgers University
and
Departament d'Arquitectura de Computadors
Universitat Politècnica de Catalunya
![Page 2: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/2.jpg)
2
This Talk
Brief introduction to conditional branch prediction Some motivation, some background
Introduction to neural branch prediction Perceptron predictor Mathematical intuition Some pictures and movies
Piecewise Linear Branch Prediction The algorithm Why it’s better
Idealized Piecewise Linear Branch Prediction My entry into the championship branch predictor contest
![Page 3: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/3.jpg)
3
Pipelining and Branches
Instruction fetch
Instruction decode
Execute
Memory access
Write back
Pipelining overlaps instructions to exploit parallelism, allowing the clock rate to be increased. Branches cause bubbles in the pipeline, where some stages are left idle.
Unresolved branch instruction
![Page 4: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/4.jpg)
4
Branch Prediction
Instruction fetch
Instruction decode
Execute
Memory access
Write back
A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path.
Speculative execution
Branch predictors must be highly accurate to avoid mispredictions!
![Page 5: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/5.jpg)
5
Branch Predictors Must Improve
The cost of a misprediction is proportional to pipeline depth As pipelines deepen, we need more accurate branch predictors
Pentium 4 pipeline has 31 stages!
Simulations with SimpleScalar/Alpha
Deeper pipelines allow higher clock rates by decreasing the delay of each pipeline stage
Decreasing misprediction rate from 9% to 4% results in 31% speedup for 32 stage pipeline
![Page 6: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/6.jpg)
6
Previous Work on Branch Prediction
The architecture literature is replete with branch prediction papers
Most refine two-level adaptive branch prediction [Yeh & Patt 91]
A 1st-level table records recent global or per-branch pattern histories A 2nd-level table learns correlations between histories and outcomes Refinements focus on reducing destructive interference in the tables
Some of the better refinements (not an exhaustive list): gshare [McFarling `93]
agree [Sprangle et al. `97]
hybrid predictors [Evers et al. `96]
skewed predictors [Michaud et al. `93]
![Page 7: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/7.jpg)
7
Conditional Branch Prediction is a Machine Learning Problem
The machine learns to predict conditional branches So why not apply a machine learning algorithm? Artificial neural networks
Simple model of neural networks in brain cells Learn to recognize and classify patterns
We used fast and accurate perceptrons [Rosenblatt `62, Block `62] for dynamic branch prediction [Jiménez & Lin, HPCA 2001]
We were the first to use single-layer perceptrons and to achieve accuracy superior to PHT techniques. Previous work used LVQ and MLP for branch prediction [Vintan & Iridon `99].
![Page 8: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/8.jpg)
8
Input and Output of the Perceptron
The inputs to the perceptron are branch outcome histories Just like in 2-level adaptive branch prediction
Can be global or local (per-branch) or both (alloyed)
Conceptually, branch outcomes are represented as +1, for taken
-1, for not taken
The output of the perceptron is Non-negative, if the branch is predicted taken
Negative, if the branch is predicted not taken
Ideally, each static branch is allocated its own perceptron
![Page 9: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/9.jpg)
9
Branch-Predicting Perceptron
Inputs (x’s) are from branch history and are -1 or +1 n + 1 small integer weights (w’s) learned by on-line training Output (y) is dot product of x’s and w’s; predict taken if y ≥ 0 Training finds correlations between history and outcome
![Page 10: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/10.jpg)
10
Training Algorithm
![Page 11: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/11.jpg)
11
What Do The Weights Mean?
The bias weight, w0: Proportional to the probability that the branch is taken Doesn’t take into account other branches; just like a Smith predictor
The correlating weights, w1 through wn: wi is proportional to the probability that the predicted branch agrees
with the ith branch in the history
The dot product of the w’s and x’s wi × xi is proportional to the probability that the predicted branch is
taken based on the correlation between this branch and the ith branch Sum takes into account all estimated probabilities
What’s θ? Keeps from overtraining; adapt quickly to changing behavior
![Page 12: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/12.jpg)
12
Organization of the Perceptron Predictor
Keeps a table of m perceptron weights vectors Table is indexed by branch address modulo m
[Jiménez & Lin, HPCA 2001]
![Page 13: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/13.jpg)
13
Mathematical Intuition
A perceptron defines a hyperplane in n+1-dimensional space:
For instance, in 2D space we have:
This is the equation of a line, the same as
![Page 14: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/14.jpg)
14
Mathematical Intuition continued
In 3D space, we have
Or you can think of it as
i.e. the equation of a plane in 3D space
This hyperplane forms a decision surface separating predicted taken from predicted not taken histories. This surface intersects the feature space. Is it a linear surface, e.g. a line in 2D, a plane in 3D, a cube in 4D, etc.
![Page 15: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/15.jpg)
15
Example: AND
Here is a representation of the AND function White means false, black means true for the output -1 means false, +1 means true for the input
-1 AND -1 = false
-1 AND +1 = false
+1 AND -1 = false
+1 AND +1 = true
![Page 16: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/16.jpg)
16
Example: AND continued
A linear decision surface (i.e. a plane in 3D space) intersecting the feature space (i.e. the 2D plane where z=0) separates false from true instances
![Page 17: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/17.jpg)
17
Example: AND continued
Watch a perceptron learn the AND function:
![Page 18: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/18.jpg)
18
Example: XOR
Here’s the XOR function:
-1 XOR -1 = false
-1 XOR +1 = true
+1 XOR -1 = true
+1 XOR +1 = false
Perceptrons cannot learn such linearly inseparable functions
![Page 19: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/19.jpg)
19
My Previous Work on Neural Predictors
The perceptron predictor uses only pattern history information The same weights vector is used for every prediction of a static branch
The ith history bit could come from any number of static branches
So the ith correlating weight is aliased among many branches
The newer path-based neural predictor uses path information The ith correlating weight is selected using the ith branch address
This allows the predictor to be pipelined, mitigating latency
This strategy improves accuracy because of path information
But there is now even more aliasing since the ith weight could be used to
predict many different branches
![Page 20: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/20.jpg)
20
Piecewise Linear Branch Prediction
Generalization of perceptron and path-based neural predictors
Ideally, there is a weight giving the correlation between each
Static branch b, and
Each pair of branch and history position (i.e. i) in b’s history
b might have 1000s of correlating weights or just a few
Depends on the number of static branches in b’s history
First, I’ll show a “practical version”
![Page 21: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/21.jpg)
21
The Algorithm: Parameters and Variables
GHL – the global history length
GHR – a global history shift register
GA – a global array of previous branch addresses
W – an n × m × (GHL + 1) array of small integers
![Page 22: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/22.jpg)
22
The Algorithm: Making a Prediction
Weights are selected based on the current branch and the ith most recent branch
![Page 23: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/23.jpg)
23
The Algorithm: Training
![Page 24: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/24.jpg)
24
Why It’s Better
Forms a piecewise linear decision surface Each piece determined by the path to the predicted branch
Can solve more problems than perceptron
Perceptron decision surface for XOR
doesn’t classify all inputs correctly
Piecewise linear decision surface for XOR
classifies all inputs correctly
![Page 25: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/25.jpg)
25
Learning XOR
From a program that computes XOR using if statements
perceptron prediction piecewise linear prediction
![Page 26: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/26.jpg)
26
A Generalization of Neural Predictors
When m = 1, the algorithm is exactly the perceptron predictor W[n,1,h+1] holds n weights vectors
When n = 1, the algorithm is path-based neural predictor
W[1,m,h+1] holds m weights vectors
Can be pipelined to reduce latency
The design space in between contains more accurate predictors
If n is small, predictor can still be pipelined to reduce latency
![Page 27: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/27.jpg)
27
Generalization Continued
Perceptron and path-based are the least accurate extremes of piecewise linear branch prediction!
![Page 28: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/28.jpg)
28
Idealized Piecewise Linear Branch Prediction
Presented at CBP workshop at MICRO 2004 Hardware budget limited to 64K + 256 bits
No other limitations
Get rid of n and m
Allow 1st and 2nd dimensions of W to be unlimited
Now branches cannot alias one another; accuracy much better
One small problem: unlimited amount of storage required
How to squeeze this into 65,792 bits for the contest?
![Page 29: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/29.jpg)
29
Hashing
3 indices of W : i, j, & k, index arbitrary numbers of weights Hash them into 0..N-1 weights in an array of size N Collisions will cause aliasing, but more uniformly distributed Hash function uses three primes H1 H2 and H3:
![Page 30: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/30.jpg)
30
More Tricks
Weights are 7 bits, elements of GA are 8 bits Separate arrays for bias weights and correlating weights Using global and per-branch history
An array of per-branch histories is kept, alloyed with global history
Slightly bias the predictor toward not taken Dynamically adjust history length
Based on an estimate of the number of static branches
Extra weights Extra bias weights for each branch Extra correlating weights for more recent history bits
Inverted bias weights that track the opposite of the branch bias
![Page 31: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/31.jpg)
31
Parameters to the Algorithm
#define NUM_WEIGHTS 8590#define NUM_BIASES 599#define INIT_GLOBAL_HISTORY_LENGTH 30#define HIGH_GLOBAL_HISTORY_LENGTH 48#define LOW_GLOBAL_HISTORY_LENGTH 18#define INIT_LOCAL_HISTORY_LENGTH 4#define HIGH_LOCAL_HISTORY_LENGTH 16#define LOW_LOCAL_HISTORY_LENGTH 1#define EXTRA_BIAS_LENGTH 6#define HIGH_EXTRA_BIAS_LENGTH 2#define LOW_EXTRA_BIAS_LENGTH 7#define EXTRA_HISTORY_LENGTH 5#define HIGH_EXTRA_HISTORY_LENGTH 7#define LOW_EXTRA_HISTORY_LENGTH 4#define INVERTED_BIAS_LENGTH 8#define HIGH_INVERTED_BIAS_LENGTH 4#define LOW_INVERTED_BIAS_LENGTH 9
#define NUM_HISTORIES 55#define WEIGHT_WIDTH 7#define MAX_WEIGHT 63#define MIN_WEIGHT -64#define INIT_THETA_UPPER 70#define INIT_THETA_LOWER -70#define HIGH_THETA_UPPER 139#define HIGH_THETA_LOWER -136#define LOW_THETA_UPPER 50#define LOW_THETA_LOWER -46#define HASH_PRIME_1 511387U#define HASH_PRIME_2 660509U#define HASH_PRIME_3 1289381U#define TAKEN_THRESHOLD 3
All determined empirically with an ad hoc approach
![Page 32: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/32.jpg)
32
Per-Benchmark Accuracy
I used several highly accurate predictors to compete with my predictor I measured the potential of my technique using an unlimited hardware budget
![Page 33: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/33.jpg)
33
Scores for the 6 Finalists (out of 18 entries)
1. Hongliang Gao, Huiyang Zhou 2.574
2. André Seznec 2.627
3. Gabriel Loh 2.700
4. Daniel A. Jiménez 2.742
5. Pierre Michaud 2.777
6. Veerle Desmet et al. 2.807
scores are in average MPKI (mispredicts per 1000 insts) (corrected) over a suite of 20 traces from Intel
5 of the 6 finalists used ideas from the perceptron predictor in their entries
![Page 34: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/34.jpg)
34
References
Jiménez and Lin, HPCA 2001 (perceptron predictor)
Jiménez and Lin, TOCS 2002 (global/local perceptron)
Jiménez, MICRO 2003 (path-based neural predictor)
Jiménez, ISCA 2005 (piecewise linear branch prediction)
Juan, Sanjeevan, Navarro, SIGARCH Comp. News, 1998
(dynamic history length fitting)
Skadron, Martonosi, Clark, PACT 2000 (alloyed history)
![Page 35: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/35.jpg)
35
The End
![Page 36: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/36.jpg)
36
Extra Slides
Following this slide are some slides cut from the talk to fit within time constraints.
![Page 37: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/37.jpg)
37
Program to Compute XOR
int f () {int a, b, x, i, s = 0;
for (i=0; i<100; i++) {a = rand () % 2;b = rand () % 2;if (a) {
if (b)x = 0;
elsex = 1;
} else {if (b)
x = 1;else
x = 0;}if (x) s++; /* this is the branch */
} return s;
}
![Page 38: Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors](https://reader035.vdocuments.net/reader035/viewer/2022062511/5519f390550346443e8b5589/html5/thumbnails/38.jpg)
38
Example: XOR continued
Watch a perceptron try to learn XOR