ece 486/586 computer architecture lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf ·...
TRANSCRIPT
![Page 1: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/1.jpg)
ECE 486/586
Computer Architecture
Lecture # 16
Spring 2019
Portland State University
![Page 2: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/2.jpg)
Lecture Topics
• Branch Prediction
Reference:
• Chapter 3: Section 3.3
![Page 3: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/3.jpg)
Why Predict Branches?
• The decision about control flow (where to fetch the next instruction from?) is made in the fetch stage
• The branch penalty is non-zero because when the processor computes the branch outcome (in decode stage), a useless instruction may have already been fetched and needs to be discarded
• To prevent the fetching of useless instruction, the processor needs to know about the branch outcome in the fetch stage
• This involves the following steps:
– Anticipating that the instruction being fetched is a branch instruction
– Predicting whether the branch instruction will be taken or not taken
– Predicting the branch target address (for a taken branch)
![Page 4: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/4.jpg)
Basic Branch Prediction
• Branch prediction buffer (branch history table)
• Memory indexed by low order bits of branch instruction address
• Stores previous branch outcomes to predict next outcome
• Memory is not tagged (unlike cache)
• Consequence: entry may reflect a different branch (aliasing)
PC
10PC[11:2]
210 = 1K entries
![Page 5: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/5.jpg)
Static Branch Prediction
• In static branch prediction, the prediction made for a conditional branch remains constant (static) throughout the execution of a program
• Example 1: Always-predict-not-taken– Simplest form of prediction, always fetch next instruction in the sequential order
– In case of a misprediction, the incorrectly fetched instruction is discarded and branch penalty is incurred
– Low prediction accuracy because many branches in the program are taken
• Typically, branch outcomes are not completely random
• In a loop with many iterations, forward branches (beginning of loop) are mostly not taken and backward branches (end of loop) are mostly taken
• Example 2: Predict not-taken for forward branches and taken for backward branches– Improves prediction accuracy as compared to the always-not-taken prediction
– Mispredictions still happen during the last loop iteration
![Page 6: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/6.jpg)
Dynamic Branch Prediction
• Outcomes for a branch instruction often change during program execution– Static prediction may result in high misprediction accuracy
• But, outcomes for a particular branch often follow a predictable pattern
• Key idea behind dynamic branch prediction:– Track the past outcomes for a branch instruction to make predictions about
future outcomes
• In its simplest form, a dynamic prediction algorithm can use the result of the most recent execution of a branch instruction– This result can be captured in a single bit (e.g., “0” if the branch was taken and
“1” if the branch was not taken)
– The processor assumes that the next time, the branch instruction is executed, its outcome is the same as the last time
![Page 7: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/7.jpg)
1-bit Branch Prediction
• The algorithm is implemented by a 2-state state machine:LT -- Branch is likely to be taken
LNT -- Branch is likely not to be taken• The prediction for a branch is based on the current state of the state machine • The state transitions are based on the actual outcome computed after the branch has been executed
![Page 8: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/8.jpg)
Example
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?
![Page 9: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/9.jpg)
Example (cont.)
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?
Instance Current State Prediction Actual Outcome Next State
1 LNT NT T LT
2
3
4
5
6
![Page 10: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/10.jpg)
Example (cont.)
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?
Instance Current State Prediction Actual Outcome Next State
1 LNT NT T LT
2 LT T T LT
3
4
5
6
![Page 11: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/11.jpg)
Example (cont.)
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?
Instance Current State Prediction Actual Outcome Next State
1 LNT NT T LT
2 LT T T LT
3 LT T NT LNT
4 LNT NT T LT
5 LT T T LT
6 LT T NT LNT
![Page 12: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/12.jpg)
Example (cont.)
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 1-bit branch predictor starts in the LNT state. What predictions will it make for each instance of the branch?
Instance Current State Prediction Actual Outcome Next State
1 LNT NT T LT
2 LT T T LT
3 LT T NT LNT
4 LNT NT T LT
5 LT T T LT
6 LT T NT LNT
Prediction Accuracy = 2/6
Mispredictions happen during both the first and last iterations of the loop => one bit of state not enough to capture the branch outcome pattern accurately
![Page 13: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/13.jpg)
2-bit Branch Prediction
ST: Strongly likely to be takenLT: Likely to be taken
LNT: Likely not to be takenSNT: Strongly likely not to be taken
Branch predicted as Not taken in these two states
Branch predicted as Taken in these two states
![Page 14: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/14.jpg)
Example
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 2-bit branch predictor starts in the LT state. What predictions will it make for each instance of the branch?
Instance Current State Prediction Actual Outcome Next State
1 LT T T ST
2 ST T T ST
3 ST T NT LT
4 LT T T ST
5 ST T T ST
6 ST T NT LT
![Page 15: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/15.jpg)
Example (cont.)
• Consider a branch instruction which is executed 6 times in a program. The actual outcomes of the branch are T, T, NT, T, T, NT where “T” = Taken and “N” = Not taken. Assume that the 2-bit branch predictor starts in the LT state. What predictions will it make for each instance of the branch?
Instance Current State Prediction Actual Outcome Next State
1 LT T T ST
2 ST T T ST
3 ST T NT LT
4 LT T T ST
5 ST T T ST
6 ST T NT LT
Prediction Accuracy = 4/6
Mispredictions happen only during the last iteration of the loop => less mispredictionsthat 1-bit prediction
![Page 16: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/16.jpg)
Prediction Accuracy of 4K 2-bit Predictor
![Page 17: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/17.jpg)
Having more Entries Isn’t the Solution
![Page 18: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/18.jpg)
Correlating Branch Predictors
• Simple 2-bit prediction schemes use branch history of single branch to predict future behavior of that branch. This is called a local branch prediction
• Behavior of other branches may have impact on the current branch
• Outcomes of different branches often correlated
Example:
If (a == 2)
a = 0;
If (b == 2)
b = 0;
If ( a == b) {
}
If the first two branches are not taken, then the third one is taken. Local branch prediction cannot capture this behavior
DADDi R3, R1, -2
BNEZ R3, L1 ; a != -2
DADD R1, R0, R0
L1: DADDI R3, R2, -2
BNEZ R3, L2 ; b!= -2
DADD R2, R0, R0
L2: DSUB R3, R1, R2
BEQZ R3, L3 ; a== b
![Page 19: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/19.jpg)
Correlating Branch Predictor with 2-bit Global History Register
Branch Address
11
2-bit per-branch predictors
3
01
10 11 00Prediction
= 11
• Correlating (or 2-level) Predictors use the behavior of other branches (global branch history) to make branch predictions
• Can extend branch history as m-bits recording history of last m branches• Requires 2m tables of length
2(branch address bits used)
• Global branch history implemented as a m-bit shift register where each bit records whether a branch was taken or not taken
2-bit global branch history
![Page 20: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/20.jpg)
Correlating Branch Predictor with m-bit Global History Register
(m,n) correlating predictor uses behavior of last m branches to choose from 2m
branch predictors, each of which is an n-bit predictor
Total number of bits = 2m * n * Number of entries in each prediction table= 2m * n * 2(branch address bits used)
For a predictor that does not use any global history, m = 0, e.g., a (0,2) is a 2-bit predictor with no global history
Branch Address
1..0
n-bit per-branch predictors
3
10….1
10..1 0..1 0..0
m-bit global branch history
![Page 21: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/21.jpg)
Correlating Predictor Examples
Question: How many bits are in the (0,2) branch predictor with 4K entries? How many entries are in a (2,2) predictor with the same number of bits?
Solution:
Number of bits = 2m * n * 2(branch address bits used)
For the (0,2) predictor:
Number of bits = 20 * 2 * 4K = 8K bits
For the (2,2) predictor:
Number of bits = 8K
8K = 22 * 2 * Number of predictor entries
=> Number of predictor entries = 1K
![Page 22: ECE 486/586 Computer Architecture Lecture # 16web.cecs.pdx.edu/~zeshan/ece586_lec16.pdf · 2019-05-28 · Computer Architecture Lecture # 16 Spring 2019 Portland State University](https://reader034.vdocuments.net/reader034/viewer/2022042914/5f4f54ff2afa395c63034989/html5/thumbnails/22.jpg)
Comparison of 2-bit Predictors