hardware based speculation - bt.nitk.ac.in€¦ · hardware based speculation execute instructions...
TRANSCRIPT
Hardware based Speculation● Execute instructions along predicted execution paths but
only commit the results if prediction was correct
● Instruction commit: allowing an instruction to update the register file when instruction is no longer speculative
● Need an additional piece of hardware to prevent any irrevocable action until an instruction commits
– Reorder Buffer● In-order commit● Stores instruction results before instruction commits● Clear ROB on misprediction● Exceptions
ROB – Loop Based Example
ROB
Entry Busy Instruction State Destination Value
1
2
3
4
5
6
7
8
9
10
no
yes
no
yes
yes
yes
yes
yes
yes
yes
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
DADDIU R1, R1, #-8
BNE R1, R2, LOOP
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
DADDIU R1, R1, #-8
BNE R1, R2, LOOP
Commit
Commit
Write Result
Write Result
Write Result
Write Result
Write Result
Write Result
Write Result
Write Result
F0
F4
0+Regs[R1]
R1
F0
F4
0+Regs[R1]
R1
Mem[0+Regs[R1]]
#1 * Regs[F2]
#2
Regs[R1]-8
Mem[#4]
#6 * Regs[F2]
#7
#4 - #8
Multiple Issue and Static Scheduling
To achieve CPI < 1, need to complete multiple instructions per clock
● Statically scheduled superscalar processors● VLIW (Very Long Instruction Word) processors● Dynamically scheduled superscalar processors
Dynamic Scheduling + Multiple Issue + Speculation
Limit the number of instructions of a given class that can be issued in a “bundle”I.e. on FP, one integer, one load, one store
Examine all the dependencies among the instructions in the bundle
Also need multiple completion/commit
Dynamic Scheduling + Multiple Issue
Instructions Issues at clock
Executes at clock
Mem Access at clock
Write CDB at clock
1 LD R2, 0(R1)
1 DADDIU R2, R2, #1
1 SD R2, 0(R1)
1 DADDIU R1, R1, #8
1 BNE R2, R3, L
2 LD R2, 0(R1)
2 DADDIU R2, R2, #1
2 SD R2, 0(R1)
2 DADDIU R1, R1, #8
2 BNE R2, R3, L
3 LD R2, 0(R1)
3 DADDIU R2, R2, #1
3 SD R2, 0(R1)
3 DADDIU R1, R1, #8
3 BNE R2, R3, L
1
1
2
2
3
4
4
5
5
6
7
7
8
8
9
2 3
3
3
4
4
5
7
8
11
9
8
13
14
17
15
14
19
7
9
13
15
19
15
18
16
9
12
10
6
2-way Superscalar
Instructions Issues at clock
Executes at clock
Mem Access at clock
Write CDB at clock
Commits at clock
1 LD R2, 0(R1)
1 DADDIU R2, R2, #1
1 SD R2, 0(R1)
1 DADDIU R1, R1, #8
1 BNE R2, R3, L
2 LD R2, 0(R1)
2 DADDIU R2, R2, #1
2 SD R2, 0(R1)
2 DADDIU R1, R1, #8
2 BNE R2, R3, L
3 LD R2, 0(R1)
3 DADDIU R2, R2, #1
3 SD R2, 0(R1)
3 DADDIU R1, R1, #8
3 BNE R2, R3, L
1
1
2
2
3
4
4
5
5
6
7
7
8
8
9
2 3
3
3
4
4
5
7
5
8
6
6
10
8
11
9
9
13
7
6
10
9
13
10
12
10
7
9
7
6
Dynamic Scheduling + Multiple Issue + Speculation
5
7
8
8
9
10
11
11
12
13
14
14
2-way Superscalar
Literature on Processors● Efficient Reading of Papers in Science and Technolo
gy● Yeager, The MIPS R10000 Processor, MICRO,
1996.● Hinton et. al., The Microarchitecture of the Pentium 4
Processor. Intel Technology Journal Q1, 2001.● Smith and Sohi. Microarchitecture of Superscalar
Processors. Proc. of IEEE. 1995.● Kahle, et. al. Introduction to the Cell multiprocessor.
IBM J. RES. & DEV. 2005. ● Hammerlund, et. al., Haswell: The fourth generation
Intel Processor, MICRO 2014.
Tomasulo's - Loop based ExampleInstruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
√√√√√
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 noMult1
Mult2
Store1
Store2
Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Load1 Mult2
√
yes
yes
yesno
Load
MUL
MUL
Regs[F2]
Regs[F2]
Load1
Regs[R1] - 8
yes Load Regs[R1] + 0no
no
no
no
noyes
yes Store Mult1
Mult2
Regs[R1]+0
Regs[R1]-8
Load2
Store
Load2 Mult1
1
Tomasulo's - Loop based ExampleInstruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
√√√√√
√
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 noMult1
Mult2
Store1
Store2
Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
√
yes
yes
yes
Load
MUL
MUL
Regs[F2]
Regs[F2]
Load1
Regs[R1] - 8
yes Load Regs[R1] + 0
yes
yes Store Mult1
Mult2
Regs[R1]+0
Regs[R1]-8
Load2
Store
Load2
2
Tomasulo's - Loop based ExampleInstruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
√√√√√
√
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 noMult1
Mult2
Store1
Store2
Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
√
yes
yes
yes
Load
MUL
MUL
Regs[F2]
Regs[F2]
Load1
Regs[R1] - 8
yes Load Regs[R1] + 0
yes
yes Store Mult1
Mult2
Regs[R1]+0
Regs[R1]-8
Load2
Store
Load2
no
Mem[Regs[R1] + 0]
3
√
√
Tomasulo's - Loop based ExampleInstruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
√√√√√
√
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 noMult1
Mult2
Store1
Store2
Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
√
yes
yes
yes
Load
MUL
MUL
Regs[F2]
Regs[F2]
Regs[R1] - 8
yes
yes Store Mult1
Mult2
Regs[R1]+0
Regs[R1]-8
Load2
Store
Load2
no
Mem[Regs[R1] + 0]
4
√√
M:4
√ √
no
Tomasulo's - Loop based ExampleInstruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
√√√√√
√
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 noMult1
Mult2
Store1
Store2
Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
√
yes
yes
MUL
MUL
Regs[F2]
Regs[F2]
yes
yes Store Mult1
Mult2
Regs[R1]+0
Regs[R1]-8
Load2
Store
no
Mem[Regs[R1] + 0]
5
√√
M:4
√ √
no
Mem[Regs[R1] - 8]
Tomasulo's - Loop based ExampleInstruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
√√√√√
√
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 noMult1
Mult2
Store1
Store2
Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
√
yes
yes
MUL
MUL
Regs[F2]
Regs[F2]
yes
yes Store Mult1
Mult2
Regs[R1]+0
Regs[R1]-8Store
no
Mem[Regs[R1] + 0]
6
√√
M:4
√ √
no
Mem[Regs[R1] - 8]
√
Tomasulo's - Loop based ExampleInstruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
√√√√√
√
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 noMult1
Mult2
Store1
Store2
Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
√
yes
yes
MUL
MUL
Regs[F2]
Regs[F2]
yes
yes Store Mult1
Mult2
Regs[R1]+0
Regs[R1]-8Store
no
Mem[Regs[R1] + 0]
7
√√
M:4
√ √
no
Mem[Regs[R1] - 8]
√
Tomasulo's - Loop based ExampleInstruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
√√√√√
√
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 noMult1
Mult2
Store1
Store2
Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
√
yes
yes
MUL
MUL
Regs[F2]
Regs[F2]
yes
yes Store Mult1
Mult2
Regs[R1]+0
Regs[R1]-8Store
no
Mem[Regs[R1] + 0]
8
√√
M:4
√ √
no
Mem[Regs[R1] - 8]
√
√
Mul[F4]
no
Tomasulo's - Loop based ExampleInstruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
√√√√√
√
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 noMult1
Mult2
Store1
Store2
Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
√
yes MUL Regs[F2]
yes
yes Store
Mult2
Regs[R1]+0
Regs[R1]-8Store
no
9
√√
M:4
√ √
no
Mem[Regs[R1] - 8]
√
√
Mul[F4]
√
no
Tomasulo's - Loop based ExampleInstruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
√√√√√
√
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 noMult1
Mult2
Store1
Store2
Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi Mult2
√
yes MUL Regs[F2]
yes
yes Store
Mult2
Regs[R1]+0
Regs[R1]-8Store
no
10
√√
M:4
√ √
no
Mem[Regs[R1] - 8]
√
√
Mul[F4]
√
no
√
√
no
Mem[Regs[R1] - 8]
no
Tomasulo's - Loop based ExampleInstruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
√√√√√
√
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 noMult1
Mult2
Store1
Store2
Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi
√
yes Regs[R1]-8Store
no
11
√√
M:4
√ √
no
√
√√
no
√
√
no
Mem[Regs[R1] - 8]
no
√
Tomasulo's - Loop based ExampleInstruction Status
Instruction Issue Execute Write result
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
L.D F0, 0(R1)
MUL.D F4, F0, F2
S.D F4, 0(R1)
√√√√√
√
Reservation Stations
Name Busy Op Vj Vk Qj Qk A
Load1
Load2
Add1 noMult1
Mult2
Store1
Store2
Register Status
Field F0 F2 F4 F6 F8 F10 12 ... F30
Qi
√
yes Regs[R1]-8Store
no
12
√√
M:4
√ √
no
√
√√
no
√
√
no
Mem[Regs[R1] - 8]
no
√ √