hardware based speculation - bt.nitk.ac.in€¦ · hardware based speculation execute instructions...

24
Hardware based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register file when instruction is no longer speculative Need an additional piece of hardware to prevent any irrevocable action until an instruction commits Reorder Buffer In-order commit Stores instruction results before instruction commits Clear ROB on misprediction Exceptions

Upload: others

Post on 06-Apr-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Hardware based Speculation● Execute instructions along predicted execution paths but

only commit the results if prediction was correct

● Instruction commit: allowing an instruction to update the register file when instruction is no longer speculative

● Need an additional piece of hardware to prevent any irrevocable action until an instruction commits

– Reorder Buffer● In-order commit● Stores instruction results before instruction commits● Clear ROB on misprediction● Exceptions

Tomasulo's Algorithm with Speculation

ROB – Loop Based Example

ROB

Entry Busy Instruction State Destination Value

1

2

3

4

5

6

7

8

9

10

no

yes

no

yes

yes

yes

yes

yes

yes

yes

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

DADDIU R1, R1, #-8

BNE R1, R2, LOOP

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

DADDIU R1, R1, #-8

BNE R1, R2, LOOP

Commit

Commit

Write Result

Write Result

Write Result

Write Result

Write Result

Write Result

Write Result

Write Result

F0

F4

0+Regs[R1]

R1

F0

F4

0+Regs[R1]

R1

Mem[0+Regs[R1]]

#1 * Regs[F2]

#2

Regs[R1]-8

Mem[#4]

#6 * Regs[F2]

#7

#4 - #8

Multiple Issue and Static Scheduling

To achieve CPI < 1, need to complete multiple instructions per clock

● Statically scheduled superscalar processors● VLIW (Very Long Instruction Word) processors● Dynamically scheduled superscalar processors

Multiple Issue Processors

Dynamic Scheduling + Multiple Issue + Speculation

Limit the number of instructions of a given class that can be issued in a “bundle”I.e. on FP, one integer, one load, one store

Examine all the dependencies among the instructions in the bundle

Also need multiple completion/commit

Dynamic Scheduling + Multiple Issue

Instructions Issues at clock

Executes at clock

Mem Access at clock

Write CDB at clock

1 LD R2, 0(R1)

1 DADDIU R2, R2, #1

1 SD R2, 0(R1)

1 DADDIU R1, R1, #8

1 BNE R2, R3, L

2 LD R2, 0(R1)

2 DADDIU R2, R2, #1

2 SD R2, 0(R1)

2 DADDIU R1, R1, #8

2 BNE R2, R3, L

3 LD R2, 0(R1)

3 DADDIU R2, R2, #1

3 SD R2, 0(R1)

3 DADDIU R1, R1, #8

3 BNE R2, R3, L

1

1

2

2

3

4

4

5

5

6

7

7

8

8

9

2 3

3

3

4

4

5

7

8

11

9

8

13

14

17

15

14

19

7

9

13

15

19

15

18

16

9

12

10

6

2-way Superscalar

Instructions Issues at clock

Executes at clock

Mem Access at clock

Write CDB at clock

Commits at clock

1 LD R2, 0(R1)

1 DADDIU R2, R2, #1

1 SD R2, 0(R1)

1 DADDIU R1, R1, #8

1 BNE R2, R3, L

2 LD R2, 0(R1)

2 DADDIU R2, R2, #1

2 SD R2, 0(R1)

2 DADDIU R1, R1, #8

2 BNE R2, R3, L

3 LD R2, 0(R1)

3 DADDIU R2, R2, #1

3 SD R2, 0(R1)

3 DADDIU R1, R1, #8

3 BNE R2, R3, L

1

1

2

2

3

4

4

5

5

6

7

7

8

8

9

2 3

3

3

4

4

5

7

5

8

6

6

10

8

11

9

9

13

7

6

10

9

13

10

12

10

7

9

7

6

Dynamic Scheduling + Multiple Issue + Speculation

5

7

8

8

9

10

11

11

12

13

14

14

2-way Superscalar

Literature on Processors● Efficient Reading of Papers in Science and Technolo

gy● Yeager, The MIPS R10000 Processor, MICRO,

1996.● Hinton et. al., The Microarchitecture of the Pentium 4

Processor. Intel Technology Journal Q1, 2001.● Smith and Sohi. Microarchitecture of Superscalar

Processors. Proc. of IEEE. 1995.● Kahle, et. al. Introduction to the Cell multiprocessor.

IBM J. RES. & DEV. 2005. ● Hammerlund, et. al., Haswell: The fourth generation

Intel Processor, MICRO 2014.

Extra

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Load1 Mult2

yes

yes

yesno

Load

MUL

MUL

Regs[F2]

Regs[F2]

Load1

Regs[R1] - 8

yes Load Regs[R1] + 0no

no

no

no

noyes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8

Load2

Store

Load2 Mult1

1

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

yes

Load

MUL

MUL

Regs[F2]

Regs[F2]

Load1

Regs[R1] - 8

yes Load Regs[R1] + 0

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8

Load2

Store

Load2

2

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

yes

Load

MUL

MUL

Regs[F2]

Regs[F2]

Load1

Regs[R1] - 8

yes Load Regs[R1] + 0

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8

Load2

Store

Load2

no

Mem[Regs[R1] + 0]

3

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

yes

Load

MUL

MUL

Regs[F2]

Regs[F2]

Regs[R1] - 8

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8

Load2

Store

Load2

no

Mem[Regs[R1] + 0]

4

√√

M:4

√ √

no

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

MUL

MUL

Regs[F2]

Regs[F2]

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8

Load2

Store

no

Mem[Regs[R1] + 0]

5

√√

M:4

√ √

no

Mem[Regs[R1] - 8]

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

MUL

MUL

Regs[F2]

Regs[F2]

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8Store

no

Mem[Regs[R1] + 0]

6

√√

M:4

√ √

no

Mem[Regs[R1] - 8]

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

MUL

MUL

Regs[F2]

Regs[F2]

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8Store

no

Mem[Regs[R1] + 0]

7

√√

M:4

√ √

no

Mem[Regs[R1] - 8]

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes

yes

MUL

MUL

Regs[F2]

Regs[F2]

yes

yes Store Mult1

Mult2

Regs[R1]+0

Regs[R1]-8Store

no

Mem[Regs[R1] + 0]

8

√√

M:4

√ √

no

Mem[Regs[R1] - 8]

Mul[F4]

no

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes MUL Regs[F2]

yes

yes Store

Mult2

Regs[R1]+0

Regs[R1]-8Store

no

9

√√

M:4

√ √

no

Mem[Regs[R1] - 8]

Mul[F4]

no

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi Mult2

yes MUL Regs[F2]

yes

yes Store

Mult2

Regs[R1]+0

Regs[R1]-8Store

no

10

√√

M:4

√ √

no

Mem[Regs[R1] - 8]

Mul[F4]

no

no

Mem[Regs[R1] - 8]

no

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi

yes Regs[R1]-8Store

no

11

√√

M:4

√ √

no

√√

no

no

Mem[Regs[R1] - 8]

no

Tomasulo's - Loop based ExampleInstruction Status

Instruction Issue Execute Write result

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

√√√√√

Reservation Stations

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 noMult1

Mult2

Store1

Store2

Register Status

Field F0 F2 F4 F6 F8 F10 12 ... F30

Qi

yes Regs[R1]-8Store

no

12

√√

M:4

√ √

no

√√

no

no

Mem[Regs[R1] - 8]

no

√ √

VLIW Example

● Performance?● Overhead?