cosc 6385 computer architecture...

25
1 Edgar Gabriel COSC 6385 Computer Architecture - Tomasulo’s Algorithm Edgar Gabriel Spring 2012 COSC 6385 – Computer Architecture Edgar Gabriel Analyzing a short code-sequence DIV.D F0, F2, F4 ADD.D F6, F0, F8 S.D F6, 0(R1) SUB.D F8, F10, F14 MUL.D F6, F10, F8

Upload: others

Post on 23-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

1

Edgar Gabriel

COSC 6385

Computer Architecture

- Tomasulo’s Algorithm

Edgar Gabriel

Spring 2012

COSC 6385 – Computer Architecture

Edgar Gabriel

Analyzing a short code-sequence

DIV.D F0, F2, F4

ADD.D F6, F0, F8

S.D F6, 0(R1)

SUB.D F8, F10, F14

MUL.D F6, F10, F8

Page 2: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

2

COSC 6385 – Computer Architecture

Edgar Gabriel

Analyzing a short code-sequence

• 3 True data dependencies

DIV.D F0, F2, F4

ADD.D F6, F0, F8

S.D F6, 0(R1)

SUB.D F8, F10, F14

MUL.D F6, F10, F8

COSC 6385 – Computer Architecture

Edgar Gabriel

Analyzing a short code-sequence

DIV.D F0, F2, F4

ADD.D F6, F0, F8

S.D F6, 0(R1)

SUB.D F8, F10, F14

MUL.D F6, F10, F8

• 3 True data dependencies

Page 3: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

3

COSC 6385 – Computer Architecture

Edgar Gabriel

Analyzing a short code-sequence

DIV.D F0, F2, F4

ADD.D F6, F0, F8

S.D F6, 0(R1)

SUB.D F8, F10, F14

MUL.D F6, F10, F8

• 3 True data dependencies

COSC 6385 – Computer Architecture

Edgar Gabriel

Analyzing a short code-sequence

DIV.D F0, F2, F4

ADD.D F6, F0, F8

S.D F6, 0(R1)

SUB.D F8, F10, F14

MUL.D F6, F10, F8

• Anti-dependencies (WAR hazards)

Page 4: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

4

COSC 6385 – Computer Architecture

Edgar Gabriel

Analyzing a short code-sequence

DIV.D F0, F2, F4

ADD.D F6, F0, F8

S.D F6, 0(R1)

SUB.D F8, F10, F14

MUL.D F6, F10, F8

• Output dependency (WAW

hazard)

COSC 6385 – Computer Architecture

Edgar Gabriel

Analyzing a short code-sequence

DIV.D F0,F2, F4

ADD.D S, F0, F8

S.D S, 0(R1)

SUB.D T, F10, F14

MUL.D F6,F10, T

• Renaming some registers can

remove the WAR and WAW

hazards

– Any subsequent use of F8 must

be replaced by T

Page 5: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

5

COSC 6385 – Computer Architecture

Edgar Gabriel

Tomasulo’s Algorithm

• Register renaming is provided by reservation stations

– Buffer the operands of instructions waiting to being

issued

– Fetches an operand as soon as available

– Eliminates the need to get an operand from register

– Pending instructions designate the reservation station

providing the input

• For overlapping successive writes: only the last one will

be executed

COSC 6385 – Computer Architecture

Edgar Gabriel

Tomasulo’s Algorithm

• Typically more reservation stations than registers

• Hazard detection is distributed (instead of centralized

as in the Scoreboard)

• Results are passed directly from reservation stations to

functional units using a common data bus (CDB)

• Each reservation station holds the opcode for the

pending instruction and either operand values or names

of reservation stations that will provide them

• Load and store buffers hold data and addresses for

memory access

Page 6: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

6

COSC 6385 – Computer Architecture

Edgar Gabriel

FP registersInstruction

queue

Address unit

Memory unitFP adders FP multipliers

4

3

2

1

4

3

2

1

Frominstruction

unit

Reservationstations

Store buffers Load

buffers

DataAddress

Common data bus

LOAD-STOREOPERATIONS

FPOPERATIONS

COSC 6385 – Computer Architecture

Edgar Gabriel

Tomasulo’s Algorithm

• Load store buffers:

– Hold components of effective address

– Hold destination memory address ( = effective address)

– Hold value

Page 7: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

7

COSC 6385 – Computer Architecture

Edgar Gabriel

Tomasulo’s Algorithm

• Only three steps per instruction – each step can take an

arbitrary number of cycles

– Issue:

• get next instruction from FIFO instruction queue

• Search matching empty reservation station

– If found: issue instruction with operand values

– If not found: structural hazard-> instruction stalls

– If operands not in register: keep track of functional

units producing operands

COSC 6385 – Computer Architecture

Edgar Gabriel

Tomasulo’s Algorithm

– Execute:

• If operands not available: monitor common data bus

• When all operands available: execute

– Write result:

• Write data on CDB and from there into registers

Page 8: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

8

COSC 6385 – Computer Architecture

Edgar Gabriel

Data fields for reservation stations

• Op:operation to perform on source operands S1 and S2

• Qj, Qk: reservation stations producing the operands

• Vj, Vk: value for each operand

• A: holds information for memory address calculation

(immediate field, effective address)

• Busy: indicates occupied functional units/reservation

stations

• Qi: number of the reservation station who will produce

the data to be stored in this register

COSC 6385 – Computer Architecture

Edgar Gabriel

The same example as for scoreboarding

L.D F6, 34(R2)

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Following slides are based on a lecture by Jelena Mirkovic,

University of Delawarehttp://www.cis.udel.edu/~sunshine/courses/F04/CIS662/class12.pdf

Assumption:

ADD and SUB take 2 clock cycles

MULT takes 10 clock cycle

DIV takes 40 clock cycles

2 Load/Store, 3 ADD and 2 Mult functional units/reservation stations

Page 9: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

9

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) �

L.D F2, 45(R3)

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1 Yes Load Regs[R2] 34

Load2

Add1

Add2

Add3

Mult1

Mult2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Load1

Time=1 Issue first load

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � �

L.D F2, 45(R3) �

MUL.D F0, F2, F4

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1 Yes Load Regs[R2] +34

Load2 Yes Load Regs[R3] 45

Add1

Add2

Add3

Mult1

Mult2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Load2 Load1

Time=2 First load calc. address. Second load issued

Page 10: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

10

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � �

L.D F2, 45(R3) � �

MUL.D F0, F2, F4 �

SUB.D F8, F6, F2

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1 Yes Load Regs[R2]+34

Load2 Yes Load Regs[R3] +45

Add1

Add2

Add3

Mult1 Yes Mult Regs[F4] Load2

Mult2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult1 Load2 Load1

Time=3 First load read from mem. Second load calc address. Mult is issued

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � � �

L.D F2, 45(R3) � �

MUL.D F0, F2, F4 �

SUB.D F8, F6, F2 �

DIV.D F10, F0, F6

ADD.D F6, F8, F2

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2 Yes Load Regs[R3]+45

Add1 Yes Sub Mem[34+Regs[R2]] Load2

Add2

Add3

Mult1 Yes Mult Regs[F4] Load2

Mult2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult1 Load2 Add1

Time=4 First load write res. Second load read mem. Mult stalled, Sub issued

Page 11: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

11

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � � �

L.D F2, 45(R3) � � �

MUL.D F0, F2, F4 �

SUB.D F8, F6, F2 �

DIV.D F10, F0, F6 �

ADD.D F6, F8, F2

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 Yes Sub Mem[34+Regs[R2]] Mem[45+Regs[R3]]

Add2

Add3

Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]

Mult2 Yes Div Mem[34+Regs[R2]] Mult1

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult1 Add1 Mult2

Time=5 Second load write res. Mult stalled, Sub stalled, Div. issued

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � � �

L.D F2, 45(R3) � � �

MUL.D F0, F2, F4 � �

SUB.D F8, F6, F2 � �

DIV.D F10, F0, F6 �

ADD.D F6, F8, F2 �

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 Yes Sub Mem[34+Regs[R2]] Mem[45+Regs[R3]]

Add2 Yes Add Mem[45+Regs[R3]] Add1

Add3

Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]

Mult2 Yes Div Mem[34+Regs[R2]] Mult1

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult1 Add2 Add1 Mult2

Time=6 Mult executes (1/10), Sub executes (1/2), Div. stalled, Add issued

Page 12: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

12

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � � �

L.D F2, 45(R3) � � �

MUL.D F0, F2, F4 � �

SUB.D F8, F6, F2 � �

DIV.D F10, F0, F6 �

ADD.D F6, F8, F2 �

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1 Yes Sub Mem[34+Regs[R2]] Mem[45+Regs[R3]]

Add2 Yes Add Mem[45+Regs[R3]] Add1

Add3

Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]

Mult2 Yes Div Mem[34+Regs[R2]] Mult1

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult1 Add2 Add1 Mult2

Time=7 Mult executes (2/10), Sub executes (2/2), Div. stalled, Add stalled

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � � �

L.D F2, 45(R3) � � �

MUL.D F0, F2, F4 � �

SUB.D F8, F6, F2 � � �

DIV.D F10, F0, F6 �

ADD.D F6, F8, F2 �

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1

Add2 Yes Add Mem[34+Regs[R2]]-

Mem[45+Regs[R3]]

Mem[45+Regs[R3]]

Add3

Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]

Mult2 Yes Div Mem[34+Regs[R2]] Mult1

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult1 Add2 Add1 Mult2

Time=8 Mult executes (3/10), Sub writes res., Div. stalled, Add stalled

Page 13: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

13

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � � �

L.D F2, 45(R3) � � �

MUL.D F0, F2, F4 � �

SUB.D F8, F6, F2 � � �

DIV.D F10, F0, F6 �

ADD.D F6, F8, F2 � �

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1

Add2 Yes Add Mem[34+Regs[R2]]-

Mem[45+Regs[R3]]

Mem[45+Regs[R3]]

Add3

Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]

Mult2 Yes Div Mem[34+Regs[R2]] Mult1

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult1 Add2 Mult2

Time=9 Mult executes (4/10), Div. stalled, Add executes (1/2)

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � � �

L.D F2, 45(R3) � � �

MUL.D F0, F2, F4 � �

SUB.D F8, F6, F2 � � �

DIV.D F10, F0, F6 �

ADD.D F6, F8, F2 � �

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1

Add2 Yes Add Mem[34+Regs[R2]]-

Mem[45+Regs[R3]]

Mem[45+Regs[R3]]

Add3

Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]

Mult2 Yes Div Mem[34+Regs[R2]] Mult1

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult1 Add2 Mult2

Time=10 Mult executes (5/10), Div. stalled, Add executes (2/2)

Page 14: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

14

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � � �

L.D F2, 45(R3) � � �

MUL.D F0, F2, F4 � �

SUB.D F8, F6, F2 � � �

DIV.D F10, F0, F6 �

ADD.D F6, F8, F2 � � �

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1

Add2

Add3

Mult1 Yes Mult Mem[45+Regs[R3]] Regs[F4]

Mult2 Yes Div Mem[34+Regs[R2]] Mult1

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult1 Mult2

Time=11 Mult executes (6/10), Div. stalled, Add writes result

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � � �

L.D F2, 45(R3) � � �

MUL.D F0, F2, F4 � � �

SUB.D F8, F6, F2 � � �

DIV.D F10, F0, F6 �

ADD.D F6, F8, F2 � � �

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1

Add2

Add3

Mult1

Mult2 Yes Div Mem[45+Regs[R3]] *

Regs[F4]

Mem[34+Regs[R2]]

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult2

Time=16 Mult writes result, Div. stalled

Page 15: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

15

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � � �

L.D F2, 45(R3) � � �

MUL.D F0, F2, F4 � � �

SUB.D F8, F6, F2 � � �

DIV.D F10, F0, F6 � �

ADD.D F6, F8, F2 � � �

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1

Add2

Add3

Mult1

Mult2 Yes Div Mem[45+Regs[R3]] *

Regs[F4]

Mem[34+Regs[R2]]

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult2

Time=17 Div. Executed (1/40)

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute Write result

L.D F6, 34(R2) � � �

L.D F2, 45(R3) � � �

MUL.D F0, F2, F4 � � �

SUB.D F8, F6, F2 � � �

DIV.D F10, F0, F6 � �

ADD.D F6, F8, F2 � � �

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Add1

Add2

Add3

Mult1

Mult2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi

Time=57 Div. Writes result

Page 16: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

16

COSC 6385 – Computer Architecture

Edgar Gabriel

Some remarks

• To preserve exception behavior, no instruction is

allowed to initiate execution until all branches

preceding the instruction have completed

• Load and store can be executed in different order if

they access different addresses

– Not easy to verify, since 100(R3) can point to the same

effective address as 0(R5)!

-> A load must wait for any uncompleted stores to the same

effective memory address

-> A store must wait until there are no unexecuted

loads/stores to the same memory address

COSC 6385 – Computer Architecture

Edgar Gabriel

Some remarks (II)

• Effective memory address calculation has to be executed in order

• For a load operation:

– Calculate effective memory address

– Check for conflicts with all active (=pending) store buffers

– If conflict: load stalls

• Bypassing memory and taking data from the store buffer directly to the load buffer often done

– Else: execute load

• For a store operation:

– Similarly checking for conflicts with both active load and store buffers

Page 17: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

17

COSC 6385 – Computer Architecture

Edgar Gabriel

A loop based example

Loop: LD F0, 0(R1)

MULTD F4, F0, F2

SD F4, 0(R1)

SUBI R1, R1,#8

BNEZ R1, Loop

• This time assume Multiply takes 4 clocks

• Assume 1st load takes 8 clocks total (1 effective address + 7 mem. Access)(L1 cache miss), 2nd load takes 1 clock (hit)

• To be clear, will show clocks for SUBI, BNEZ

– Reality: integer instructions ahead of Fl. Pt. Instructions

• Show 2 iterations

Slide based on a lecture by David A. Patterson,

University of California, Berkley

http://www.cs.berkeley.edu/~pattrsn/252S01

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1

MUL.D F4, F0, F2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1 Yes Load Regs[R1] 0

Load2

Store1

Store2

Add1

Mult1

Mult2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Load1

Time=1 Issue first load

Page 18: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

18

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1

MUL.D F4, F0, F2 2

S.D F4, 0(R1)

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1 Yes Load Regs[R1] +0

Load2

Store1

Store2

Add1

Mult1 Yes Mult Regs[F2] Load1

Mult2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Load1 Mult1

Time=2 first load effective address calc., Issue mult

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1

MUL.D F4, F0, F2 2

S.D F4, 0(R1) 3

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1 Yes Load Regs[R1]+0

Load2

Store1 Yes Store Regs[R1] Mult1 0

Store2

Add1

Mult1 Yes Mult Regs[F2] Load1

Mult2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Load1 Mult1

Time=3 first load mem. access(1/7), mult stalled, Issue store

Page 19: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

19

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1

MUL.D F4, F0, F2 2

S.D F4, 0(R1) 3

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1 Yes Load Regs[R1]+0

Load2

Store1 Yes Store Regs[R1] Mult1 +0

Store2

Add1

Mult1 Yes Mult Regs[F2] Load1

Mult2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Load1 Mult1

Time=4 first load ex (2/7)., mult stall, store eff. addr, Calc SUBI (not shown)

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1

MUL.D F4, F0, F2 2

S.D F4, 0(R1) 3

L.D F0, 0(R1)

MUL.D F4, F0, F2

S.D F4, 0(R1)

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1 Yes Load Regs[R1]+0

Load2

Store1 Yes Store Mult1 Regs[R1] +0

Store2

Add1

Mult1 Yes Mult Regs[F2] Load1

Mult2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Load1 Mult1

Time=5 first load exec (3/7)., mult stall, store stall, BNEZ (not shown)

Page 20: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

20

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1

MUL.D F4, F0, F2 2

S.D F4, 0(R1) 3

L.D F0, 0(R1) 6

MUL.D F4, F0, F2

S.D F4, 0(R1)

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1 Yes Load Regs[R1]+0

Load2 Yes Load Regs[R1] 0

Store1 Yes Store Mult1 Regs[R1]+0

Store2

Add1

Mult1 Yes Mult Regs[F2] Load1

Mult2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Load2 Mult1

Time=6 first load exec (4/7)., mult stall, store stall, issue load

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1

MUL.D F4, F0, F2 2

S.D F4, 0(R1) 3

L.D F0, 0(R1) 6

MUL.D F4, F0, F2 7

S.D F4, 0(R1)

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1 Yes Load Regs[R1]+0

Load2 Yes Load Regs[R1] +0

Store1 Yes Store Mult1 Regs[R1]+0

Store2

Add1

Mult1 Yes Mult Regs[F2] Load1

Mult2 Yes Mult Regs[F2] Load2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Load2 Mult2

Time=7 first load ex (5/7)., mult stall, store stall, load2 eff. Add., issue mult2

Page 21: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

21

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1

MUL.D F4, F0, F2 2

S.D F4, 0(R1) 3

L.D F0, 0(R1) 6

MUL.D F4, F0, F2 7

S.D F4, 0(R1) 8

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1 Yes Load Regs[R1]+0

Load2 Yes Load Regs[R1]+0

Store1 Yes Store Mult1 Regs[R1]+0

Store2 Yes Store Regs[R1] Mult2 0

Add1

Mult1 Yes Mult Regs[F2] Load1

Mult2 Yes Mult Regs[F2] Load2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Load2 Mult2

Time=8 first load ex (6/7)., mult, store, mult2 stall, load2 ex., issue store2

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1 9

MUL.D F4, F0, F2 2

S.D F4, 0(R1) 3

L.D F0, 0(R1) 6

MUL.D F4, F0, F2 7

S.D F4, 0(R1) 8

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1 Yes Load Regs[R1]+0

Load2 Yes Load Regs[R1]+0

Store1 Yes Store Mult1 Regs[R1]+0

Store2 Yes Store Regs[R1] Mult2 +0

Add1

Mult1 Yes Mult Regs[F2] Load1

Mult2 Yes Mult Regs[F2] Load2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Load2 Mult2

Time=9 first load exec (7/7)., mult, store, mult2 stall, load2 exec., store2

Page 22: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

22

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1 9 10

MUL.D F4, F0, F2 2

S.D F4, 0(R1) 3

L.D F0, 0(R1) 6 10

MUL.D F4, F0, F2 7

S.D F4, 0(R1) 8

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2 Yes Load Regs[R1]+0

Store1 Yes Store Mult1 Regs[R1]+0

Store2 Yes Store Mult2 Regs[R1]+0

Add1

Mult1 Yes Mult Mem[Load1] Regs[F2] Load1

Mult2 Yes Mult Regs[F2] Load2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Load2 Mult2

Time=10 first load write res. mult, store, mult2 stall, load2 finish, store2 stal

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1 9 10

MUL.D F4, F0, F2 2

S.D F4, 0(R1) 3

L.D F0, 0(R1) 6 10 11

MUL.D F4, F0, F2 7

S.D F4, 0(R1) 8

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Store1 Yes Store Mult1 Regs[R1]+0

Store2 Yes Store Mult2 Regs[R1]+0

Add1

Mult1 Yes Mult Mem[Load1] Regs[F2] Load1

Mult2 Yes Mult Mem[Load2] Regs[F2] Load2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult2

Time=11 Load 2 write res, Mult1 (1/4), mult2, store1, store2 stalled

Page 23: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

23

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1 9 10

MUL.D F4, F0, F2 2 14

S.D F4, 0(R1) 3

L.D F0, 0(R1) 6 10 11

MUL.D F4, F0, F2 7

S.D F4, 0(R1) 8

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Store1 Yes Store Mult1 Regs[R1]+0

Store2 Yes Store Mult2 Regs[R1]+0

Add1

Mult1 Yes Mult Mem[Load1] Regs[F2] Load1

Mult2 Yes Mult Mem[Load2] Regs[F2] Load2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult2

Time=14 Mult1 (4/4), Mult2 (3/4), store1, store2 stalled

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1 9 10

MUL.D F4, F0, F2 2 14 15

S.D F4, 0(R1) 3

L.D F0, 0(R1) 6 10 11

MUL.D F4, F0, F2 7 15

S.D F4, 0(R1) 8

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Store1 Yes Store Mult1 Regs[R1]+0

Store2 Yes Store Mult2 Regs[R1]+0

Add1

Mult1

Mult2 Yes Mult Mem[Load2] Regs[F2] Load2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi Mult2

Time=15 Mult1 write res., Mult2 (4/4), store1 exec, store2 stalled

Page 24: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

24

COSC 6385 – Computer Architecture

Edgar Gabriel

Instruction status

Instruction Issue Execute done Write result done

L.D F0, 0(R1) 1 9 10

MUL.D F4, F0, F2 2 14 15

S.D F4, 0(R1) 3

L.D F0, 0(R1) 6 10 11

MUL.D F4, F0, F2 7 15 16

S.D F4, 0(R1) 8

Reservation station

Name Busy Op Vj Vk Qj Qk A

Load1

Load2

Store1 Yes Store Mult1 Regs[R1]+0

Store2 Yes Store Mult2 Regs[R1]+0

Add1

Mult1

Mult2

Register result status

F0 F2 F4 F6 F8 F10 F12 / F30

Qi

Time=16 store1, store2 exec

COSC 6385 – Computer Architecture

Edgar Gabriel

Tomasulo’s Algorithm

• Please note:

– F0 never sees data from the first load

– Register File completely detached from computation

– First and Second iteration overlap completely

– Assuming two Mult units, we could not have issued a third

mult operation for the next iteration of the loop

-> no third store instruction could be issued

• In order issue, out-of-order execution, out-of-order

completion

Slide based on a lecture by David A. Patterson,

University of California, Berkley

http://www.cs.berkeley.edu/~pattrsn/252S01

Page 25: COSC 6385 Computer Architecture -Tomasulo’sAlgorithmgabriel/courses/cosc6385_s12/CA_09_Tomas... · 2018-06-18 · 4 COSC 6385 –Computer Architecture Edgar Gabriel Analyzing a

25

COSC 6385 – Computer Architecture

Edgar Gabriel

Why can Tomasulo overlap

iterations of loops?• Register renaming

– Multiple iterations use different physical destinations for registers (dynamic loop unrolling).

• Reservation stations

– Permit instruction issue to advance past integer control flow operations

– Also buffer old values of registers - totally avoiding the WAR stall that we saw in the scoreboard.

• Other perspective: Tomasulo building data flow dependency graph on the fly.

Slide based on a lecture by David A. Patterson,

University of California, Berkley

http://www.cs.berkeley.edu/~pattrsn/252S01