john kubiatowicz electrical engineering and computer sciences university of california, berkeley

59
CS252 Graduate Computer Architecture Lecture 7 ILP 1: Loop-Level parallelism extraction, Data Flow, Explicit Register Renaming. John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252 http://www-inst.eecs.berkeley.edu/~cs252

Upload: ellard

Post on 06-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

CS252 Graduate Computer Architecture Lecture 7 ILP 1: Loop-Level parallelism extraction, Data Flow, Explicit Register Renaming. John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

CS252Graduate Computer Architecture

Lecture 7

ILP 1:Loop-Level parallelism extraction,

Data Flow, Explicit Register Renaming.John Kubiatowicz

Electrical Engineering and Computer SciencesUniversity of California, Berkeley

http://www.eecs.berkeley.edu/~kubitron/cs252http://www-inst.eecs.berkeley.edu/~cs252

Page 2: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 2

Review: Hardware techniques for out-of-order execution

• HW exploitation of ILP– Works when can’t know dependence at compile time.– Code for one machine runs well on another

• Scoreboard (ala CDC 6600 in 1963)– Centralized control structure– No register renaming, no forwarding– Pipeline stalls for WAR and WAW hazards.– Are these fundamental limitations??? (No)

• Reservation stations (ala IBM 360/91 in 1966)– Distributed control structures– Implicit renaming of registers (dispatched pointers)– WAR and WAW hazards eliminated by register renaming– Results broadcast to all reservation stations for RAW

Page 3: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 3

Review: Scoreboard Architecture(CDC 6600)

Func

tion

al U

nits

Regi

ster

s

FP MultFP Mult

FP Divide

FP Add

Integer

MemorySCOREBOARD

Page 4: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 4

Review: Four Stages of Scoreboard Control• Issue—decode instructions & check for structural

hazards (ID1)– Instructions issued in program order (for hazard checking)– Don’t issue if structural hazard– Don’t issue if instruction is output dependent on any previously

issued but uncompleted instruction (no WAW hazards) • Read operands—wait until no data hazards, then read

ops (ID2)– All real dependencies (RAW hazards) resolved in this stage,

since we wait for instructions to write back data.– No forwarding of data in this model!

• Execution—operate on operands (EX)– The functional unit begins execution upon receiving operands.

When the result is ready, it notifies the scoreboard that it has completed execution.

• Write result—finish execution (WB)– Stall until no WAR hazards with previous instructions:

Page 5: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 5

Review: Tomasulo Organization

FP adders

Add1Add2Add3

FP multipliers

Mult1Mult2

From Mem FP Registers

Reservation Stations

Common Data Bus (CDB)

To Mem

FP OpQueue

Load Buffers

Store Buffers

Load1Load2Load3Load4Load5Load6

Page 6: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 6

Review: Three Stages of Tomasulo Algorithm1. Issue—get instruction from FP Op Queue

If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).

2.Execution—operate on operands (EX) When both operands ready then execute;

if not ready, watch Common Data Bus for result

3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting units;

mark reservation station available

• Normal data bus: data + destination (“go to” bus)• Common data bus: data + source (“come from” bus)

– 64 bits of data + 4 bits of Functional Unit source address– Write if matches expected Functional Unit (produces result)– Does the broadcast

Page 7: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 7

Review: Comparison Cycle 62

Instruction status: Read Exec Write Exec WriteInstruction j k Issue Oper Comp Result Issue ComplResultLD F6 34+ R2 1 2 3 4 1 3 4LD F2 45+ R3 5 6 7 8 2 4 5MULTD F0 F2 F4 6 9 19 20 3 15 16SUBD F8 F6 F2 7 9 11 12 4 7 8DIVD F10 F0 F6 8 21 61 62 5 56 57ADDD F6 F8 F2 13 14 16 22 6 10 11

• Why take longer on scoreboard/6600?•Structural Hazards•Lack of forwarding

Page 8: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 8

Tomasulo Loop ExampleLoop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 SUBI R1 R1 #8 BNEZ R1 Loop

• Assume Multiply takes 4 clocks• Assume first load takes 8 clocks (cache miss), second

load takes 1 clock (hit)• To be clear, will show clocks for SUBI, BNEZ• Reality: integer instructions ahead

Page 9: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 9

Loop ExampleInstruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 Load1 No1 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

0 80 Fu

Page 10: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 10

Loop Example Cycle 1Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

1 80 Fu Load1

Page 11: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 11

Loop Example Cycle 2Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 Load3 No2 LD F0 0 R1 Store1 No2 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

2 80 Fu Load1 Mult1

Page 12: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 12

Loop Example Cycle 3Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

3 80 Fu Load1 Mult1

• Implicit renaming sets up “DataFlow” graph

Page 13: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 13

Loop Example Cycle 4Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

4 80 Fu Load1 Mult1

• Dispatching SUBI Instruction

Page 14: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 14

Loop Example Cycle 5Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

5 72 Fu Load1 Mult1

• And, BNEZ instruction

Page 15: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 15

Loop Example Cycle 6Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F4) Load1 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

6 72 Fu Load2 Mult1

• Notice that F0 never sees Load from location 80

Page 16: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 16

Loop Example Cycle 7Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 No2 SD F4 0 R1 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

7 72 Fu Load2 Mult2• Register file completely detached from computation• First and Second iteration completely overlapped

Page 17: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 17

Loop Example Cycle 8Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

8 72 Fu Load2 Mult2

Page 18: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 18

Loop Example Cycle 9Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 Load1 Yes 801 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load1 SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

9 72 Fu Load2 Mult2• Load1 completing: who is waiting?• Note: Dispatching SUBI

Page 19: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 19

Loop Example Cycle 10Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 Yes 721 SD F4 0 R1 3 Load3 No2 LD F0 0 R1 6 10 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

4 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #8Mult2 Yes Multd R(F2) Load2 BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

10 64 Fu Load2 Mult2• Load2 completing: who is waiting?• Note: Dispatching BNEZ

Page 20: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 20

Loop Example Cycle 11Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

3 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #84 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

11 64 Fu Load3 Mult2

• Next load in sequence

Page 21: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 21

Loop Example Cycle 12Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

2 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #83 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

12 64 Fu Load3 Mult2

• Why not issue third multiply?

Page 22: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 22

Loop Example Cycle 13Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

1 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #82 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

13 64 Fu Load3 Mult2

Page 23: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 23

Loop Example Cycle 14Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 Mult12 MULTD F4 F0 F2 7 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1

0 Mult1 Yes Multd M[80] R(F2) SUBI R1 R1 #81 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

14 64 Fu Load3 Mult2

• Mult1 completing. Who is waiting?

Page 24: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 24

Loop Example Cycle 15Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 Store2 Yes 72 Mult22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 No SUBI R1 R1 #8

0 Mult2 Yes Multd M[72] R(F2) BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

15 64 Fu Load3 Mult2

• Mult2 completing. Who is waiting?

Page 25: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 25

Loop Example Cycle 16Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 No

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

16 64 Fu Load3 Mult1

Page 26: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 26

Loop Example Cycle 17Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

17 64 Fu Load3 Mult1

Page 27: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 27

Loop Example Cycle 18Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 Yes 80 [80]*R22 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 Store3 Yes 64 Mult1

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

18 64 Fu Load3 Mult1

Page 28: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 28

Loop Example Cycle 19Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 Yes 72 [72]*R22 SD F4 0 R1 8 19 Store3 Yes 64 Mult1

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

19 64 Fu Load3 Mult1

Page 29: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 29

Loop Example Cycle 20Instruction status: Exec Write

ITER Instruction j k Issue CompResult Busy Addr Fu1 LD F0 0 R1 1 9 10 Load1 No1 MULTD F4 F0 F2 2 14 15 Load2 No1 SD F4 0 R1 3 18 19 Load3 Yes 642 LD F0 0 R1 6 10 11 Store1 No2 MULTD F4 F0 F2 7 15 16 Store2 No2 SD F4 0 R1 8 19 20 Store3 Yes 64 Mult1

Reservation Stations: S1 S2 RS Time Name Busy Op Vj Vk Qj Qk Code:

Add1 No LD F0 0 R1Add2 No MULTD F4 F0 F2Add3 No SD F4 0 R1Mult1 Yes Multd R(F2) Load3 SUBI R1 R1 #8Mult2 No BNEZ R1 Loop

Register result statusClock R1 F0 F2 F4 F6 F8 F10 F12 ... F30

20 64 Fu Load3 Mult1

Page 30: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 30

Why can Tomasulo overlap iterations of loops?

• Register renaming– Multiple iterations use different physical destinations for registers

(dynamic loop unrolling).

• Reservation stations – Permit instruction issue to advance past integer control flow operations

• Other idea: Tomasulo building dynamic “DataFlow” graph from instructions.

Page 31: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 31

• Basic Idea: Hardware respresents direct encoding of compiler dataflow graphs:

• Data flows along arcs in“Tokens”.

• When two tokens arrive atcompute box, box “fires” andproduces new token.

• Split operations produce copiesof tokens

Data-Flow Architectures

Input: a,b y:= (a+b)/x x:= (a*(a+b))+boutput: y,x

+

*/

A B

+X(0)

Y X

Page 32: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 32

Paper by Dennis and Misunas

Instruction Cell 0

Instruction Cell 1

Instruction Cell n-1

Memory

OperationUnit 0

OperationUnit m-1

Instruction Cell

Instruction

Operand 1

Operand 2 Ope

rati

onPa

cket

Dat

a Pa

cket

s

“Reservation Station?”

Page 33: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 33

Explicit Register Renaming• Make use of a physical register file that is larger than

number of registers specified by ISA• Keep a translation table:

– ISA register => physical register mapping– When register is written, replace table entry with new register from

freelist.– Physical register becomes free when not being used by any

instructions in progress.• Pipeline can be exactly like “standard” DLX pipeline

– IF, ID, EX, etc….• Advantages:

– Removes all WAR and WAW hazards– Like Tomasulo, good for allowing full out-of-order completion– Allows data to be fetched from a single register file– Makes speculative execution/precise interrupts easier:

» All that needs to be “undone” for precise break pointis to undo the table mappings

Page 34: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 34

Question: Can we use explicit register renaming with scoreboard?

RenameTable

Func

tion

al U

nits

Regi

ster

sFP MultFP Mult

FP Divide

FP Add

Integer

MemorySCOREBOARD

Page 35: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 35

Scoreboard ExampleInstruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 NoMult1 NoAdd NoDivide No

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F30

FU P0 P2 P4 P6 P8 P10 P12 P30

• Initialized Rename Table

Page 36: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 36

Renamed Scoreboard 1Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 Yes Load P32 R2 YesInt2 NoMult1 NoAdd NoDivide No

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F30

1 FU P0 P2 P4 P32 P8 P10 P12 P30

• Each instruction allocates free register • Similar to single-assignment compiler

transformation

Page 37: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 37

Renamed Scoreboard 2Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2LD F2 45+ R3 2MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 Yes Load P32 R2 YesInt2 Yes Load P34 R3 YesMult1 NoAdd NoDivide No

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F30

2 FU P0 P34 P4 P32 P8 P10 P12 P30

Page 38: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 38

Renamed Scoreboard 3Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3LD F2 45+ R3 2 3MULTD F0 F2 F4 3SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 Yes Load P32 R2 YesInt2 Yes Load P34 R3 YesMult1 Yes Multd P36 P34 P4 Int2 No YesAdd NoDivide No

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F30

3 FU P36 P34 P4 P32 P8 P10 P12 P30

Page 39: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 39

Renamed Scoreboard 4Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4MULTD F0 F2 F4 3SUBD F8 F6 F2 4DIVD F10 F0 F6ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 Yes Load P34 R3 YesMult1 Yes Multd P36 P34 P4 Int2 No YesAdd Yes Sub P38 P32 P34 Int2 Yes NoDivide No

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F30

4 FU P36 P34 P4 P32 P38 P10 P12 P30

Page 40: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 40

Renamed Scoreboard 5Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3SUBD F8 F6 F2 4DIVD F10 F0 F6 5ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 NoMult1 Yes Multd P36 P34 P4 Yes YesAdd Yes Sub P38 P32 P34 Yes YesDivide Yes Divd P40 P36 P32 Mult1 No Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F30

5 FU P36 P34 P4 P32 P38 P40 P12 P30

Page 41: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 41

Renamed Scoreboard 6Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6SUBD F8 F6 F2 4 6DIVD F10 F0 F6 5ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 No

10 Mult1 Yes Multd P36 P34 P4 Yes Yes2 Add Yes Sub P38 P32 P34 Yes Yes

Divide Yes Divd P40 P36 P32 Mult1 No Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F30

6 FU P36 P34 P4 P32 P38 P40 P12 P30

Page 42: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 42

Renamed Scoreboard 7Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6SUBD F8 F6 F2 4 6DIVD F10 F0 F6 5ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 No

9 Mult1 Yes Multd P36 P34 P4 Yes Yes1 Add Yes Sub P38 P32 P34 Yes Yes

Divide Yes Divd P40 P36 P32 Mult1 No Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F30

7 FU P36 P34 P4 P32 P38 P40 P12 P30

Page 43: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 43

Renamed Scoreboard 8Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6SUBD F8 F6 F2 4 6 8DIVD F10 F0 F6 5ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 No

8 Mult1 Yes Multd P36 P34 P4 Yes Yes0 Add Yes Sub P38 P32 P34 Yes Yes

Divide Yes Divd P40 P36 P32 Mult1 No Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F30

8 FU P36 P34 P4 P32 P38 P40 P12 P30

Page 44: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 44

Renamed Scoreboard 9Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6SUBD F8 F6 F2 4 6 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 No

7 Mult1 Yes Multd P36 P34 P4 Yes YesAdd NoDivide Yes Divd P40 P36 P32 Mult1 No Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F30

9 FU P36 P34 P4 P32 P38 P40 P12 P30

Page 45: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 45

Renamed Scoreboard 10Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6SUBD F8 F6 F2 4 6 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 10

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 No

6 Mult1 Yes Multd P36 P34 P4 Yes YesAdd Yes Addd P42 P38 P34 Yes YesDivide Yes Divd P40 P36 P32 Mult1 No Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F3010 FU P36 P34 P4 P42 P38 P40 P12 P30

WAR Hazard gone!

• Notice that P32 not listed in Rename Table– Still live. Must not be reallocated by

accident

Page 46: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 46

Renamed Scoreboard 11Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6SUBD F8 F6 F2 4 6 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 10 11

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 No

5 Mult1 Yes Multd P36 P34 P4 Yes Yes2 Add Yes Addd P42 P38 P34 Yes Yes

Divide Yes Divd P40 P36 P32 Mult1 No Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F3011 FU P36 P34 P4 P42 P38 P40 P12 P30

Page 47: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 47

Renamed Scoreboard 12Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6SUBD F8 F6 F2 4 6 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 10 11

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 No

4 Mult1 Yes Multd P36 P34 P4 Yes Yes1 Add Yes Addd P42 P38 P34 Yes Yes

Divide Yes Divd P40 P36 P32 Mult1 No Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F3012 FU P36 P34 P4 P42 P38 P40 P12 P30

Page 48: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 48

Renamed Scoreboard 13Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6SUBD F8 F6 F2 4 6 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 10 11 13

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 No

3 Mult1 Yes Multd P36 P34 P4 Yes Yes0 Add Yes Addd P42 P38 P34 Yes Yes

Divide Yes Divd P40 P36 P32 Mult1 No Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F3013 FU P36 P34 P4 P42 P38 P40 P12 P30

Page 49: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 49

Renamed Scoreboard 14Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6SUBD F8 F6 F2 4 6 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 10 11 13 14

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 No

2 Mult1 Yes Multd P36 P34 P4 Yes Yes Add No

Divide Yes Divd P40 P36 P32 Mult1 No Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F3014 FU P36 P34 P4 P42 P38 P40 P12 P30

Page 50: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 50

Renamed Scoreboard 15Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6SUBD F8 F6 F2 4 6 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 10 11 13 14

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 No

1 Mult1 Yes Multd P36 P34 P4 Yes Yes Add No

Divide Yes Divd P40 P36 P32 Mult1 No Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F3015 FU P36 P34 P4 P42 P38 P40 P12 P30

Page 51: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 51

Renamed Scoreboard 16Instruction status: Read Exec Write

Instruction j k Issue Oper CompResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6 16SUBD F8 F6 F2 4 6 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 10 11 13 14

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 No

0 Mult1 Yes Multd P36 P34 P4 Yes Yes Add No

Divide Yes Divd P40 P36 P32 Mult1 No Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F3016 FU P36 P34 P4 P42 P38 P40 P12 P30

Page 52: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 52

Renamed Scoreboard 17Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6 16 17SUBD F8 F6 F2 4 6 8 9DIVD F10 F0 F6 5ADDD F6 F8 F2 10 11 13 14

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 NoMult1 No

Add NoDivide Yes Divd P40 P36 P32 Mult1 Yes Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F3017 FU P36 P34 P4 P42 P38 P40 P12 P30

Page 53: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 53

Renamed Scoreboard 18Instruction status: Read Exec Write

Instruction j k Issue Oper Comp ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 2 3 4 5MULTD F0 F2 F4 3 6 16 17SUBD F8 F6 F2 4 6 8 9DIVD F10 F0 F6 5 18ADDD F6 F8 F2 10 11 13 14

Functional unit status: dest S1 S2 FU FU Fj? Fk?Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk

Int1 NoInt2 NoMult1 No

Add No40 Divide Yes Divd P40 P36 P32 Mult1 Yes Yes

Register Rename and ResultClock F0 F2 F4 F6 F8 F10 F12 ... F3018 FU P36 P34 P4 P42 P38 P40 P12 P30

Page 54: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 54

Explicit Renaming Support Includes:

• Rapid access to a table of translations• A physical register file that has more registers than

specified by the ISA• Ability to figure out which physical registers are

free.– No free registers stall on issue

• Thus, register renaming doesn’t require reservation stations. However:

– Many modern architectures use explicit register renaming + Tomasulo-like reservation stations to control execution.

Page 55: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 55

What about Precise Exceptions/Interrupts?• Both Scoreboard and Tomasulo have:

– In-order issue, out-of-order execution, out-of-order completion

• Recall: An interrupt or exception is precise if there is a single instruction for which:

– All instructions before that have committed their state– No following instructions (including the interrupting

instruction) have modified any state.

• Need way to resynchronize execution with instruction stream (I.e. with issue-order)

– Easiest way is with in-order completion (i.e. reorder buffer)– Other Techniques (Smith paper): Future File, History Buffer

Page 56: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 56

Relationship between precise interrupts and speculation:

• Speculation is a form of guessing.• Important for branch prediction:

– Need to “take our best shot” at predicting branch direction.– If we issue multiple instructions per cycle, lose lots of potential

instructions otherwise:» Consider 4 instructions per cycle» If take single cycle to decide on branch, waste from 4 - 7

instruction slots!

• If we speculate and are wrong, need to back up and restart execution to point at which we predicted incorrectly:

– This is exactly same as precise exceptions!

• Technique for both precise interrupts/exceptions and speculation: in-order completion or commit

Page 57: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 57

HW support for precise interrupts• Concept of Reorder Buffer (ROB):

– Holds instructions in FIFO order, exactly as they were issued» Each ROB entry contains PC, dest reg, result, exception status

– When instructions complete, results placed into ROB» Supplies operands to other instruction between execution

complete & commit more registers like RS» Tag results with ROB buffer number instead of reservation station

– Instructions commit values at head of ROB placed in registers– As a result, easy to undo

speculated instructions on mispredicted branches or on exceptions

ReorderBufferFP

OpQueue

FP Adder FP AdderRes Stations Res Stations

FP Regs

Commit path

Page 58: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 58

Four Steps of Speculative Tomasulo Algorithm

1. Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr & send

operands & reorder buffer no. for destination (this stage sometimes called “dispatch”)

2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for

result; when both in reservation station, execute; checks RAW (sometimes called “issue”)

3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs

& reorder buffer; mark reservation station available.4. Commit—update register with reorder result

When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)

Page 59: John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley

2/12/2007 CS252-s07, lecture 7 59

Summary• DataFlow view:

– Data triggers execution rather than instructions triggering data• Dynamic hardware schemes can unroll loops

dynamically in hardware– Form of limited dataflow– Register renaming is essential

• Explicit Renaming: more physical registers than needed by ISA.

– Rename table: tracks current association between architectural registers and physical registers

– Uses a translation table to perform compiler-like transformation on the fly

• Precise Interrupts:– Must commit things back in order– Reorder buffer: temporarily holds results until commit possible– Toss out things to achieve precise interrupt point