eecs 452 – lecture 3 pipelining and...

72
EECS 452 – Lecture 3 Pipelining and Interrupts Instructor: Gokhan Memik EECS Dept., Northwestern University

Upload: others

Post on 20-Oct-2019

12 views

Category:

Documents


0 download

TRANSCRIPT

EECS 452 – Lecture 3Pipelining and Interrupts

Instructor: Gokhan MemikEECS Dept., Northwestern University

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 2

Pipelining

Principles of pipeliningSimple pipeliningStructural HazardsData HazardsControl HazardsInterruptsMulticycle operationsPipeline clocking

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 3

Sequential Execution Semantics

We will be studying techniques that exploit the semantics of Sequential Execution.Sequential Execution Semantics:

instructions appear as if they executed in the program specified order and one after the other

AlternativelyAt any given point in time we should be able to identify an instruction so that:1. All preceding instructions have executed2. None following has executed

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 4

Exploiting Sequential Semantics

The “it should appear” is the keyThe only way one can inspect execution order is via the machine’s state

This includes registers, memory and any other named storageWe will looking at techniques that relax execution order while preserving sequential execution semantics

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 5

Steps of Instruction Execution

Instruction execution is not amonolithic action

There are multiplemicroactionsinvolved

Fetch

Decode

Read Operand

Operation

Writeback Result

Determine Next Instruction

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 6

Pipelining: Partially Overlap Instructions

Ideally:

This ignores fill and drain times

1/Throughput

Latency

timeinstructions

Unpipelined

Pipelined 1/Throughput

Latency

timeinstructions

pthPipelineDeTime

Time sequentialpipeline =

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 7

Sequential Semantics Preserved?

Two execution states:1. In-progress: changes not visible to outsiders2. Committed: changes visible

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 8

Principles of Pipelining: Ideal Case

Let T be the time to execute an instructionInstruction execution requires n stages, t1…tn taking T =

W/O pipelining: TR = 1/T Latency = T = 1/TR

W/ n-stage pipeline: TR = Latency=n x max(ti) ≥ T

if all ti are equal, Speedup is nIdeally: Want higher Performance? Use more pipeline stages!!!

∑ it

Tn

ti

≤)max(

1

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 9

Principles of Pipelining: Example

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 10

Why Can’t Improve Performance that easily

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 11

Impact of Clock Skew and Latch Overheads

let X be extra delay per stage forlatch overheadclock/data skew

X limits the useful pipeline depthWith n-stage pipeline (all ti equal)

throughput =

latency = n (X + t) = n X + T

speedup =

Real pipelines usually do not achieve this due to Hazards

Tn

tXp

+1

ntX

T≤

+

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 12

Simple pipelines

F- fetch, D - decode, X - execute, M - memory, W –writeback

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 13

Pipelining as Datapaths in Time

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 14

Hazards

Hazardsconditions that lead to incorrect behavior if not fixed

Structural Hazardtwo different instructions use same resource in same cycle

Data Hazardtwo different instructions use same storagemust appear as if the instructions execute in correct order

Control Hazardone instruction affects which instruction is next

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 15

Structural Hazards

When two or more different instructionswant to use the same hardware resource in the same cycle

e.g., load and stores use the same memory port as IF

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 16

Dealing with Structural Hazards

Stall:+ low cost, simple– decrease IPCuse for rare case

Pipeline Hardware Resource:useful for multicycle resources+ good performance- sometimes complex e.g., RAM – Example 2-stage cache pipeline: decode, read or write data (wave pipelining - generalization)

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 17

Dealing with Structural Hazards

Replicate resource+ good performance– increases cost– increased interconnect delay ?use for cheap or divisible resources

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 18

Impact of ISA on Structural Hazards

Structural hazards are reducedIf each instruction uses a resource at most onceAlways in same pipeline stageFor one cycle

Many RISC ISAs designed with this in mind

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 19

Data Hazards

When two different instructions use the same storage locationIt must appear as if they executed in sequential order

read-after-write (RAW, true dependence) – realwrite-after-read (WAR, anti-dependence) – artificialwrite-after-write (WAW, output-dependence) – artificialread-after-read (no hazard)

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 20

Examples of RAW

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 21

Simple Solution to RAW

Hardware detects RAW and stalls

+ low cost, simple– reduces IPC

Maybe we should try to minimize stalls

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 22

Stall Methods

Compare ahead in pipeif rs1(EX) == Rd(MEM) || rs2(EX) == Rd (MEM) then stallassumes MEM instr is a load, EX instr is ALUUse register reservation bits

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 23

Minimizing RAW stalls

Bypass or Forward or Short-Circuit

Use data before it is in register+ reduces/avoids stalls– complex– Deeper pipelines -> more places to bypass fromcrucial for common RAW hazards

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 24

Bypass

Interlock logicdetect hazardbypass correct result to ALU

Hardware detection requires extra hardwareinstruction latches for each stagecomparators to detect the hazards

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 25

Bypass: Control Example

Mux controlif insn(EX) uses immediate then select IMMelse if rs2(EX) == rd(MEM) then ALUOUT(MEM)else if rs2(EX) == rd(WB) then ALUOUT(WB)else select B

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 26

RAW solutions

Hybrid (i.e., stall and bypass) solutions required sometimes

DLX has one cycle bubble if load result used in next instructionTry to separate stall logic from bypass logic

avoid irregular bypasses

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 27

Pipeline Scheduling - Compilers can help

Instructions scheduled by compiler to reduce stallsa = b + c; d = e + f -- Prior to scheduling

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 28

Pipeline Scheduling

After scheduling

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 29

Delayed Load

Avoid hardware solutions - Let the compiler deal with itInstruction Immediately after load can’t/shouldn’t see load resultCompiler has to fill in the delay slot - NOP might be necessary

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 30

Other Data Hazards

WAR

Not possible in DLX - read early write lateConsider late read then early write

ALU ops writeback at EX stageMEM takes two cycles and stores need source reg after 1 cycle

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 31

Other Data Hazards

WAWNot in DLX : register writes are in orderconsider slow then fast operation

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 32

Control Hazards

When an instruction affects which instruction execute next or changes the PC

sw $4, 0($5)bne $2, $3, loopsub -, - , -

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 33

Control Hazards

Handling control hazards is very importantVAX e.g.,

Emer and Clark report 39% of instr. change the PCNaive solution adds approx. 5 cycles every timeOr, adds 2 to CPI or ~20% increase

DLX e.g.,H&P report 13% branchesNaive solution adds 3 cycles per branchOr, 0.39 added to CPI or ~30% increase

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 34

Handling Control Hazards

Move control point earlier in the pipelineFind out whether branch is taken earlierCompute target address fast

Both need to be done

e.g., in ID stagetarget := PC + immediateif (Rs1 op 0) PC := target

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 35

Handling Control Hazards

Implies only one cycle bubble butspecial PC adder requiredHow about register file?

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 36

ISA and Control Hazard

Comparisons in ID stagemust be fastcan’t afford to subtractcompares with 0 are simplegt, lt test sign-biteq, ne must OR all bits

More general conditions need ALUDLX uses conditional sets

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 37

Handling Control Hazards

Branch predictionguess the direction of branchminimize penalty when rightmay increase penalty when wrong

Techniquesstatic - by compilerdynamic - by hardwareMORE ON THIS LATER ON

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 38

Handling Control Hazards

Static techniquespredict always not-takenpredict always takenpredict backward takenpredict specific opcodes takendelayed branches

Dynamic techniquesDiscussed with ILP

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 39

Handling Control Hazards

Predict not-taken always

if taken then squash (aka abort or rollback)will work only if no state change until branch is resolvedDLX - ok - why?VAX - autoincrement addressing?

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 40

Handling Control Hazards

Predict taken always

For DLX must know target before branch is decoded!can use predictionspecial hardware for fast decode

Execute both paths

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 41

Handling Control Hazards

Delayed branch - execute next instruction whether taken or noti: beqz r1, #8i+1: sub --, --, --. . . . .i+8 : or --, --, -- (reused by RISC invented by microcode)

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 42

Filling in Delay slots

Fill with an instr before branchWhen? if branch and instr are independent.Helps? Always

Fill from target (taken path)When? if safe to execute target, may have to duplicate codeHelps? on taken branch, may increase code size

Fill from fall-through (not-taken path)when? if safe to execute instructionhelps? when not-taken

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 43

Filling in Delay Slots cont.

From Control-Independent code:code that will be eventually visited no matter where the branch goes

Nullifying or Canceling or Likely Branches:Specify when delay slot is execute and when is squashedWhy? Increase fill opportunitiesMajor Concern w/ DS: Exposes implementation optimization

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 44

Comparison of Branch Schemes

Cond. Branch statistics – DLX14%-17% of all insts (integer)3%-12% of all insts (floating-point)Overall 20% (int) and 10% (fp) control-flow insts.About 67% are taken

Branch-Penalty = %branches x (%taken x taken-penalty + %not-taken x not-taken-penalty)

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 45

Comparison of Branch Schemes

Assuming: branch% = 14%, taken% = 65%, 50% delay slots are filled w/ useful workideal CPI is 1

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 46

Impact of Pipeline Depth

Assume that now penalties are doubledFor example we double clock frequency

Delayed Branches need special support for interrupts

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 47

Interrupts

Examples:power failing, arithmetic overflowI/O device request, OS call, page faultInvalid opcode, breakpoint, protection violation

Interrupts (aka faults, exceptions, traps) often require

surprise jump (to vectored address)linking return addresssaving of PSW (including CCs)state change (e.g., to kernel mode)

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 48

Classifying Interrupts

1a. Synchronousfunction of program state (e.g., overflow, page fault)

1b. Asynchronousexternal device or hardware malfunction

2a. user requestOS call

2b. Coercedfrom OS or hardware (page fault, protection violation)

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 49

Classifying Interrupts

3a. User MaskableUser can disable processing

3b. Non-MaskableUser cannot disable processing

4a. Between InstructionsUsually asynchronous

4b. Within an instructionUsually synchronous - Harder to deal with

5a. ResumeAs if nothing happened? Program will continue execution

5b. Termination

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 50

Restartable Pipelines

Interrupts within an instruction are not catastrophicMost machines today support thisNeeded for virtual memorySome machines did not support thisWhy?

CostSlowdown

Key: Precise InterruptsFirst let’s consider a simple DLX-style pipeline

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 51

Handling Interrupts

Precise interrupts (sequential semantics)Complete instructions before the offending instrSquash (effects of) instructions afterSave PC (& next PC with delayed branches)Force trap instruction into IF

Must handle simultaneous interruptsIF, M - memory access (page fault, misaligned, protection)ID - illegal/privileged instructionEX - arithmetic exception

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 52

Interrupts

E.g., data page fault

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 53

Interrupts

Preceding instructions already completeSquash succeeding instructions

prevent them from modifying state (registers, CC, memory)Trap instruction jumps to trap handlerhardware saves PC in IARtrap handler must save IAR

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 54

Interrupts

E.g., arithmetic exception

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 55

Interrupts

E.g., Instruction fetch page fault

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 56

Interrupts

Let preceding instructions completeNo succeeding instructionsWhat happens if i+3 causes a data page fault?

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 57

Interrupts

Out-of-order interruptswhich page fault should we take?

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 58

Out-of-Order Interrupts

Post interruptscheck interrupt bit on entering WBprecise interruptslonger latency

Handle immediatelynot fully preciseinterrupt may occur in order different from sequential CPUmay cause implementation headaches!

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 59

Interrupts

Other complicationsodd bits of state (e.g., CC)early-writes (e.g., autoincrement)instruction buffers and prefetch logicdynamic schedulingout-of-order execution

Interrupts come at random timesBoth Performance and Correctness

frequent case not everythingrare case MUST work correctly

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 60

Delayed Branches and Interrupts

What happens on interrupt while in delay slot next instruction is not sequential

Solution #1: save multiple PCssave current and next PCspecial return sequence, more complex hardware

Solution #2: single PC plusbranch delay bitPC points to branch instructionSW Restrictions

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 61

Multicycle operations

Not all operations complete in 1 cycleFP slower than integer2-4 cycles multiply or add20-50 cycles divide

Extend DLX pipelineEX stage repeated multiple timesmultiple, parallel functional units

not pipelined for now

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 62

Handling Multicycle Operations

Four functional unitsEX: integer E*: FP/integer multiplierE+: FP adder E/: FP/integer divider

AssumeEX takes 1 cycle and all FP take 4 cyclesseparate integer and FP registersall FP arithmetic in FP registers

Worry about hazardsstructural, RAW (forwarding), WAR/WAW (between I & FP)

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 63

Simple Multicycle Example

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 64

Simple Multicycle Example

Notes:(1) - no WAW but complicates interrupts(2) - no WB conflict(3) - stall forced by structural hazard(4) - stall forced by in-order issue

Different FP operation times are possibleMakes FP WAW hazards possibleFurther complicates interrupts

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 65

FP Instruction Issue

Check for structural hazardswait until functional unit is free

Check for RAW - wait untilsource regs are not used as destinations by instrs in Exi

Check for forwardingbypass data from MEM or WB if needed

What about overlapping instructions?contention in WBpossible WAR/WAW hazardsinterrupt headaches

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 66

Overlapping Instructions

Contention in WBstatic prioritye.g., FU with longest latencyinstructions stall after issue

WAR hazardsalways read registers at same pipe stage

WAW hazardsdivf f0, f2, f4 followed by subf f0, f8, f10stall subf or abort divf’s WB

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 67

Multicycle Operations

Problems with interruptsDIVF f0, f2,f4ADDF f2,f8, f10SUBF f6, f4, f10

ADDF completes before DIVFOut-Of-Order completionPossible imprecise interrupts

What if divf excepts after addf/subf complete?Precise Interrupts Paper

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 68

Precise InterruptsSimple solution: Modify state only when all preceding insts. are known to be exception free.Mechanism: Result Shift Register

Reserve all stages for the duration of the instructionMemory: Either stall stores at decode or use dummy store

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 69

Reorder Buffer

Out-of-order completionCommit: Write results to register file or memoryReorder buffer holds not yet committed state

Reorder buffer

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 70

Reorder Buffer Complications

State is kept in the reorder bufferMay have to bypass from every entryNeed to determine the latest writeIf read not at same stage need to determine closest earlier write

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 71

History Buffer

Allow out-of-order register file updatesAt decode record current value of target register in reorder buffer entry.On commit: do nothingOn exception: scan following reorder buffer entries restoring register values

EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 72

Future File

Two register files:One updated out-of-order (FUTURE)

assume no exceptions will occurOne updated in order (ARCHITECTURAL)

Advantage: No delay to restore state on exception