eecs 452 – lecture 3 pipelining and...
TRANSCRIPT
EECS 452 – Lecture 3Pipelining and Interrupts
Instructor: Gokhan MemikEECS Dept., Northwestern University
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 2
Pipelining
Principles of pipeliningSimple pipeliningStructural HazardsData HazardsControl HazardsInterruptsMulticycle operationsPipeline clocking
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 3
Sequential Execution Semantics
We will be studying techniques that exploit the semantics of Sequential Execution.Sequential Execution Semantics:
instructions appear as if they executed in the program specified order and one after the other
AlternativelyAt any given point in time we should be able to identify an instruction so that:1. All preceding instructions have executed2. None following has executed
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 4
Exploiting Sequential Semantics
The “it should appear” is the keyThe only way one can inspect execution order is via the machine’s state
This includes registers, memory and any other named storageWe will looking at techniques that relax execution order while preserving sequential execution semantics
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 5
Steps of Instruction Execution
Instruction execution is not amonolithic action
There are multiplemicroactionsinvolved
Fetch
Decode
Read Operand
Operation
Writeback Result
Determine Next Instruction
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 6
Pipelining: Partially Overlap Instructions
Ideally:
This ignores fill and drain times
1/Throughput
Latency
timeinstructions
Unpipelined
Pipelined 1/Throughput
Latency
timeinstructions
pthPipelineDeTime
Time sequentialpipeline =
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 7
Sequential Semantics Preserved?
Two execution states:1. In-progress: changes not visible to outsiders2. Committed: changes visible
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 8
Principles of Pipelining: Ideal Case
Let T be the time to execute an instructionInstruction execution requires n stages, t1…tn taking T =
W/O pipelining: TR = 1/T Latency = T = 1/TR
W/ n-stage pipeline: TR = Latency=n x max(ti) ≥ T
if all ti are equal, Speedup is nIdeally: Want higher Performance? Use more pipeline stages!!!
∑ it
Tn
ti
≤)max(
1
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 9
Principles of Pipelining: Example
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 10
Why Can’t Improve Performance that easily
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 11
Impact of Clock Skew and Latch Overheads
let X be extra delay per stage forlatch overheadclock/data skew
X limits the useful pipeline depthWith n-stage pipeline (all ti equal)
throughput =
latency = n (X + t) = n X + T
speedup =
Real pipelines usually do not achieve this due to Hazards
Tn
tXp
+1
ntX
T≤
+
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 12
Simple pipelines
F- fetch, D - decode, X - execute, M - memory, W –writeback
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 13
Pipelining as Datapaths in Time
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 14
Hazards
Hazardsconditions that lead to incorrect behavior if not fixed
Structural Hazardtwo different instructions use same resource in same cycle
Data Hazardtwo different instructions use same storagemust appear as if the instructions execute in correct order
Control Hazardone instruction affects which instruction is next
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 15
Structural Hazards
When two or more different instructionswant to use the same hardware resource in the same cycle
e.g., load and stores use the same memory port as IF
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 16
Dealing with Structural Hazards
Stall:+ low cost, simple– decrease IPCuse for rare case
Pipeline Hardware Resource:useful for multicycle resources+ good performance- sometimes complex e.g., RAM – Example 2-stage cache pipeline: decode, read or write data (wave pipelining - generalization)
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 17
Dealing with Structural Hazards
Replicate resource+ good performance– increases cost– increased interconnect delay ?use for cheap or divisible resources
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 18
Impact of ISA on Structural Hazards
Structural hazards are reducedIf each instruction uses a resource at most onceAlways in same pipeline stageFor one cycle
Many RISC ISAs designed with this in mind
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 19
Data Hazards
When two different instructions use the same storage locationIt must appear as if they executed in sequential order
read-after-write (RAW, true dependence) – realwrite-after-read (WAR, anti-dependence) – artificialwrite-after-write (WAW, output-dependence) – artificialread-after-read (no hazard)
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 20
Examples of RAW
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 21
Simple Solution to RAW
Hardware detects RAW and stalls
+ low cost, simple– reduces IPC
Maybe we should try to minimize stalls
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 22
Stall Methods
Compare ahead in pipeif rs1(EX) == Rd(MEM) || rs2(EX) == Rd (MEM) then stallassumes MEM instr is a load, EX instr is ALUUse register reservation bits
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 23
Minimizing RAW stalls
Bypass or Forward or Short-Circuit
Use data before it is in register+ reduces/avoids stalls– complex– Deeper pipelines -> more places to bypass fromcrucial for common RAW hazards
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 24
Bypass
Interlock logicdetect hazardbypass correct result to ALU
Hardware detection requires extra hardwareinstruction latches for each stagecomparators to detect the hazards
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 25
Bypass: Control Example
Mux controlif insn(EX) uses immediate then select IMMelse if rs2(EX) == rd(MEM) then ALUOUT(MEM)else if rs2(EX) == rd(WB) then ALUOUT(WB)else select B
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 26
RAW solutions
Hybrid (i.e., stall and bypass) solutions required sometimes
DLX has one cycle bubble if load result used in next instructionTry to separate stall logic from bypass logic
avoid irregular bypasses
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 27
Pipeline Scheduling - Compilers can help
Instructions scheduled by compiler to reduce stallsa = b + c; d = e + f -- Prior to scheduling
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 28
Pipeline Scheduling
After scheduling
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 29
Delayed Load
Avoid hardware solutions - Let the compiler deal with itInstruction Immediately after load can’t/shouldn’t see load resultCompiler has to fill in the delay slot - NOP might be necessary
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 30
Other Data Hazards
WAR
Not possible in DLX - read early write lateConsider late read then early write
ALU ops writeback at EX stageMEM takes two cycles and stores need source reg after 1 cycle
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 31
Other Data Hazards
WAWNot in DLX : register writes are in orderconsider slow then fast operation
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 32
Control Hazards
When an instruction affects which instruction execute next or changes the PC
sw $4, 0($5)bne $2, $3, loopsub -, - , -
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 33
Control Hazards
Handling control hazards is very importantVAX e.g.,
Emer and Clark report 39% of instr. change the PCNaive solution adds approx. 5 cycles every timeOr, adds 2 to CPI or ~20% increase
DLX e.g.,H&P report 13% branchesNaive solution adds 3 cycles per branchOr, 0.39 added to CPI or ~30% increase
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 34
Handling Control Hazards
Move control point earlier in the pipelineFind out whether branch is taken earlierCompute target address fast
Both need to be done
e.g., in ID stagetarget := PC + immediateif (Rs1 op 0) PC := target
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 35
Handling Control Hazards
Implies only one cycle bubble butspecial PC adder requiredHow about register file?
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 36
ISA and Control Hazard
Comparisons in ID stagemust be fastcan’t afford to subtractcompares with 0 are simplegt, lt test sign-biteq, ne must OR all bits
More general conditions need ALUDLX uses conditional sets
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 37
Handling Control Hazards
Branch predictionguess the direction of branchminimize penalty when rightmay increase penalty when wrong
Techniquesstatic - by compilerdynamic - by hardwareMORE ON THIS LATER ON
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 38
Handling Control Hazards
Static techniquespredict always not-takenpredict always takenpredict backward takenpredict specific opcodes takendelayed branches
Dynamic techniquesDiscussed with ILP
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 39
Handling Control Hazards
Predict not-taken always
if taken then squash (aka abort or rollback)will work only if no state change until branch is resolvedDLX - ok - why?VAX - autoincrement addressing?
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 40
Handling Control Hazards
Predict taken always
For DLX must know target before branch is decoded!can use predictionspecial hardware for fast decode
Execute both paths
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 41
Handling Control Hazards
Delayed branch - execute next instruction whether taken or noti: beqz r1, #8i+1: sub --, --, --. . . . .i+8 : or --, --, -- (reused by RISC invented by microcode)
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 42
Filling in Delay slots
Fill with an instr before branchWhen? if branch and instr are independent.Helps? Always
Fill from target (taken path)When? if safe to execute target, may have to duplicate codeHelps? on taken branch, may increase code size
Fill from fall-through (not-taken path)when? if safe to execute instructionhelps? when not-taken
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 43
Filling in Delay Slots cont.
From Control-Independent code:code that will be eventually visited no matter where the branch goes
Nullifying or Canceling or Likely Branches:Specify when delay slot is execute and when is squashedWhy? Increase fill opportunitiesMajor Concern w/ DS: Exposes implementation optimization
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 44
Comparison of Branch Schemes
Cond. Branch statistics – DLX14%-17% of all insts (integer)3%-12% of all insts (floating-point)Overall 20% (int) and 10% (fp) control-flow insts.About 67% are taken
Branch-Penalty = %branches x (%taken x taken-penalty + %not-taken x not-taken-penalty)
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 45
Comparison of Branch Schemes
Assuming: branch% = 14%, taken% = 65%, 50% delay slots are filled w/ useful workideal CPI is 1
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 46
Impact of Pipeline Depth
Assume that now penalties are doubledFor example we double clock frequency
Delayed Branches need special support for interrupts
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 47
Interrupts
Examples:power failing, arithmetic overflowI/O device request, OS call, page faultInvalid opcode, breakpoint, protection violation
Interrupts (aka faults, exceptions, traps) often require
surprise jump (to vectored address)linking return addresssaving of PSW (including CCs)state change (e.g., to kernel mode)
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 48
Classifying Interrupts
1a. Synchronousfunction of program state (e.g., overflow, page fault)
1b. Asynchronousexternal device or hardware malfunction
2a. user requestOS call
2b. Coercedfrom OS or hardware (page fault, protection violation)
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 49
Classifying Interrupts
3a. User MaskableUser can disable processing
3b. Non-MaskableUser cannot disable processing
4a. Between InstructionsUsually asynchronous
4b. Within an instructionUsually synchronous - Harder to deal with
5a. ResumeAs if nothing happened? Program will continue execution
5b. Termination
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 50
Restartable Pipelines
Interrupts within an instruction are not catastrophicMost machines today support thisNeeded for virtual memorySome machines did not support thisWhy?
CostSlowdown
Key: Precise InterruptsFirst let’s consider a simple DLX-style pipeline
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 51
Handling Interrupts
Precise interrupts (sequential semantics)Complete instructions before the offending instrSquash (effects of) instructions afterSave PC (& next PC with delayed branches)Force trap instruction into IF
Must handle simultaneous interruptsIF, M - memory access (page fault, misaligned, protection)ID - illegal/privileged instructionEX - arithmetic exception
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 52
Interrupts
E.g., data page fault
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 53
Interrupts
Preceding instructions already completeSquash succeeding instructions
prevent them from modifying state (registers, CC, memory)Trap instruction jumps to trap handlerhardware saves PC in IARtrap handler must save IAR
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 54
Interrupts
E.g., arithmetic exception
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 55
Interrupts
E.g., Instruction fetch page fault
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 56
Interrupts
Let preceding instructions completeNo succeeding instructionsWhat happens if i+3 causes a data page fault?
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 57
Interrupts
Out-of-order interruptswhich page fault should we take?
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 58
Out-of-Order Interrupts
Post interruptscheck interrupt bit on entering WBprecise interruptslonger latency
Handle immediatelynot fully preciseinterrupt may occur in order different from sequential CPUmay cause implementation headaches!
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 59
Interrupts
Other complicationsodd bits of state (e.g., CC)early-writes (e.g., autoincrement)instruction buffers and prefetch logicdynamic schedulingout-of-order execution
Interrupts come at random timesBoth Performance and Correctness
frequent case not everythingrare case MUST work correctly
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 60
Delayed Branches and Interrupts
What happens on interrupt while in delay slot next instruction is not sequential
Solution #1: save multiple PCssave current and next PCspecial return sequence, more complex hardware
Solution #2: single PC plusbranch delay bitPC points to branch instructionSW Restrictions
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 61
Multicycle operations
Not all operations complete in 1 cycleFP slower than integer2-4 cycles multiply or add20-50 cycles divide
Extend DLX pipelineEX stage repeated multiple timesmultiple, parallel functional units
not pipelined for now
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 62
Handling Multicycle Operations
Four functional unitsEX: integer E*: FP/integer multiplierE+: FP adder E/: FP/integer divider
AssumeEX takes 1 cycle and all FP take 4 cyclesseparate integer and FP registersall FP arithmetic in FP registers
Worry about hazardsstructural, RAW (forwarding), WAR/WAW (between I & FP)
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 63
Simple Multicycle Example
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 64
Simple Multicycle Example
Notes:(1) - no WAW but complicates interrupts(2) - no WB conflict(3) - stall forced by structural hazard(4) - stall forced by in-order issue
Different FP operation times are possibleMakes FP WAW hazards possibleFurther complicates interrupts
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 65
FP Instruction Issue
Check for structural hazardswait until functional unit is free
Check for RAW - wait untilsource regs are not used as destinations by instrs in Exi
Check for forwardingbypass data from MEM or WB if needed
What about overlapping instructions?contention in WBpossible WAR/WAW hazardsinterrupt headaches
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 66
Overlapping Instructions
Contention in WBstatic prioritye.g., FU with longest latencyinstructions stall after issue
WAR hazardsalways read registers at same pipe stage
WAW hazardsdivf f0, f2, f4 followed by subf f0, f8, f10stall subf or abort divf’s WB
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 67
Multicycle Operations
Problems with interruptsDIVF f0, f2,f4ADDF f2,f8, f10SUBF f6, f4, f10
ADDF completes before DIVFOut-Of-Order completionPossible imprecise interrupts
What if divf excepts after addf/subf complete?Precise Interrupts Paper
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 68
Precise InterruptsSimple solution: Modify state only when all preceding insts. are known to be exception free.Mechanism: Result Shift Register
Reserve all stages for the duration of the instructionMemory: Either stall stores at decode or use dummy store
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 69
Reorder Buffer
Out-of-order completionCommit: Write results to register file or memoryReorder buffer holds not yet committed state
Reorder buffer
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 70
Reorder Buffer Complications
State is kept in the reorder bufferMay have to bypass from every entryNeed to determine the latest writeIf read not at same stage need to determine closest earlier write
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 71
History Buffer
Allow out-of-order register file updatesAt decode record current value of target register in reorder buffer entry.On commit: do nothingOn exception: scan following reorder buffer entries restoring register values
EECS 452 © Hill, Wood, Sohi, Smith, Vijaykumar, Moshovos, and Memik http://www.eecs.northwestern.edu/~memik/courses/452 72
Future File
Two register files:One updated out-of-order (FUTURE)
assume no exceptions will occurOne updated in order (ARCHITECTURAL)
Advantage: No delay to restore state on exception