in-line interrupt handling for software managed tlbs aamer jaleel and bruce jacob electrical and...
TRANSCRIPT
In-Line Interrupt Handling for Software Managed TLBs
Aamer Jaleel and Bruce JacobElectrical and Computer Engineering
University of Maryland at College Park{ ajaleel, blj } @ eng.umd.edu
International Conference on Computer Design (ICCD) 2001
September 24 – September 26Austin, Texas
Outline Reorder buffers Interrupt handling
Traditional method In-lined method ( novel solution )
Performance of in-lining TLB interrupts
Conclusions
Reorder Buffer (ROB) Hardware data
structure (queue) “Holding tank” for
instructions & their pipeline state
New instructions are queued at the tail and retired from the head
Allows for interrupts to be handled in-order
ROB0
ROB1
ROB2
ROB3
ROB4
ROB5
ROB6
ROB7
ROB8
ROB9
ROB10
ROB11
ROB12
ROB13
ROB14
ROB15
HEAD
TAIL
Reorder Buffer (ROB)A Hardware Data Structure
Queue new instructions
Retire instructions
Instruction typePipeline State
Exception flagsetc
Empty Slots
Handling an Interrupt ( Traditional Method ) Interrupts handled at retire stage If ROB[ head ].has_exception = true
Save state & exceptional PC Flush ROB Set PC to appropriate handler Handle exception with privileges
enabled Restore exceptional PC and continue
executing user code
A Novel Approach – In-lining Hardware knows length of handler If ROB[ head ].has_exception = true
If empty slots in ROB >= LHC & enough resources
Save current tail pointer & nextPC ( PC of next instruction to fetch )
Set mode to inline and reset head and tail pointers Fetch handler, once done unset INLINE mode, tail
pointer, continue fetching user code When TLB updated, undo all TLB misses in ROB
Else, handle interrupt by traditional method
Done Executing
Empty ROB slotReady To ExecutePrivilege Bit Set
Instruction had exception
Interrupt In-lining… (An Example)
TLB Interrupt!CAN I INLINE?
Space available = 9 >= LHC
CAN INLINE!
LWSHIFTSUB
MISC--------
ADDILW
MULADDI
LWSHIFTSUB
MISC---------
LWMULADDI
SavedTail Ptr
CYCLE 1: TLB miss detectedINLINE mode set, tail pointer & nextPC saved,
reset head & tail pointers, PC = handler code
LHC
ResetPointers
LWSHIFTSUB
MISC---
Handler 1Handler 2
----
LWMULADDI
CYCLE 2 & 3: Handler FetchedUser & Handler Execute (ROB 2,7,8)
LWSHIFTSUB
MISC---
Handler 1Handler 2Handler 3Handler 4
--
LWMULADDI
Tail Ptr
Head Ptr
Assuming length of handler LHC = 6, Instruction Fetch/Retire Per Cycle = 2
Interrupt In-lining… (An Example)LW
SHIFTSUB
MISC---
Handler 1Handler 2
----
LWMULADDI
CYCLE 2 & 3: Handler FetchedUser & Handler Execute (ROB 2,7,8)
LWSHIFTSUB
MISC---
Handler 1Handler 2Handler 3Handler 4
--
LWMULADDI
DONE FETCHING HANDLER
RestoreTail Ptr
LWSHIFTSUB
MISC-----
Handler 3Handler 4Handler 5TLB WR
LWMULADDI
CYCLE 4: Done Fetching HandlerINLINE mode unset, tail pointer & nextPC restored.
Resume fetching user code
LWSHIFTSUB
MISC---
Handler 1Handler 2Handler 3Handler 4Handler 5TLB WR
LWMULADDI
Done Executing
Empty ROB slotReady To ExecutePrivilege Bit Set
Instruction had exception
Interrupt In-lining… (An Example)
CYCLE 3: User & Handler Execute
LWSHIFTSUB
MISC---
Handler 1Handler 2Handler 3Handler 4
--
LWMULADDI
RestoreTail Ptr
LWSHIFTSUB
MISC-----
Handler 3Handler 4Handler 5TLB WR
LWMULADDI
CYCLE 4: Done Fetching HandlerINLINE mode unset, tail pointer & nextPC restored.
Resume fetching user code
LWSHIFTSUB
MISC---
Handler 1Handler 2Handler 3Handler 4Handler 5TLB WR
LWMULADDI
LWSHIFTSUB
MISCADDIADDI
---
Handler 3Handler 4Handler 5TLB WR
LWMULADDI
CYCLE 5: Fetch user code from where it
last stopped
Done Executing
Empty ROB slotReady To ExecutePrivilege Bit Set
Instruction had exception
Interrupt In-lining… (An Example)
RestoreTail Ptr
LWSHIFTSUB
MISC-----
Handler 3Handler 4Handler 5TLB WR
LWMULADDI
CYCLE 4: Done Fetching HandlerINLINE mode unset, tail pointer & nextPC restored.
Resume fetching user code
LWSHIFTSUB
MISC---
Handler 1Handler 2Handler 3Handler 4Handler 5TLB WR
LWMULADDI
LWSHIFTSUB
MISCADDIADDI
---
Handler 3Handler 4Handler 5TLB WR
LWMULADDI
CYCLE 5: Fetch user code from where it
last stopped
LWSHIFTSUB
MISCADDIADDISWSW
---
Handler 5TLB WR
LWMULADDI
CYCLE 6: Fetch &execute more
user code
Done Executing
Empty ROB slotReady To ExecutePrivilege Bit Set
Instruction had exception
Interrupt In-lining… (An Example)LW
SHIFTSUB
MISCADDIADDI
---
Handler 3Handler 4Handler 5TLB WR
LWMULADDI
CYCLE 5: Fetch user code from where it
last stopped
LWSHIFTSUB
MISCADDIADDISWSW
---
Handler 5TLB WR
LWMULADDI
CYCLE 6: Fetch &execute more
user code
TLB UPDATED UNDO ALL
TLB INTERRUPTS
LWSHIFTSUB
MISCADDIADDISWSW
---
Handler 5TLB WR
LWMULADDI
CYCLE 7: TLB refilled in execute stageUndo All TLB misses
LWSHIFTSUB
MISCADDIADDISWSW
---
Handler 5TLB WR
LWMULADDI
Done Executing
Empty ROB slotReady To ExecutePrivilege Bit Set
Instruction had exception
Interrupt In-lining… (An Example)LW
SHIFTSUB
MISCADDIADDISWSW
---
Handler 5TLB WR
LWMULADDI
CYCLE 6: Fetch &execute more
user code
LWSHIFTSUB
MISCADDIADDISWSW
---
Handler 5TLB WR
LWMULADDI
CYCLE 7: TLB refilled in execute stageUndo All TLB misses
LWSHIFTSUB
MISCADDIADDISWSW
---
Handler 5TLB WR
LWMULADDI
CYCLE 8: Handler “Vanishes”Re-access TLB
LWSHIFTSUB
MISCADDIADDISWSW
ADDADD
---
LWMULADDI
Done Executing
Empty ROB slotReady To ExecutePrivilege Bit Set
Instruction had exception
Issues With Interrupt In-lining Hardware knows handler length There should be a privilege bit per ROB
entry When done fetching handler, fetch nextPC
Save nextPC NOT exceptionalPC ( Add MUX ) When done updating TLB
Undo all instructions w/TLB miss and set them as “ready to execute”
Branch mispredictions must be handled If mispredict occurs while in-lining, replace
nextPC
Experimental Methodology Simulation Tool:
Alpha 21264 4-way OOO 80 instructions in
flight FA I/D TLBs w/NMRU
policy, 128-entry 8 KB page size 150 renaming
registers 22 instruction D-TLB
handler
Benchmarks: Small scientific
kernels Red Black Jacobi Matrix Multiply Quicksort
Why not SPEC2000? TLB miss rates are not realistic
Real Life Apps
Spec FP 2000
*
*
*
**
**
**
**
*
*
** *
* Our benchmarks
Results - In-lining Limitations Benefit from in-lining: 80-90% of
TLB miss interrupts Can not in-line because:
Not enough space in ROB ( < 2% ) Pipeline already stalled due to lack of
resources ( free registers )
# of User Instructions Flushed
Reorder buffer 50– 55% full when interrupt occurs
In-lining reduces # instr flushed by 40– 80%
Interrupt Overhead
Cost of re-fetching and executing flushed instr
In-lining reduces cost of TLB miss by 10– 40%
Performance of Benchmarks
Execution time by 5 – 25% for same size TLB
Can get the performance of a traditional TLB with an in-lined TLB of ¼ size
Speedup Vs Miss Rate
As TLB management benefit from in-lining
Applications that will benefit from in-lining are those that need it the most
MATRIX MULT
RED BLACK
JACOBI
QUICKSORT
Conclusions In-lining can be used for ALL types of
software handled “transparent interrupts”
Avoids unnecessary flushing of pipeline In-line interrupt handling for TLB misses
Cuts # of instr flushed by 55-80% Reduces overhead by 10-40% Improves performance by 5-25%
Future Work Speculative in-lining
No need to check for ROB space, check for deadlock
Energy savings by not re-fetching and re-executing instructions
Related Work Save entire internal state of entire
pipeline and restore after completion of handler (Cyber 200 for VM interrupts)
Save instruction window (ROB) as part of machine state, restore when done (Torng & Day)
A new thread fetches handler code while existing thread continues fetching user code (Zilles, Emer, & Sohi)
Miscellaneous Slides
If Time Permitting
Interrupts Interrupt: exceptional condition
Perform behind the scenes work Transparent to the user application e.g. unaligned memory access,
instruction emulation, TLB miss handling, etc
Two types Software handled ( privileged code ) Hardware handled ( special hardware )
Precise Interrupts Precise Interrupt
Everything before excepted instruction has finished execution and has committed
Everything after excepted instruction has NOT committed
The excepted instruction may or may not have finished execution
Software-Managed TLBs Vs Hardware-Managed TLBs Hardware managed TLBs
outperform software managed TLBs Software managed used because of
flexibility Software managed TLBs
E.g. MIPS, Alpha, SPARC, PA-RISC Hardware managed TLBs
E.g. IA-32, PowerPC
Disadvantages… Two sources of performance loss:
No user code executes while exception is handled
Instructions are re-fetched and re-executed Solution: avoid pipeline & ROB flushes Why is the pipeline flushed?
Ensure privileges? Attach a privilege bit to ROB entry ( Henry )
Have enough space for interrupt handler?
Interrupt In-lining – An Example
Head Pointer
Tail Pointer
Old Tail Pointer
Empty ROB slots
Waiting to be issued
Had TLB Interrupt
Finished execution
Had MISC Interrupt
Assuming length of handler LHC = 6, Instruction Fetch/Retire Per Cycle = 2
LWSHIFTSUB
MISCADDIADDISWSW
ROB8
ROB9
ROB10
ROB11
ROB12
LWMULADDIPrivilege Bit Set
In-lining TLB Interrupts First level handler length is short TLB miss handlers are most commonly
executed OS primitives TLB miss handling account for more than
40% of total run time and 80% of the kernels computation time
TLB miss interrupts occur once every 100 – 1000 instructions in applications ranging from databases to engineering workloads
Issues With Interrupt In-lining In-lined instructions shouldn’t affect
state of user registers Problem w/conventional method of
register renaming First handler instruction should receive a
mapping of the current state of register file
A user instruction should receive mapping of the previous user instruction mapped
Alpha 21264 Pipeline Fetch Decode & Map Execute Write back Retire