igpu : exception support and speculative execution on gpus
DESCRIPTION
iGPU : Exception Support and Speculative Execution on GPUs. Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group University of Wisconsin−Madison. Presented at ISCA 2012 . Executive Summary. Compiler/hardware co-design for efficient, general-purpose GPUs - PowerPoint PPT PresentationTRANSCRIPT
1
Department of Computer Science
iGPU: Exception Support and Speculative Execution on GPUs
Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam
Vertical Research GroupUniversity of Wisconsin−Madison
Presented at ISCA 2012
Department of Computer Science
2
Executive Summary Compiler/hardware co-design for efficient, general-
purpose GPUs
Exception support with 1.5% overhead (no more than 4%)
Demand paging support with 2.5% overhead Context switch (no more than 4%)
Exploiting speculation provides > 10% energy savings
Department of Computer Science
3
Outline Motivation and Background iGPU Mechanisms
General exception handling Context switching Speculation support
iGPU Architecture Software Hardware
Evaluation Conclusion
Department of Computer Science
4
CPU Evolution Retrospective IBM 360 era – precise exceptions as a performance
tradeoff
However, two key shifts in processor design – Virtual memory no longer optional Speculative execution on ILP processors
Department of Computer Science
5
Precise exception handling and speculation was a key enabler for modern CPUs
Department of Computer Science
6
GPU Architectural trends
Significant interest in supporting demand paging Emerging necessity for supporting speculation
More workloads – “irregular” workloads Handling reliability problems
A single unified CPU-GPU address space
Department of Computer Science
7
Need general purpose exception and speculation support for GPUs
Department of Computer Science
8
Why not just borrow CPU ideas? CPUs use buffering to preserve arch. state
Future file, History file, Re-order Buffer …
But GPUs have 1000x as many registers Not practical!
Department of Computer Science
9
Fundamental Challenges1. Well defined restart point in program
GPU pipeline and SIMT model make this hard
2. Preserving architecture state prior to restart Need to save 1000s of registers
Department of Computer Science
10
Key Ideas of our Solution1. Well defined restart point in program
Idempotent code regions Restartable regions
producing same effect
2. Preserving architecture state prior to restart Regions constructed with
small live state: 1 to 3 regs Save only this live state
Creation of restart points
Preservation of necessary state
Department of Computer Science
11
Outline Challenges and Implications iGPU Mechanisms
General exception handling Context switching Speculation support
iGPU Architecture Software Hardware
Evaluation Conclusion
Department of Computer Science
12
Exception Support Idempotent regions mark restart points Register file provides all the reqd. state! Idempotence guarantees correctness
Implicit checkpoints using idempotence
A BException handler
B
Creation idea
Department of Computer Science
13
Outline Challenges and Implications iGPU Mechanisms
General exception handling Context switching Speculation support
iGPU Architecture Software Hardware
Evaluation
Department of Computer Science
14
Context Switch
A B
Exception is page fault
1. Cleanly remove process 1 ?2. Start another process and execute 3. Get page from disk concurrently 4. Restore process 1 ?5. Restart process 1
?
Page-fault handling
B?
Department of Computer Science
15
Context Switch
A B
Exception is page fault
1. Cleanly remove process 1 ?2. Start another process and execute 3. Get page from disk concurrently 4. Restore process 1 ?5. Restart process 1
?
Page-fault handling
B?
Department of Computer Science
Context Switch Must save and restore architectural state
But...GPUs have megabytes of register state Save only live state
Save only live state at points of minimal live state
Department of Computer Science
Context Switch Must save and restore architecture state
But...GPUs have megabytes of register state Save only live state
Save state at points of minimal live state
17
Implicit minimum live state checkpoints using idempotence
A B B
# live registers
23
Candidate cut point
942
B
# live registers
2
Exception handler
Preserve idea
Department of Computer Science
18
Outline Challenges and Implications iGPU Mechanisms
General exception handling Context switching Speculation support
iGPU Architecture Software Hardware
Evaluation Conclusion
Department of Computer Science
19
Speculation Speculation generates state that is wrong
Need even more buffers Recall: buffers are impractical for GPUs
Use idempotence! Reduce re-execution cost by sub-dividing
regions
Implicit checkpoints with low re-execution overhead using idempotence
Tuning the Creation idea
Department of Computer Science
20
Speculation
A B
# live registers: 2
* Region construction details: Idempotent Processing, PLDI ‘12
B1 B2 BB2 CC
Misspeculation
Department of Computer Science
21
Outline Motivation and Background iGPU Mechanisms
General exception handling Context switching Speculation support
iGPU Architecture Software Hardware
Evaluation Conclusion
Department of Computer Science
22
iGPU Architecture
Compiler
Hardware
Application
Department of Computer Science
23
iGPU Architecture - Software Form regions Preserve state
Creation idea
Preserve idea
state preservation
register re-assignment, moves and
spills
region formation
region marker instructions
Reg. pressure
Department of Computer Science
24
iGPU Architecture - Software
Source Code Compiler
Device Code Generator
Device Code
Kernel Source Code
Department of Computer Science
25
iGPU Architecture - Software
Source Code Compiler
Device Code Generator
Idempotent Device Code
Kernel Source Code
Region formation
Department of Computer Science
26
iGPU Architecture - Software
Source Code Compiler
Device Code Generator
Idempotent Device Code
Kernel Source Code
Region formation
State preservation
Department of Computer Science
27
iGPU Architecture - Hardware
…
L2 Cache SIMDProcessor
L1 cache & TLB
General PurposeRegisters
Core
Core Core
Core
Fetch Unit
……
Decode
RPCs
(not to scale)
Creation idea
Department of Computer Science
28
iGPU Architecture - Hardware
General Purpose Registers
Restart PCRegister
(to scale)
2 RPCs per warp - one each for Sparse and Short regions
Compare to 1024 GPRs per warp
(32 x 32)
Department of Computer Science
29
iGPU Architecture - Hardware
State preservation handled purely by compiler!Not hardware’s responsibility
Preserve idea
Department of Computer Science
30
Outline Motivation and Background iGPU Mechanisms
General exception handling Context switching Speculation support
iGPU Architecture Software Hardware
Evaluation Conclusion
Department of Computer Science
31
Evaluation
AES BFS BSC CP DG LIB LPS MUM NN NQU RAY STO WPGeoMean0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
iGPU Runtime Overhead
Region Creation Context Switch and Speculation support overhead
% O
verh
ead
Department of Computer Science
32
Evaluation – Voltage Speculation
AES BFS BSC CP DG LIB LPS MUM NN NQU RAY STO WP GeoMean0
2
4
6
8
10
12
14
16
18
20
Energy Savings on iGPU with Voltage Emergency Prediction
% e
nerg
y sa
ving
s
Vdd reduction : 10% Error rate : 0.01%
Department of Computer Science
33
Outline Motivation and Background iGPU Mechanisms
General exception handling Context switching Speculation support
iGPU Architecture Software Hardware
Evaluation Conclusion
Department of Computer Science
34
Executive Summary Compiler/hardware co-design for efficient, general-
purpose GPUs
Exception support with 1.5% overhead (no more than 4%)
Demand paging support with 2.5% overhead Context switch (no more than 4%)
Exploiting speculation provides > 10% energy savings
Department of Computer Science
35
Conclusions Exception support for GPUs is practical
Enables better integration with CPUs in CPU-GPU architectures
Speculative execution on GPUs Both for performance and reliability presents interesting possibilities in the context of
“irregular” workloads
Department of Computer Science
36
Questions