uw-madison computer sciences vertical research group© 2010 relax: an architectural framework for...
Post on 17-Dec-2015
213 Views
Preview:
TRANSCRIPT
UW-Madison Computer Sciences Vertical Research Group © 2010
Relax: An Architectural Framework for Software Recovery
of Hardware Faults
Marc de KruijfShuou Nomura
Karthikeyan Sankaralingam
ISCA 2010 - 3
Executive Summary Problem
Technology is driving simple hardware Fault recovery requires complex hardware
Software Recovery Enables simple hardware High energy efficiency
Relax: An Architectural Framework for Software Recovery ISA: a well-defined interface for software recovery Software: support to use the ISA Hardware: support to implement the ISA
ISCA 2010 - 5
SearchComputer Vision
Data MiningMedia Processing
Scientific Computing…
Applications TrendData-intensive, error-tolerant applications
Architecture TrendEnergy efficiency
Hardware simplification
100110101101001011001010111001010111000100001101
ISCA 2010 - 6
Vdd
OutIn
CMOS TrendDevice variability,
wear-out, soft errors
SearchComputer Vision
Data MiningMedia Processing
Scientific Computing…
Applications TrendData-intensive, error-tolerant applications
Architecture TrendEnergy efficiency
Hardware simplification
CMOS TrendDevice variability,
wear-out, soft errors
Hardware RecoverySoftware Recovery
?Applications Trend
Data-intensive, error-tolerant applications
InefficientNo flexibility
Checkpoints conservative
EfficientError tolerance
Natural recovery points
ISCA 2010 - 7
Vdd
OutIn
SearchComputer Vision
Data MiningMedia Processing
Scientific Computing…
Architecture TrendEnergy efficiency
Hardware simplification
Simple HardwareNo speculative state
Recovery Support Is Needed
Complex HardwareSpeculative state
?
ISCA 2010 - 10
ISA
SIMPLE HARDWARE
application
error tolerancesoftware-definedrecovery
simplicityenergy
efficiency
flexibility
Software defines recovery handler
Hardware detects and jumps to handler on faultand is allowed to commit corrupted state*
rlx RECOVER ...RECOVER: ...
*Details in paper
ISCA 2010 - 12
-- WARNING --SOURCE CODE AHEAD
Software
int sad(int *left, int *right, int len) int sum = 0; for (int i = 0; i < len; ++i) { sum += abs(left[i] - right[i]); } return sum;}
SAD (Sum of Absolute Differences) Example(adapted from a H.264 video encoder)
ISCA 2010 - 13
ENTRY: mv 0 -> $sum ble $len, 0, EXITLOOP_PREHEADER: mv 0 -> $iLOOP: ld [$left + $i * 4] -> $tmp1 ld [$right + $i * 4] -> $tmp2 abs $tmp1, $tmp2 -> $tmp3 add $sum, $tmp3 -> $sum add $i, 1 -> $i blt $i, $len, LOOPEXIT: rlx 0 # Relax off ret $sum
Software
int sad(int *left, int *right, int len) int sum = 0; for (int i = 0; i < len; ++i) { sum += abs(left[i] - right[i]); } return sum;}
relax {
SAD (Sum of Absolute Differences) Example
int sad(int *left, int *right, int len)
int sum = 0; for (int i = 0; i < len; ++i) { sum += abs(left[i] - right[i]);
return sum;}
} recover { retry; }} recover { return INT_MAX; }
return 0x7FFFFFF # “discard”RECOVER: jmp ENTRY # “retry”
rlx RECOVER # Relax on
(adapted from a H.264 video encoder)
raw
encoded
1. No writes to memory2. Idempotent3. Recoverable by re-execution
SIMPLE + INTUITIVE + FLEXIBLE
ISCA 2010 - 15
Microarchitecture1. Fine-grained hardware detection (e.g. Argus)2. Recovery PC register + control logic
Hardware
SIMPLE MICROARCHITECTURE
ISCA 2010 - 16
Homogenous RelaxAll cores with no hardware recovery support
Hardware Organization
“Relaxed” coresNo hardware recovery
Normal coresHardware recovery
Dynamically Heterogeneous RelaxHardware recovery adaptively disabled
Statically Heterogeneous RelaxSome cores with; some cores without
FLEXIBLE DESIGN
ISCA 2010 - 19
Is it Useful?
Application Name Percent Execution Time Contribution of FunctionBarnesHut (Lonestar) >99.9%bodytrack (PARSEC) 21.9%canneal (PARSEC) 89.4%ferret (PARSEC) 15.7%kmeans (MineBench) 83.3%raytrace (PARSEC) 49.4%x264 (PARSEC) 49.2%
Language support using LLVMOne relax region per application (most dominant function)
Retry and discard behavior
7 Applications
IT WORKS!
ISCA 2010 - 21
Methodology
Instruction-level fault injection
Execution time model Statically Heterogeneous
Architecture
Energy model Energy-delay product (EDP) Analytical model for hardware efficiency
ISCA 2010 - 22
Results – Execution Time
barnesh
ut
bodytrac
k
canneal
ferret
kmean
s
raytra
cex2
640
0.20.40.60.8
11.2
retrydiscard
Exec
ution
Tim
e
*error rates range from 10-3 to 10-6 errors/cycle
Execution time overhead is less than 10% and 1% typical
Discard performance is comparable to retry
ISCA 2010 - 23
Results – Energy-delay
barnesh
ut
bodytrac
k
canneal
ferret
kmean
s
raytra
cex2
64-0.2-1.66533453693773E-16
0.20.40.60.8
11.2
retrydiscard
Nor
mal
ized
ED
P
*error rates range from 10-3 to 10-6 errors/cycle
Relax achieves energy improvements for timing speculation
ISCA 2010 - 24
Future Work Better software support
Compiler automation? Binary instrumentation? Nesting relax blocks?
Hardware support What are the chip-level area and power savings? Is Relax hardware truly simpler?
Other domains Software rollback for hardware transactional memory?
Tools to assist analysis of “discard” Discard is hard to reason about; non-deterministic
ISCA 2010 - 25
Summary
Emerging Architectures Many-core architectures are simple Hardware fault recovery is complex
Emerging Applications Error tolerant Large idempotent regions
Software Recovery is a natural fit Relax : an architectural framework for software recovery
ISA: an interface to define it Software: support for applications to use it Hardware: hardware that enables it
ISCA 2010 - 27
ISA Semantics Errors must be “spatially contained” to the target resources of a
relax block Misdirected stores and register not recoverable by Relax!
Errors must be “temporally contained” to the scope of a relax block ECC (or other technique) necessary for memory Cache coherence, cache writeback, etc. require other mechanisms
Control flow must be “legal” (follow static control flow edges) Includes hardware exceptions (must wait on detection before trap)
Atomic operations (e.g. atomic increment) are problematic Not supported (sorry)
ISCA 2010 - 27
ISCA 2010 - 28
Fault Detection
Short latencies important for Detecting misdirected stores Detecting misdirected register writes
Otherwise, latencies depend on region sizes 50 cycle regions + 5 cycle latency = 10% overhead Average region sizes in paper = 1000 cycles
Then, 10 cycle latency = 1% overhead
top related