a cost-effective substantial-impact-filter based method to tolerate voltage emergencies
DESCRIPTION
A Cost-effective Substantial-impact-filter Based Method to Tolerate Voltage Emergencies. Songjun Pan 1,2 , Yu Hu 1 , Xing Hu 1,2 , and Xiaowei Li 1 1 Key Laboratory of Computer System and Architecture I nstitute of C omputing T echnology Chinese Academy of Sciences - PowerPoint PPT PresentationTRANSCRIPT
1
A Cost-effective Substantial-impact-filter Based Method to Tolerate Voltage Emergencies
Songjun Pan1,2, Yu Hu1, Xing Hu1,2, and Xiaowei Li1
1Key Laboratory of Computer System and Architecture
Institute of Computing Technology
Chinese Academy of Sciences
2Graduate University of Chinese Academy of Sciences
2
Outline• Background and Motivation
• Voltage Emergency Analysis
• Substantial-impact-filter Based Method
• Experimental Results
• Conclusions
3
Background• Shrinking feature size is affecting transistor
behaviors
Variations
Static Dynamic
Process Temperature Voltage
Static
Process Temperature
4
Voltage Emergencies
• Voltage emergencies (VE)• Slow down logical operation • Cause timing violations and affect system reliability
• Traditional tolerance technologies• Set a conservative timing margin [2]• Trigger a program rollback if occur [5]
Volt
ag
eNominal
Operating margin
Voltage emergencies
Vth
[5] M. Gupta, et. al. “DéCoR: A Delayed Commit and Rollback Mechanism for Handling Inductive Noise in Processors,” In HPCA 2008.
[2] N. James, et. al. “Comparison of split-Versus Connected-Core Supplies in the POWER6TM Microprocessor,” In ISSCC 2007.
5
Motivation
• Key observation: not all voltage emergencies will affect program execution
• Basic idea: Only handle the voltage emergencies having adverse effect on program execution
Substantial impact
6
Voltage Emergency Analysis• Voltage emergencies Intermittent timing faults
• Substantial impact faults
• Propagate to storage cells
• Change architecturally correct execution (ACE) bits
Cycle 2 Cycle N-1 Cycle N
VE-inducedtiming violation appears
Timing violation disappears
Clock
Dwrong
Error
Dcorrect ...
...Cycle 1
...
...
...
Capture a wrong data
7
Voltage Emergency Analysis• Quantitative analysis
• IVF: Intermittent Vulnerability Factor, extending from [12]
• Percentage of substantial-impact VE in different structures
PPnumnum- (N- (Ndeaddead + N + Nun-ACEun-ACE))
NUMNUMtotaltotal =IVFitf
• NUMtotal: total number of VE
• Pnum: the number of VE propagating to storage structures
• Ndead : affect dead values
• Nun-ACE: not change ACE bits
[12] S. Pan, Y. Hu, and X. Li, “IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults,” In DATE, 2010.
Masked
8
Substantial-impact-filter Based Method
• Floorplan of our method• Delay sensor: a VE occurs ?
• Fault filter: a substantial-impact VE ?
REG
LSQ
ROB
ALU
IL1
DL1
DL2
IQ
Rollback controller
delay sensor fault filter
error
timing violation
rollback
TLB
timing-insensitive zone
9
Fault Filter
clk_delay
data_in1
rollbackr1
r2
filter1
...clktuning bits
master DFF
^
slave DFF^
.
. . .
clk
data_outw_e1
w_enw_e2
reset
D
D
E
E
...
...data_in2
data_inn
.
. .
..
.
.structure
under analysis
^
timing violation
.
.
.
• Filter structure• Architecture level masking
E
E
≠ROLLBACK
WRITE
VE
Δt=1/2cycle
10
Experimental Setup• Wattch: power estimation
• Matlab: model power delivery subsystem (implement a second order linear model)
• Synopsis Design Compiler: area overhead analysis
• Alpha-power model [21] : compute path delay
• Workload• 16 SPEC2000 benchmarks (10INT, 6FP)
• Simulate 100M instructions with SimPoint
[21] T. Sakurai, et al. “Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas,” Journal of Solid-State Circuits, 1990.
11
Experimental Results• IVF for load/store queue and register file
16.6%
36.4%
Include Ndead and Nun-ACE
Upper bound IVF
14.8%
31.7%
Exclude Ndead and Nun-ACE
Refined IVF
PPnumnum- (N- (Ndeaddead + N + Nun-ACEun-ACE))
NUMNUMtotaltotal =IVFitf
12
Experimental Results• Comparison of three methods
• Once-occur-then-rollback method• DéCoR method [5]• Our proposed method
57%
[5] M. Gupta, et. al. “DéCoR: A Delayed Commit and Rollback Mechanism for Handling Inductive Noise in Processors,” In HPCA 2008.
13
Conclusions• We obverse that less than 40% voltage emergencies
affect program execution
• IVF: Quantitative analysis
• Propose a substantial-impact-filter based method to tolerate voltage emergencies
• Structure independent
• Reduce performance overhead significantly
• Gain back 57% performance loss
14
• Thank You for Your Attention
• Questions?