are robust circuits really robust? fileerlangen – january 31, 2011 9 example: random dopant...
TRANSCRIPT
02.02.20
Sybille Hellebrand
Computer Engineering Group
University of Paderborn, Germany
Are Robust Circuits Really
Robust?
2 Erlangen – January 31, 2011
Outline
Motivation
“Robustness Checking”
Self-Checking Circuits – Theory and Practice
Technical Challenges and Solutions
Yield and Quality
“Fault Tolerant Yield”
“Quality Binning”
Conclusions
02.02.20
3 Erlangen – January 31, 2011
Electronic is everywhere more than 80 Processors to
control various functions (ABS, ..., Infotainment, ...)
4 Erlangen – January 31, 2011
Major Problem so far
„Spot defects“, „random defects“ during
manufacturing
[http://www.icyield.com]
Short
02.02.20
5 Erlangen – January 31, 2011
Nanoscale Integration
Potential for integrating highly complex innovative
products into single chip (SoC) or package (SiP)
Problems
Soft errors
Parameter variations
cf. Borkar, IEEE Micro 2005
6 Erlangen – January 31, 2011
Soft Errors
Caused by
Alpha particles, cosmic radiation
Measures
SER (Soft Error Rate) given in
FIT (Failure in Time)
1 FIT = 1 failure in 109 hours ( 114,155 years)
Example
SER for Processor with embedded SRAM is 50,000 FIT
(1 soft error every 2 years)
But: Multiprocessor system with 100 chips has 1 failure per week
02.02.20
7 Erlangen – January 31, 2011
SRAM bit SER with
error-correcting code
Technology (nm)
1000 100
10
1
10-1
10-2
10-3
10-4
10-5
Norm
aliz
ed s
oft e
rror
rate
SRAM bit SER
Logic SER (data)
Logic SER (simulation)
[Baumann, IEEE Design&Test 2005]
SER for Latches/Flipflops in Random Logic
8 Erlangen – January 31, 2011
Parameter Variations
Static variations
Systematic
Random
Dynamic variations
Variations over time (aging)
02.02.20
9 Erlangen – January 31, 2011
Example: Random Dopant Fluctuations
Threshold voltage Vth
Determined by the
concentration of dopant
atoms in the channel
Only a few dopant atoms
in nano scale transitors
Law of large numbers is
no longer valid,
quantum effects must be
considered [Borkar, IEEE Micro 2005]
10 Erlangen – January 31, 2011
Dynamic Parameter Variations
„Power density“ in a
Processor chip
Problems
Hot spots
Varying supply voltage
...
y
W/cm2
x
[Borkar, IEEE Micro 2005]
250
200
150
100
50
0
02.02.20
11 Erlangen – January 31, 2011
Consequences
a
b
g
c
d
e
f
Most parameter variations result in timing variations
1ns
1ns
2ns
2ns
2ns
Traditional view:
nominal or worst
case delay
Now: probability
density functions
(PDF) for delay
12 Erlangen – January 31, 2011
Variation-Aware and Robust Design
Statistical timing analysis
More and more commercial
EDA support
Redundancy
Hardware
Time
Information
Algorithmic
Self-calibrating architectures
a
b
g
c
d
e
f
02.02.20
13 Erlangen – January 31, 2011
Example
[D. Ernst et al., IEEE Micro, 2004]
Razor
14 Erlangen – January 31, 2011
Razor – Error Rates
[D. Ernst et al., IEEE Micro, 2004]
02.02.20
15 Erlangen – January 31, 2011
Robust Systems
Classical fault tolerant architectures
(Self-checking circuits, TMR, …)
New self-calibrating, self-adaptive solutions
System
Robust
implementation compensates
static and/or dynamic
parameter
variations and/or soft errors
You‘re kidding guys ???????
02.02.20
17 Erlangen – January 31, 2011
Challenges
Design validation/verification must take into account
fault tolerance and robustness properties
(‚robustness checking“)
How much robustness is left after manufacturing?
Fault tolerant yield
Quality binning
18 Erlangen – January 31, 2011
Outline
Motivation
„Robustness Checking“
Self-Checking Circuits – Theory and Practice
Technical Challenges and Solutions
Yield and Quality
“Fault Tolerant Yield”
“Quality Binning”
Conclusions
02.02.20
19 Erlangen – January 31, 2011
Self-Checking Circuits
CUT Checker Error
Indication
Encoded
Outputs Encoded
Inputs
20 Erlangen – January 31, 2011
Self-Checking Circuits
An error is detected, if and only if it produces an erroneous
output outside the output code (non code word)
Input Code
Output
Code
f(x) x
02.02.20
21 Erlangen – January 31, 2011
Properties
Totally self-checking (TSC) goal
Faults must be detected when they produce the first
erroneous output
Fault secure (FS)
Faults are detected or do not propagate to outputs
Self-Testing (ST)
Every fault can be detected with at least one input
Avoid fault accumulation
22 Erlangen – January 31, 2011
Problem
Design strategies for self-
checking circuits well-known
But: synthesis may destroy
self-checking properties,
e.g. by logic sharing
CUT
Pre
dic
tion
Generation =
x c(x)
c(y)
y
c(y)’
Error
Indication
02.02.20
23 Erlangen – January 31, 2011
Problem
Design strategies for self-
checking circuits well-known
But: synthesis may destroy
self-checking properties,
e.g. by logic sharing
Pre
dic
tion
Generation =
x c(x)
c(y)
y
c(y)’
Error
Indication
CUT
24 Erlangen – January 31, 2011
Consequences
Analysis of circuit robustness is required to
check robustness properties after synthesis
identify critical nodes / regions
compare different circuit implementations
02.02.20
25 Erlangen – January 31, 2011
Formal Robustness Checking
[G. Fey et al. 2008, 2009]
N faults
26 Erlangen – January 31, 2011
ATPG-Based Analysis
Reuse efficient tools for
manufacturing test
Automatic Test Pattern
Generation (ATPG) can
Generate test patterns
Identify redundant faults
CUT
Testin
Testout
02.02.20
27 Erlangen – January 31, 2011
Example: Self-Testability
Self-testable = every fault
is detectable
Use test bench to
constrain ATPG
Only input codes as
patterns
Detection only for non
code outputs
CUT
Code-Generator
Testin
Code-Check
Testout
28 Erlangen – January 31, 2011
Strongly Fault-Secure Circuits (SFS)
Discussed so far
Fault-secure (FS)
Self-testing (ST) Avoid fault accumulation
Secure fault accumulation Strongly fault-secure (SFS)
Circuit is SFS w.r.t. fault set F:
For all f in F
ST and FS w.r.t. {f} or
FS for {f} and SFS for all sequences {<f, g>}
02.02.20
29 Erlangen – January 31, 2011
Challenges
Multiple fault analysis required
How to compare circuits which are not 100% SFS?
30 Erlangen – January 31, 2011
Iterative Robustness Grading
Classify multiple fault f as
insecure (!FS)
secure (FS & ST) or
unknown (else)
Study {f} F, if fault is unknown
At each iteration compute upper and lower bounds
02.02.20
31 Erlangen – January 31, 2011
Outputs
Input
Multiple Fault Analysis
Unconstrained multiple fault analysis:
Rules to determine detectability of
multiple faults from properties of single
faults
E.g. Faults f and g with disjoint output
cones:
DT(<f, g>) = DT(f) or DT(g)
f g
32 Erlangen – January 31, 2011
Problem
Rules cannot be directly applied, new code specific
rules are applied
Example: dual-rail circuit
o
o‘
i1
i1‘
i2
i2‘
02.02.20
33 Erlangen – January 31, 2011
Experimental Results
Unordered input and output encoding & inverter-free
implementation
Parity output encoding
Thread-parallel SAT-based ATPG tool TIGUAN
(Freiburg)
34 Erlangen – January 31, 2011
Unordered Input and Output Coding
Lower Bound
Upper Bound
Weighted LB
Weighted UB
% SFS (single faults)
02.02.20
35 Erlangen – January 31, 2011
Parity Output Coding
% SFS (single faults)
Lower Bound
Upper Bound
Weighted LB
Weighted UB
36 Erlangen – January 31, 2011
Precision for Single and Double Faults
Precision
Precision (weighted bounds)
C17 C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552
02.02.20
37 Erlangen – January 31, 2011
Run Time for Double Faults (Seconds)
Advanced Multiple Fault Analysis
Standard
38 Erlangen – January 31, 2011
Outline
Motivation
„Robustness Checking“
Self-Checking Circuits – Theory and Practice
Technical Challenges and Solutions
Yield and Quality
“Fault Tolerant Yield”
“Quality Binning”
Conclusions
02.02.20
39 Erlangen – January 31, 2011
Example: Triple Modular Redundancy
Can compensate both
permanent and transient
faults
Used both for yield and
reliability improvement
M1 V
O T
E R
M2
M3
i o
40 Erlangen – January 31, 2011
Yield of a TMR System
i faults occur i faults tolerated
02.02.20
41 Erlangen – January 31, 2011
Fault Tolerance?
perfect
TMR
still tolerates
certain faults
error detection
still possible
working
42 Erlangen – January 31, 2011
“Fault Tolerant” Yield
Necessary:
refined yield estimation o1
o4
o2
o3 i2
i1
i3
f1
f2
V
O T
E R
k additional faults tolerated
02.02.20
43 Erlangen – January 31, 2011
Example b13
YFT(2) lower bound
YFT(2) upper bound
TMR upper bound
TMR lower bound
Single module
defect density in defects/gate defect density in defects/gate
44 Erlangen – January 31, 2011
Quality Binning
Enhanced manufacturing
test must classify chips
according to quality levels
Two steps
“Functional” Test: Go/NoGo
Diagnostic Test with DfT
Reveals “functionally
redundant” faults
Critical faults must be
distinguished from tolerable
faults
o1
o4
o2
o3 i2
i1
i3
f1
f2
V
O T
E R
02.02.20
45 Erlangen – January 31, 2011
DFG-Project RealTest
Topics
Variation-Aware Testing
Design and Test of Robust Systems
Partners
IIS-EAS Dresden (Straube, Vermeiren), U. Freiburg
(Becker), U. Stuttgart (Wunderlich), U. Paderborn
(Hellebrand), U. Passau (Polian)
Industrial Board
Mentor Graphics Hamburg, Infineon München
46 Erlangen – January 31, 2011
Conclusions
Soft errors and parameter variations require a robust
system design
Robust circuit design comes along with new
challenges in
design validation/verification
yield estimation (traditional vs. “fault tolerant yield”)
testing (pass/fail vs. “quality binning”)