embedded systems laboratory informatics institute federal university of rio grande do sul porto...
Post on 21-Dec-2015
215 views
TRANSCRIPT
Embedded Systems Laboratory Informatics Institute
Federal University of Rio Grande do SulPorto Alegre – RS – Brazil
SRC TechCon 2005Portland, Oregon, USA
Dealing withMultiple Simultaneous Faults
in Future Technologies
Carlos A. L. Lisbôa Erik Schüler
Luigi Carro
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 2
Why Multiple Simultaneous Faults ?
• Future technologies (2010 and beyond)
• very small transistors and fewer electrons to form the
channel ( SETs)
• transient pulses due to radiation attack will last longer
than the propagation delays of gates
• devices will be more sensitive to the effects of
electromagnetic noise, neutrons and alpha particles
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 3
Single Event Upset Origin
1 0 1 0 0 0 0 1
0 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 4
Why Should One Study Multiple Faults ?
Change in paradigm:
Gates will behave statistically,
producing correct outputs only a
fraction of the time.
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 5
• New paradigm: multiple simultaneous faults• new fault tolerance techniques will be required
(TMR will no longer provide enough protection)
How to Deal with Multiple Faults ?
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 6
• New paradigm: multiple simultaneous faults• new fault tolerance techniques will be required (TMR
will no longer provide enough protection)
• How to deal with this problem ?
• new materials and manufacturing technologies
must be developed
OR• new design approaches must be taken
How to Deal with Multiple Faults ?
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 7
• New paradigm: multiple simultaneous faults• new fault tolerance techniques will be required (TMR
will no longer provide enough protection)
• How to deal with this problem ?
How to Deal with Multiple Faults ?
•new design approaches must be taken (our bet !)
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 8
Research Approaches
• Use of stochastic operators
• Use of bit stream operators
• Ensuring voter reliability to use n-MR while dealing with multiple simultaneous faults
• Next steps: 2005 - 2007 time frame
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 9
Research Evolution
OK for someDSP
Applications
Look
ing fo
r
mor
e sp
eed
StochasticOperators
Small footprintand fast
Tolerant to multiple faults in n-MR solutions
AnalogVoter
Bit StreamOperators
Looking for
tolerant converter
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 10
Using Stochastic Operators
• SEU induced transient errors are of random nature
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 11
Using Stochastic Operators
• SEU induced transient errors are of random nature
• Stochastic operators rely on randomness to produce approximate results
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 12
Using Stochastic Operators
• SEU induced transient errors are of random nature
• Stochastic operators rely on randomness to produce approximate results
• The injection of random faults in the input signals processed by stochastic operators did not impact the precision of the results
0 faults 2 faults 4 faults 8 faults0.1412 0.2580 0.1768 0.2196
Stochastic AdderConventional
0.0000
% Errors in 1,000 additions
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 13
Using Stochastic Operators
• SEU induced transient errors are of random nature
• Stochastic operators rely on randomness to produce approximate results
• The injection of random faults in the input signals processed by stochastic operators did not impact the precision of the results
• Several application areas (DSP) can deal with approximate values and still produce acceptable results (outputs)
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 14
Using Stochastic Operators
• Benefit: reduced area of the operators
Stochastic multiplier circuit
1000100110011010
10010001000010111000000100001010
Stochastic Adder Circuit
01100010101
010111011001S1
S3
Sum
01010101101
0010100110101
S2
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 15
Using Stochastic Operators
How does it work ?
Come and see the posters !
No free drinks, but the answer to this question is granted !
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 16
Using Bit Stream Operators
• Computation principles similar to those of the stochastic adder and multiplier
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 17
Using Bit Stream Operators
• Computation principles similar to those of the stochastic adder and multiplier
• Operators can produce bit streams which represent the exact results of the operation
Proposed Multiplication Algorithm - bit stream product(the count of 1’s in the stream is equal to the product value)
F12 F11 F10
x F22 F21 F20
F20.F12 F20.F11 F20.F10
F21.F12 F21.F11 F21.F10
F22.F12 F22.F11 F22.F10
b48 .. b33 b32 .. b17 b16 .. b5 b4 .. b1 b0
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 18
b48 .. b48 b47 .. b47 ... b0 .. b0 1 1 1 1 0 0 0
8 times 8 times 8 times +4total count of 1’s = 8 * product + 4
Using Bit Stream Operators
• Computation principles similar to those of the stochastic adder and multiplier
• Operators can produce bit streams which represent the exact results of the operation
• Redundancy is added to the bit streams in order to stand to multiple bit flips
Adding robustness to the bit stream through redundancy
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 19
Using Bit Stream Operators
• Computation principles similar to those of the stochastic adder and multiplier
• Operators can produce bit streams which represent the exact results of the operation
• Redundancy is added to the bit streams in order to stand to multiple bit flips
• Conversion of bit streams to binary coded values is delayed as much as possible, and conversion circuits must use TMR or n-MR for protection against faults
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 20
Using Bit Stream Operators
• Computation principles similar to those of the stochastic adder and multiplier
• Operators can produce bit streams which represent the exact results of the operation
• Redundancy is added to the bit streams in order to stand to multiple bit flips
• Conversion of bit streams to binary coded values is delayed as much as possible, and conversion circuits must use TMR or n-MR for protection against faults
• Issues to be further investigated: size of bit streams and area of the conversion circuits
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 21
Using Bit Stream Operators
No free food, but some more info on this subject will be provided !
How does it work ?
Come and see the posters !
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 22
VOTER
correct output
What is Wrong with TMR ?
• TMR protects only against single faults in one of the modules
Module 1
Module 2
Module 3
correct output
correct output
correct output
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 23
Module 2 wrong output
What is Wrong with TMR ?
Module 1
Module 3
correct output
correct output
VOTER
correct output
• TMR protects only against single faults in one of the modules
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 24
Module 2 correct output
What is Wrong with TMR ?
• TMR does not protect against double faults in different modules
Module 1
Module 3
wrong output
wrong output
VOTER
wrong output
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 25
VOTER
correct output
What is Wrong with TMR ?
• When a single fault occurs in the voter circuit, the voter output may be wrong
Module 1
Module 2
Module 3
correct output
correct output
correct output
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 26
VOTER
correct output ?
What is Wrong with TMR ?
Module 1
Module 2
Module 3
correct output
correct output
correct output
• When a single fault occurs in the voter circuit, the voter output may be wrong
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 27
Making TMR (n-MR) more reliable
• Known solutions imply in• area, performance and / or power penalties
• deadlock: how to protect the output generator ?
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 28
Making TMR (n-MR) more reliable
• Known solutions imply in• area, performance and / or power penalties
• deadlock: how to protect the output generator ?
• Proposed solution:• use TMR to cope with single faults in the modules
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 29
Making TMR (n-MR) more reliable
• Known solutions imply in• area, performance and / or power penalties
• deadlock: how to protect the output generator ?
• Proposed solution:• use TMR to cope with single faults in the modules
• replace the digital voter by an analog voter that• uses a comparator to generate the output
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 30
• Known solutions imply in• area, performance and / or power penalties
• deadlock: how to protect the output generator ?
• Proposed solution:• use TMR to cope with single faults in the modules
• replace the digital voter by an analog voter that• uses a comparator to generate the output
• can support some noise, nevertheless producing the correct result
Making TMR (n-MR) more reliable
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 31
The Analog Voter
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 32
Injection of faultsin the comparator (*)
Minimum Area Comparator
(*) using CMOS 0.35µm
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 33
Electrical Simulation: Multiple Faults(SPICE and CMOS 0.35 m)
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 34
Dealing with Multiple Simultaneous Faults: n-MR
The Analog Voter with 5 Inputs (for 5-MR)
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 35
Dealing with Multiple Simultaneous Faults: n-MR
The Analog Voter with 5 Inputs (for 5-MR)
Simulations with injection of2 simultaneous faults also succeeded
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 36
The Analog Voter ... Oops !
Does t
his
work ??
?
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 37
Let’s
see the
posters !
The Analog Voter
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 38
Future Work - Short Term (2005-2006)
• use of signal redundancy with other number representation forms, such as Sigma-Delta
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 39
Future Work - Short Term (2005-2006)
• use of signal redundancy with other number representation forms, such as Sigma-Delta
• use of the analog voter as an efficient way to implement robust n-MR circuits
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 40
Future Work - Short Term (2005-2006)
• use of signal redundancy with other number representation forms, such as Sigma-Delta
• use of the analog voter as an efficient way to implement robust n-MR circuits
• investigate the application of statistical methods and neural networks to the design of fault tolerant circuits with minimum redundancy
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 41
Future Work - Long Term (2006-2007)
• use of logic properties to develop signal redundancy with low cost
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 42
Future Work - Long Term (2006-2007)
• use of logic properties to develop signal redundancy with low cost
• apply the developed techniques to actual processors w/ DSP and VLIW architectures
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 43
Future Work - Long Term (2006-2007)
• use of logic properties to develop signal redundancy with low cost
• apply the developed techniques to actual processors with DSP and VLIW architectures
• discuss the architectural impact of new technologies together with fault tolerance
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 44
Research Evolution
StochasticOperators Analog
Voter
Bit StreamOperators
previous work (2004-2005) 2005 2006 2007
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 45
Research Evolution
StochasticOperators Analog
Voter
Bit StreamOperators
SigmaDelta
previous work (2004-2005) 2005 2006 2007
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 46
Research Evolution
StochasticOperators Analog
Voter
Bit StreamOperators
SigmaDelta
Logic Properties
previous work (2004-2005) 2005 2006 2007
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 47
Low cost
redundancy
Research Evolution
StochasticOperators Analog
Voter
Bit StreamOperators
SigmaDelta
Logic Properties
previous work (2004-2005) 2005 2006 2007
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 48
Application to actualDSP and VLIW processors
Low cost
redundancy
Research Evolution
StochasticOperators Analog
Voter
Bit StreamOperators
SigmaDelta
Logic Properties
DSP / VLIW
previous work (2004-2005) 2005 2006 2007
Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 49
Questions ?
Looking forward to answer them at the poster booth!
(# 20.4)
Contact: [email protected]
Thank You !
No free anything, but a nice chat about these matters will be a pleasure !