design of soft error tolerant memory and logic circuitsee.sharif.edu/~adic/lecture_ser_20.pdf ·...
TRANSCRIPT
Design of Soft Error Tolerant Memory and Logic Circuits
Shah M. JahinuzzamanPhD Student
http://vlsi.uwaterloo.ca/~smjahinu
Graduate Student Research Talks, E&CEJanuary 16, 2006
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
CMOS Design and Reliability Group
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
http://www.ece.uwaterloo.ca/~cdr/
Outline• What is soft error• Soft error sources and mechanism • Soft error in logic circuits• Soft error in memories• Effect of technology scaling on soft error• Mitigation techniques • Summary
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
• Transient data upset due to particle strike‘1’ ‘0’ or ‘0’ ‘1’
• Minimum charge required for an upset Qcrit• No damage to hardware - REWRITE or RESET
can restore the changed data• Random in time and space• Affects: latches, flip-flops, memory blocks, and
even combinational logic circuits
What is Soft Error?
1 0 1 1 0 1 0 0 1
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
• Typically expressed in FIT (failure in time)1 FIT=1failure/109 device-hr
• Sum of typical hard failure rates ≈ 50-200FIT(oxide breakdown, latch-up etc.)
Soft failure rate in unprotected chip ≈ 50,000FIT• Critical reliability concerns:
microprocessors with large cache (e.g., in servers), SRAM based FPGAs and ASICs, aircraft controllers, space-borne electronics, life-support devices such as cardiac defibrillators
Soft Error Rate
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
Source: www.fda.gov
• High energy (~MeV) particles– Alpha particles (~ 4-9 MeV)– Cosmic neutrons (~ 10-200 MeV) and – Thermal neutron and 10B in Borophosphosilicate glass
(BPSG)• Only 3.6eV is required to create 1 EHP in Si• BPSG is no more a concern after 0.25μm tech • Alpha particles come form chip packaging
materials • Neutrons come from cosmic rays and are ever
present (background radiation)
Sources of soft error
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
• Doubly ionized 4He2+ atom• Sources: Pb in solders and U, Th in IC
packaging materials; major concern - solder balls in flip-chip package
• Penetrates 25μm in Si• Can be shielded by ‘epoxy layer’ (not in flip-chip)
Alpha Particle
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
• Comes from sun or inter-galactic rays
• Generates EHPindirectly through Si recoil
• Cannot be shielded: 1ft concrete can lower neutron flux only by 1.4x
Cosmic Neutron
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
• Alpha particle deposits 4-16 fC/μmNeutron (Si recoil) deposits 25-150 fC/μm
• Rate limiting SE source in scaled down devices with high purity materials: cosmic neutron
Relative Influence of SE Sources
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
Basic Mechanism of SE
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
• Qcoll=ηQdep, η being collection efficiency• Qcoll generates current transient • Qcoll depends on doping, collection volume, node voltage, carrier mobility etc.
• Qcoll>Qcrit soft error
R. Baumann, IEEE Design and Test of Computers, pp. 258-266, May-June 2005
• Also referred to as Single Event Transient (SET)• Less troublesome - less density compared to
memories and activity dependent • Naturally masked by three mechanisms
– Logical masking– Electrical masking– Latching window masking
• Increasing concern with scaling (90nm, onward)“Robust enterprise platforms in sub-65nm technologies require design with built-in logic soft error protection,” S. Mitra, Intel Corp.
Soft Error in Logic Circuits
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
• Struck node has to be in controlling state for a transient to pass from input to output
• In order for an error to propagate, there must be a sensitized path along the logic chain
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
Logical Masking
A B NAND0 0 10 1 11 0 11 1 0
Output does not depend on B; A is
in controlling state
• Digital circuits have finite bandwidth and rise-fall time.
• Transients with bandwidths higher than the cut-off frequency will be attenuated (amplitude ↓, rise and fall time ↑) and eventually the transients will disappear.
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
Electrical Masking
• A transient cannot be latched into a FF/register unless it occurs within the clock window
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
Latching Window Masking
Transient has to occur here to be latched
Transient is not latched
• No masking effects, high density Most susceptible to soft error
• Memories to consider: • Main memory (DRAM)• Cache memory (SRAM)
• Soft error changes the stored bits may lead to catastrophic failures of microprocessors, SRAM based FPGAs etc.
Soft Error in Memories
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
• Higher capacitance (3D, trench like), smaller charge collection area, periodic refresh
decreasing bit error rate, constant system error rate
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
Soft Error in DRAM
R. Baumann, IEEE Design and Test of Computers, pp. 258-266, May-June 2005
Word lines
Bit lines
Trench capacitors
Metal bit line
Trench capacitors
Poly word line
• Larger area per bit than DRAM, signal charge stored by two cross-coupled inverter
• Two nodes are prone to particle hit: one node is more sensitive
• Critical reliability issue
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
Soft Error in SRAM
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
Scaling Trends of SE
Syst
em S
ER
in S
emic
ondu
ctor
Mem
ory
(FIT
s)
Source: Semico Research Inc. (June 2002)
R. Baumann, IEEE Design and Test of Computers, pp. 258-266, May-June 2005
• Signal charge is reduced: Q=CV, both C and V are scaled
• Particles with lower energy can cause soft error
Process Node Application Soft Error Protection Required
Consumer None
Networking and storage Memory
Military and aerospace Memory and logic
Consumer Memory and logic
Networking and storage Memory and logic
Military and aerospace Memory and logic
Consumer Memory and logic
Networking and storage Memory and logic
Military and aerospace Memory and logic
65nm and below
90nm
180nm to 130nm
Source: iRoC Technologies & www.edn.com
SE Sensitivity with Scaling
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
• Layout level– Reduction of sensitive area, using extra doping layer
(epitaxial layer can help) or SOI etc.• Circuit level
– Circuit techniques to reduce sensitivity to transients• System level
– Space and time redundancy, Parity protection (only error detection), Error Correction Code (ECC), Error Detection and Correction Code (EDAC)
Existing Mitigation Techniques
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
VDD
WLWLBLB
BL
R
R
Circuit Level Mitigation
Cypress Semiconductors
Ootsuka et. al., IEDM 1998
P. Roche, et. al., IRPS 2004
T. M. Mnich, et. al., IEEE Trans. Nucl. Sci., p. 4620, 1983
System Level Mitigation
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
• Redundancy, majority voting
• Parity protection, EDAC/ECC
• SRAM in SoC: currently 50%; expected to reach ~90% by the end of the decade
• SE tolerance of SRAM will determine the system reliability
• Scaling and low power approaches for SRAMs are making SE immunity harder to achieve
• Circuit and system level hardening within area-power-performance constraints is essential – motivation of my research
Design of SE Tolerant SRAMs
CMOS Designand
Reliability Group
CMOS Designand
Reliability Group
• Soft error causes silent data corruption – the probability increases with technology scaling
• Both memory and logic circuits are susceptible• Logic circuits have inherent masking mechanism
– Higher frequency makes them vulnerable• Memories, e.g., SRAM is the most vulnerable• Layout, circuit and system level mitigation
techniques are used• Mitigation techniques incur cost and degrade
performance CMOS Design
and Reliability Group
CMOS Designand
Reliability Group
Summary