flash mitigation strategies for space applications charles howard southwest research institute
TRANSCRIPT
![Page 1: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/1.jpg)
FLASH Mitigation Strategies for Space Applications
Charles HowardSouthwest Research Institute
![Page 2: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/2.jpg)
2FLASH Mitigation Strategies for Space Applications
Abstract
The MMS mission requires a high density non-volatile solid state recorder. The SSR will be implemented with screened commercial FLASH devices, characterized for radiation effects (both TID and SEE). In an extensive collaborative effort by NEPP and SWRI, multiple manufacturers and devices have been characterized. The additional SEU failure modes exhibited by FLASH devices compel mitigation techniques to extend beyond the traditional bit error correction. A discussion of mitigation techniques and tradeoffs between FPGA complexity/utilization, bandwidth and total memory will be presented .
![Page 3: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/3.jpg)
3FLASH Mitigation Strategies for Space Applications
Why FLASH?
“I am your density” – George McFly, Back to the Future
SDRAM– 512Mx8 in an MCM?
SRAM– Yeah, right…
FLASH – 512Mx8 discrete parts
1Gx8 available
– 4Gx8 MCMs (8Gx8 possible)– NON-VOLATILE…
![Page 4: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/4.jpg)
4FLASH Mitigation Strategies for Space Applications
Why NOT FLASH?
Space qualified parts?– General availability sorely lacking– No Rad foundry providing FLASH
Legacy / Lack thereof– Radiation testing of commercial products is a
strenuous process…– Each wafer lot must be tested– “Long term” availability for commodity parts?
NOT!
![Page 5: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/5.jpg)
5FLASH Mitigation Strategies for Space Applications
NEPP/SWRI testing of FLASH
SEE response is generally excellent for all flash products– Error cross-sections orders of magnitude lower than
for standard volatile memories None of the parts suffered SEL
– There were other destructive effects, usually failure of the erase circuit.
The SEFI rate is a concern with flash memories.– What do you call a SEFI that won’t clear after a
power cycle?
![Page 6: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/6.jpg)
6FLASH Mitigation Strategies for Space Applications
FLASH Memory in Space Environment
“The SEFI (Single Event Functional Interrupt) rate is of greater concern for space applications than the bit error rate”– TID and SEE Response of Advanced 4G NAND Flash
Memories NSREC08, T.R. Oldham
![Page 7: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/7.jpg)
7FLASH Mitigation Strategies for Space Applications
Mitigation Considerations
Class of Error– SEUs– SEL– SEFI– “Permanent” SEFI
Cost of implementation/mitigation– Area – Mass– Power– Required FPGA logic
![Page 8: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/8.jpg)
8FLASH Mitigation Strategies for Space Applications
Error Classes
SEU– Address to satisfy MAR
Some form of ECC SEL
– Sufficiently low to neglect Component design issue
SEFI (part becomes nominal after power cycle/reset)– More likely than SEU, must address
Detect & power cycle/reset Permanent SEFI
– More likely than SEU, must address– Different mitigation approach!
???
![Page 9: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/9.jpg)
9FLASH Mitigation Strategies for Space Applications
Module Topology
4 Gigabyte Module
FLASH 4Gx8
(512Mx8, x8
PowerControl
Ctrl+Data
FLASH512Mx8
FLASH512Mx8
FLASH512Mx8
FLASH512Mx8
FLASH512Mx8
FLASH512Mx8
4GByte Module
FLASH512Mx8
FLASH512Mx8
![Page 10: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/10.jpg)
10FLASH Mitigation Strategies for Space Applications
CAVEAT STATEMENTS
I am not doing the probability calculations Consider a DWORD storage system for reference Permanent SEFIs are not recoverable:
– Loss of Erase, Write or Read Circuit– Can approximate the loss of a component
Block based failures and permanent SEFIs are roughly equivalent– Lose a “unit” of data (BLOCK x 4 x n) ~
“component” Simple addressing and memory management
– No exotic stuff like link listing
![Page 11: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/11.jpg)
11FLASH Mitigation Strategies for Space Applications
Design Options
UnmitigatedSEC/DED (Traditional EDAC)Reed-SolomonParallel Reed-SolomonTMRRedundancyECC “Plus”
![Page 12: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/12.jpg)
12FLASH Mitigation Strategies for Space Applications
Unmitigated
0% more memory– Area / Power / Mass 1x
Implementation concerns– Addressing scheme Simple– Memory management metrics Simple
Utilization -- logic required to implement– I/O count 1x– Gates Baseline
Susceptibility– Bit Any Single Bit Error– Byte or component NOPE…
![Page 13: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/13.jpg)
13FLASH Mitigation Strategies for Space Applications
SEC/DED
25% more memory– Area / Power / Mass 1.25x
Implementation concerns– Addressing scheme Simple– Memory management metrics Simple
Utilization -- logic required to implement– I/O count 1.25x– Gates Hamming cost
Susceptibility (Immunity)– Bit Any Single Bit Error– Byte or component NOPE…
![Page 14: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/14.jpg)
14FLASH Mitigation Strategies for Space Applications
Reed Solomon (Block)
25% more memory – Area / Power / Mass 1.25x
Implementation concerns– Addressing scheme Straightforward– Memory management metrics Simple
Utilization -- logic required to implement– I/O count 1.25x– Gates Encoder/Decoder/RAM– Bandwidth Likely Adverse
Susceptibility (Immunity)– Bit, byte Many/codeblock – Component failures NOPE…
![Page 15: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/15.jpg)
15FLASH Mitigation Strategies for Space Applications
Parallel Reed Solomon
50% more memory – Area / Power / Mass 1.5x
Implementation concerns– Addressing scheme Simple– Memory management metrics
Utilization -- logic required to implement– I/O count 1.5x– Gates Encoder/Decoder
Susceptibility (Immunity) – Bit, byte, byte “plus” YEAH!– SOME component failures 2/3 (NOT IN
THE RS)
![Page 16: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/16.jpg)
16FLASH Mitigation Strategies for Space Applications
TMR
200% more memory– Area / Power / Mass 3x
Implementation concerns– Addressing scheme Simple– Memory management metrics Simple
Utilization -- logic required to implement– I/O count 3X or TDM– Bus loading / signal integrity Ouch…– Gates Voters (plus)
Susceptibility (Immunity) – Bit, byte or component OH, YEAH! We
can handle anything!
![Page 17: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/17.jpg)
17FLASH Mitigation Strategies for Space Applications
Redundant Memory
X% more memory– Area / Power / Mass X
Implementation concerns– Addressing scheme Simple– Memory management metrics Simple
Utilization -- logic required to implement– I/O count X– Gates Minimal
Susceptibility (Immunity) – Bit, byte or component Nope.
![Page 18: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/18.jpg)
18FLASH Mitigation Strategies for Space Applications
ECC with Warm Spare
25-50% more memory per dword– Area / Power / Mass 1.5x
Implementation concerns– Addressing scheme Simple– Memory management metrics
Straightforward Utilization -- logic required to implement
– I/O count 1.5x– Bus loading / signal integrity– Gates ECC & steering
Susceptibility (Immunity) – Bit, byte or component OH, YEAH! We
can handle anything!
![Page 19: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/19.jpg)
19FLASH Mitigation Strategies for Space Applications
Memory Topology
Power Sector(16GB + EDAC)
PowerControl
Power Sector(16GB + EDAC)
PowerControl
Ctrl+Data
BYTE0
BYTE2
ECC
BYTE1
BYTE3
SPARE
Revised 07/25/09
Power Sector(16GB + EDAC)
PowerControl
Mass Memory
Flash Array(48GB + EDAC)
![Page 20: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/20.jpg)
20FLASH Mitigation Strategies for Space Applications
Failure 1
Power Sector(16GB + EDAC)
PowerControl
Power Sector(16GB + EDAC)
PowerControl
Ctrl+Data
BYTE0
BYTE2
ECC
BYTE1
BYTE3
BYTE0
Revised 07/25/09
Power Sector(16GB + EDAC)
PowerControl
Mass Memory
Flash Array(48GB + EDAC)
![Page 21: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/21.jpg)
21FLASH Mitigation Strategies for Space Applications
Failure 2
Power Sector(16GB + EDAC)
PowerControl
Power Sector(16GB + EDAC)
PowerControl
Ctrl+Data
BYTE0
BYTE2
ECC
BYTE1
BYTE3
ECC
Revised 07/25/09
Power Sector(16GB + EDAC)
PowerControl
Mass Memory
Flash Array(48GB + EDAC)
![Page 22: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/22.jpg)
22FLASH Mitigation Strategies for Space Applications
Observations
ECC covers SEU errors Warm Spare compensates for SEFIs and block errors ECC with Warm Spare is a superior option
– Susceptibility to permanent SEFIs plummets– Memory availability remains near 100%
Block based errors mapped to spare SEFI based errors map to spare
ECC with Warm Spare is roughly equivalent to full TMR at half the power, mass, area, and cost
![Page 23: FLASH Mitigation Strategies for Space Applications Charles Howard Southwest Research Institute](https://reader036.vdocuments.net/reader036/viewer/2022062804/56649e205503460f94b0af89/html5/thumbnails/23.jpg)
23FLASH Mitigation Strategies for Space Applications
Summary
Memory modules allow highest density/area Mitigation is user’s choice depending upon
design goals but must cover SEFI and SEU ECC with Warm Spare is roughly equivalent to
full TMR at half the power, mass, area, and cost
TID and SEE Response of an Advanced Samsung 4Gb NAND Flash Memory (NSREC07); T. R. Oldham, M. Friendlich, J. W. Howard, Jr., M. D. Berg, H. S. Kim, T. L. Irwin, and K. A. LaBel
TID and SEE Response of Advanced 4G NAND Flash Memories (NSREC08); T. R. Oldham, Fellow, IEEE, M. Suhail, M. R. Friendlich, M. A. Carts, R.L. Ladbury, Member, IEEE, H. S. Kim, M. D. Berg, C. Poivey, Member, IEEE, S. P. Buchner, Member, IEEE, A. B. Sanders, C. M. Seidleck, and K. A. LaBel, Member, IEEE
SEE and TID of Emerging Non-Volatile Memories; D.N. Nguyen and L.Z. Scheick, Jet Propulsion Laboratory California Institute of Technology, http://parts.jpl.nasa.gov/docs/PID16621.pdf
A Case Study of Single Event Functional Interrupts (SEFIs) in COTS SDRAMS (NSREC08); Joe Benedetto and George Ott, Radiation Assured Devices