LaRCMAPLD 2005 / A2081Ng
Radiation Tolerant Intelligent Memory Stack (RTIMS)
Tak-kwong Ng, Jeffrey HerathElectronics Systems Branch
Systems Engineering DirectorateNASA Langley Research Center
[email protected]@nasa.gov
757-864-1097 (Tak)757-864-1098 (Jeff)
LaRC 2 MAPLD 2005 / A208Ng
Agenda
• What is it ?• Goals• Components selection• FPGA SEU mitigation• XTMR tools• Status• Future work• Points to ponder
LaRC 3 MAPLD 2005 / A208Ng
What is it ?
• Radiation tolerant– Use commercial-off-the-shelf (COTS) components
• Reprogrammable FPGA• High performance• Lower cost
– Pick parts with applicable mitigation techniques• Shielding, over-current protection, triple module redundancy,
FPGA configuration scrubbing
• Intelligent– Reprogrammable FPGA
• SDRAM controller• Capacity to add custom logic
• Memory– Large capacity
• SDRAM
• Stack– 3D vs 2D, board space saving
LaRC 4 MAPLD 2005 / A208Ng
Goals
• Large memory capacity– 256 MB EDAC
• Single +3.3V power supply• Simple interface, LVTTL compatible• Throughput
– 32 MWord write
– 16 MWord read
• Reprogram via the JTAG interface • Spare FPGA gate capacity for user application• Radiation characteristics
– Total ionizing dose of 100 krad (Si) at 25o C
– SEU: best practice
– SEL of 60 MeV-cm2/mg requirement
• Operating temperature: -40o C / +85o C
LaRC 5 MAPLD 2005 / A208Ng
Components Selection (1/3)
• FPGA– Reprogrammable– Xilinx Virtex, Virtex-II
• XQR2V1000– Total ionizing dose of 200 krad (Si) (data sheet)– SEL of 160 MeV-cm2/mg (data sheet)– Current limiters
• Limited SEFI– POR, SelectMAP, JTAG– 1.5E-6 upsets/device/day (data sheet)
• SOFT– Mitigation techniques: TMR, configuration scrubbing
– XQ2V1000-4BG575• Military version for lower cost
– SEL may not be as good as XQR2V1000
– SEL of 124 MeV-cm2/mg• Capacity of 1 M gates• 328 Signal I/Os
LaRC 6 MAPLD 2005 / A208Ng
Components Selection (2/3)
• EEPROM– Xilinx XQR18V04
• Total ionizing dose of 10 krad (Si) (data sheet)– 30 krad (Si) for read only (data sheet)
• SEL of 120 MeV-mg/cm2 (data sheet)• SEU of 120 MeV-mg/cm2 (data sheet)
• SDRAM– Elpida EDS5108ABTA (512Mb)
• Total ionizing dose of 50 krad (Si)• SEL of 80 MeV-mg/cm2 at 85o C, 100o C, 125o C• SEU
– Bit error rate of 6.96E-12 errors/bit-day– SEFI error rate of 1.3E-4 errors/device-day
• Linear Regulator– Texas Instrument TPS75715 (1.5V LDO regulator)
• Total ionizing dose of 10 krad (Si)
• SEL of 60 MeV-cm2/mg
LaRC 7 MAPLD 2005 / A208Ng
Components Selection (3/3)
• Current limiters– Maxim-IC MAX893L (1.2A) , MAX891L (0.5A)
• Total ionizing dose SEL of 30 krad (Si)
• Power-On-Reset circuit– Maxim-IC MAX803
• Total ionizing dose of 20 krad (Si)
• Stacking technology– Provided by 3D Plus
LaRC 8 MAPLD 2005 / A208Ng
Radiation Mitigation
• Total ionizing dose– Local shielding
– Package shielding, thickness depend on requirement
• SEL– Current limiting device
• SEU– Memory contents
• TMR, EDAC
– FPGA SEU• Configuration scrubbing, TMR
• SEFI– Best effort to minimize the SEFI rate
– Mitigate at higher level
LaRC 9 MAPLD 2005 / A208Ng
Block Diagram
LaRC 10 MAPLD 2005 / A208Ng
FPGA SEU Mitigation (1/5)
• Input– Xilinx recommendation
• Use 3 pins per signal, connected on the board• Bus signals: use one pin per signal, add EDAC, save pins
– The sending side must generate EDAC check bits
• Pins can be used up quickly
– Implementation• Module Interface
– Use 3 pins per signal for address/controls
– Use 1 pin per signal for Din
• EDAC is optional
• Single point failure rate increases without EDAC
LaRC 11 MAPLD 2005 / A208Ng
FPGA SEU Mitigation (2/5)
• Output– Xilinx recommendation
• Use 3 pins per signal, connected on the board– Not glitch-free
– Signal integrity
• Bus signals: use one pin per signal, add EDAC, save pins– The receiving side must also implement EDAC
• Pins can be used up quickly
– Implementation• Module interface
– Use 3 pins per signal for controls
– Use 1 pin per signal for Dout
• EDAC is optional
• Single point failure rate increases without EDAC
LaRC 12 MAPLD 2005 / A208Ng
FPGA SEU Mitigation (3/5)
• Output– Implementation …
• SDRAM interface– Clock, Address
• 3 sets, equivalent signals are not connected together on the board,
• Each set drives two SDRAMs
– Controls
• 4 sets, equivalent signals are not connected together on the board
• Two of the sets, each drives two SDRAMs
• The other two sets, each drives one SDRAM
• Switch EDAC/TMR configured SDRAM
LaRC 13 MAPLD 2005 / A208Ng
FPGA SEU Mitigation (4/5)
• Bi-directional– Xilinx recommendation
• Use 1 pin per signal• Path from voter to the pin becomes possible single point failure
– Implementation• SDRAM Interface
– TMR configured SDRAMs
• 3 sets of data bus
– EDAC configure SDRAMs
• Use 1 pin per signal
LaRC 14 MAPLD 2005 / A208Ng
FPGA SEU Mitigation (5/5)
• Implication on data integrity of the SDRAM contents– EDAC configured SDRAMs
• 256 MB• Output drivers and input receivers are possible single point failure
– TMR configured SDRAMs• 128 MB• No single point failure• Back ground SDRAMs content scrubbing
LaRC 15 MAPLD 2005 / A208Ng
XTMR Tool (1/4)
• Fairly fast• Gates utilized
– Average utilization cost of TMR is ~3.2x
– RTIMS actual• 4.3x• Gates multiplier = 3 + 3 * (fraction of flops + fraction of I/Os)
– It is closer to 3x for design that is mostly gates
– It is closer to 6x for design that is mostly flops
– RTIMS actual: 36% flops• Additional multiplier for design with SRL16
LaRC 16 MAPLD 2005 / A208Ng
XTMR Tool (2/4)
• Internal performance degradation– Average performance impact of TMR is ~10%
– RTIMS actual• ~20%• 6 logic levels original
– Add a voter, 7 levels
– ~15% performance impact • Longer routing
– 3.8x gates
– ~5% performance impact
LaRC 17 MAPLD 2005 / A208Ng
XTMR Tool (3/4)
• I/O performance degradation• Input Pin
– TMR• Voters after the FF• Lock the FF in the IOB
– No TMR on input pin• 3 FFs after the input receiver• Can’t lock the FF in the IOB• Performance penalty• RTIMS actual: increased from 1.8 ns to 3.6 ns
LaRC 18 MAPLD 2005 / A208Ng
XTMR Tool (4/4)
• Output Pin– Triplicate pin, tied together on board
• Add Voter before the output driver• Glitch• Can’t lock the FF in the IOB• Performance penalty• Signal integrity
– Not triplicating pin• Add voter before the output driver• Glitch• Can’t lock the FF in the IOB• Performance penalty
– RTIMS actual: increased from 4.5 ns to 6.4 ns
LaRC 19 MAPLD 2005 / A208Ng
Storage state
• Correct SEU on storage state before the next SEU that make it uncorrectable
• Memory content– Scrubbing
• Flop state– Basic Xilinx flop: FDCPE(PRE, D, CE, C, CLR, Q)– Inputs of FLOP are corrected– Unless CE is active, the Flop state is not corrected.– 3 minority voters and 3 OR gates can be added to force a CE on error
detected– Expensive to apply this universally– For “almost” static flop, the following FLOP is used
LaRC 20 MAPLD 2005 / A208Ng
A few other things (1/4)
• Digital Clock Manger– Use 3 DCMs for each DCM that is in the original design
– DCM is a unit• SEU on a FLOP in the DCM
– Corrected by configuration scrubbing
– Reset only
– 3 counters, each counter is clocked by a DCM
– When one of the counter value is different from the other two, we know which DCM is operating differently than the others
– Each counter is TMR so that a SEU on the counter other than the clock path will not produce an error
LaRC 21 MAPLD 2005 / A208Ng
A few other things (2/4)
• Configuration scrubbing– Similar to Virtex
– Virtex II • Whole configuration is loaded with 1 type 2 command• The order of configuration loading is
– GCLK, CLB and IOB, Memory Content, and Memory Control
– Script to split the loading into three type 2 command• GCLK, CLB, IOB• Memory control• Memory content
– On power up the whole configuration is loaded
– On scrubbing, only GCLK, CLB, IOB, and memory control are loaded
LaRC 22 MAPLD 2005 / A208Ng
A few other things (3/4)
• Configuration scrubbing– Scrubber logic is TMR and it is part of the FPGA code
– Master SelectMap for configuration with configuration clock continue to run after initial load
– Scrubber logic is clocked by the configuration clock• The generation of the configuration clock becomes a possible single
point failure• Can switch to Slave SelectMap and add an external oscillator
LaRC 23 MAPLD 2005 / A208Ng
A few other things (4/4)
• SelectMap Interface SEFI detection– Implement a 16x1 distribute memory as SRL16 with initial value of
all zeros
– Instruct XTMR not to convert it to registers
– Write a signature into this memory prior to configuration scrubbing
– This memory shall be clear because of the reloading of the CLB during configuration scrubbing
– Read the memory content after configuration scrubbing
– A non-zero content indicates scrubbing failure
LaRC 24 MAPLD 2005 / A208Ng
Stack
SDRAM MISC
LaRC 25 MAPLD 2005 / A208Ng
Status
• 20 Modules– Related paper: "Radiation Tolerant and Intelligent Memory for Space"
(P1025)
– 144-Lead QFP package
– Dimensions:42.5mm x 42.5mm x 13.0 mm
– Mass: 70g with radiation shielding
– Power: ~4.0 W peak
– To Be Verified / Analyzed• Total Ionizing Dose > 100 krad (Si)• SEU in GEO less than 1.5E-6 per day• Latch-Up Immune to 60 MeV-cm2/mg
LaRC 26 MAPLD 2005 / A208Ng
Future Work
• VHDL and Place & Route– Works in progress
• Minimize SEFI• Error detection and recording• Error recovery
• What is the SEFI rate of RTIMS ?• Environment testing
– Life test (accelerated component life testing)
– 100 krad (Si) TID radiation tests
– SEL and SEU radiation tests
– Vacuum and temperature tests
– Mechanical stress tests
– Electrostatic discharge tests
LaRC 27 MAPLD 2005 / A208Ng
Points to ponder
• XTMR– Not a turn key process
• Scrub memory content• Almost static flop• DCM failure detection and reset• Glitch-free output is no longer glitch-free• Signal integrity with dotted output
– IO• 3 pins for one signal, EDAC• Tie the triplicate IO together vs carry three signals on the board with the
voter implemented on the receiving side
– One size does not fit all