larc mapld 2005 / a208 ng 1 radiation tolerant intelligent memory stack (rtims) tak-kwong ng,...

27
LaRC MAPLD 2005 / A208 1 Ng Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering Directorate NASA Langley Research Center [email protected] [email protected] 757-864-1097 (Tak) 757-864-1098 (Jeff)

Upload: jasmine-bathsheba-gardner

Post on 18-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRCMAPLD 2005 / A2081Ng

Radiation Tolerant Intelligent Memory Stack (RTIMS)

Tak-kwong Ng, Jeffrey HerathElectronics Systems Branch

Systems Engineering DirectorateNASA Langley Research Center

[email protected]@nasa.gov

757-864-1097 (Tak)757-864-1098 (Jeff)

Page 2: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 2 MAPLD 2005 / A208Ng

Agenda

• What is it ?• Goals• Components selection• FPGA SEU mitigation• XTMR tools• Status• Future work• Points to ponder

Page 3: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 3 MAPLD 2005 / A208Ng

What is it ?

• Radiation tolerant– Use commercial-off-the-shelf (COTS) components

• Reprogrammable FPGA• High performance• Lower cost

– Pick parts with applicable mitigation techniques• Shielding, over-current protection, triple module redundancy,

FPGA configuration scrubbing

• Intelligent– Reprogrammable FPGA

• SDRAM controller• Capacity to add custom logic

• Memory– Large capacity

• SDRAM

• Stack– 3D vs 2D, board space saving

Page 4: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 4 MAPLD 2005 / A208Ng

Goals

• Large memory capacity– 256 MB EDAC

• Single +3.3V power supply• Simple interface, LVTTL compatible• Throughput

– 32 MWord write

– 16 MWord read

• Reprogram via the JTAG interface • Spare FPGA gate capacity for user application• Radiation characteristics

– Total ionizing dose of 100 krad (Si) at 25o C

– SEU: best practice

– SEL of 60 MeV-cm2/mg requirement

• Operating temperature: -40o C / +85o C

Page 5: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 5 MAPLD 2005 / A208Ng

Components Selection (1/3)

• FPGA– Reprogrammable– Xilinx Virtex, Virtex-II

• XQR2V1000– Total ionizing dose of 200 krad (Si) (data sheet)– SEL of 160 MeV-cm2/mg (data sheet)– Current limiters

• Limited SEFI– POR, SelectMAP, JTAG– 1.5E-6 upsets/device/day (data sheet)

• SOFT– Mitigation techniques: TMR, configuration scrubbing

– XQ2V1000-4BG575• Military version for lower cost

– SEL may not be as good as XQR2V1000

– SEL of 124 MeV-cm2/mg• Capacity of 1 M gates• 328 Signal I/Os

Page 6: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 6 MAPLD 2005 / A208Ng

Components Selection (2/3)

• EEPROM– Xilinx XQR18V04

• Total ionizing dose of 10 krad (Si) (data sheet)– 30 krad (Si) for read only (data sheet)

• SEL of 120 MeV-mg/cm2 (data sheet)• SEU of 120 MeV-mg/cm2 (data sheet)

• SDRAM– Elpida EDS5108ABTA (512Mb)

• Total ionizing dose of 50 krad (Si)• SEL of 80 MeV-mg/cm2 at 85o C, 100o C, 125o C• SEU

– Bit error rate of 6.96E-12 errors/bit-day– SEFI error rate of 1.3E-4 errors/device-day

• Linear Regulator– Texas Instrument TPS75715 (1.5V LDO regulator)

• Total ionizing dose of 10 krad (Si)

• SEL of 60 MeV-cm2/mg

Page 7: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 7 MAPLD 2005 / A208Ng

Components Selection (3/3)

• Current limiters– Maxim-IC MAX893L (1.2A) , MAX891L (0.5A)

• Total ionizing dose SEL of 30 krad (Si)

• Power-On-Reset circuit– Maxim-IC MAX803

• Total ionizing dose of 20 krad (Si)

• Stacking technology– Provided by 3D Plus

Page 8: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 8 MAPLD 2005 / A208Ng

Radiation Mitigation

• Total ionizing dose– Local shielding

– Package shielding, thickness depend on requirement

• SEL– Current limiting device

• SEU– Memory contents

• TMR, EDAC

– FPGA SEU• Configuration scrubbing, TMR

• SEFI– Best effort to minimize the SEFI rate

– Mitigate at higher level

Page 9: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 9 MAPLD 2005 / A208Ng

Block Diagram

Page 10: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 10 MAPLD 2005 / A208Ng

FPGA SEU Mitigation (1/5)

• Input– Xilinx recommendation

• Use 3 pins per signal, connected on the board• Bus signals: use one pin per signal, add EDAC, save pins

– The sending side must generate EDAC check bits

• Pins can be used up quickly

– Implementation• Module Interface

– Use 3 pins per signal for address/controls

– Use 1 pin per signal for Din

• EDAC is optional

• Single point failure rate increases without EDAC

Page 11: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 11 MAPLD 2005 / A208Ng

FPGA SEU Mitigation (2/5)

• Output– Xilinx recommendation

• Use 3 pins per signal, connected on the board– Not glitch-free

– Signal integrity

• Bus signals: use one pin per signal, add EDAC, save pins– The receiving side must also implement EDAC

• Pins can be used up quickly

– Implementation• Module interface

– Use 3 pins per signal for controls

– Use 1 pin per signal for Dout

• EDAC is optional

• Single point failure rate increases without EDAC

Page 12: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 12 MAPLD 2005 / A208Ng

FPGA SEU Mitigation (3/5)

• Output– Implementation …

• SDRAM interface– Clock, Address

• 3 sets, equivalent signals are not connected together on the board,

• Each set drives two SDRAMs

– Controls

• 4 sets, equivalent signals are not connected together on the board

• Two of the sets, each drives two SDRAMs

• The other two sets, each drives one SDRAM

• Switch EDAC/TMR configured SDRAM

Page 13: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 13 MAPLD 2005 / A208Ng

FPGA SEU Mitigation (4/5)

• Bi-directional– Xilinx recommendation

• Use 1 pin per signal• Path from voter to the pin becomes possible single point failure

– Implementation• SDRAM Interface

– TMR configured SDRAMs

• 3 sets of data bus

– EDAC configure SDRAMs

• Use 1 pin per signal

Page 14: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 14 MAPLD 2005 / A208Ng

FPGA SEU Mitigation (5/5)

• Implication on data integrity of the SDRAM contents– EDAC configured SDRAMs

• 256 MB• Output drivers and input receivers are possible single point failure

– TMR configured SDRAMs• 128 MB• No single point failure• Back ground SDRAMs content scrubbing

Page 15: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 15 MAPLD 2005 / A208Ng

XTMR Tool (1/4)

• Fairly fast• Gates utilized

– Average utilization cost of TMR is ~3.2x

– RTIMS actual• 4.3x• Gates multiplier = 3 + 3 * (fraction of flops + fraction of I/Os)

– It is closer to 3x for design that is mostly gates

– It is closer to 6x for design that is mostly flops

– RTIMS actual: 36% flops• Additional multiplier for design with SRL16

Page 16: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 16 MAPLD 2005 / A208Ng

XTMR Tool (2/4)

• Internal performance degradation– Average performance impact of TMR is ~10%

– RTIMS actual• ~20%• 6 logic levels original

– Add a voter, 7 levels

– ~15% performance impact • Longer routing

– 3.8x gates

– ~5% performance impact

Page 17: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 17 MAPLD 2005 / A208Ng

XTMR Tool (3/4)

• I/O performance degradation• Input Pin

– TMR• Voters after the FF• Lock the FF in the IOB

– No TMR on input pin• 3 FFs after the input receiver• Can’t lock the FF in the IOB• Performance penalty• RTIMS actual: increased from 1.8 ns to 3.6 ns

Page 18: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 18 MAPLD 2005 / A208Ng

XTMR Tool (4/4)

• Output Pin– Triplicate pin, tied together on board

• Add Voter before the output driver• Glitch• Can’t lock the FF in the IOB• Performance penalty• Signal integrity

– Not triplicating pin• Add voter before the output driver• Glitch• Can’t lock the FF in the IOB• Performance penalty

– RTIMS actual: increased from 4.5 ns to 6.4 ns

Page 19: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 19 MAPLD 2005 / A208Ng

Storage state

• Correct SEU on storage state before the next SEU that make it uncorrectable

• Memory content– Scrubbing

• Flop state– Basic Xilinx flop: FDCPE(PRE, D, CE, C, CLR, Q)– Inputs of FLOP are corrected– Unless CE is active, the Flop state is not corrected.– 3 minority voters and 3 OR gates can be added to force a CE on error

detected– Expensive to apply this universally– For “almost” static flop, the following FLOP is used

Page 20: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 20 MAPLD 2005 / A208Ng

A few other things (1/4)

• Digital Clock Manger– Use 3 DCMs for each DCM that is in the original design

– DCM is a unit• SEU on a FLOP in the DCM

– Corrected by configuration scrubbing

– Reset only

– 3 counters, each counter is clocked by a DCM

– When one of the counter value is different from the other two, we know which DCM is operating differently than the others

– Each counter is TMR so that a SEU on the counter other than the clock path will not produce an error

Page 21: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 21 MAPLD 2005 / A208Ng

A few other things (2/4)

• Configuration scrubbing– Similar to Virtex

– Virtex II • Whole configuration is loaded with 1 type 2 command• The order of configuration loading is

– GCLK, CLB and IOB, Memory Content, and Memory Control

– Script to split the loading into three type 2 command• GCLK, CLB, IOB• Memory control• Memory content

– On power up the whole configuration is loaded

– On scrubbing, only GCLK, CLB, IOB, and memory control are loaded

Page 22: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 22 MAPLD 2005 / A208Ng

A few other things (3/4)

• Configuration scrubbing– Scrubber logic is TMR and it is part of the FPGA code

– Master SelectMap for configuration with configuration clock continue to run after initial load

– Scrubber logic is clocked by the configuration clock• The generation of the configuration clock becomes a possible single

point failure• Can switch to Slave SelectMap and add an external oscillator

Page 23: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 23 MAPLD 2005 / A208Ng

A few other things (4/4)

• SelectMap Interface SEFI detection– Implement a 16x1 distribute memory as SRL16 with initial value of

all zeros

– Instruct XTMR not to convert it to registers

– Write a signature into this memory prior to configuration scrubbing

– This memory shall be clear because of the reloading of the CLB during configuration scrubbing

– Read the memory content after configuration scrubbing

– A non-zero content indicates scrubbing failure

Page 24: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 24 MAPLD 2005 / A208Ng

Stack

SDRAM MISC

Page 25: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 25 MAPLD 2005 / A208Ng

Status

• 20 Modules– Related paper: "Radiation Tolerant and Intelligent Memory for Space"

(P1025)

– 144-Lead QFP package

– Dimensions:42.5mm x 42.5mm x 13.0 mm

– Mass: 70g with radiation shielding

– Power: ~4.0 W peak

– To Be Verified / Analyzed• Total Ionizing Dose > 100 krad (Si)• SEU in GEO less than 1.5E-6 per day• Latch-Up Immune to 60 MeV-cm2/mg

Page 26: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 26 MAPLD 2005 / A208Ng

Future Work

• VHDL and Place & Route– Works in progress

• Minimize SEFI• Error detection and recording• Error recovery

• What is the SEFI rate of RTIMS ?• Environment testing

– Life test (accelerated component life testing)

– 100 krad (Si) TID radiation tests

– SEL and SEU radiation tests

– Vacuum and temperature tests

– Mechanical stress tests

– Electrostatic discharge tests

Page 27: LaRC MAPLD 2005 / A208 Ng 1 Radiation Tolerant Intelligent Memory Stack (RTIMS) Tak-kwong Ng, Jeffrey Herath Electronics Systems Branch Systems Engineering

LaRC 27 MAPLD 2005 / A208Ng

Points to ponder

• XTMR– Not a turn key process

• Scrub memory content• Almost static flop• DCM failure detection and reset• Glitch-free output is no longer glitch-free• Signal integrity with dotted output

– IO• 3 pins for one signal, EDAC• Tie the triplicate IO together vs carry three signals on the board with the

voter implemented on the receiving side

– One size does not fit all