Electronic PartsEngineering
Ramin RoostaJet Propulsion Laboratory
Office 514
Xilinx SRAM Based FPGA Testing, Testability, and Reliability Issues
New Electronic Technologies and Insertion intoFlight Programs Workshop
January 30- February 1, 2007 at NASA/GSFC in Greenbelt, MD
2-1-2007 Ramin Roosta
2
Electronic PartsEngineering
● FPGA Testing FPGA Test Goals
FPGA Testing Phases Why FPGA Testing and Testability Analysis is Difficult? FPGA testing approach requirements
● Virtex Product Test Flow ● Application-independent Testing
Problems of Application-independent Testing
● Application Dependent Testing Interconnect Testing
Configurable Logic Blocks (CLB) Testing The fault models in FPGA testing Related works
● Deep Submicron process and its effects on Testing● FPGA (Virtex-4 Power Reduction)
● FPGA Reliability Analysis/Concerns
● Conclusion
Table of Contents
2-1-2007 Ramin Roosta
3
Electronic PartsEngineeringFPGA Test Goals
● Fully verify all parameters and the functionality of all features and resources to ensure full compliance with the data sheet
● Product features are fully characterized across temperature and voltages, with key parameters measured to guarantee performance
2-1-2007 Ramin Roosta
4
Electronic PartsEngineeringFPGA Testing Phases
● Design verification phase: behavioral simulation, static timing simulation (analysis), post layout functional and timing simulation, back-annotated testing and prototyping testing. (Relies heavily on design automation tools, such as simulation, logic/physical synthesis, and place & route tools). How Accurate are these tools?
● Production phase test: includes screening tests such as; burn in test, functional test, fault coverage analysis, internal speed test, at speed test, external speed test including verifications of set up/hold and delay characteristics of the IC, IO level test, and finally analog parametric tests including gain, noise, delay, time constants, precision and margins.
2-1-2007 Ramin Roosta
5
Electronic PartsEngineeringWhy FPGA Testing and Testability Analysis is Difficult?
● To Xilinx the FPGA looks like an ASIC. To the consumers it is an FPGA. This distinction should be kept in mind when testing the device
●Today's FPGAs are practically “System on a Chip”, thus testing the chip thoroughly is a daunting task, especially without the benefit of DFT
● Re-programmability of Xilinx FPGAs should be used to make several (different) images to test specific resource(s) of the FPGA
2-1-2007 Ramin Roosta
6
Electronic PartsEngineeringFPGA Testing Approach
Requirements● Test methodology must be generic, uniform and
application independent
● Test methodology must be scalable and independent of array size
● Test methodology must be reusable and lend itself to automation
● Test methodology must be must have measurable test quality metrics
Source[1]
2-1-2007 Ramin Roosta
7
Electronic PartsEngineeringXilinx Test Flow
2-1-2007 Ramin Roosta
8
Electronic PartsEngineering
● I/O testing Opens and Shorts Icc and Leakage I/O Parametric
● Functional tests CLB Test
BRAM memory test Configuration memory test
● Router Driven Test Methods● Layout Driven Metal Test Methods● Speed tests
FPGA Functional Test Descriptions
2-1-2007 Ramin Roosta
9
Electronic PartsEngineeringFPGA Architecture (Virtex-II Fabric)
2-1-2007 Ramin Roosta
10
Electronic PartsEngineering
Every instance of LUT, SELRAM, Flip-Flops, TBUF, BRAM, DCM, Global Clocks, Carry, etc are tested
FPGA Architecture (Virtex-II Fabric)
2-1-2007 Ramin Roosta
11
Electronic PartsEngineeringVirtex-4 Architecture
2-1-2007 Ramin Roosta
12
Electronic PartsEngineeringTesting the Slice Using Serial Shift
Register
● Easy to understand and document
● Quick diagnostics of a failure
● Consistency across the array
● Few I/Os required for test environment
2-1-2007 Ramin Roosta
13
Electronic PartsEngineeringBlockRAM Memory
2-1-2007 Ramin Roosta
14
Electronic PartsEngineeringTesting Configuration Memory
● ReadbackProcess of reading back the contents of configuration memory
● Four Readback test patterns for Configuration Memory All Zeros for Stuck-At-1 All ones for Stuck-At-0 Checkerboard for Coupling (AND/OR) Inverted Checkerboard for Coupling (AND/OR)
2-1-2007 Ramin Roosta
15
Electronic PartsEngineering
● Address Fault (AF)Caused by defects in the address lines and address decoder
● Stuck-at Fault (SAF)The logic value of a stuck-at memory cell is always 0 or 1
● Transition Fault (TF)A faulty cell or line with a rising (falling) transition fault fails to undergo a 0-1 (1-0) transition when written
● Stuck Open Fault (SOF)The word line retains the previous value when certain cells are accessed (typically, an open in word line access transistors)
● Coupling Fault (CF)Shorts and crosstalk between memory cells or linesIdempotent (forces a cell), Inversion (flips a cell), or Bridging (AND/OR)
● Passive Neighborhood Pattern Sensitive Fault (PNPSF)The contents of a memory cell cannot be changed due to a certain neighborhood pattern
Memory Fault Models
2-1-2007 Ramin Roosta
16
Electronic PartsEngineeringRouter Driven Test Method
-Patterns routed using same software as customer (Xilinx PAR)-Most Favorable Interconnect routed first (not all interconnects routed equal)-Utilization of interconnect is <3% for a single design, customer or test-99% interconnect coverage when compared to customer design utilization
Test Pattern generation flow- Input routed design to PAR- Route the design using the PAR - Routing information enter DB -PAR references DB to route next design
2-1-2007 Ramin Roosta
17
Electronic PartsEngineeringConfigurable Logic
Block Tile
2-1-2007 Ramin Roosta
18
Electronic PartsEngineeringRouting Phases
2-1-2007 Ramin Roosta
19
Electronic PartsEngineeringFPGA Routing
Resources
● The nature and availability of “Routing Resources” ultimatelydictates the interconnect scenarios within the FPGA.
● Interconnects is a major impediment to theperformance/Power consumption -Wires consume power, threatening chip performance.
Main routing components are :Wire segmentSwitching MatrixLow-skew (clock)Low-skew distribution
IEEE Spectrum - June 2006
2-1-2007 Ramin Roosta
20
Electronic PartsEngineeringFPGA Test Coverage Claims (Xilinx)
● Features Coverage 100% of FPGA features are testedEvery instance of LUT RAM, Flip-Flops, Carry, Tbuf, BlockRAM, DCM, etc. are tested
● Interconnect CoverageOverall Interconnect Coverage is > 99.7%For Customer Designs, Coverage is > 99.9%Interconnect is SAF and TF coverage by pattern construction, structured (BIST) method
● Today’s (Xilinx) test program contains 1800+ test configurations● Zero customer returns related to missing interconnect coverage
CF coverage is in development
2-1-2007 Ramin Roosta
21
Electronic PartsEngineeringLayout Driven Metal Test (LDMT)
● Test Metal Lines based on the physical Layout
● Attach test logic to each lineProven to be very effective to detect metal short
2-1-2007 Ramin Roosta
22
Electronic PartsEngineeringThe Fault Models in FPGA Testing
● Bridging FaultA short between a group of signalsThe logic value of the shorted
1-dominant (OR bridge)0-dominant (AND bridge)Indeterminate
● Stuck-at FaultA fixed (0 or 1) value to a signal line in the circuit single stuck-at faults: Most popular form (classical fault model)
● Delay FaultThe fault by the combinational delay of a circuit to exceed clock period
2-1-2007 Ramin Roosta
23
Electronic PartsEngineeringThe Functional Defects
● Interconnect defectModeled by Bridging faults and/or stuck-at faults
● CLB defectA faulty CLB can be detected through “the functional test” of the CLB.
● IOB defectThe information exchange with other components in the system may not be possible or reliable
CLB: configurable logic block, IOB: input/output block
2-1-2007 Ramin Roosta
24
Electronic PartsEngineeringProblems of Application-independent Testing
● Low efficiency in detecting timing-related faultsIt is impossible to test even a small fraction of “all possible” interconnection patterns that may occur in the user-defined configurations
● The decreased yield of FPGA vendorsSome defected chips are used in some designsThe defected resources are not used by the designs
2-1-2007 Ramin Roosta
25
Electronic PartsEngineeringApplication (Specific) dependent Testing
● Only resources used by a specific configuration (Design) is tested
● Avoids the disadvantage of the application-independent FPGA testing
● Time (to test) saving ● The increased yield of FPGA vendors● Xilinx uses this approach
“A Dynamic Platform for Reliability and Environmental Test of Re-programmable Xilinx Virtex-II 3000 FPGA”; Ramin Roosta, Ph.D. et al, Electronic Parts Engineering, NASA/JPL (Sponsored By NASA Electronic Parts and Packaging Program (NEPP))
2-1-2007 Ramin Roosta
26
Electronic PartsEngineeringMulti-Configuration Strategy (MCS)
The MCS have three test configurations:
● Interconnect testing (2)
-All the (built-in) LUTs in the used CLBs are reconfigured to implement logic “AND” or logic “OR” functions (All-0/1 pattern test vectors at the PIs)
-All the flip-flops in the application need to be preset to value “1” or “0”
● CLB Testing (1)Reprogramming the interconnect network and make each used CLB controlled by the primary inputs (PIs)
MCS: Multi-configuration strategy, CLB: configurable logic block, LUT: Look-Up Table
2-1-2007 Ramin Roosta
27
Electronic PartsEngineeringGeneral Model of the Interconnect Test Configuration
AND AND
AND AND
Combinational Part
Flip Flops
Clock
1D C1
PIs POsall-1s
all-1s
2-1-2007 Ramin Roosta
28
Electronic PartsEngineeringAn Example of Interconnect Testing
L1
L7
L4
AND
L2
AND
L5
L3
L9
L6
L8
AND
AND
AND AND AND
AND
AND
1
1
1
1
1
1
1
1
1
1
1
LUT Flip-Flop
n1
2-1-2007 Ramin Roosta
29
Electronic PartsEngineeringOriginal Application Configuration
F1 F4
F2 F5 F7 F9
F6 F8
F3
2-1-2007 Ramin Roosta
30
Electronic PartsEngineeringModified Configuration
2-1-2007 Ramin Roosta
31
Electronic PartsEngineering
● Law of physics: leakage current increases as channel and gate oxide thickness decrease
Xilinx Triple-Oxide Technology (90nm)
● Two oxide thicknesses are commonly used- Thin oxide in the fast core logic- Thick oxide in the versatile I/O
● Virtex-4 adds a third medium thickness oxide to reduce leakage current without compromising performance
Deep Submicron process and its effects on Testing
2-1-2007 Ramin Roosta
32
Electronic PartsEngineering
Nanometer-scale CMOS technologies Challenges
● Modeling, simulation and verification of system components● Accurate prediction of timing and powerdissipation● Design robustness and fault tolerance in the presence of highly unpredictable device behavior● Some physical design issues such as floor planning and routing give rise to challenges in system timing and signal integrity
2-1-2007 Ramin Roosta
33
Electronic PartsEngineering Nanometer-scale CMOS technologies Issues
● Design size and complexity● Timing based on signal integrity and IR drop● IR Drop ● Crosstalk and Inductance● Electro-migration● Digital/Analog Integration● Power consumption● System signal transmission● Manufacturing rules● Yield optimization
2-1-2007 Ramin Roosta
34
Electronic PartsEngineering Deep Submicron process and its effects on Testing
● Increased variability (on chip)
●Decreased reliability
● Leakage Current, Power Consumption
●Loss of operating margin
● Junction Temperature and Thermal Issues (Thermal Run away)
● Signal Integrity Issues (caused by faster I/O) at Chip/Board Level
● Design Entry and Power prediction tools’ Accuracy
2-1-2007 Ramin Roosta
35
Electronic PartsEngineering
Static Power variation & Saving
● At 90nm process technology static power becomes the dominant power factor (I/O’s are drawing minimal power)- Some FPGAs offer a lower power mode feature that disables the I/O putting it into a sleep mode that further reduces static power
● Static Power from leakage increases exponentially with temperature-Proportional to voltage (0.3 VCCINT /1.2)-Increases exponentially due to source → Drain leakage -At 1.26V static power for VCCINT is ~20% higher than at 1.2V (Try to use VCCINT close to 1.2V) -Keep junction temperature as low as possible
● Static power scales linearly with part size -Use smallest part to reduce leakage (Lx60 has 40% less leakage power than LX100)
● Static power is increased with process variation [VT and gate length (2.5x)]
-Look at worst case and typical at a given temperature
2-1-2007 Ramin Roosta
36
Electronic PartsEngineering Dynamic Power Variation & Saving
Dynamic power consumption (and performance) is very sensitive to switched capacitance, (mainly routing capacitance in Xilinx FPGAs )-Dynamic Power = N*CV2f
N = Number of nodes switchingC = Capacitive loadV = Voltage swingf = Switching rate – Dynamic power varies linearly with frequency
● Tighten VCCINT and run at center of range or down to 5% below center (Reduces leakage by better than 10% over Run at 1.2 V vs. 1.26V)
● Run non-critical functions with a low speed clock (rather than an arbitrary high speed clock present in the design)
.
●
2-1-2007 Ramin Roosta
37
Electronic PartsEngineering Overall Power Minimization in Virtex-4
● Power Minimization fall into a few areas
- Static & Dynamic Power (Adjustment to operating environment) -Design Code Optimization -Interconnect transistors --Bump up performance target for XST router (maybe able to gain 5-10% power
improvement--Minimize path length (capacitance and power is lowered)--Minimize interconnect hops (capacitance and power is lowered)--Interconnect capacitance--Use a Relationally Placed Macros (RPM) or other placement method to guide tighter placement and help reduce routing length, especially on repeated macros
--Number of nodes switching into a capacitive load--Minimize logic levels, Try to pack logic (if possible)--Clocks Driving Loads (Use BUFGMUX), reduces switching at target flip-flops, a common practice in ASIC design
2-1-2007 Ramin Roosta
38
Electronic PartsEngineeringFPGA Reliability Analysis/Concerns
● Transient errors due to complexity and feature size reduction in FPGAs thru redundancy based techniques● The rate of degradation due to the accelerated aging phenomena is dependent on: Supply voltage, temperature, switching activity, and leakage currents● Impact of different aging phenomenon resulting in permanent failures of the FPGAs’ components/interconnect circuitry such as; TDDB (reduces as gate leakage increases), Impact on HCE (as function of switching activities) and EM (interconnect)
● Aging impact of TDDB, EM and HCE on Xilinx style (SRAM Based) FPGAs using a set of benchmarks show that a significant portion of the FPGA resources (LUTs) may fail in the first 3 to 5 years of operation (commercial) [8]
2-1-2007 Ramin Roosta
39
Electronic PartsEngineeringConclusion (FPGA Testing)
● Multi-Configuration Strategy (MCS) provides a simple way to perform the interconnect and CLB testing in the application-dependent testing
● FPGAs are really SOCs requiring some DFT to be built in● Use BIST for Memory (MBIST) Megacells (ROMs, RAMs, FIFO)● Imbedded Scan to improve manufacturability
● Iddq measurement as a substitute or complementing Burn-In
● Signal Integrity related issues will dominate
● Power consumption, Junction Temperature, Thermal Runaway● Some Design Code & Place/Route Optimization is required
●Leave plenty of Margin
2-1-2007 Ramin Roosta
40
Electronic PartsEngineeringReferences
[1] M. B. Tahoori, E. J. McCluskey, M. Renovell, P. Faure, “A Multi-Configuration Strategy for an Application Dependent Testing of FPGAs,” Proc. VLSI Test Symp., 2004.
[2] M. B. Tahoori, “Application-Dependent Testing of FPGA Interconnects,” Proc. Int’l Symp. On Defect and Fault Tolerance, 2003.
[3] C. Jordan, W. P. Marnane, “Incoming inspection of FPGAs”, Proc. European Test Conf. pp. 371-377, 1993.
[4] W. K. Huang, F. J. Meyer, X.-T. Chen, F. Lombardi, “Testing Configurable LUT-Based FPGAs,” IEEE Trans. on VLSI Systems, pp. 276-283, June 1998.
[5] M. Abramovici, C. Stroud, “BIST-Based Detection and Diagnosis of Multiple Faults in FPGAs,” Proc. of Int’l Test Conf., 2000.
[6] A. Krasniewski, “Application-Dependent Testing of FPGA Delay Faults,” Proc. 25th EUROMICRO Conf., vol. 1, pp. 260-267, 1999.
[7] Das, N. A. Touba, “A Low Cost Approach for Detecting, Locating, and Avoiding Interconnect Faults in FPGA-Based Reconfigurable Systems,: Proc. of Int’l Conf. On VLSI Design, 1999.
[8] S. Srinivasan, N. Vijaykrishnan,K. Sarpatvari, “ FLAW: FPGA Lifetime Awareness” ,DAC 2006, July 24-28, 2006, San Francisco,
[9] S. Mahapatra, V. R. Rao, B. Cheng, M. Khare, C. D. Parikh, J. C. S. Woo and J. M. Vasi. “Performance and hot-carrier reliability of 100 nm channel
length jet vapor deposited Si3N4 MNSFETs” IEEE Transactions on Electron Devices, vol.48, (no.4), April 2001. pp 679-84.
[10] S. M. Alam, C. L. Gan, D. E. Troxel, and C. V. Thompson “Circuit-Level Reliability Analysis of Cu Interconnects” In Proceedings of International Symposium on Quality Electronics Design (ISQED) , 2004.
[11] J. Srinivasan, S. V. Adve, P. Bose and J. A. Rivers, “The Impact of Technology Scaling on Lifetime Reliability ” In Proceedings of International Conference on Dependable Systems and Networks (DSN), 2004.
[12] X. Xuan, A. Chatterjee, and A. D. Singh “Local Redesign for Reliability of CMOS Digital Circuits Under Device Degradation” In proceedings of International Reliability
Physics Symposium (IRPS), 2004.
[13] F. N. Najm “Transition density, a stochastic measure of activity in digital circuits” In Proceedings of Annual ACM IEEE Design Automation Conference, 1991.
[14] J. H. Anderson, F. Najm, and T. Tuan. “Active leakage power optimization for FPGAs,” In Proceedings of ACM/SIGDA International Symposium on Field-programmable gate arrays, 2004.
[15] “Critical Reliability Challenges for the International Technology Roadmap for Semiconductors” In International Sematech Technology transfer 03024377A-TR, 2003.
[16] S. Srinivasan, A. Gayasen, N. VijayKrishnan and T. Tuan “Leakage control in FPGA routing fabric” In Proceedings of Asia-Pacific Design Automation Conference (ASPDAC) , 2005.