tima lab. research reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9....

14
ISSN 1292-862 TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet, 38000 Grenoble France

Upload: others

Post on 09-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

ISSN 1292-862

TIMA Lab. Research Reports

CNRS INPG UJF

TIMA Laboratory, 46 avenue Félix Viallet, 38000 Grenoble France

Page 2: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

Online Testing Embedded Systems: Adapting Automatic Control Techniques to Microelectronic Testing

Emmanuel SIMEU, Salvador MIR, Libor RUFER

TIMA Laboratory

46 Av. Félix Viallet 38031 Grenoble FRANCE

Abstract

This paper is aimed at exploiting Fault Detection and Isolation (FDI) techniques widely known in automatic control for solving online test problem in embedded Integrated Circuits (ICs). Before reaching this aim, we will briefly review the field of microelectronic testing, introducing basic concepts and technique. We will next introduce FDI model-based approaches and their application for online testing of embedded ICs considering linear systems with potential faults and disturbances. The parity relation-based residual is specially suitable for this type of application. As an example, we will apply it to concurrent fault detection in a digital embedded filter. The proposed scheme will then be illustrated for a linear digital pass-band elliptic filter.

1. Introduction Embedded systems are equipment made up of hardware and software incorporated in consumer products or other devices to perform some application specific functions. Generally, the product user is not even aware of the existence of these systems. From toys to medical devices, from ovens to automobiles, the range of products incorporating microprocessor-based embedded controlled systems has expanded rapidly since the introduction of the microprocessor in 1971. Embedded systems (we will use the term ‘systems’ most often used in this paper) promise previously impossible functions enhancing the performance of people or machines.

As these systems gain sophistication, manufacturers are using them in increasingly critical applications that can result in injury, economic loss, or unacceptable inconvenience when they do not perform as required. Embedded systems can contain a variety of computing devices, such as microcontrollers, application-specific integrated circuits, and digital and analogue signal processors. A key requirement is that these computing devices continuously respond to external events in real time. Makers of embedded systems take many measures to ensure safety and reliability throughout the lifetime of products incorporating the systems.

Modern microelectronic manufacturing technologies and software tools make it feasible to integrate tens of millions of transistors, or even a complex system, into a single chip (SoC) able to hold all the components and functions that historically required a hardware board. In addition, in order to reduce development time, designers increasingly embed pre-designed and pre-verified complex functional modules in their SoC designs. These reusable modules are known as cores. Moreover, each core can serve diverse scenarios, so are reusable in different designs. Designers can also integrate cores from different vendors. When a certain combination of cores becomes common, a system integrator or core provider can create a new core from that combination. Hence, today SoCs could become the cores of tomorrow in more complex SoCs. Cores can be very different in nature, including digital, analogue, and even radio-frequency cores on the same chip.

Test development for such large, core-based SoCs poses major challenges; test access is one of them. Typically, a core is deeply embedded in the SoC and due to its surrounding circuitry direct access from the SoC pins to the core terminals is not possible. In order to be able to test an embedded core as a stand-alone unit, it should be isolated from its surrounding circuitry and electrical test access needs to be provided (Rajsuman, 1999). Traditional constrains for IC design cost (expressed as silicon area),

Page 3: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

latency (expressed as the time required to generate an output), etc. must of necessity be completed with evaluation of testability and cost of testing. Algorithm-specific architectures seldom inherently exhibit those characteristics (such as regularity, ease of controllability and observability, possibility of partitioning, etc.) that facilitate testing. Adding design-for-testability features to a complete logic level design may become excessively costly in terms of both area and performance. In recent years, a number of authors have advocated introduction of test-related techniques since the first synthesis steps. In the same line, introduction of Built-In-Self Test (BIST) features allows us to achieve autonomous testing. BIST techniques correspond to an off-line testing philosophy, suitable for end-of-production testing or for periodic life-time testing. Strong reliability constraints have contributed to the evolution of highly reliable digital components (Nicolaidis, 1996). The new design philosophy, based on the hierarchical reuse of cores, requires system-level test architectures. For demanding applications, requiring high reliability of the embedded system and correctness of the results, some form of techniques for identifying faults during normal operation of the product must be introduced. That is, online testing techniques. Such solutions often exploit characteristics of the application (as specified by the algorithm implemented by the module) to achieve the required performance while limiting redundancy.

In automatic control area, the problem of fault detection and isolation (FDI) in dynamical systems has attracted a good deal of attention (Gertler, 1988, Chen et al., 1996, De Persis and Isidori, 2001). FDI deals with the generation of diagnostic signals sensitive to the occurrence of faults. Regarding a fault as an input acting on the system, a diagnostic signal must be able to detect its occurrence as well as to isolate this particular input from all other inputs (disturbances, other faults) affecting the system behaviour. One specific diagnostic signal (also called residual) must be generated per each fault to be detected, each diagnostic signal being sensitive only to one particular fault.

It seems attractive to adapt some of the results that are abundantly available in automatic control research to deal with the problem of online fault detection in electronic embedded systems. However, only some techniques will be applicable to electronic systems because the design and implementation constrains are very different in both research fields. In this paper, we will illustrate efficient online test architecture for digital embedded systems with respect to electronic design constrains. Our solution is built on the model-based parity space approach which, compared to other known FDI architectures, allows not only for efficient fault coverage but also for efficient implementation facilities.

The sequel of this paper is organized as follows. The next section provides an overview of research activities in Microelectronics testing. Section 3 introduce typical FDI approaches and discusses their applicability to online testing of embedded ICs section 4 studies in particular the parity space method applied to concurrent fault detection in a linear digital filter. Results are provided for a linear digital pass-band elliptic filter. Finally, conclusions are discussed in Section 5. 2. Integrated circuit testing The Semiconductors Industry has always recognized the importance and the challenges of testing ICs. These are produced in large volumes, with ever increasing levels of integration that put together in a single chip more and more transistors (in addition to other components such as resistors, capacitors or diodes). The development of methodologies and tools for generating and applying test stimuli (known often as test vectors) for detecting device malfunction, and locating faults or design errors, keeps on requiring a vast amount of research effort as semiconductor technologies evolve.

Test activities are generally carried out at different stages of an IC life cycle. Validation tests are already performed during the design of the product, ensuring that the target specifications are being met. For the first sample prototypes of an IC, characterization tests are performed in the laboratory and malfunctioning prototypes are studied in detail, with failure analysis and fault diagnosis tasks leading often to the redesign of the prototype. Successful characterization tests open the way for large volume fabrication and production testing. Chips are tested on the wafers before being diced and packaged.

Page 4: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

Packaged chips are tested again before being embarked on a board or shipped out for commercialization. In addition, chips are often tested in the field while running in a given application.

Economics has always been the driving force for test research. Finding problems as early as possible in the life cycle of an IC can save large amounts of time and money. Obviously, it is much costlier to detect a malfunctioning operation in the field that during manufacturing, characterization or validation tests. Furthermore, test costs rise rapidly as the level of integration regularly increases. As an order of magnitude, the cost of production testing today SoC devices embedding tens of millions of transistors fabricated with deep submicron technologies (e.g. 0.12 µm or 0.09 µm CMOS) can represent above 50 % of the overall device cost. These costs are split between a variable part that depends on the type of chip under test and the volume of fabrication, and a fixed part related to the cost of test equipment. Fig. 1 shows how test chip quality and variable costs are directly linked (Williams and Hawkins, 1990). The percentage of devices that pass the tester quality control is referred as the product yield. On one hand, among the failing devices some will actually be good devices. On the other hand, among the passing devices some will be actually bad devices (test escapes). In both cases, these are erroneous decisions due to the imprecision of the tests. The tighter the tests, the lower the yield and the lower the number of passing devices obtained. This has associated an economic impact and a quality level determined by the test escapes (measured normally in parts per million). Driving down these ever increasing testing costs has given rise to different test approaches that can be classified in several ways.

Figure. 1: Test quality and economic impact.

A first classification of test approaches depends on the way test vectors are generated, as shown in Fig. 2 (Sachdev and Soma, 1998). Classically, functional tests have been performed to verify that an IC meets the specifications (Fig. 2a). This is still the basic approach for Analogue and Mixed-Signal (AMS) chips that are relatively small. For digital circuits, as the functionality has become more and more complex, it has appeared practically impossible to exhaustively verify the functionality of a chip, in particular large digital components such as microprocessors.

(a) (b) Figure 2: IC test approaches according to the test vector generation: (a) functional testing and (b) structural testing.

Structural (or fault-based) testing has been largely adopted for digital circuits (Fig. 2b). In this approach, tests are directed to find out if faults exist in the circuit, rather than verifying the circuit

Tester

Devices to be tested

Pass

Fail

Good device : OK

Bad device : error

Bad device : OK

Good device : error

Customer defect level

Economicimpact

Functional testing

Test matches the functionality

Simple test vector generation

Test complexity increases exponentiallywith device size

Lengthy test time due to redundant tests

Tester

PassSpecs

FailSpecs

Specifications(DC, AC, …)

Test program

Pass

Fail

Devices to be

tested

Functional testing

Test matches the functionality

Simple test vector generation

Test complexity increases exponentiallywith device size

Lengthy test time due to redundant tests

Tester

PassSpecs

FailSpecs

Specifications(DC, AC, …)

Test program

Pass

Fail

Devices to be

tested

Structural testing

Tests targeting structural defects

Based on the use of fault models

CAD tools for test generation (ATPG, fault simulation)

Does not test exhaustively for functionality

Less expensive test

Tester

No faults

Faulty

Test program

Pass

Fail

Devices to be tested

Fault model

Fault simulation

Test vectors

Page 5: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

functionality. A comprehensive fault list is then necessary, which requires in-depth knowledge of potential defects and failure mechanisms and the mapping of these into fault models that describe the circuit misbehavior. Fault simulation and Automatic Test Pattern Generation (ATPG) are then used to derive the tests required for a given circuit to detect the expected faults (or, more precisely, to achieve a certain fault coverage). In general, structural tests are much harder to derive than functional tests but they lead to an optimized test set and minimal test time, thus effectively reducing test costs. For AMS circuits, functional testing is still preferred since it is difficult to provide adequate fault models and fault simulation tools to deal with the more complex analogue functions. As an intermediate approach between functional and structural testing, parametric tests are often applied to a circuit to measure a parameter of major concern.

Another way of classifying test approaches depends on the way the test vectors are applied to the device under test. Most typically, an external test equipment (or tester) downloads a programmed set of test vectors into a chip and uploads the responses of the chip to these input stimuli. The comparison with the expected responses allows a decision on the device performance. Testers are very expensive equipment since they must deal with very different chips, with a variable and often very large number of interconnecting pins of different bandwidths. For interfacing with the tester, the chip under test is plugged into a customized tester board (Device Interface Board). Due to the different nature of the signals, test equipment for AMS or RF chips is much more expensive.

Driving down the fixed test costs that arise from the use of expensive test equipment demands the incorporation of test features onto the chip. This is commonly known as Design-For-Test (DFT) since designers must be involved in including them. The most widely adopted DFT technique is defined by the IEEE 1149.1 Boundary Scan Standard (Maunder, 1993). This architecture, shown in Fig. 3, facilitates the application of test patterns and the reading of the test responses in digital chips. Scan cells are placed at each input or output pin and they are interconnected to form a shift-register between the TDI (Test Data In) and the TDO (Test Data Out) pins under the control of the test clock signal TCK and the test mode signal TMS. Thus, just a few pins are required to scan a test pattern into all chip inputs and to scan out the chip response. This is an example of a structured DFT technique since it has a general application. The technique has the advantage of simplifying the interface with the test equipment and facilitates also the testing of the interconnections between the chips of a board. An extension of this technique for AMS circuits is known as the IEEE 1149.4 Standard (Wilkins, 2000). In addition to structured approaches, other ad-hoc (circuit dependent) DFT techniques are often proposed in order to facilitate controllability and observability of deeply embedded nodes in the design.

Figure 3: DFT for digital circuits: the IEEE 1149.1 Boundary Scan Test Standard.

Further inclusion of test features ultimately leads to the chip being able to test itself. Three basic ways of achieving this exist, namely, concurrent, non-concurrent and semi-concurrent testing. In concurrent

Design-For-Test : IEEE 1149.1

Boundary Scan standard

Facilitates the testof digital chips

Test patterns are scanned in

Test responses are scanned out for verification

Page 6: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

testing, the chip is able to perform its normal operation while testing is under way. This requires some sort of functional redundancy for performing comparisons, like hardware or time redundancy (Lubazewski et al., 1995). In the other extreme, non-concurrent testing requires the circuit operation to be halted while testing is under way. In this approach, referred as Built-In-Self-Test (BIST), an on-chip test circuitry generates the required test vectors and analyzes the test responses, thus in fact incorporating on-chip the actual test equipment (Mir et al., 1995, L. Rufer et al., 2003). On one hand, the generation of test vectors can follow a deterministic or a pseudo-random approach. In the deterministic case, test vectors are known a priori (normally derived by means of an ATPG) and stored on chip. Alternatively, a non deterministic approach can be used in which pseudo-random test patterns are used to excite the circuit under test as shown in Fig. 4. These are normally generated by means of Linear-Feedback-Shift-Registers (LFSRs). On the other hand, the analysis of the responses normally requires the generation of a signature: the actual test responses are compressed to form a signature that is compared on-chip with the expected result. Between concurrent and non-concurrent testing, semi-concurrent approaches exist. These exploit the idle times of the circuit under test (or the idle times of its constitutive functional units).

Figure. 4 : Generic BIST architecture BIST is of course a very desirable feature. However, it has also its impact on costs due to increased silicon surface required to accommodate the BIST circuitry, possible penalties in circuit performance and difficulties of carrying out a fully comprehensive test. BIST techniques for digital cores of SoC devices are today getting wide acceptance. But BIST techniques for AMS parts are still in a very early state of development. This is a major concern since in today complex SoCs, AMS parts represent a small part (typically below 10 % of the chip). However, the costs of testing these parts can well represent over 90 % of the total test costs, given the need of using extremely expensive AMS test equipment with very lengthy test times.

In general, a specific DFT technique is used for each kind of block. But DFT becomes also necessary at the overall system level. Large SoC design usually includes different types of cores. Some of them are given as Intellectual Property (IP) for which only the overall core function and its interfaces are known to the SoC design team. The internal description remains confidential to the core provider who must also provide the information for testing it. For this reason, a new standard called IEEE P1500 is being developed as a core test description language (Zorian et al., 1999). In order to access the cores from the SoC pins, on-chip hardware infrastructure known as Test Access Mechanism (TAM) is used to transport test stimuli and test responses between the SoC and the embedded cores. In addition, each core is surrounded by a thin shell called a wrapper that provides the interface to other chip cores and the TAM. In this way, observability and controllability of the SoC cores is ensured, each core employing its own DFT technique. In fact, To cover all the operational fault types, test engineers often use two both modes of online testing: nonconcurrent and concurrent. Nonconcurrent testing is either event-triggered (sporadic) or time-triggered (periodic) and is characterized by low space and time redundancy. Even-triggered testing is initiated by key events or state changes such as start-up or shutdown, and its goal is to detect permanent faults. Nonconcurrent testing cannot detect transient or intermittent faults whose effects disappear quickly. Concurrent testing, on the other hand, continuously checks for errors due to such faults. However, concurrent testing is not particularly useful for diagnosing the source of errors, so test designers often combine it

Mux

Text generator

Test pattern

sequence

Response monitor

Test control

Circuit Under Test

(CUT)

Inputs Outputs

Error

Page 7: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

with diagnostic software. They may also combine concurrent and nonconcurrent testing to detect or diagnose complex faults of all types. 3. Online testing of embedded ICs with Model-based FDI techniques 3.1. Online testing of ICs

In developing an online BIST methodology for embedded systems, we must consider four primary parameters related to those listed earlier for online-testing techniques: i) error coverage: the fraction of model errors detected, usually expressed as a percentage. Critical

and highly available systems require very good error coverage to minimize the probability of system failure.

ii) error latency: the difference between the first time an error becomes active and the first time it is detected. Error latency depends on the time taken to perform a test and how often tests are executed. A related parameter is fault latency, the difference between the onset of the fault and its detection. Clearly, fault latency is greater than or equal to error latency, so when error latency is difficult to determine, test designers often consider fault latency instead.

iii) hardware overhead: the extra hardware needed for BIST. In most embedded systems, high hardware overhead is not acceptable. In any case, a hardware overhead equivalent to a duplication nominal system size should be avoided.

iv) performance penalty: the impact of BIST hardware on normal circuit performance, such as worst-case (critical) path delays. This includes the extra time needed for online testing. Overhead of this type is sometimes more important than hardware overhead.

The ideal online-testing scheme would have 100% error coverage, error latency of 1 clock cycle, no space redundancy, and no time redundancy. It would require no redesign of the CUT and impose no functional or structural restrictions on it. Most BIST methods meet some of these constraints without addressing others. Considering all four parameters in the design of an online-testing scheme may create conflicting goals. High coverage requires high error latency, space redundancy, and/or time redundancy. Schemes with immediate detection (error latency equalling 1) minimize time redundancy but require more hardware.

In a typical automatic control FDI implementation, the residuals are usually computed using specific algorithms embedded in controller computers. In this case, only two of the four constraints required for online testing of ICs have to be taken into account in the FDI design techniques: error coverage and error latency. Hardware overhead and performance penalty constrains become pointless. FDI implementation does not require any additional hardware and do not generate any disturbance on the performances of the nominal system. These additional design constrains should be taken into consideration when applying FDI technique to embedded microelectronic devices. 3.2. Model-based FDI techniques

Previous fault detection techniques were restricted to check directly measurable variables for upward or downward transgression of fixed limits or trends (Abdelhay and Simeu, 2000). This technique could be automated by using a simple limit-value monitor. Various faults in the plant could then be recognised only when the controlled signal exceeds some predefined thresholds. For so called voting techniques, a fault occurrence is determined by a sensor when some redundant equipment allowing a comparison between two or more measures points out a possible mismatch. This technique (often referred as hardware redundancy) is a more powerful method, but it has the disadvantage of requiring a lot of expensive fault-detection devoted equipment.

In the last twenty years, the use of process computers has allowed the development of new methods based on advanced signal processing techniques (Frank, 1990, Chen et al., 1996)). In addition to model-free (MF) methods several model-based (MB) techniques for fault diagnosis have been

Page 8: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

proposed in the literature (Patton and Chen, 1991, Patton et al., 1989). A large class of MB diagnosis schemes relies on analytical models such as differential and difference equations. For example, linear discrete-time systems described by

x(k + 1) =Ax(k) + Bu(k) + Edd(k) + Ef f(k) (1) y(k) =Cx(k) + Du(k) + Fdd(k) + Fff(k) (2)

where x ∈ Rn denotes the state vector, u ∈ Rp the input vector, y ∈ Rm the output vector, d ∈ Rk the unknown disturbance vector, and f ∈ Rq the fault vector to be detected. A, B, C, D, Ed, Ef , Fd, and Ff are known constant matrices of appropriate dimensions. The first step to successful fault detection is residual generation. For this purpose, various model-based methods have been developed, including unknown input observers, parameter identification and parity relation approaches. 3.2.1. Unknown input observer

The unknown input observer approach is based on the design of a full-state observer given as follows:

)(t)xcK(y(t)(t)x̂A(t)x̂ )& −+= (3)

where nRx ∈ˆ is the estimated state, K is a matrix to be designed such that (t)x̂ asymptotically converges to x(t), when no fault and no disturbance are considered. The residual is designed using the output estimation error derived from the observer

( ) ( ) )()()(ˆ)()(ˆ)()( tfQFtdQFtxtxQCtxCtyQtr fd ++−=−= (4)

where Q is a r x m matrix and r is the size of the residual.

In the diagnosis schemes, a bank of observers may be used instead of a single observer-based scheme. Each observer residual is designed to be sensitive to a single fault while remaining insensitive to the other faults and disturbances. This approach requires a total implementation of the system model and will need more hardware overhead required than duplication. For this reason, observer-based FDI is not appropriate for adaptation to online testing of embedded ICs. 3.2.2. Parameter estimation method

The parameter estimation method makes of the fact that the faults of a dynamic system are reflected in a physical parameters as, for example, resistance, capacitance, inductance or friction, mass, viscosity, etc. The idea of the parameter estimation approach is to detect the faults via identification of the parameters of the mathematical model. Deviations of the process are evaluated or tracked through a recursive estimation of its parameters. Therefore, the available input-output data must be processed nonlinearly. The input signal must satisfy several conditions concerning process excitation with respect the dynamics and the nonlinearity which have to be estimated. The test can then be performed using a residual measure of form:

r aT

a= − −( $ $ ) ( $ $ )θ θ θ θ0 0 (5)

Where $θ0 and $θa denoted the estimates of nominal and actual parameters respectively. Finally, in any case, the residual is compared with a fixed threshold rmax and fault free hypothesis is accepted if r < rmax. The parameter estimation method requires online monitoring of the system parameters. Complex calculation is developed in the monitoring algorithm, including large matrix inversion or application of matrix inversion theorem in the case of recursive algorithm. The hardware overhead required is widely prohibitive for an application to online testing of ICs.

Page 9: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

3.2.3. Parity relation-based method

The basic principle of this method is as follows: As input of concurrent fault detection scheme, only available measurable signals must be used (input vector u and output vector y). Since the state vector x is not supposed to be directly measurable from the system under test, it is assumed unknown and must not be used in the concurrent fault detection scheme. Hence, the vector of state variables x(t) has to be eliminated from Equation (2). The goal is achieved by expressing the output vector y at the successive clock time (t+1, t+2, …, t+k) in terms of state vector x(t) and the input sequence, (u(t+1), u(t+2), …, u(t+k)). In the fault free operation, the k+1 successive expressions of the output vector are given by the following equations:

y(t)=Cx(t)+Du(t) y(t+1)=CAx(t)+CBu(t)+Du(t+1) y(t+2)=CA2x(t)+CABu(t)+CBu(t+1)+Du(t+2) … y(t+k)=CAkx(t)+CAk-1Bu(t)+CAk-2Bu(t+1)+…+CBu(t+k-1)+ Du(t+k).

This set of equations can be group and presented in the following condensed matrix form:

Y[k] (t) = OB[k]x(t)+H[k]U(t) (6)

where:

[ ] ;

)(:

)2()1(

)(

)(

+

++

=

kty

tyty

ty

tY k [ ] ;:

2

=

k

k

CA

CACAC

OB

[ ] ;

)(:

)2()1(

)(

)(

+

++

=

ktu

tutu

tu

tU k [ ] .

..0..:::::00:00..0

21

=

−− DCBBCABCAD

CBCABDCB

D

H

kk

k

The integer k will define the order of the residual detection scheme, i.e. the number of unit delay that will be needed for each signal connected in the residual generation scheme. If n is the order of the system, then OB[k] is the observability matrix of the system when k=n-1. Elimination of the state vector x(t) from Equation (6) gives a redundancy relation which is a linear combination of present and lagged values of input and output sequences. The residual associated with the redundancy relation equals zero if no failure occurs in the system. Elimination of unknown variables from Equation (6) can be achieved if there is a vector v orthogonal to OB[k] such that vT.OB[k] = 0. The subspace Pk of all the vectors orthogonal to OB[k] is defined by:

Pk = { v | vT.OB[k] = 0 } (7)

Pk is the parity space of order k, and any vector v in Pk is a parity vector. Equation (7) can be verified if the rows of matrix OB[k] are linearly dependent. In another way, Pk exists if the OB[k] matrix rank is lower than the number of its rows (p×(k+1)). Every parity vector v of the subspace Pk can be associated at any time (t) with a redundancy relation to generate a residual or a parity check r(t), like:

r(t)=vT[Y[k]-H[k]U[k]] (8)

Equation (8) defines a redundancy relation and in the fault free operation:

r(t)=vTOB[k]x(t) = 0 (9)

From the above discussion, it is clear that the problem of concurrent fault detection in LDSs using only available measurable variables can be reformulated as follows: find parity vectors v=[v1,…, vn]

Page 10: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

belonging to Pk such that Equation (8) can be verified; then, redundancy relations can be determined and so concurrent fault detection circuits can be constructed.

The implementation of the parity residual given by Equation (8) seems relatively simple. Moreover, as opposed to the automatic control processes on which any measurement is obtained using expensive sensors on few and far connections, a lot of measurement possibilities generally exist in electronic ICs. This enables the design of efficient parity residual of order 1 or 2 enabling the simplification of the implementation scheme. These advantages make the parity state approach more suitable for online testing of embedded systems than observer-based and parameter identification techniques (Simeu et al., 2001). The next section illustrates the application of the parity space method for fault detection in an embedded digital elliptic filter. 4. Application of parity based technique for concurrent testing To illustrate the parity space technique described above, the linear digital pass-band elliptic filter in Fig. 5 is used. This system has one external input u, one external output y0, two connectable internal signals y1, y2 and four state variables x0, …, x3. The state-space matrices for this system are:

−−−−−

−−−−

=

1221.01831.09565.01561.01831.00747.01561.07887.09565.01561.01847.01331.01561.07887.01331.03277.0

A ,

−−

=

0831.04198.0

0708.03578.0

B,

,10000100

3641.00208.03103.00177.0

=C .

00

1094.0

=D

4. 1. Concurrent residual design

The parameters of the fault detection circuit for this system have been computed using the procedures described above. These parameters are the following: Redundancy relation order: k = 1 Optimal parity vector: [ ]14.4471.442.4916.876.699.73 −−−=Τv .

[ ] [ ]41.593.61 −=ΤHv

Figure 5. Linear digital pass-band elliptic filter

M2

M6

M10

M14

M3

M7

M11

M15

M4

M8

M16

M19

M17

M25

-0.3277-0.1331

0.7887-0.1561

-0.1331-0.1847

0.15610.9565

-0.78870.1561

0.07470.1831

-0.1561-0.9565

-0.1831-0.1221

0.3578

-0.41980.0708

-0.0831

0.1094

0.0177

0.3103

0.0208

0.3641

u(t)

y0(t)

y1(t)

y2(t)

x0

x1

x2

x3

z-1

z-1

z-1

z-1

M21

M22

M23

M24

M20

M18

M1

M9

M5

M13

M12

Page 11: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

The optimal robust redundancy relation is:

[ ]

−−

−−−

= ΤΤ

)()1(

.

)()()(

)1()1()1(

.)( 1

2

1

0

2

1

0

tutu

Hv

tytytytytyty

vtr

r(t)={-73.99y0(t-1)-6.76y1(t-1)+8.16y2(t-1)+49.42y0(t)+4.71y1(t)-44.14y2(t)}- {-6.93u(t-1)+5.41u(t)}. The scheme of this circuit is given in Fig. 6.

Figure 6. Fault detection circuitry.

4.2. Experimental results The filter was simulated with the error (fault) detection circuit indicated in Fig. 6 and the error detection threshold was chosen (by simulation) to be the largest residual response value in the case of fault-free operation. Generally, faults are simulated using fault models. Fault models are means of describing the effects of faults that may cause an error in the output of the circuit under test. For complex microelectronic circuits, circuit level fault models (such as at transistor and gate level fault model) and functional fault models are very difficult to be applied. The fault models to be considered here and used to simulate faults are the behavioural, i.e. the faults are expressed by means of deviations from the nominal parameters included in the model describing the system under test. The filter to be simulated consists of constant-multipliers, adders and registers. The faults of a constant-multiplier can be modelled as deviations from its nominal constant value, so changing the nominal constant can simulate the presence of faults. Adder and register fault are assumed to be covered by faults on its input constant-multiplier. These faults can be then simulated using the constant-multiplier fault models. 4.2.1. Efficiency in robust concurrent detection To illustrate the efficiency of the proposed scheme to detect faults concurrently, the fault detection circuit in Fig. 6 was simulated with the filter circuit indicated in Fig. 5 under. A sinusoidal signal of frequency 30 kHz and amplitude 1v was sampled and applied into the system input, u. During the simulation, a deviation of +50% from the nominal value of constant- multiplier M24 was made. The calculation of the temporal residual response indicated in Fig. 7 shows how the residual changes immediately when the deviation (fault) occurs.

r(t)

System Under Test

u(t) y1(t) y2(t) y0(t)

M

z-1

+M

M

z-1

+M

M

z-1

+M

M

z-1

+M

+ + +

49.42 4.71 -44.14 5.41

8.16 -6.93-6.76-73.99

-

Page 12: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

Figure 7. Fault detection circuitry 4.2.2. Concurrent detector fault coverage

The filter was simulated for various frequencies of the input signal. The fault simulation corresponds to parametric deviations from their nominal value of the multiplier constant inputs. For each single fault, the corresponding residual response was evaluated.

For each multiplier constant, Table 1 gives the maximal deviation tolerated by the system behaviour and the minimal deviation detected by the detector output. The second column of this table represents (in percent) the minimal deviation of the corresponding constant-multiplier required to observe the error manifestation on the output of the system. Any deviation less than this value will not have a significantly observable effect on the system behaviour. So the system can be considered as fault-secure for any deviation less than this limit value. The third column represents the minimal parameter deviation detectable by the robust detector scheme implemented by the circuit of Fig. 6. The error detector output oversteps the threshold significantly when the corresponding value of deviation is simulated. The following results may be deduced from Table 1: 23 over 25 (92 %) of deviations are detected by the robust detector scheme before their effect is observable on the nominal system behaviour. This is not the case for M1 and M2.

Table 1: Robust detector fault coverage

Deviation of the constant value

Deviation of the constant value

Constant value of

multiplier Fault-secure maximal deviation

in %

Minimal detectable deviation

in %

Constant value of multiplier

Fault-secure maximal deviation

in %

Minimal detectable deviation

in % M1 0.5 0.8 M14 0.06 0.01 M2 2.5 2 M15 0.7 0.08 M3 1.5 0.5 M16 0.3 0.1 M4 1.8 1.5 M17 1.3 2 M5 0.6 0.2 M18 0.9 0.5 M6 0.3 0.1 M19 0.5 0.2 M7 0.5 0.2 M20 0.2 0.2 M8 0.1 0.02 M21 0.3 0.2 M9 0.1 0.06 M22 0.01 0.01 M10 0.7 0.3 M23 0.55 0.2 M11 0.9 0.8 M24 0.04 0.01 M12 0.6 0.3 M25 0.09 0.04 M13 0.15 0.08

Page 13: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

Fig. 8 presents the complete spectral plot of the corresponding residual responses for deviation value of +100% for each constant-multiplier. These simulation results show that all faults are detectable in a large band of operational frequency.

4.2.3. Efficiency of the robust scheme

Fig. 9 shows the efficiency of the robust fault detection scheme and its insensitivity to process noise. For simulation, a sinusoidal signal of frequency 30 kHz and amplitude 1v was sampled and applied into the system input. A digital noise signal of maximal amplitude 0.0015 was first injected into the filter circuit during a short period of time. Next, a fault inducing a small deviation of the constant of multiplier M25 was integrated. It is clear on the plot of the detector output signal that the robust residual scheme is sensitive to fault and insensitive to a noise.

a) b)

c) d) Figure 8. Spectral plot of the faulty detector response

Figure 9. Robustness of the detector response.

Robustness of residual in fault an in residual

Page 14: TIMA Lab. Research Reportstima.univ-grenoble-alpes.fr/publications/files/rr/ote... · 2004. 9. 30. · TIMA Lab. Research Reports CNRS INPG UJF TIMA Laboratory, 46 avenue Félix Viallet,

5. Conclusion This paper prospects the possibilities for adapting model-based FDI techniques that are developed in the area of automatic control for online testing of embedded electronic systems. A new low-overhead online testing technique is deduced from FDI parity based residual. This technique is illustrated for online concurrent testing of a digital embedded filter. The proposed scheme ensures a high sensitivity against faults while the robust fault detection circuit presents a negligible sensitivity towards system noise. The proposed technique is applicable to any type of linear digital system and the test circuitry required for the implementation of the concurrent fault detector is still very reasonable. References A. Abdelhay. and E. Simeu (2000) ‘‘Analytical Redundancy Based Approach for Concurrent Fault

Detection in Linear Digital Systems’’ 6th IEEE International OnLine Testing Workshop (IOLTW), July 2000.

J. Chen, R. J. Patton, and H. Y. Zhang, (1996) “Design of unknown input observers and robust fault detection filters,” Int. J. Control, vol. 63, no. 1, pp. 85–105, 1996.

C. De Persis and A. Isidori, (2001) A Geometric Approach to Nonlinear Fault Detection and Isolation, IEEE Trans on automatic control vol.. 46, n°. 6, JUNE 2001

P. M. Frank, (1990) Fault diagnosis in dynamic systems using analytical and knowledge-based redundancy—A survey, Automatica, vol. 26, pp. 459–474, 1990

P. M. Frank,(1996) “Analytical and qualitative model-based fault diagnosis—A survey and some new results,” Eur. J. Control, vol. 2, pp. 6–28, 1996.

J. J. Gertler,(1988) “Survey of model-based failure detection and isolation in complex plants,” IEEE Contr. Syst. Mag., vol. 3, pp. 3–11, 1988.

M. Lubaszewski., S. Mir, A. Rueda and J.L. Huertas (1995). Concurrent error detection in analogue and mixed-signal integrated circuits. IEEE 38th Midwest Symp. on Circuits and Systems, Rio de Janeiro, Brazil, August, pp. 1151-1156.

C. M Maunder (1993). Standard Test Access Port and Boundary-Scan Architecture. IEEE Std 1149.1-1993a, (Maunder. (Ed)), IEEE Standards Board, New York.

Mir S., M. Lubaszewski, V. Liberali and B. Courtois (1995). Built-in self-test approaches for analogue and mixed-signal integrated circuits. IEEE 38th Midwest Symp. on Circuits and Systems, Rio de Janeiro, Brazil, August, pp. 1145-1150.

M. Nicolaidis, (1996) “Theory of Transparent BIST for RAMs,” IEEE Trans. Computers, Vol. 45, No. 10, Oct. 1996, pp. 1141-1156.

R. J. Patton and J. Chen (1991) “Parity space approach to model-based fault diagnosis—A tutorial survey and some new results,” in Proc. IFAC/IMACS Symp. SAFEPROCESS, Baden–Baden, 1991.

R. J. Patton, P. M. Frank, and R. N. Clark, Eds.(1989), Fault Diagnosis in Dynamic Systems, Theory and Application. Englewood Cliffs, NJ: Prentice-Hall, 1989.

R. Rajsuman, (1999) “Testing a System-on-a-Chip with Embedded Microprocessor,” in Proc. Int. Test Conf., 1999, pp. 499–508.

L. Rufer, S. Mir, E. Simeu and C. Domingues (2003). On-chip testing of MEMS using pseudo-random test sequences. SPIE Symp. on Design, Test, Integration and Packaging of MEMS/ MOEMS, (DTIP'03), Cannes, France. Cannes, France, May 5-7.

M. Sachdev and M. Soma (1998). Deep sub-micron test challenges for mixed-signal ICs. VLSI Test Symp., Monterey, USA, (tutorial presentation).

E. Simeu, A. Abdelhay and M. A. Naal (2001). Robust Concurrent Self Test of Linear Digital Systems’’ Tenth Asian Test Symposium November 19-21, 2001, Rihga Royal Hotel Kyoto, Kyoto, Japan.

B. Wilkins (2000). IEEE Standard for a Mixed-Signal Test Bus. IEEE Std 1149.4-1999, (Wil-kins. (Ed)), IEEE Standards Board, New York.

R.H. Williams and C.F. Hawkins (1990). Errors in testing. Proceedings of the Int. Test Conference, Washington DC, USA, pp. 1018-1027.

Y. Zorian., E.J. Marinissen and S. Dey (1999). Testing embedded core-based system chips. IEEE Computer, vol. 32, n. 6, June, pp. 52-60.