on the use of model checking for the verification of a dynamic signature monitoring approach

7
IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 52, NO. 5, OCTOBER 2005 1555 On the Use of Model Checking for the Verification of a Dynamic Signature Monitoring Approach Bogdan Nicolescu, Nicolas Gorse, Yvon Savaria, El Mostapha Aboulhamid, and Raoul Velazco Abstract—Consequences of transient faults represent a signif- icant problem for today’s electronic circuits and systems. As the probability of such errors increases, incorporation of error detec- tion and correction mechanisms is mandatory. It is well known that traditional techniques that validate system’s reliability do not cover the whole spectrum of fault scenarios, because fault models are linked to target architectures. Therefore, validating the com- pleteness of robust fault tolerance techniques is a major issue when assessing reliability improvements these techniques can produce. In this paper, we propose an original approach to evaluate the system reliability with respect to Single Event Upset (SEU) errors. It is based on model-checking principles. In addition, a signature analysis technique is evaluated. This technique was previously validated using a simulation-based fault injection approach. Sim- ulation results showed that no error escapes detection. However, simulation based fault injection cannot guarantee that all fault consequences have been investigated. This limitation motivates us to explore a formal verification approach that targets a complete validation. Model checking has a fundamental advantage over classic fault-injection techniques: it can cover all possible SEU fault scenarios from a predefined class. Results reported in this paper demonstrate the efficiency of this validation approach over usual simulation-based techniques. Index Terms—Fault injection, formal verification, model checking, signature analysis. I. INTRODUCTION C ONTINUAL scaling in VLSI technologies produced im- proved performance but led to an increased sensitivity to transient errors. These errors are often produced by energetic particles present in the environment (e.g., cosmic rays produced by sun activity). Transient errors often cause soft-errors that modify the content of memory cells. These errors are already a significant problem for present electronic circuits and systems. As scaling continues, it is also expected that future deep submi- cron circuits, operating at very high frequencies, will be subject to transient errors in combinational elements. This phenomenon already constitutes a significant source of errors for circuits op- erating in harsh environments. Moreover, it is a growing concern for digital equipments operating at ground level [1]. As the probability of transient faults occurrence increases, incorporation of error detection and correction mechanisms Manuscript received January 11, 2005; revised April 12, 2005p. B. Nicolescu and Y. Savaria are with the Ecole Polytechnique de Mon- tréal, Montréal, QC, H3C 3J7 Canada (e-mail: [email protected]; [email protected]). N. Gorse and E. M. Aboulhamid are with the Université de Montréal, Montréal, QC, H3C 3J7 Canada (e-mail: [email protected]; [email protected]). R. Velazco is with the TIMA Laboratory, Circuits Qualification, 38031, Grenoble, France (e-mail: [email protected]). Digital Object Identifier 10.1109/TNS.2005.855819 is mandatory and represents one of the major industry con- cerns. International Technology Roadmap for Semiconductor (ITRS) predicts that concurrent error detection and correction techniques will be an important challenge for VLSI circuits implemented with technologies with feature size below 90 nm [2]. Solutions to mitigate the effects of transient errors have been proposed in the past. They can be mainly classified in hard- ware and software techniques. Hardware approaches involve the transformation of integrated circuit design allowing the man- ufacturing of reliable circuits using commercial CMOS pro- cesses. Proposed design hardening solutions range from the use of Error Correction Codes (ECC) [3] and Triple Modular Re- dundancy (TMR) [4] to the development of hardened cell li- braries [5]. In another family of fault tolerance techniques that was proposed, redundancy is introduced by duplicating instruc- tions [6], [7]. Another category of software-based approaches relies on signature analysis techniques [8]–[12] that detect er- rors corrupting the program control flow. Evaluating system’s behavior in the presence of faults consti- tutes another significant problem for deep submicron circuits. Fault injection techniques have emerged as a key method for evaluating fault tolerant systems. An interesting fault injection approach has been proposed in [13], in which faults are in- jected through interrupts, available on most programmable de- vices. Another family of fault injection techniques is based on dedicated simulation tools. In [14], a fault injection method tar- geting a complex digital signal processor through its Instruction Set Simulator (ISS) is investigated. Other simulation-based fault injection techniques use built-in commands available in VHDL simulators. Representative works can be found in [15]–[17]. As a general drawback, simulation-based fault injection tech- niques tend to be very time consuming. Usually, complete ex- ploration of all fault scenarios is not feasible. Therefore, they cannot be used to investigate all fault consequences on systems behavior. Other reliability evaluation approaches not related to soft- errors use model checking [24]. Model checking is success- fully used to identify conceptual errors in design specifications. The main idea relies on verifying whether a desired property is satisfied for the model under consideration. Earlier works propose evaluating system dependability using model checking techniques [18], [19]. This article presents two main contributions addressing the issues mentioned above: the evaluation of the efficiency of the Dynamic Signature Monitoring (DSM) technique, initially presented in [20]; 0018-9499/$20.00 © 2005 IEEE

Upload: r

Post on 10-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the use of model checking for the verification of a dynamic signature monitoring approach

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 52, NO. 5, OCTOBER 2005 1555

On the Use of Model Checking for the Verification ofa Dynamic Signature Monitoring ApproachBogdan Nicolescu, Nicolas Gorse, Yvon Savaria, El Mostapha Aboulhamid, and Raoul Velazco

Abstract—Consequences of transient faults represent a signif-icant problem for today’s electronic circuits and systems. As theprobability of such errors increases, incorporation of error detec-tion and correction mechanisms is mandatory. It is well knownthat traditional techniques that validate system’s reliability do notcover the whole spectrum of fault scenarios, because fault modelsare linked to target architectures. Therefore, validating the com-pleteness of robust fault tolerance techniques is a major issue whenassessing reliability improvements these techniques can produce.In this paper, we propose an original approach to evaluate thesystem reliability with respect to Single Event Upset (SEU) errors.It is based on model-checking principles. In addition, a signatureanalysis technique is evaluated. This technique was previouslyvalidated using a simulation-based fault injection approach. Sim-ulation results showed that no error escapes detection. However,simulation based fault injection cannot guarantee that all faultconsequences have been investigated. This limitation motivates usto explore a formal verification approach that targets a completevalidation. Model checking has a fundamental advantage overclassic fault-injection techniques: it can cover all possible SEUfault scenarios from a predefined class. Results reported in thispaper demonstrate the efficiency of this validation approach overusual simulation-based techniques.

Index Terms—Fault injection, formal verification, modelchecking, signature analysis.

I. INTRODUCTION

CONTINUAL scaling in VLSI technologies produced im-proved performance but led to an increased sensitivity to

transient errors. These errors are often produced by energeticparticles present in the environment (e.g., cosmic rays producedby sun activity). Transient errors often cause soft-errors thatmodify the content of memory cells. These errors are already asignificant problem for present electronic circuits and systems.As scaling continues, it is also expected that future deep submi-cron circuits, operating at very high frequencies, will be subjectto transient errors in combinational elements. This phenomenonalready constitutes a significant source of errors for circuits op-erating in harsh environments. Moreover, it is a growing concernfor digital equipments operating at ground level [1].

As the probability of transient faults occurrence increases,incorporation of error detection and correction mechanisms

Manuscript received January 11, 2005; revised April 12, 2005p.B. Nicolescu and Y. Savaria are with the Ecole Polytechnique de Mon-

tréal, Montréal, QC, H3C 3J7 Canada (e-mail: [email protected];[email protected]).

N. Gorse and E. M. Aboulhamid are with the Université de Montréal,Montréal, QC, H3C 3J7 Canada (e-mail: [email protected];[email protected]).

R. Velazco is with the TIMA Laboratory, Circuits Qualification, 38031,Grenoble, France (e-mail: [email protected]).

Digital Object Identifier 10.1109/TNS.2005.855819

is mandatory and represents one of the major industry con-cerns. International Technology Roadmap for Semiconductor(ITRS) predicts that concurrent error detection and correctiontechniques will be an important challenge for VLSI circuitsimplemented with technologies with feature size below 90 nm[2].

Solutions to mitigate the effects of transient errors have beenproposed in the past. They can be mainly classified in hard-ware and software techniques. Hardware approaches involve thetransformation of integrated circuit design allowing the man-ufacturing of reliable circuits using commercial CMOS pro-cesses. Proposed design hardening solutions range from the useof Error Correction Codes (ECC) [3] and Triple Modular Re-dundancy (TMR) [4] to the development of hardened cell li-braries [5]. In another family of fault tolerance techniques thatwas proposed, redundancy is introduced by duplicating instruc-tions [6], [7]. Another category of software-based approachesrelies on signature analysis techniques [8]–[12] that detect er-rors corrupting the program control flow.

Evaluating system’s behavior in the presence of faults consti-tutes another significant problem for deep submicron circuits.Fault injection techniques have emerged as a key method forevaluating fault tolerant systems. An interesting fault injectionapproach has been proposed in [13], in which faults are in-jected through interrupts, available on most programmable de-vices. Another family of fault injection techniques is based ondedicated simulation tools. In [14], a fault injection method tar-geting a complex digital signal processor through its InstructionSet Simulator (ISS) is investigated. Other simulation-based faultinjection techniques use built-in commands available in VHDLsimulators. Representative works can be found in [15]–[17].

As a general drawback, simulation-based fault injection tech-niques tend to be very time consuming. Usually, complete ex-ploration of all fault scenarios is not feasible. Therefore, theycannot be used to investigate all fault consequences on systemsbehavior.

Other reliability evaluation approaches not related to soft-errors use model checking [24]. Model checking is success-fully used to identify conceptual errors in design specifications.The main idea relies on verifying whether a desired propertyis satisfied for the model under consideration. Earlier workspropose evaluating system dependability using model checkingtechniques [18], [19].

This article presents two main contributions addressing theissues mentioned above:

• the evaluation of the efficiency of the Dynamic SignatureMonitoring (DSM) technique, initially presented in [20];

0018-9499/$20.00 © 2005 IEEE

Page 2: On the use of model checking for the verification of a dynamic signature monitoring approach

1556 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 52, NO. 5, OCTOBER 2005

• the development of a new validation paradigm based onmodel checking principles.

DSM introduces new principles that permit detection of allillegal interblock transitions. This technique has been validatedby simulation-based fault injection experiments. However, sim-ulation-based fault injection techniques cannot cover all faultscenarios. Therefore, a new validation technique, relying onmodel checking, has been developed. Fundamental advantagesof model checking over simulation-based fault injection are thatmodel checking allows coverage of all fault scenarios, and ifsome property of the system is not satisfied, a counterexamplecan be obtained. Model checking validation formally proved thetightness of the DSM technique with respect to all execution sce-narios.

The paper is structured as follows. Section II describes princi-ples of the evaluated error detection technique. The conceptualframework of the proposed validation technique is depicted inSection III. Validation results are analyzed and discussed in Sec-tion IV. Finally, Section V presents our concluding remarks andproposed future work.

II. DYNAMIC SIGNATURE MONITORING TECHNIQUE

Faults resulting in illegal branches are grouped in two classesaccording to the location of the resulting branch:

• illegal intrablock branches—that are incorrect jumpswithin a block and they corrupt the program data seg-ment. In some cases, they may affect the program controlflow as well (e.g., nonexecution or re-execution of in-structions that compute a test condition, etc.);

• illegal interblock transitions—that are incorrect jumps toother blocks corrupting the program control flow.

As a general characteristic, signature analysis techniques aredesigned to detect illegal interblock transitions. Illegal intra-block transitions are out of scope for this class of error detectiontechniques. Note that illegal branches that do not cross the blockboundary are often detected using techniques that tolerate datasegment faults. Representative works can be found in [20]–[22].

The basic principle of classic signature analysis techniquesconsists in associating a unique reference signature to each basicblock of a program during the precompilation phase. During theprogram execution, an on-line signature (OS) is computed andcompared with a reference signature (RS).

Unlike classic approaches, DSM associates signatures tointerblock transitions. With the adopted signature monitoringstrategy, on-line signatures depend on identifiers of the sourceand destination blocks. In addition, control flow is checkedby comparing a signature computed on-line to a referencesignature computed at compile time and injected by the last ex-ecuted block prior to an interblock transition. Fig. 1 depicts thefundamental difference between DSM and classic approaches.

In addition to the on-line and reference signatures, some localcumulative signatures , and are introduced. Theyare precompiled integer numbers that are unique for each basicblock. In absence of error, condition expressed by (1) is satisfiedby the design. This means that every time control is transferredfrom a block to another, the sum of local signatures must be

Fig. 1. DSM versus classic approach.

Fig. 2. Examples of precompiled values for local cumulative signatures.

Fig. 3. Basic block with checking instructions.

equal to zero. Examples of precompiled values for local cumu-lative signatures are pictured in Fig. 2

(1)

where:

• , and are local cumulative precompiled sig-natures that belong to block ;

• , and whateverand are.

The locations where components of are combined ensurethat: 1) a source block transfers control to the first instruction ofthe destination block, and 2) the signature checking instructionsare correctly executed. At the end of each basic block, isadded to both on-line and reference signatures. Fig. 3 illustrateshow control transfer is checked.

Another significant difference from traditional techniques isthat the control flow is checked by branch prediction. With ourtechnique, each source block knows in advance the identity ofthe destination block. When the control is transferred to a des-tination node, the on-line signature is computed and compared

Page 3: On the use of model checking for the verification of a dynamic signature monitoring approach

NICOLESCU et al.: ON THE USE OF MODEL CHECKING 1557

Fig. 4. Checking control flow for a certain transition.

Fig. 5. Checking control flow for a conditional transition.

with the reference signature corresponding to the last branch ex-ecuted. In order to predict which basic block will be executed,three types of interblock transitions are identified:

• certain transition—a source block transfers the controlflow only to one destination block;

• conditional transition—a source block transfers the con-trol to one of two basic blocks. The test condition isreevaluated in order to predict which destination blockwill receive the control;

• current state dependent transition—a source block trans-fers the control to one of several destination blocks ac-cording to the current system state (e.g., returns from afunction). A test on a control variable (EO) is performedto predict the next destination block.

Fig. 4 illustrates how control flow is checked in the case ofcertain transitions.

In the case of a conditional transition, the test is reevaluatedbefore the branch is taken. This permit to predict which des-tination block will receive control. According to the value ofa test condition, the on-line signature will be compared to thereference signature corresponding to the branch that will be ex-ecuted. Fig. 5 illustrates how the control flow is checked. Notethat different values are assigned to RS on the true and falsebranches at run time based on values produced at compile time.

For a current state dependent transition, the source blocktransfers control to one of several destinations. In order to pre-dict which destination block is the right one, an execution ordervariable (EO) has been introduced. A test on EO is performed

Fig. 6. Checking control flow for a current state dependent.

to predict the next destination block. Fig. 6 illustrates how thetransfer control is checked. EO is already assigned when thesource node transfers control.

When the DSM technique is applied, some transformationsare introduced to the target program in order to check the controlflow. These transformations introduce an overhead in terms ofincreased code size and performances degradation.

In previous work [23] a method combining the DSM tech-nique with a software redundancy method was investigated andthe overhead it puts on a program was characterized. For theDSM technique, the average overhead is a factor of 1.7 on codesize and 1.5 on execution time.

III. VALIDATION THROUGH MODEL CHECKING TECHNIQUES

This section describes the conceptual framework of the vali-dation technique based on model checking principles. It aims atformally proving the capacity of the DSM technique to detecterrors over a general class of possible applications.

A. Conceptual Framework

In simulation-based fault injection techniques, fault modelsrely on the target architectures and on the assumed fault model.The class of observable errors is typically limited by the fea-tures of the architectures (i.e., organization of instructions inmemory).

Fig. 7(a) illustrates an example of error that is not observ-able under the single bit-flip fault model. Instructions A and Bare mutually sensitive in the sense that the program fails if [asshown on Fig. 7(a)] a Single Event Upset (SEU) produces anincorrect jump from one to the other. However, to produce anincorrect jump from physical address A to physical address B, aSEU error must satisfy the condition that the XOR distance be-tween the two considered memory locations differs by one bit.In the example of Fig. 7(a), the illegal jump is not permissible.The XOR distance between instructions A and B is two bits:

. In this example, the wayinstructions are organized in memory prevents this class of errorfrom being observable. Accordingly, the consequences of suchclass of faults cannot be modeled as a single bit-flip.

However, locations of instructions A and B can be shifted inphysical memory, as illustrated in Fig. 7(b). In that case, the

Page 4: On the use of model checking for the verification of a dynamic signature monitoring approach

1558 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 52, NO. 5, OCTOBER 2005

Fig. 7. (a) Example of unobservable error. (b) Example of observable error.

addresses of instructions A and B differ from only one bit:. In these circumstances, a SEU can

create an illegal jump between the considered instructions andthe incorrect jump can be modeled as a single bit-flip.

Model checking permits the formal description and valida-tion of a model. The proposed model-checking based approachallows exploring all scenarios where an illegal interblock tran-sition is performed. Our approach does not focus on causes ofSEU errors but on their consequences. Indeed, the execution ofan illegal instruction may be caused by a SEU occurring in thePC register, as well as in instruction register, memory, or evenin pipeline stages. Moreover, it may also be caused by multipleSEUs.

For example, a SEU whose physical effect is confined to asingle register would only change one bit in the 32-bit PC reg-ister, thus inducing only 32 illegal possible destinations. Obvi-ously, multiple SEUs or SEUs affecting multiple bits in a reg-ister will induce other possible illegal destinations. Focusing onconsequences instead of causes allows capturing all classes oferrors for the targeted fault model, since several causes can havethe same consequence as mentioned above.

B. Generic Model

Fig. 8 depicts the control graph of the considered generictarget program, which is a representative model over a generalclass of all possible applications. The objective was to verify thatthe DSM technique is able to detect all illegal interblock tran-sitions. Therefore, the target program must contain all types ofinterblock transitions (conditional, certain and state dependent).The adopted control graph meets this requirement; accordingly,the target program can be seen as a possible generic model.

The generic model comprises seven states and a set of 13 in-terblock transitions. Each state models a basic block that con-sists of a sequence of assembly-like instructions in which theDSM algorithm is implemented. Since the DSM targets errorsaffecting only the control flow (illegal interblock transitions),

Fig. 8. Considered generic model.

operations performed on the data segment are abstracted fromthe generic model. In terms of data processing, all basic blockscontain nothing else but a single NOP instruction. C1, C2, C3,and C4 are test conditions. Conditional variables are exhaus-tively modified by the model checking tool during program ex-ecution, thus permitting to use appropriate values in order toexplore all execution paths.

The consequence of a SEU does not necessarily involve amodification of the program flow. Indeed, SEUs affecting datathat is not used in conditional evaluations can definitely corruptdata without affecting the execution flow. For this reason, thevalidation we perform only focuses on SEUs resulting in illegalinterblock transitions. The consequence of a SEU is one of thefollowing:

• erroneous evaluation of a test condition, thus engenderinga mispredicted conditional branching;

• erroneous branching due to corruption of an immediateoperand of a branch instruction;

• address of the next instruction to be executed differs fromthe expected one. This engenders the execution of anotherinstruction.

C. Validation Principle

The basic principle of model checking consists in ensuringthat the model under validation satisfies a desired property. Theproperty, , characterizing the DSM technique is:

P1: If an error corrupts the control flow execution, it willultimately be detected.

All consequences of a SEU, mentioned in Section III-B, leadto the fact that the next instruction to be executed will differfrom the expected one. Hence, modeling SEUs consists in in-troducing an erroneous transition to the next instruction suchthat it differs from the expected one.

Let us consider a program as an oriented graph where nodesrepresent instructions and edges represent the links between in-structions. If an instruction is followed by an instruction ,then a directed edge will link node to node . Given the ini-tial graph representing the program, covering all consequencesof SEUs affecting the control flow consists in adding edges fromeach node to all nodes (including the source itself) such that 1)

Page 5: On the use of model checking for the verification of a dynamic signature monitoring approach

NICOLESCU et al.: ON THE USE OF MODEL CHECKING 1559

Fig. 9. Validation principle.

the resulting graph is complete, and 2) there is a difference be-tween regular and faulty edges, thus permitting to know whetheror not a SEU occurred.

As explained, the generic model, presented in Fig. 8, is imple-mented as a sequence of assembly-like instructions. SEUs’ con-sequences are implemented in the following manner: At the endof an instruction, once the next instruction has been set for ex-ecution, a special random-execution routine is called. This rou-tine modifies the address of the next instruction to be executed,setting this latter to a random value amongst all instructions’addresses. Since the new target address may be the same as theoriginal one, a comparison of both old and new addresses per-mits to know whether or not a SEU did occur. The program thencontinues its execution from the new determined address. Thismechanism ensures that all edges of the graph can be covered,thus permitting to consider all faulty paths. Implementation de-tails are discussed in Section IV-A.

Part of the SEUs introduced by the random-execution mech-anism does not affect the execution flow. Let us consider forinstance that the last instruction executed regards data not in-volved in test conditions. A SEU leading to the immediate rep-etition of this instruction will obviously lead to incorrect data,but will not corrupt the execution flow of the program.

Following the definition of property , validation mustfocus on consequences leading to a modification of the ex-ecution flow. Hence, we must not only identify that a SEUoccurred, but also that the flow was corrupted. The detection ofa corrupted execution flow is illustrated by Fig. 9.

The reference model, not subject to SEUs, runs in parallelwith an erroneous model subject to SEUs. When entering a newblock, each model communicates the ID of its current block tothe arbiter and waits for a signal to continue its execution. Thearbiter waits for the block ID of each model and compares them.If they have the same value, the execution flows of both modelsstill correspond. Signals are then sent to both models to indicatethat they can continue their respective execution. If the valuesdiffer, the arbiter sets a flag indicating the occurrence of an errorin the execution flow. From this point, a model checking toolwill verify whether or not the DSM technique satisfies property

.

IV. VALIDATION RESULTS

This section presents the experimental setup used in thispaper and the resulting validation results. The model checking

Fig. 10. First macro-instruction of first state.

tool used is SPIN [24]. SPIN uses the Promela high-levellanguage [25] to describe systems. It allows checking thelogical consistency of a system’s specification and can beused to prove whether or not this latter satisfies a given LinearTemporal Logic (LTL) [26] formula. Our experimental setupconsists of a system composed by the reference and erroneousmodels as well as the arbiter. The validation itself consists informalizing in LTL and using SPIN to verify whether ornot this property is satisfied by this system.

A. Implementation of the Framework Under Spin

Blocks of the generic model consist of a sequence of as-sembly-like instructions. Each block abstracts operations ondata segments with a single NOP instruction and implementsthe DSM detection mechanism.

This generic model is translated into two Promela models;reference and erroneous. Both models consist of the exact setof instructions from which they are derived. Each instruction isrepresented in Promela as a macro-instruction composed of aset of instructions containing the instruction itself and specificadditional instructions used for model-checking purpose. Addi-tional instructions implement:

• message sequence charts [24] generation, used forcounter-example scenarios illustration;

• arbiter synchronization, when the considered original in-struction is the first of a block;

• setting of some model checking tool control variables thatcorrespond to source and destination macro-instructions;

• a jump to the next macro-instruction or to the random-ex-ecution routine, depending whether the generated modelis the reference or the erroneous model.

Fig. 10 shows the Promela code for the first macro-instruc-tion of state 1. The text of the instruction itself is formatted inbold font. The remaining instructions correspond to additionalinstructions enumerated above.

Differences arise between the reference and erroneousmodels: macro-instructions specific to the DSM mechanism areabsent from the reference model. This considerably reduces theinterleaving of instructions’ execution. This results in a signifi-cant reduction of the computation time without modifying theexecution flow of reference model, which is compatible withthe purpose of the synchronizing arbiter.

In addition, the reference model is not subject to SEUs, and itrespects the execution flow by executing the macro-instructionsin the proper order. The erroneous model is subject to SEUs,thus at the end of a macro-instruction, a call to the random-execution routine determines the next address to be targeted.

The random-execution routine exhaustively chooses a desti-nation over all possible macro-instructions in the program, thus

Page 6: On the use of model checking for the verification of a dynamic signature monitoring approach

1560 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 52, NO. 5, OCTOBER 2005

implementing all possible consequences of a SEU on the con-sidered model. In an execution path, once a SEU occurred, nonewill occur anymore. The routine will thus go to the expected des-tination macro-instruction.

The translation into a Promela model consists in constructingthe meta-instructions by encapsulating each original instructionwith additional ones. The macro-instructions are executed atom-ically such that their internal instructions sequence is kept trans-parent from the execution’s point of view. Therefore, the re-sulting Promela model has the exact same behavior as the orig-inal assembly-like program. In addition, it enables explorationof all SEU consequences and formal validation of LTL proper-ties via model-checking.

The translation phase should not be seen as a limitation. Al-though it requires basic knowledge of assembly and Promela,this translation is straightforward as it simply consists in trans-lating instructions into macro-instructions containing specificpredetermined instructions as mentioned above. Promela is easyto learn and the translation time definitely remains a matter ofhours.

The translation phase can be automated by means of a lexicalanalyzer that would transform assembly programs into Promelamodel while proper scripts could generate the random-executionroutine. Such automation is currently under investigation forfurther automation of the current proposed validation method-ology. This improvement would definitely raise the proposed ap-proach to the same accessibility level as the classic fault-injec-tion techniques with respect to the implementation of the modelto test.

B. Formal Validation

The analyzed generic model implements the DSM techniquedescribed in Section III-A. The assembly implementation con-sists of 96 instructions. The Promela model thus consists of96 atomic macro-instructions composed of sequences of trans-parent instructions (NOPs). The random-execution routine per-mits to explore SEU consequences, i.e., jumps from any macro-instruction leading to any other. This results in 96 possible con-sequences to explore.

Let us consider an arbitrary SEU consequence from an in-struction to an instruction . If enumerating all SEUs conse-quences is possible based on the square product of the number ofinstructions, exploring all execution scenarios that correspondto all consequences cannot be performed in a reasonable timeusing simulation-based methods. Not only the number of pathsmay be extremely large but the existence of loops can lead toinfinite exploration [26].

Model-checking is not based on simulation paradigms.It aims at performing a mathematical verification in orderto determine whether or not a given model satisfies a givenproperty. Therefore, it copes with the simulation limitationswith respect to the exploration of all scenario paths. The SPINmodel-checker is based on LTL and on graph theory. Given aLTL formula and a FSM model, it is first possible to compute aspecial FSM representation of the formula. From this, a specificsynchronized product of the model’s FSM with the formula’s

FSM permits to determine whether or not the given formula issatisfied by the model. Although such computation is p-spacecomplete [27], it ensures that the result obtained is certain.

The validation task consists in formally verifying thatis satisfied for all execution paths. This property has been for-malized in LTL and exercised against the complete model. Thevalidation took about 6 min on a 1.5 GHz Centrino processorusing 1.3 GBytes of RAM. The number of bit-flip consequencesto explore is the square of the number of instructions in themodel. The model checking computation time relies on thenumber of consequences and scenarios that lead to and derivefrom those consequences. The increase of the number of sce-narios is moderate with respect to the increase of the numberof consequences. Hence, the model checking computation timeincreases at a polynomial rate of the addition of instructions tothe model.

To make a parallel with simulation-based techniques, the totalnumber of fault scenarios verified approximates 16 millions.Since instructions performed on the data segment are abstractedin the generic program, consequences of SEUs are modeled as:

• SEUs that change the control flow execution;• SEUs that affect the signature control instructions.

According to our formal validation, the number of faultscenarios where the DSM technique detects a SEU is approx-imately the same as the total number of fault scenarios. Only,a negligible fraction of the total number of fault scenarios rep-resents SEUs which modify neither the control flow executionnor the signature control instructions. These particular faultscenarios are encountered in signature assignment instructions,introduced by the DSM technique. Note that as these faults donot change the program execution flow, DSM is not designed tocatch them, and formal validation confirms that DSM satisfiesthe property .

V. CONCLUSION

In this paper we presented an original approach for reliabilityevaluation of the DSM technique. Simulation based validationshowed that no error escapes detection. However, a main draw-back of simulation-based fault injection techniques is the prac-tical impossibility to cover all fault scenarios. The major ad-vantage of model checking over simulation-based fault injectiontechniques motivated us to develop a new validation approach.

Based on generic models that are independent from the targetarchitecture, the proposed validation approach allows exploringall fault scenarios. In addition, it focuses on consequences in-stead of causes, thus permitting to capture all classes of errorsrelative to a targeted fault model.

As the computation of the synchronized product is p-spacecomplete, the amount of memory needed may be a major limita-tion for large models. However, the principle of testing error de-tection techniques on a generic model considerably reduces thesize of the model under validation, thus lowering the memoryusage to an affordable amount.

We are currently investigating the complete automation of theformal validation approach presented in this paper.

Page 7: On the use of model checking for the verification of a dynamic signature monitoring approach

NICOLESCU et al.: ON THE USE OF MODEL CHECKING 1561

REFERENCES

[1] E. Normand, “Single event effects in avionics,” IEEE Trans. Nucl. Sci.,vol. 43, no. 2, pp. 461–474, Apr. 1996.

[2] (2003) Design. International TechnologyRoadmap for Semiconductor. [Online]. Available:http://public.itrs.net/Files/2003ITRS/Home2003.htm

[3] N. A. Touba and E. J. McCluskey, “Logic synthesis of multilevel cir-cuits with concurrent error detection,” IEEE Trans. Comput.-Aided Des.Integr. Circuits Syst., vol. 16, no. 7, pp. 783–789, Jul. 1997.

[4] E. Dupont, M. Nicolaidis, and P. Rohr, “Embedded robustness ips fortransient-error-free ICs,” IEEE Des. Test Comput., pp. 56–70, May-Jun.2002.

[5] R. Velazco, D. Bessot, R. Eccofet, and S. Duzellier, “Two CMOSmemory cells suitable for the design of SEU tolerant VLSI circuits,”IEEE Trans. Nucl. Sci., vol. 6, no. 6, pp. 2229–2234, Dec. 1994.

[6] N. Oh, P. Shirvani, and E. J. McCluskey, “Error detection by duplicatedinstructions in super-scalar processors,” IEEE Trans. Reliab., vol. 51, no.1, pp. 63–75, Mar. 2002.

[7] P. Cheynet, B. Nicolescu, R. Velazco, M. Rebaudengo, M. S. Reorda,and M. Violante, “Experimentally evaluating an automatic approach forgenerating safety-critical software with respect to transient errors,” IEEETrans. Nucl. Sci., vol. 47, no. 6, pp. 2231–2236, Dec. 2000.

[8] N. Oh, P. P. Shirvani, and E. J. McCluskey, “Control-Flow checking bysoftware signatures,” IEEE Trans. Reliab., vol. 51, no. 1, pp. 111–122,Mar. 2002.

[9] G. Miremadi, J. T. J. Ohlsson, M. Rimen, and J. Karlsson, “Use of time,location and instruction signatures for control flow checking,” presentedat the DCCA-5 Int. Conf., 1995.

[10] S. S. Yau and F. C. Chen, “An approach to concurrent control flowchecking,” IEEE Trans. Softw. Eng., vol. SE-6, no. 2, pp. 126–137, Mar.1980.

[11] O. Goloubeva, M. Rebaudengo, M. Sonza-Reorda, and M. Violante,“Soft-error detection using control flow assertions,” in Proc. 18th IEEEInt. Symp. Defect and Fault Tolerant in VLSI Systems, Nov. 2003, pp.581–588.

[12] Z. Alkhalifa, V. S. S. Nair, N. Krishnamurthy, and J. A. Abraham,“Design and evaluation of system-level checks for on-line control flowerror detection,” IEEE Trans. Parallel Distrib. Syst., vol. 10, no. 6, pp.627–641, Jun. 1999.

[13] R. Velazco et al., “Predicting error rate for microprocessor-based digitalarchitectures through C.E.U. injection,” IEEE Trans. Nucl. Sci., vol. 47,no. 6, pp. 2405–2411, Dec. 2000.

[14] R. Velazco, A. Corominas, and P. Ferreyra, “Injecting bit flip faults bymeans of a purely software approach: A case studied,” in Proc. 17thIEEE Int. Symp. Defect and Fault Tolerance in VLSI Systems, Nov. 2002,pp. 108–116.

[15] D. Gil, R. Martinez, J. V. Busquets, J. C. Baraza, and P. J. Gil, “Faultinjection into VHDL models: Experimental validation of a fault tolerantmicrocomputer system,” in Proc. Dependable Computing EDCC-3, Sep.1999, pp. 191–208.

[16] B. Parrotta, M. Rebaudengo, M. S. Reorda, and M. Violante, “New tech-niques for accelerating fault injection in VHDL descriptions,” in Proc.IEEE Int. On-Line Test Workshop, Jul. 2000, pp. 61–66.

[17] E. Jenn, J. Arlat, M. Rimén, J. Ohlsson, and J. Karlsson, “Fault injectioninto VHDL models: The MEFISTO tool,” in Proc. Fault-Tolerant Com-puting, FTCS-24, Austin, TX, 1994, pp. 66–75.

[18] M. H. Chehely, M. Gasser, G. A. Huff, and K. Millen, “Verifying secu-rity,” ACM Comput. Surv., vol. 13, pp. 279–339, Sep. 1981.

[19] C. E. Landwehr, “Formal models for computer security,” ACM Comput.Surv., vol. 13, pp. 247–278, Sep. 1981.

[20] B. Nicolescu, Y. Savaria, and R. Velazco, “Software detection mecha-nisms providing full coverage against single bit-flip faults,” IEEE Trans.Nucl. Sci., vol. 51, no. 6, pp. 3510–3518, Dec. 2004.

[21] N. Oh, P. Shirvani, and E. J. McCluskey, “Error detection by duplicatedinstructions in super-scalar processors,” IEEE Trans. Reliab., vol. 51, no.1, pp. 63–75, Mar. 2002.

[22] J. G. Holm and P. Banerjee, “Low cost concurrent error detection ina VLIW architecture using replicated instructions,” in Proc. Int. Conf.Parallel Processing, 1992, pp. 192–195.

[23] B. Nicolescu, Y. Savaria, and R. Velazco, “Performance evaluation andfailure rate prediction for the soft implemented error detection tech-nique,” in Proc. 10th IEEE Int. On-Line Testing Symp., Madeira Island,Portugal, Jul. 2004, pp. 233–238.

[24] B. Bérar et al., Systems and Software Validation, Model-Checking Tech-niques and Tools. New York: Springer-Verlag, 2001.

[25] T. Kropf, Introduction to Formal Hardware Verification. New York:Springer-Verlag, 1999.

[26] G. J. Holzmann, The SPIN Model Checker—Primer and ReferenceManual. Reading, MA: Addison-Wesley, 2003.