scan chain hold-time violations: can they be tolerated?

12
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 6, JUNE 2009 815 Scan Chain Hold-Time Violations: Can They be Tolerated? Ozgur Sinanoglu and Philip Schremmer Abstract—Scan chain hold-time violations may occur due to manufacturing defects or to errors in timing closure process during the physical design stage. The latter type of violations prohibits the test of manufactured chips, leading to a zero yield, although these chips with scan hold-time violations may be per- fectly functional. In this paper, we propose a suite of techniques which enable the diagnosis and the tolerance of scan hold-time violations. The proposed diagnosis technique can be utilized for any scan chain hold-time violation in order to pinpoint, in minimal diagnosis application time, the cause of the violation. The proposed tolerance technique is more targeted towards violations that lead to systematic failure of parts; it enables the generation of test patterns to screen out the defective parts in the presence of scan hold-time violations, perfectly restoring the yield in a cost-ef- fective manner. The techniques that we propose are non-intrusive, as they utilize only basic scan capabilities, and thus impose no design changes. We also extend this discussion for fast-to-rise and fast-to-fall errors, intermittent scan hold-time violations, and functional hold-time violations. Index Terms—Diagnosis, hold-time violations, scan chain errors, tolerance. I. INTRODUCTION I N VERY complex or custom macro designs, automation in the generation of timing models does not exist. Timing models are an abstraction of the design indicating timing de- pendencies for the physical design (PD) flow. Rather, gener- ating timing models may be a manually intensive process that in- volves characterization across many corners (variables). There- fore, it is likely that a timing model may be invalid upon cre- ation. Also, due to tool capacity limitations especially for very large designs, typically PD and timing analysis must be com- pleted hierarchically with timing models of numerous cores. A timing model by itself has a slight inaccuracy and timing paths that intersect with multiple timing models will have a larger potential deviation. Consequently, in large hierarchical designs that consist of hard macros, the probability of invalid timing model usage increases. Thus, the risk of unknowingly closing chip timing in PD with invalid timing models does exist, poten- tially resulting in manufactured chips with hold-time violations. Furthermore, incorrect modeling of certain silicon charac- teristics, such as clock skew, or overlooking timing constraints Manuscript received December 08, 2007; revised February 21, 2008. First published May 02, 2009; current version published May 20, 2009. A preliminary version of this work was presented at the Design Automation and Test in Europe (DATE) Conference, Nice, France, April, 2007. O. Sinanoglu is with the Department of Mathematics and Computer Science, Kuwait University, Safat 13060, Kuwait (e-mail: [email protected]). P. Schremmer is with the QCT Department, Qualcomm, San Diego, CA 92121-1714 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TVLSI.2009.2013984 in test mode may lead to hold-time violations in this mode. These violations often times remain uncaught, as thorough verification processes, such as timing-annotated simulations, typically are too time consuming, and are thus dropped es- pecially during the later stages of the tape-out process due to fast approaching time to market deadlines. As a result, hold-time violations on scan paths may show up in manu- factured integrated circuits. Other reasons, such as true manufacturing defects, or process variations on a die, may also result in integrated circuits ex- hibiting scan path hold-time violations. These type of failures, however, manifest sporadically in manufactured chips, re- sulting in process yield degradations only. The aforementioned physical design errors, namely, the timing-related errors in the back-end design process, result in systematic IC fails on the tester, and thus require special attention, as catastrophic process yield ensues otherwise. An interesting observation about scan chain hold-time viola- tions is that they do not interfere with the circuit’s functional operation. The implication of these violations rather affects the test mode; the impact is the invalidation of the test procedure. Scan chain integrity tests are able to detect scan chain hold-time violations. Structural automatic test pattern generation (ATPG) patterns cannot be applied subsequently, if such violations are detected, as they too would fail on all the functional chips with scan chain hold-time violations. Even if a manufactured chip that has scan hold-time viola- tions is perfectly functional, it cannot be shipped out to cus- tomers due to the inability to apply a proper manufacturing test. In the case of systematic failures due to design errors, the out- come is a zero yield. Expensive solutions to handle this type of a problem consist of fibbing, namely, the application of focused ion beams, in order to change the material characteristics and fix the problem on the chip, or metal revision, which enable the manufacturing test. However, fibbing is a very costly process that requires dedicated equipment. Furthermore, many locations on the die are not feasible fibbing locations. Also, fibbing is a destructive process which requires thinning and removal of sil- icon. While it is a quite useful tool for debug and failure anal- ysis, it cannot be used for yield recovery. A more feasible solution is the control and observation of only the scan chains that are known to be error-free, namely, the masking out of the problematic scan chains. Such an approach enables test application; however, significant fault coverage loss may be suffered due to the loss of controllability and observ- ability, resulting in a poor test quality. This approach becomes even inapplicable, if the scan architecture consists of a response compactor with no masking capability. To compensate for the coverage loss due to scan chain masking, functional tests may be 1063-8210/$25.00 © 2009 IEEE

Upload: p

Post on 16-Mar-2017

216 views

Category:

Documents


0 download

TRANSCRIPT

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 6, JUNE 2009 815

Scan Chain Hold-Time Violations:Can They be Tolerated?

Ozgur Sinanoglu and Philip Schremmer

Abstract—Scan chain hold-time violations may occur due tomanufacturing defects or to errors in timing closure processduring the physical design stage. The latter type of violationsprohibits the test of manufactured chips, leading to a zero yield,although these chips with scan hold-time violations may be per-fectly functional. In this paper, we propose a suite of techniqueswhich enable the diagnosis and the tolerance of scan hold-timeviolations. The proposed diagnosis technique can be utilizedfor any scan chain hold-time violation in order to pinpoint, inminimal diagnosis application time, the cause of the violation. Theproposed tolerance technique is more targeted towards violationsthat lead to systematic failure of parts; it enables the generationof test patterns to screen out the defective parts in the presence ofscan hold-time violations, perfectly restoring the yield in a cost-ef-fective manner. The techniques that we propose are non-intrusive,as they utilize only basic scan capabilities, and thus impose nodesign changes. We also extend this discussion for fast-to-riseand fast-to-fall errors, intermittent scan hold-time violations, andfunctional hold-time violations.

Index Terms—Diagnosis, hold-time violations, scan chain errors,tolerance.

I. INTRODUCTION

I N VERY complex or custom macro designs, automationin the generation of timing models does not exist. Timing

models are an abstraction of the design indicating timing de-pendencies for the physical design (PD) flow. Rather, gener-ating timing models may be a manually intensive process that in-volves characterization across many corners (variables). There-fore, it is likely that a timing model may be invalid upon cre-ation. Also, due to tool capacity limitations especially for verylarge designs, typically PD and timing analysis must be com-pleted hierarchically with timing models of numerous cores. Atiming model by itself has a slight inaccuracy and timing pathsthat intersect with multiple timing models will have a largerpotential deviation. Consequently, in large hierarchical designsthat consist of hard macros, the probability of invalid timingmodel usage increases. Thus, the risk of unknowingly closingchip timing in PD with invalid timing models does exist, poten-tially resulting in manufactured chips with hold-time violations.

Furthermore, incorrect modeling of certain silicon charac-teristics, such as clock skew, or overlooking timing constraints

Manuscript received December 08, 2007; revised February 21, 2008. Firstpublished May 02, 2009; current version published May 20, 2009. A preliminaryversion of this work was presented at the Design Automation and Test in Europe(DATE) Conference, Nice, France, April, 2007.

O. Sinanoglu is with the Department of Mathematics and Computer Science,Kuwait University, Safat 13060, Kuwait (e-mail: [email protected]).

P. Schremmer is with the QCT Department, Qualcomm, San Diego, CA92121-1714 USA (e-mail: [email protected]).

Digital Object Identifier 10.1109/TVLSI.2009.2013984

in test mode may lead to hold-time violations in this mode.These violations often times remain uncaught, as thoroughverification processes, such as timing-annotated simulations,typically are too time consuming, and are thus dropped es-pecially during the later stages of the tape-out process dueto fast approaching time to market deadlines. As a result,hold-time violations on scan paths may show up in manu-factured integrated circuits.

Other reasons, such as true manufacturing defects, or processvariations on a die, may also result in integrated circuits ex-hibiting scan path hold-time violations. These type of failures,however, manifest sporadically in manufactured chips, re-sulting in process yield degradations only. The aforementionedphysical design errors, namely, the timing-related errors in theback-end design process, result in systematic IC fails on thetester, and thus require special attention, as catastrophic processyield ensues otherwise.

An interesting observation about scan chain hold-time viola-tions is that they do not interfere with the circuit’s functionaloperation. The implication of these violations rather affects thetest mode; the impact is the invalidation of the test procedure.Scan chain integrity tests are able to detect scan chain hold-timeviolations. Structural automatic test pattern generation (ATPG)patterns cannot be applied subsequently, if such violations aredetected, as they too would fail on all the functional chips withscan chain hold-time violations.

Even if a manufactured chip that has scan hold-time viola-tions is perfectly functional, it cannot be shipped out to cus-tomers due to the inability to apply a proper manufacturing test.In the case of systematic failures due to design errors, the out-come is a zero yield. Expensive solutions to handle this type ofa problem consist of fibbing, namely, the application of focusedion beams, in order to change the material characteristics andfix the problem on the chip, or metal revision, which enable themanufacturing test. However, fibbing is a very costly processthat requires dedicated equipment. Furthermore, many locationson the die are not feasible fibbing locations. Also, fibbing is adestructive process which requires thinning and removal of sil-icon. While it is a quite useful tool for debug and failure anal-ysis, it cannot be used for yield recovery.

A more feasible solution is the control and observation ofonly the scan chains that are known to be error-free, namely, themasking out of the problematic scan chains. Such an approachenables test application; however, significant fault coverage lossmay be suffered due to the loss of controllability and observ-ability, resulting in a poor test quality. This approach becomeseven inapplicable, if the scan architecture consists of a responsecompactor with no masking capability. To compensate for thecoverage loss due to scan chain masking, functional tests may be

1063-8210/$25.00 © 2009 IEEE

816 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 6, JUNE 2009

utilized also in production testing; coverage gain with functionaltests will be very small, however, as they are not as effective asstructural tests applied through scan chains.

In this paper, we propose a suite of techniques mainly for di-agnosing and tolerance of scan chain hold-time violations. Thediagnosis part of the proposed suite of techniques can be uti-lized for not only the hold-time violations that are induced bytiming-related design errors in the physical design phase, butfurthermore for the hold-time violations caused by manufac-turing defects and/or process variations as well. The particularfocus of the paper, however, is on hold-time violations that areinduced by timing-related design errors, and their tolerance inaddition to diagnosis. We provide cost-effective solutions thatenable the generation of valid test data for chips that has scanchain hold-time violations, paving the way for the capability toapply ATPG tests through these hold-time violating chains andto screen out chips that truly fail in mission mode. Slight degra-dations do occur in test quality, however, due to the controlla-bility and observability loss induced by the scan hold-time vi-olations. The methodology we propose helps identify the chipsthat are functional in mission mode, which can still be shippedout despite the scan hold-time violations in the scan chains. Theoriginal yield is perfectly restored without resorting to alterna-tive expensive methods, or to scan chain masking techniquesthat suffer from test quality loss. Maximal possible resolutionattained by the proposed diagnosis technique is crucial, sincethe overall goal is to accurately model the scan chain hold-timeviolations and to tolerate them during the structural testing ofthe chips.

For the violations induced by manufacturing defects, the di-agnosis part of the proposed suite of techniques can still be uti-lized in order to pinpoint the failures, and thus improve the man-ufacturing process. The diagnosis technique that we propose isbased on minimizing the diagnosis application time by gener-ating a minimum number of diagnostic patterns with a maximalresolution. Minimum diagnosis application time may becomecrucial depending on the number of chips identified to havehold-time violations, which can be quite significant especiallyduring the yield ramp-up phase wherein the process is quite im-mature. Importance of diagnosis time minimization has beenemphasized in [2] and [3] among others, wherein techniques forreducing diagnosis time in particular have been proposed.

Production test patterns can sometimes be utilized for diag-nostic purposes as well. In the case of scan chain hold-time vi-olations, however, diagnostics via the application of productiontest patterns is quite challenging, as the violations render theinserted and the delivered stimuli different. As a result, the cap-tured response differs from the expected response. Also duringthe shift-out process, captured response is further modified dueto the violations, resulting in a quite complex relationship be-tween the observed response and the expected response; it be-comes quite challenging to decipher this complex relationshipand pinpoint the location of the violations. In addition to thesediagnostic challenges, the production test set size may be muchlarger than that of a test set targeted towards differentiating theviolations, resulting in prolonged diagnosis application.

A preliminary version [1] of this work has been presentedearlier, wherein the discussion has been limited to the diagnosis

and tolerance of permanent hold-time violations. In this paper,we provide extensions in many directions. First, we provide animprovement for the diagnostic pattern generation process inorder to minimize the diagnosis application time, and the as-sociated costs. Second, we extend the discussion in our ear-lier work to also handle other timing-related violations, such asfast-to-rise and fast-to-fall errors in scan chains. For these typeof violations, we discuss how they can be diagnosed and toler-ated. Third, we provide a discussion on intermittent hold-timeviolations on scan paths. Intermittent hold-time violations canbe diagnosed by the proposed techniques, while their intermit-tent nature renders their tolerance possible only under certainassumptions. We thus provide an analysis regarding the use ofthe proposed technique in diagnosing intermittent faults, and ex-plicitly present a set of assumptions that must hold for toleratingthem. Finally, we further extend the analysis to hold-time vio-lations that impact the mission mode of the circuit, and coverdiagnosis techniques for handling these violations when theymanifest along with scan chain hold-time violations. In all thesetechniques, the existing scan capabilities are utilized rather thanany design changes or improvements.

To summarize, we provide the following suite of techniques:• a diagnosis technique capable of pinpointing hold-time vi-

olating scan cells in minimal diagnosis application time;• incorporating, in the form of a netlist modification, the im-

pact of the accurately identified hold-time violations, inorder to be able to generate hold-time tolerant test patternsand responses, which can be utilized in production test onchips with scan chain hold-time violations;

• extending the diagnosis, modeling and tolerance tech-niques for fast-to-rise and fast-to-fall errors in scan chains;

• extending the discussion for intermittent and functionalhold-time errors.

The remainder of this paper is organized as follows. InSection II, we provide a brief literature survey on scan chaindiagnosis. In Section III, we take a closer look into scan chainhold-time violations, and present an analysis that capturestheir manifestation behavior. Subsequently in Section IV, byusing the outcome of the analysis of Section III, we presentthe proposed diagnosis procedure. In Section V, we outlinehow the diagnosed hold-time violations can be modeled andtolerated. Section VI extends this discussion to other types oftiming-errors, namely, fast-to-rise and fast-to-fall errors in scanchains. We elaborate on the challenging aspect of intermittentscan hold-time violations in Section VII, while we cover inSection VIII the impact of functional hold-time violations onthe diagnosis process. Finally, in Section IX, the experimentalresults are presented.

II. PREVIOUS WORK

Significant amount of research work has been conducted inthe area of scan chain diagnosis. Most of the work in the lit-erature has focused on sporadic defects rather than systematicfailures however; defects that exhibit a stuck-at, transition, orhold-time violation behavior have constituted the underlyingfault model in most of the papers that have been published inthis area.

SINANOGLU AND SCHREMMER: SCAN CHAIN HOLD-TIME VIOLATIONS 817

Various techniques [4]–[7] have been based on improving thescan capabilities at the expense of increased area overhead so asto attain better scan chain diagnosis. Improved scan capabilitiesconsist of set/reset and toggle features added to scan cells byinserting multiplexers or XOR gates on the scan path. Area costincurred by these approaches limit their practicality.

Kundu proposes the utilization of functional path so as to jus-tify the scan cells to deterministic values [8], thus eliminatingthe ambiguity induced during scan-in operations in the defec-tive scan chain. Sequential ATPG techniques are incorporatedinto this technique.

A technique that generates diagnostic patterns in order to dif-ferentiate scan cell faults has been proposed in [9], [10]. In thesepatterns, a single hold-time error is sensitized into a scan cellby inserting a stimulus with a single transition into the corre-sponding chain. All the other problematic chains are filled inwith constant values. The error-free scan chains are filled inwith deterministically generated data that enables the propaga-tion of the hold-time error through functional logic into a cell inan error-free chain. As a single pattern needs to be generated foreach scan cell in all the faulty chains, the resulting size of thediagnostic pattern set may be too large. Furthermore, multiplehold-time errors within the same chain complicates the analysis,as the generation of diagnostic patterns for a hold-time errormay have to account for the other ones.

Simulation and scoring-based approaches [11]–[14] have alsobeen proposed for diagnosing scan chain failures. Based on acertain underlying fault model, individual faults are simulatedand scored by comparing against expected responses. Thesetechniques produce a range of scan cells as the suspect defectivescan cells, subsequent to extensive fault simulation.

In another technique proposed in [15], numerous functionalpattern sequences, with each sequence followed by scan-out op-erations, are applied to a chip. For every scan cell, the frequencyof 1-bit scanned-out is computed and plotted. A comparison ofthe plotted graph to the expected graph is conducted in pin-pointing the problem on the scan path.

A differential dictionary-based approach is utilized in [16],wherein an observation regarding adjacent cells producing sim-ilar scan-out response signatures is utilized in order to reducethe size of the dictionary. Diagnosis process becomes slightlycomplicated however, due to the need to reconstruct the requiredsignature out of differential ones.

In [17], the impact of scan hold-time errors is modeled, anda diagnosis procedure is described. The technique overlooksthe impact of the hold-time violations on the stimulus insertionprocess, however, resulting in an incomplete solution.

Our approach can be differentiated from the previously pro-posed techniques on many fronts. First, we aim at handling scanchain hold-time violations that result in systematic failures, inaddition to those induced by manufacturing defects. Thus, wepropose a suite of techniques wherein diagnosis constitutes onlythe first step; the subsequent step is the modeling and the toler-ance of the accurately identified hold-time violations, enablingthe generation of test patterns that can be utilized in produc-tion testing of the manufactured chips with scan hold-time vio-lations. To the best of our knowledge, this latter step (modelingand tolerance) of the proposed approach is completely novel.

Fig. 1. Scan chain hold-time violation.

Even the hold-time violation diagnosis step of the proposedapproach differs from the previously proposed techniques. Acareful analysis of the scan hold-time violation shows that theseerrors need not be sensitized via the insertion of special patternsas in [9], [10]. On the contrary, their diagnosis is simplified withstimuli that ensure their insensitization, as the scan-out of thecaptured responses may potentially sensitize them anyways. Amuch more compact set of diagnostic patterns with a perfectresolution can hence be generated based on this observation,resulting in a computationally efficient scheme that requires nomore than basic scan and ATPG capabilities.

III. UNDERSTANDING THE IMPACT OF SCAN CHAIN

HOLD-TIME VIOLATIONS

In order to cope with scan chain hold-time violations, theirimpact needs to be studied and understood. In simple terms, ahold-time violation occurs if the intended data input changesbefore the active clock edge hits the flip flop; this typically hap-pens due to clock skew. In the context of scan shift operations, ahold-time violation results in the violator scan cell acting as theshadow of its preceding scan cell. As a result, the scan chain be-comes effectively one bit shorter due to the hold-time violation.

A timing diagram is provided in Fig. 1 for a scan chain frag-ment of four positive-edge scan cells; the third scan cell fromthe left, namely, , is a hold-time violator. While the other threescan cells, namely, , and , latch their scan input on time,scan cell latches its scan data late. As a result, the scan datathat gets latched into also gets latched into erroneously;acts as a shadow of throughout the shift operations. The vio-lation reduces the effective length of the fragment by one bit; thescan data in scan cell reaches scan cell in two cycles dueto the hold-time violation, while this operation normally takesthree cycles in a scan chain fragment of four cells.

Scan chain hold-time violations have implications that per-turb both the stimulus insertion and the response collection pro-cesses.

A. Disparity Between Captured and Observed (Scanned-Out)Responses

During the capture cycle, the scan cells latch their D-input;consequently, the hold-time violations on the scan path have noimpact on the response capture operation. During the first shift

818 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 6, JUNE 2009

Fig. 2. Captured versus observed responses.

cycle subsequent to the capture cycle, the response bit capturedby the violator’s preceding cell gets overwritten upon beingshifted into the violator cell. Thus, in this cycle, the same re-sponse bit shifted into the violator’s preceding cell is also copiedinto the violator cell as well. This bit-skipping impact results ina disparity between the observed and the captured responses.As the effective length of the scan chain is shorter due to thescan hold-time violations, each response bit captured in a scanflop between the hold-time violating flop and the scan-in pin isscanned out one cycle earlier than expected. The response bitscaptured in the scan flops between the hold-time violating celland the scan-out pin, including the hold-time violating scan cell,are scanned out intact.

Fig. 2 illustrates a scan chain with five scan cells, with themiddle cell a hold-time violator. In this case, the second cap-tured response bit from the left , is skipped, and the leftmostcaptured response bit is observed one cycle earlier than ex-pected. The last bit observed, denoted by , is the leading bitof the subsequent stimulus.

The disparity between the captured and observed responsebits should be analyzed to drive the diagnosis process; specifi-cally, the skipped captured response bits can be identified, pin-pointing the scan cells that precede the violators. However, asthe hold-time violations on the scan path also impacts the stim-ulus insertion process, a disparity between expected and cap-tured responses exists also. The relation between the expectedand the observed responses is far more complicated than bit-skipping, consequently.

B. Disparity Between Inserted (Intended) andDelivered Stimuli

As a hold-time violator behaves as a shadow of its precedingscan cell, these two cells together act like a single-bit storageunit, shortening the scan chain length by one bit. Such an impactintroduces a disparity between the inserted (or intended) stim-ulus and the delivered stimulus. Upon the completion of the shiftcycles in a scan chain with a hold-time violation, the scan cellsbetween the violator and the scan-in pin all receive the intendedstimulus bits, while each one of the remaining cells receives thestimulus bit that is supposed to be delivered into their precedingcell. The resulting impact can be summarized as the falling outof the leading stimulus bit, and the repetition of the stimulus bit,which is delivered into the violator’s preceding cell, twice in thedelivered stimulus bit sequence.

Fig. 3. Inserted versus delivered stimuli.

Fig. 3 illustrates the stimulus insertion into a scan chain offive scan cells, with the middle cell a hold-time violator. In thiscase, the leading bit of the inserted stimulus vector , fallsoff the scan chain that becomes effectively four-bit long due tothe violation. Furthermore, the bit that the violator’s precedingcell receives is also delivered into the violator.

The disparity between the inserted and the delivered stimulireflects into a disparity between the expected and the capturedresponses. Thus, a simple bit-skipping impact can no longer ac-count for the relationship between the expected and the observedresponses.

IV. DIAGNOSIS OF SCAN CHAIN HOLD-TIME VIOLATIONS

This procedure is a four-stage process applied on the chipsthat have failed the scan chain integrity tests. The individualsteps are listed as follows.

• Identification of the number of hold-time violations ineach chain: The first stage of the proposed procedure aimsat verifying that the chip failure is indeed due to scan hold-time violation. A sequence of 0’s is inserted to the failingscan chains for as many cycles as the number of scan cells

in the chain in order to fill in the entire scan chainwith 0’s. Subsequently, all 1’s are inserted while the scanout pin is observed. In a chain with hold-time violatingcells, the first 1-bit should be observed during the thcycle; furthermore, a total of consecutive 1’s shouldbe observed. In order to rule out the possibility of stuck-atdefects, the same procedure is applied by replacing 0’s with1’s and vice versa. At the end of the first stage, not only isthe scan hold-time violation behavior of the chip verified,but furthermore, the number of such violations on everychain is identified.

• Generation and application of a minimal set of diagnosticpatterns: As many diagnostic patterns as it takes to attaina perfect diagnostic resolution should be applied in thisstage. The observed responses are stored for analysis.

• Pinpointing hold-time violator cells per pattern;• Intersection of candidate scan cell sets.The following sections provide a detailed explanation for

each of the last three steps above.

A. Generation of a Minimal Set of Diagnostic Patterns

While the diagnosis of the hold-time violating scan cells canbe driven by a comparison between the expected responses and

SINANOGLU AND SCHREMMER: SCAN CHAIN HOLD-TIME VIOLATIONS 819

the observed responses, this process is complicated by the factthat inserted stimuli differ from the delivered ones. Thus, stimulithat are immune to any scan path hold-time violation shouldbe used in this diagnosis process; a pattern that is immune tohold-time violations is one that is delivered intact despite anyhold-time violations. A stimulus of all identical bits (all 0’s or all1’s) is hold-time immune; the repetition of a bit in two scan cells,which are expected to receive identical bits anyways, does notcreate any discrepancy between the inserted and the deliveredvectors when a stimulus of all identical bits is utilized.

Even with inversions on the scan path, such a stimulus ishold-time immune. Insertion of all identical bits is effected byholding the scan-in pin at a constant logic value throughout theshift cycles. In the case of a chain with scan hold-time viola-tions, the resulting impact is the filling in of the chain earlierthan expected, yet with the expected values. If an inverter existsbetween a violator and its preceding cell, they receive comple-mentary values, which is the expected case in the absence ofa hold-time violation also. If no inverter exists between a vio-lator and its preceding cell, they always receive identical values,which is the expected case in the absence of a hold-time viola-tion. Consequently, the delivered vector will be identical to theintended stimulus. Thus, the application of hold-time immunepatterns results in identical expected and captured responses,simplifying the disparity between the expected and observed re-sponses to simple bit-skipping.

The number of all possible hold-time violation immune testvectors is determined by the number of chains with hold-timeviolations and the number of scan cells in the scan chains withno hold-time violations. A series of identical bit values needto be shifted into the scan chains with hold-time violations,while the scan cells in the remaining chains can be freely filledin with arbitrary values. For scan chains with of themhaving hold-time violations, and for a scan depth of , eachof the chains can be filled in with either all 0’s or all 1’s, re-sulting in possibilities. Furthermore, each of the scan cells( of them) on error-free scan chains can be filledin arbitrarily, resulting in possibilities. Consequently,

hold-time immune stimuli exist.Apparently, a minimum number of these should be selected inorder to minimize the diagnosis time.

A hold-time immune pattern delivers heightened resolutionof violations, if the corresponding captured response bits areas alternating as possible. On one extreme, a single hold-timeimmune pattern that leads to a captured response of “0101 01”or “1010 10” provides a perfect diagnostic resolution for ahold-time violation in any position in the scan chain.

It is quite unlikely that a single hold-time immune patternleads to a perfectly alternating captured response. Thus, a min-imal set of hold-time immune patterns that provides a perfectdiagnostic resolution should be identified. This way, the timeexpended on the tester is minimized for the diagnosis of thehold-time violating scan cells.

In order to attain perfect differentiation of any possible scanhold-time violation, every pair of consecutive captured responsebits should have complementary values for at least one hold-time immune pattern; complementary values in consecutive re-sponse bit positions help pinpoint violations, as the location of

Fig. 4. Forcing ATPG tool to generate a minimal set of hold-time immune pat-terns.

bit-skipping is easily identified in such a case. At the same time,we would like a minimal number of hold-time immune patternsto deliver such a capability.

We map this computationally hard problem to the ATPGproblem, wherein we rely on the test pattern compaction capa-bilities of the ATPG tool. We force the ATPG tool to generatea minimal set of hold-time immune patterns by providing thetool with a modified netlist and a small fault list. It should benoted that this modified netlist is solely utilized for the purposeof hold-time immune pattern generation.

We incorporate two sets of modifications into the netlist. First,we add new primary outputs driven by XOR gates; the lines thatdrive the D-input of every pair of consecutive scan cells areXORed together to feed a newly added primary output (PO) asin Fig. 4. Thus, the number of newly added POs (or XOR gates)equals one less than the number of scan cells in the hold-timeviolating scan chain; all of the other scan chains remain intact.Second, we connect the scan-in pin of the problematic chain tothe scan-in ports of all the scan cells on this chain.

The ATPG is provided with the modified netlist along with afault list that consists of only the stuck-at-0 faults on the newlyadded POs. In order to detect the stuck-at-0 fault on a newlyadded PO, the ATPG tool has to force the two inputs of theXOR gates to complementary values, thus capturing complemen-tary values in consecutive scan cells of the problematic chain.Furthermore, in doing so, the problematic chain will receive allidentical bits. As a result, the ATPG tool is forced to generatea set of hold-time immune patterns. If all the stuck-at-0 faultsare detected, these patterns ensure a perfect diagnostic resolu-tion. Furthermore, by switching on the compaction options ofthe ATPG tool, the size of this set of patterns can be minimized.

B. Expected-Observed Response Disparity Analysis

The generated hold-time immune patterns are applied to thechips and the observed responses are collected for diagnosis.

820 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 6, JUNE 2009

The skipped bits that are identified upon the comparison of ex-pected and observed response vectors (corresponding to im-mune patterns) correspond to the cells that precede the hold-time violators.

We provide two algorithms for identifying the set of hold-time violator scan cells. The first algorithm is a linear com-plexity algorithm that fails in certain corner cases, while thesecond algorithm is of quadratic complexity and can handle anycorner case. The first algorithm may pinpoint a wrong set of scancells as the violating cells in the case of consecutive hold-timeviolating scan cells. It should be noted, however, that this is ahighly unlikely case, as for two consecutive scan cells to be bothhold-time violating, the violation on the succeeding scan cellmust be a gross violation. The clock arrival to the succeedingscan cell should occur later than the data arrival from scan outport of the second preceding scan cell; only a clock skew that isgreater than the sum of two scan-out to scan-in path delays mayresult in such a scenario, which is quite unlikely.

In the first algorithm, a bit by bit comparison is effected be-tween the expected and the observed responses starting from thefirst bit that is shifted out. The first bit location wherein there isa mismatch hints the location of the potentially hold-time vi-olating scan cell; the scan cell corresponding to the mismatchis potentially the preceding cell of the violator. It is importantto note that there may be multiple scan cells that can accountfor the same hold-time violation; this happens when consecu-tive bits of the expected response are identical. In this case, theskipping of any one of these consecutive identical bits results inthe bit mismatch. All the associated cells are thus included inthe set of candidate hold-time violating scan cells.

The comparison continues with a single bit offset; eachexpected response bit is compared against the response bitscanned-out one cycle earlier. Every time a mismatch occurs,a hold-time violating scan cell is identified, and the offsetis increased by one. The algorithm terminates when all theexpected response bits are compared against the appropriateobserved response bits. The execution of this algorithm is illus-trated in Fig. 5; first, the right-most bits of the expected and thescanned-out responses are compared, which is a match in thisexample. Similarly, the comparison of the second right-mostbits results in a match as well. A mismatch occurs, however,when the third response bits are compared; starting from thatpoint on, expected response bits and scanned-out response bitsare compared with a single bit offset. The fourth right-mostexpected response bit matches the third right-most bit of thescanned-out response bit, while there is a mismatch betweenthe fifth right-most bit of the expected response and the fourthright-most scanned-out response bit. Since this is the second bitmismatch, the offset is incremented by one; thus, the left-mostbit of the expected response is compared against the thirdleft-most bit of the scanned-out response, which is a match.The algorithm terminates by reporting mismatches in the thirdand the fifth right-most bits of the expected response. As thethird and the second right-most bits of the expected responseare identical, the skipping of either bit accounts for the bitmismatch. Thus, either one of the second or the third right-mostcells must be the scan cell that precedes the first hold-timeviolating scan cell; the first hold-time violator should either

Fig. 5. Pinpointing hold-time violating cells.

be the first or the second scan cell from the right. As the fifthright-most bit in the expected response is different than thefourth right-most bit, the fifth scan cell is the only candidatepreceding scan cell; the second hold-time violator should bethe fourth scan cell, consequently.

While the computational complexity of this algorithm islinear in the number of scan cells, the algorithm may fail tohandle certain corner cases, wherein hold-time violations occurin consecutive scan cells. In such a case, the algorithm, in alllikelihood, will point to a number of hold-time violations thatis different than expected, in which case the malfunctioning ofthe algorithm will be detected and the second algorithm shouldbe utilized instead. For instance, if the expected response is“010101” and the scanned-out response of the chain that isknown to have two hold-time violations is “110101”, the firstalgorithm will fail to pinpoint two hold-time violating cells.Actually in this case, either the second and the third right-mostcells or the second and the third left-most cells are hold-timeviolating, which the first algorithm fails to identify.

The second proposed algorithm is based on a dynamicprogramming technique that is commonly known as thelongest common subsequence problem. The expected and thescanned-out responses are taken through this analysis so as topinpoint the hold-time violating scan cells. Its complexity isquadratic.

It should also be noted that both algorithms are capable ofhandling unknown bits in the expected responses. Bit compar-isons are still effected in the same manner as outlined above; abit match is assumed whenever an unknown bit in the expectedresponse is compared against a bit in the scanned-out response.

C. Candidate Set Refinement

Based on the response analysis outlined above, the applica-tion of every hold-time immune pattern pinpoints a number ofcandidate scan cells for each violator. Identical consecutive bitsin an expected response vector leads to multiple candidate cellsfor a single violator, as the skipping of any one of these identicalbits yields the same observed response.

To attain a perfect diagnostic resolution, the candidate setspointed by the application of each hold-time immune patternshould be intersected; the intersection operations must be ef-fected per each violator. A singleton set for each violator signi-fies perfect diagnostic resolution.

The intersection operations are illustrated in Fig. 6. In thisexample, as there are two patterns, and two hold-time violators,two intersection operations are effected, one for each hold-time

SINANOGLU AND SCHREMMER: SCAN CHAIN HOLD-TIME VIOLATIONS 821

Fig. 6. Identification of hold-time violating cells via set intersection operations.

violator. Both hold-time violators are identified with perfect di-agnostic resolution, as a single scan cell exists in the two finalcandidate hold-time violator sets.

V. TOLERANCE OF SCAN CHAIN HOLD-TIME VIOLATIONS

The accurate pinpointing of the hold-time violating scan cellsenables the tolerance of these violations in a production envi-ronment. Scan chain hold-time violations can be tolerated whenperfect diagnostic resolution is attained by the proposed diag-nosis procedure. The judicious generation of the hold-time im-mune patterns, which we have discussed in the previous sec-tion, helps achieve this goal. Once the violations are accuratelypinpointed by utilizing the techniques we have presented in theprevious section, the goal is the generation of a proper manu-facturing test set that, despite these violations, helps screen outdefective chips, which we present in this section.

The impact of pinpointed hold-time violations can be com-municated to the ATPG tool via the modification of the netlist,on which the ATPG tool is to be executed. Should the netlistmodifications capture the impact of hold-time violations onstimulus insertion and response collection, which we havediscussed in Section III, the ATPG tool provides a set of testpatterns and responses which will be valid in the presence ofthese violations. The resulting test data can thus be utilized inproduction. The only negative implication is a slight degrada-tion in fault coverage in testing the chips with scan hold-timeviolations, as controllability and observability is slightly ham-pered due to the scan hold-time violation.

The netlist modification should be in the form of leaving thescan-out pin of the violator’s preceding scan cell unconnected,and of shorting the scan-in pins of the violator and the precedingscan cells together as in Fig. 7; such a modification captures theimpact of the violation on test application. The same modifica-tion should be repeated for every identified hold-time violationto obtain a perfect match between the netlist and the manufac-tured silicon. This way, the ATPG tool becomes aware of the vi-olation, reporting an accurate fault coverage for the chips withscan chain hold-time violations and delivering test data that willbe valid for these chips.

ATPG tool executed on a netlist that is modified in the afore-mentioned manner produces test vectors and expected responsesthat will be identical to the scanned-in stimulus and scanned-out

Fig. 7. Modeling hold-time violations for ATPG.

Fig. 8. Impact of an FTR error.

responses in the presence of the associated scan chain hold-timeviolations.1

VI. EXTENSION TO OTHER ERROR TYPES

The proposed modeling, diagnosis, and tolerance techniquescan be extended to handle other types of errors in scan chains. Inthis section, we present an extension to single sided hold-timeerrors, namely, fast-to-rise (FTR) and fast-to-fall (FTF) errors.

A. Understanding the Impact of FTR and FTF Errors

FTR (FTF) errors manifest as hold-time violations only inthe form of flipping a 0 (1) to a 1 (0) in the problematic scancell. The resulting impact on both the delivered stimulus andthe observed response is the flipping of each 0-bit (1-bit), whichhas an immediate successor of a 1-bit (0-bit), and which passesthrough the FTR (FTF) scan cell.

Fig. 8 illustrates the impact of an FTR error on the stimulusinsertion and response collection process on a scan chain of fivescan cells, wherein the second rightmost cell is the FTR violator.As the second left-most bit of the response, which is a 0-bit thathas a successor of a 1-bit, passes through the violator cell, itflips to a 1 during the second shift cycle, resulting in a discrep-ancy between the captured and the observed responses. In thisexample, the stimulus is delivered intact however, as the only0-bit that passes through the violator has a successor of another0-bit.

B. Diagnosis of FTR and FTF Errors

A stimulus of all identical bits, which consists of neither risesnor falls, is also FTR-immune and FTF-immune. Thus, the in-sertion of all 0’s or all 1’s ensure the delivery of intended stim-

1The presence of inverters between the scan cells will break this equivalencerelationship; however, any commercial ATPG tool will account for these in-verters, still producing valid test data, as the netlist modifications perfectly rep-resent the impact of scan chain hold-time violations.

822 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 6, JUNE 2009

Fig. 9. Forcing ATPG tool to generate a minimal set of immune patterns forFTR and FTF errors.

ulus bits into problematic scan chains that contain FTR or FTFerrors. These patterns can be safely used in the diagnosis ofFTR and FTF errors, enabling the analysis of the discrepancybetween expected and observed responses.

In order to minimize the number of immune patterns that needto be applied to reach a perfect diagnostic resolution, they canbe judiciously generated, similar to the manner that they canbe for hold-time violations as presented in Section IV-A. Theobjective is again the capture of an alternating sequence of re-sponse bits, while this time, the requirement is more strict. Topinpoint the exact position of an FTR or an FTF, both alternatingsequences, “0101 01” and “1010 10” need to be captured inthe chain that has been identified as problematic. Thus, the im-mune pattern set generation is enforced by modifying the netlistin a slightly different manner than the one in Fig. 4; to enforcethe more strict requirement of capturing each of “01” and “10”in every adjacent pair of scan cells in at least one pattern, thenetlist is modified by inserting two implication gates per adja-cent pair of cells as in Fig. 9, rather than a single XOR gate. Bytargeting the stuck-at-0 fault on the newly added POs, the ATPGtool is forced to generate patterns for which adjacent scan cellscapture complementary values.

The expected and observed responses can be compared to pin-point the FTR or FTF scan cells in a chain that is known to beproblematic. The FTR violatorcell must residebetween the right-most flipped 0-bit and the leftmost intact 0-bit that has an imme-diate successorofa 1-bit; the candidate set excludes the rightmostflipped bit, while it includes the leftmost bit that has remained in-tact. Similarly, a range of candidate cells can also be defined forFTF errors by interchanging 0’s with 1’s, and vice versa.

The example in Fig. 10 illustrates the aforementioned re-sponse comparison analysis for an FTR violation. In this ex-ample, the right-most flipped bit is identified to be the one in bit

Fig. 10. Pinpointing FTR violator cells.

Fig. 11. Modeling FTR and FTF violations for ATPG.

position 5, while the left-most 0-bit that has remained intact de-spite a successor 1-bit is identified to be the one in bit position3. Thus, scan cell 3, scan cell 4, or both cells account for theFTR violator(s).

The application of each immune pattern thus pinpoints arange of scan cells; a subsequent intersection operation yieldsthe violator cell(s).

C. Tolerance of FTR and FTF Errors

Upon the application of the judiciously generated immunepatterns and the identification of the violator cell(s), the toler-ance of FTR and FTF errors can be effected in a manner similarto that of hold-time violations; ATPG patterns to be applied tothe chips with FTR and FTF errors, in order to identify the onesthat are functional in mission mode, can be generated on a mod-ified netlist. The modification to the netlist as in Fig. 11 reflectsthe behavior of an FTR or an FTF error. For an FTR violator, anOR gate is inserted on the scan path right on the scan-in input ofthe violator, while an AND gate is inserted for each FTF violator.The OR gate models the flipping of each passing 0-bit that has asuccessor of 1, while the AND gate models the flipping of eachpassing 1-bit that has a successor of 0. The patterns generatedon such a netlist can be utilized in the production test of chipswith scan chain errors in the form of FTR or FTF.

SINANOGLU AND SCHREMMER: SCAN CHAIN HOLD-TIME VIOLATIONS 823

VII. INTERMITTENT SCAN HOLD-TIME VIOLATIONS

An intermittent hold-time violation is a violation that maymanifest depending on the logical conditions imposed by theparticular test vector being applied, and on the environmentalconditions during the test, such as the voltage and the tempera-ture, in addition to many other factors. As a result, such a vio-lation may manifest for only a subset of the test patterns, or forsome parts but not for others. The unpredictability of intermit-tent hold-time violations renders the diagnosis and tolerance ofthese violations quite challenging. In this section, we briefly dis-cuss the changes in the application of the proposed techniquesto tackle these challenges, as well as the limitations that hamperthe applicability of the proposed techniques.

A. Diagnosis of Intermittent Scan Hold-Time Violations

The proposed diagnosis technique can still be applied with aslight modification. Upon the application of a hold-time immunepattern, the proposed response comparison analysis to pinpointthe skipped response bits is still effected in an identical manneras in the permanent hold-time violation case. The intersectionoperations should be carefully conducted, however, as the ab-sence of a bit-skipping impact for a violation should not prunethe candidate set identified for this violation thus far, as sucha behavior may still be consistent with the intermittency of theviolation, which may manifest for a test pattern, but not for an-other pattern.

Making no assumption regarding the intermittence of the vi-olation inevitably reflects into degraded diagnostic resolution.Lack of any assumption dictates that the response to every di-agnostic pattern is analyzed independently. Thus, every patternpoints to a set of violators, some of which are the true intermit-tent hold-time violations. No further pruning is enabled due tolack of any assumption.

For instance, violation characteristic variation even for thesame pattern can be observed across different chips due to issuessuch as process variation, or IR drop. Although the violationoccurs in the exact same location, they may manifest duringdifferent shift cycles for different chips. Such scenarios are quitechallenging to handle, as not only the violation location but alsothe particular shift cycle that this violation manifests need tobe analyzed to understand the exact nature of the violation ona per chip basis, adding one more dimension to the diagnosisproblem.

Unless an assumption is made for the intermittent viola-tion(s), the diagnostic resolution cannot be improved. Forinstance, the following assumption leads to further pruningof the candidate set identified for intermittent violation(s): anintermittent violation manifests in the same position, whenit manifests for a test pattern. If this “location-consistency”assumption holds, candidate set intersection operations can beeffected to improve diagnostic resolution.

Example: In a scan chain with five scan cells, the first hold-time immune pattern yields a candidate set of , no bit skip-ping occurs for the second pattern, and the third hold-time im-mune pattern yields a candidate set of . The intermittenthold-time violator cell is thus identified to be the scan cell 3 byintersecting the only two non-empty sets.

Thus, the proposed diagnosis technique can still be applied topinpoint intermittent violations. The diagnostic resolution de-pends on the accompanying set of intermittence assumptions.

B. Tolerance of Intermittent Scan Hold-Time Violations

The tolerance of the intermittent scan hold-time violationsbecomes even more challenging. The underlying reason istwofold. First, diagnostic resolution needs to be perfect to beable to tolerate a violation; in the case of intermittent faults,the diagnostic resolution depends on the assumptions regardingintermittency. Second, even if perfect resolution is attained forthe intermittent violation, whether the violation will manifestfor a test vector hinges on the conditions imposed by thattest vector, and may vary from one part to another, and froma set of conditions to another. A single modified netlist nolonger captures the behavior of a chip that contains intermittenthold-time violations.

Let us suppose that a single intermittent violation has beenaccurately identified (based on a certain set of assumptions).The two possibilities for this intermittent violation behavior aresimply the presence or the absence of this violation, which mayvary from one part to another and from one vector to another.Regardless of this intermittent behavior variation, under the lo-cation-consistency assumption that has lead to the pinpointingof the violation, two versions of the netlist capture both possi-bilities; one version is the original netlist, while the other oneis the netlist with the change that represents the bit-skippingimpact of the violation, as described in Section V. For a partand for a test vector on this part, one of the two netlists will bevalid. For this purpose, both netlists can be utilized to generateproduction test patterns by obtaining feedback from the tester.The patterns generated from both netlists are applied on a suffi-ciently large number of parts on the tester. The test patterns thatpass on the majority of the parts can be collected to form the testset. The parts which fail on the resulting test set are either defec-tive parts, or functional parts with an intermittency behavior thatdiffers from the majority of the parts (due to process variation,for instance). Depending on the extent of intermittency behaviorvariation among parts, the resulting yield deviates from the per-fect level; yet it is superior to the yield level obtained when allthe parts with intermittent violations are simply masked out.

If one accepts the aforementioned deviation from perfectyield, the following iterative procedure that aims at generatingthe minimal set of passing patterns with the highest possiblecoverage can be employed. One version of the netlist is used inATPG to generate a number of patterns, wherein this numberis pre-specified. The generated ATPG patterns can be classifiedas passing or failing by applying these patterns on the tester;the patterns that consistently fail are eliminated. The remainingpassing patterns can then be fault simulated to drop the detectedfaults from the fault list. For the remaining faults, ATPG isexecuted on the other version of the netlist until the samenumber of patterns are generated. This iterative procedure canbe executed by generating patterns on the two versions of thenetlist and filtering them on the tester in an alternating fashionuntil a satisfactory fault coverage is reached.

The iterative procedure outlined above applies to the casewherein there is a single intermittent hold-time violation. In the

824 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 6, JUNE 2009

case of multiple intermittent faults, the number of netlist ver-sions increase; for intermittent hold-time violations, the iter-ative procedure is to be applied on different netlist versions.For a large number of intermittent hold-time violations, whilethe proposed diagnosis technique can still be applied, the out-lined modeling scheme via multiple netlists may become im-practical.

The iterative procedure above can also be applied when mul-tiple sets of violation locations are identified. For each set ofviolation locations, a hold-time violation tolerant pattern set isgenerated, resulting in a test set that accounts for this set of vio-lation locations. The production test is effected by first applyingthe original test set; if a chip fails this test set, then the toleranttest sets are applied. If the chip passes the original test set orone of the tolerant tests, it is deemed functional. If the chip failsall the tests, then the chip is screened out. Also in this case, thenumber of distinct violation location sets determine the applica-bility of such an approach.

VIII. DISCUSSION ON FUNCTIONAL HOLD-TIME VIOLATIONS

In this section, we extend the analysis that we have outlinedso far to hold-time violations that occur on functional paths, im-pacting the operation of the chip in mission mode. In particular,we consider the case wherein both functional and scan hold-timeviolations occur. We only consider the diagnosis of functionalhold-time violations and not their tolerance.

Various techniques have been proposed on the test generationfor or diagnosis of hold-time errors in functional logic. In [18],hold-time errors are modeled as faults, for which test generationis performed. In [19] and [20], static timing analysis results areutilized in order to either generate diagnostic patterns [20] forfunctional hold-time violations, or do a logic back-trace analysisto pinpoint the source of the path containing the violation [19].

For a functional hold-time violation, it is quite straightfor-ward to locate the violator cell based on the expected and theobserved response mismatch; the position of the mismatch di-rectly pinpoints the violator cell. Further diagnosis is also pos-sible by locating the problematic combinational path throughwhich the functional hold-time violation manifests. The tech-niques described in [18]–[20] can be utilized for this purpose.

In the case where both functional and scan hold-time viola-tions occur, the diagnosis of these violations become slightlycomplicated, while the techniques outlined in this paper canstill be utilized with a slight modification. As the scan chainintegrity is not fulfilled, hold-time immune patterns should beutilized in the diagnosis of the mixture of functional and scanhold-time violations. The disparity between the expected andthe observed responses will be slightly more complicated thana simple bit-skipping impact; there will be additional bit mis-matches, due to the functional hold-time violations. For thisreason, the proposed response comparison analysis, which wehave presented in Section IV-B, should be slightly modified, inorder to account for the additional bit mismatches. Once the scanhold-time violators are identified, the techniques described in[18]–[20] can be utilized; the hold-time immune patterns mustbe used in the diagnosis process by these techniques however.

TABLE IJUDICIOUS GENERATION OF HOLD-TIME IMMUNE PATTERNS

IX. EXPERIMENTAL RESULTS

We have implemented the proposed scan chain hold-time vi-olation diagnosis tool and the netlist modification tool, bothin C programming language. We have executed these tools onISCAS89 benchmark circuits [21]. We have used ATALANTA[22] as the ATPG tool in our experiments. In all these experi-ments, we assume that the circuit has 10 internal scan chains.

In the generation of the hold-time immune patterns with per-fect diagnostic resolution, we assume that only one of the 10chains consists of hold-time violation(s). Thus, for each bench-mark circuit, we generate 10 sets of hold-time immune patterns.In Table I, we report the average size of the hold-time immunepattern sets for each benchmark and compare it against the pro-duction test pattern set size. It can be seen from the results thatonly a few judiciously generated hold-time immune patternssuffice to provide a perfect differentiation capability (perfect di-agnostic resolution of 1.0) for any scan hold-time violation in achain.

A comparison against another elegant hold-time violation di-agnosis technique [9] can be conducted. This diagnosis tech-nique, which is based on exciting the hold-time error in a par-ticular scan cell by inserting a single transition stimulus vectorto the corresponding chain, and on propagating the error throughcombinational logic into a scan cell on an error-free scan chain,attains near-perfect diagnostic resolutions. As each scan cell inthe problematic scan chain is handled individually in the diag-nostic pattern generation process, however, the number of di-agnostic patterns equals the number of scan cells in the prob-lematic scan chain. For , for instance, as the scan depthis 145, the number of diagnostic patterns expected in [9] is atleast 145; as a different set of benchmark circuits is utilized, wecannot provide an exact comparison, while the presented com-parison is optimistic in favor of the technique in [9]. The pro-posed approach simplifies the overall diagnosis process by ben-efiting from the observation that a proper captured response sen-sitizes the violation anyways. Thus, a compact diagnostic pat-tern set is generated towards this objective, leading to 9.1 diag-nostic patterns on the average for .

In Table II, we present the results of the proposed mod-eling and tolerance of hold-time violations. In this table, thefirst column denotes the circuit name, while the number ofhold-time violations injected to the circuit is provided in thesecond column; similar to the previous experiment, we assumethat only one of the 10 chains consists of all the randomlyinjected hold-time violation(s). In all the cases, the actual scancells with a hold-time violation are accurately identified byusing the hold-time immune patterns generated by the proposedtechnique. The third and the fourth columns show the ATPGresults for the original netlist, while the fifth and the sixth

SINANOGLU AND SCHREMMER: SCAN CHAIN HOLD-TIME VIOLATIONS 825

TABLE IIIMPACT OF HOLD-TIME VIOLATIONS

columns depict the ATPG results for the modified netlists thataccount for the impact of hold-time violations. ATPG run-timesfor the original and modified netlists are almost identical in allcases.

Perfect diagnostic resolution attained by the hold-time im-mune patterns enables the accurate modeling of these hold-timeviolations via netlist changes. While the fault coverage levelsdegrade slightly, the structural ATPG patterns generated on themodified netlist can be utilized in a production environment, en-abling the manufacturing test of chips with scan hold-time vio-lations.

To the best of our knowledge, the proposed technique isunique in its capability to tolerate scan hold-time violations;thus no comparisons against any previously proposed tech-nique can be provided. However, we can provide comparisonsagainst the scan chain masking approach, wherein the scanchains with hold-time violations are simply masked during testapplication. While a stimulus of all identical bits can be utilizedfor the problematic chains, this pessimistic approach deems theresponses collected from these chains as invalid, disabling theobservation of these chains. Thus, controllability is very poor,and observability is completely lost on the problematic scanchains; in these experiments, we gauge the impact of a singlehold-time violation, and thus a single failing chain. We providethe results in Table III, wherein the second column denotesthe fault coverage on the original design with no violationswhatsoever, the third column denotes the degraded coverage ofcomplete scan chain masking, and the fourth column denotescoverage attained due to the proposed tolerance technique.The loss of controllability and observability in the scan chainmasking approach reflects into sharp drops in fault coveragecompared to the original coverage and to that of the proposedtolerance approach, validating the expected quality degrada-tions of the pessimistic scan chain masking approach.

In the final set of experiments, the impact of intermittent scanhold-time violations is gauged. A single intermittent violation,which is presumably location-consistent, with a manifestationprobability of 0.5 is injected to a randomly chosen scan cell inthe benchmark circuit and the proposed diagnosis technique isapplied. In all the benchmark circuits, the intermittent violationis accurately diagnosed by utilizing the hold-time immune pat-

TABLE IIICOMPARISON OF THE FAULT COVERAGES: PROPOSED TOLERANCE TECHNIQUE

AGAINST COMPLETE SCAN CHAIN MASKING

terns; in other words, the diagnostic resolution is consistently aperfect 1.0 for all the cases. Subsequently, we construct anotherversion of the netlist, wherein we only modify a single scan cellmodelling the impact of the single injected intermittent fault,and we apply the proposed iterative ATPG procedure on thetwo versions (original and modified) of the netlist. Throughoutthe ATPG process, we switch from one version of the netlist tothe other one, when patterns are generated for the remainingfaults; is set to one tenth of the number of patterns generatedon the original netlist. We utilize a simple coin toss that mimicsthe pass/fail outcome of the generated test vector2 when appliedto parts on the tester. The “passing” test vectors are collected(and the “failing” ones are simply discarded) in this manner untilall the faults are detected by them throughout the iterative ATPGprocedure.

Table IV provides a comparison between the ATPG metrics ofthe original netlist versus those of the proposed iterative process.It can be seen from the results that the proposed iterative pro-cedure yields extremely negligible coverage loss, while the pat-tern count slightly increases compared to the ATPG results onthe original netlist.

X. CONCLUSION

In this paper, we propose a suite of techniques for the diag-nosis and tolerance of scan chain hold-time violations. The pro-posed diagnosis technique can be utilized for any scan chainhold-time violation, induced by a manufacturing defect or aphysical design timing closure error, in order to accurately pin-point the location of the violation. The diagnosis time is min-imized by generating and utilizing a minimal set of diagnostic

2Once a test vector is deemed as failing upon a coin toss, we make sure thatit is never generated again.

826 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 6, JUNE 2009

TABLE IVIMPACT OF A SINGLE INTERMITTENT SCAN HOLD-TIME VIOLATION

patterns. For permanent scan hold-time violations and FTR andFTF errors, even when they occur along with functional hold-time violations, the diagnostic resolution can be expected to benear-perfect. For intermittent scan hold-time violations, the di-agnostic resolution depends on the assumptions regarding theintermittency of the violation; an assumption such as location-consistency helps attain near-perfect resolutions even in the caseof intermittent violations. The scan chain hold-time violation di-agnosis technique that we propose is based on the utilization ofexisting scan capabilities, imposing no design changes whatso-ever.

The proposed tolerance technique is more targeted towardsscan chain hold-time violations that are induced due to physicaldesign process errors, which lead to systematic chip failures onthe tester, and thus to chips that are functional but cannot betested. Upon the exact pinpointing of the violations, the impactof the hold-time violating scan cells on scan shift operations canbe modeled, enabling a subsequent ATPG process to accountfor these violations, and thus generating structural ATPG pat-terns that can be utilized to test chips with scan chain hold-timeviolations. The proposed tolerance technique perfectly restoresyield for permanent hold-time violations and FTR and FTF er-rors, while deviation from perfect yield levels may be sufferedin the case of intermittent violations depending on the variationof intermittence behavior among parts. In either case, the pro-posed techniques constitute a cost-effective alternative to otherexpensive techniques such as fibbing or scan chain masking.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewers fortheir constructive criticism and help in extending the discussiontowards practical issues faced in industry.

REFERENCES

[1] O. Sinanoglu and P. Schremmer, “Diagnosis, modeling, and toleranceof scan chain hold-time violations,” in Proc. Des. Autom. Test Eur.,2007, pp. 516–521.

[2] I. Bayraktaroglu and A. Orailoglu, “Diagnosis for scan-based BIST:Reaching deep into the signatures,” in Proc. Des. Autom. Test Eur.,2001, pp. 102–109.

[3] D. S. Kim and P. H. Seong, “Optimal testing input sets for reduceddiagnosis time of nuclear power plant digital electronic circuits,” IEEETrans. Nucl. Sci., vol. 41, no. 1, pp. 307–316, Jan. 1994.

[4] S. Edisooriya and G. Edisooriya, “Diagnosis of scan path failures,” inProc. VLSI Test Symp., 1995, pp. 250–255.

[5] Y. Wu, “Diagnosis of scan chain failures,” in Proc. Int. Symp. DefectFault Tolerance, 1998, pp. 217–222.

[6] S. Narayanan and A. Das, “An efficient scheme to diagnose scanchains,” in Proc. Int. Test Conf., 1997, pp. 704–713.

[7] J. Schafer, F. Policastri, and R. Mcnulty, “Partner SRLs for improvedshift register diagnostics,” in Proc. VLSI Test Symp., 1992, pp. 198–201.

[8] S. Kundu, “On diagnosis of faults in a scan chain,” in Proc. VLSI TestSymp., 1993, pp. 303–308.

[9] J. C.-M. Li, “Diagnosis of single stuck-at faults and multiple timingfaults in scan chains,” IEEE Trans. Very Large Scale Integr. (VLSI)Syst., vol. 13, no. 6, pp. 708–718, Jun. 2005.

[10] J. C.-M. Li, “Diagnosis of multiple hold-time and setup-time faults inscan chains,” IEEE Trans. Comput., vol. 54, no. 11, pp. 1467–1472,Nov. 2005.

[11] L. Cheney and N. Sheils, “A method for isolating defects in scannablesequential elements,” presented at the Intel Des. Test Technol. Conf.,Hillsboro, OR, 2000.

[12] R. Guo and S. Venkataraman, “A technique for fault diagnosis of de-fects in scan chains,” in Proc. Int. Test Conf., 2001, pp. 268–277.

[13] Y. Huang, W. T. Cheng, C. J. Hsieh, H. Y. Tseng, A. Huang, and Y.T. Hung, “Efficient diagnosis for multiple intermittent scan chain hold-time faults,” in Proc. Asian Test Symp., 2003, pp. 198–201.

[14] Y. Huang, W. T. Cheng, S. M. Reddy, C. J. Hsieh, and Y. T. Hung,“Statistical diagnosis for intermittent scan chain hold-time fault,” inProc. Int. Test Conf., 2003, pp. 198–201.

[15] J.-S. Yang and S.-Y. Huang, “Quick scan chain diagnosis using signalprofiling,” in Proc. Int. Conf. Comput. Des., 2005, pp. 157–160.

[16] R. Guo, Y. Huang, and W.-T. Cheng, “Fault dictionary based scanchain failure diagnosis,” in Proc. Asian Test Symp., 2007, pp. 45–50.

[17] E. Hsu, S.-Y. Huang, and C.-W. Tzeng, “A new robust paradigm fordiagnosing hold-time faults in scan chains,” in Proc. Int. Symp. VLSIDes., Autom. Test, 2006, pp. 171–174.

[18] S. M. Reddy, I. Pomeranz, S. Kajihara, A. Murakami, S. Takeoka, andM. Ohta, “On validating data hold times for flip-flops in sequentialcircuits,” in Proc. Int. Test Conf., 2000, pp. 317–325.

[19] Z. Wang, M. M. Sadowska, K.-H. Tsai, and J. Rajski, “Diagnosisof hold time defects,” in Proc. Int. Conf. Comput. Des., 2004, pp.192–199.

[20] A. Majumdar, W.-Y. Chen, and J. Guo, “Hold time validation on siliconand the relevance of hazards in timing analysis,” in Proc. Des. Autom.Conf., 2006, pp. 326–331.

[21] F. Brglez, D. Bryan, and K. Kozminski, “Combinational profiles ofsequential Benchmark circuits,” in Proc. IEEE ISCAS, May 1989, vol.14, no. 2, pp. 1929–1934.

[22] H. K. Lee and D. S. Ha, “On the generation of test patterns for com-binational circuits,” Dept. Elect. Eng., Virginia Polytechnic Inst. StateUniv., Blacksburg, Tech. Rep. 12-93, 1993.

Ozgur Sinanoglu received the B.S. degree incomputer engineering, and another B.S. degree inelectrical and electronics engineering, from BogaziciUniversity, Turkey, in 1999, and the M.S. and Ph.D.degrees in computer science and engineering fromthe University of California, San Diego, in 2001 and2004, respectively.

Between 2004 and 2006, he worked as a seniordesign for testability engineer with Qualcomm, SanDiego, CA. Since Fall 2006, he has been a facultymember in the Mathematics and Computer Science

Department, Kuwait University. His research field is in the design for testabilityof VLSI circuits.

Philip Schremmer received the B.S. degree in elec-trical engineering from the University of Rochester,Rochester, NY, in 1997.

He has been with Qualcomm CDMA Technolo-gies, San Diego, CA, since 2003, where he is cur-rently a staff engineer, primarily responsible for theoverall design of on-chip test clock generation archi-tecture and timing of test circuitry. Previously, he waswith Cadence Design Systems, rising to the positionof lead design engineer in the Digital Integrated Cir-cuits Group, responsible for the design and test of

various digital ASICs. His research interests include high performance VLSIdesign and test with application to high-speed and low-power wireless devices.