14. built_in_ self_detection correction architecture for motion estimation computing arrays

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 2, FEBRUARY 2010 319

begin

Let �� be the logical subarrays of ��

divided by the � � � excluded rows;

GP_GCR(�� );

/* results in an �� target array �� */

if� �� then

begin /*update lower bound and current MLA*/

� �� ;

� �� ;

end of if;

� �� ; /* reset the number of the subarrays for �� */

end;

end;

end.

REFERENCES

[1] R. Negrini, M. G. Sami, and R. Stefanelli, Fault Tolerance Through Re-configuration in VLSI and WSI Arrays. Cambridge, MA: MIT Press,1989.

[2] S. Y. Kuo and I. Y. Chen, “Efficient reconfiguration algorithms fordegradable VLSI/WSI arrays,” IEEE Trans. Comput.-Aided Des., vol.11, no. 10, pp. 1289–1300, Oct. 1992.

[3] C. P. Low and H. W. Leong, “On the reconfiguration of degradableVLSI/WSI arrays,” IEEE Trans. Comput.-Aided Des. Integr. CircuitsSyst., vol. 16, no. 10, pp. 1213–1221, Oct. 1997.

[4] C. P. Low, “An efficient reconfiguration algorithm for degradableVLSI/WSI arrays,” IEEE Trans. Comput., vol. 49, no. 6, pp. 553–559,Jun. 2000.

[5] W. Jigang and T. Srikanthan, “Reconfiguration algorithms for powerefficient VLSI subarrays with 4-port switches,” IEEE Trans. Comput,vol. 55, no. 3, pp. 243–253, Mar. 2006.

[6] W. Jigang and T. Srikanthan, “Integrated row and column re-routingfor reconfiguration of VLSI arrays with 4-port switches,” IEEE Trans.Comput., vol. 56, no. 10, pp. 1387–1400, Oct. 2007.

[7] M. Fukushi, Y. Fukushima, and S. Horiguchi, “A genetic approach forthe reconfiguration of degradable processor arrays,” in Proc. 20th IEEEInt. Symp. Defect Fault Tolerance VLSI Syst., 2005, pp. 63–71.

[8] W. Jigang and T. Srikanthan, “An improved reconfiguration algorithmfor degradable VLSI/WSI arrays,” J. Syst. Archit., vol. 49, no. 1/2, pp.23–31, Jul. 2003.

[9] W. Jigang and T. Srikanthan, “On the reconfiguration algorithm forfault-tolerant VLSI arrays,” in Proc. Int. Conf. Comput. Sci. 2003(ICCS 2003), Melbourne, Australia, Jun. , vol. 3, Lecture Notes inComputer Science 2659, pp. 360–366.

[10] W. Jigang and T. Srikanthan, “Accelerating reconfiguration of degrad-able VLSI arrays,” IEE Proc. Circuits Devices Syst., vol. 153, no. 4, pp.383–389, Aug. 2006.

Built-in Self-Detection/Correction Architecture for MotionEstimation Computing Arrays

Chun-Lung Hsu, Chang-Hsin Cheng, and Yu Liu

Abstract—This paper develops a built-in self-detection/correction(BISDC) architecture for motion estimation computing arrays (MECAs).Based on the error detection/correction concepts of biresidue codes, anysingle error in each processing element in an MECA can be effectivelydetected and corrected online using the proposed BISD and built-inself-correction circuits. Performance analysis and evaluation demonstratethat the proposed BISDC architecture performs well in error detectionand correction with minor area overhead and timing penalty.

Index Terms—Area overhead, built-in self-correction (BISC), built-inself-detection (BISD), motion estimation computing array (MECA).

I. INTRODUCTION

The new Joint Video Team (JVT) video coding standard has garneredincreased attention recently. Generally, motion estimation computingarray (MECA) performs up to 50% of computations in the entire videocoding system, and is typically considered the computationally mostimportant part of video coding systems. Thus, integrating the MECAinto a system-on-chip (SOC) design has become increasingly importantfor video coding applications [1], [2].

Although advances in VLSI technology allow integration of a largenumber of processing elements (PEs) in an MECA into an SOC, thisincreases the logic-per-pin ratio, thereby significantly decreasing theefficiency of chip logic testing. For a commercial chip, a video codingsystem must introduce design for testability (DFT), especially in anMECA. The objective of DFT is to increase the ease with which a de-vice can be tested to guarantee high system reliability. Many DFT ap-proaches have been developed. These approaches can be divided intothree categories: ad hoc (problem oriented), structured, and built-inself-test (BIST) [3], [4]. Among these techniques, BIST has an obviousadvantage in that expensive test equipment is not needed and tests arelow cost. Moreover, BIST can generate test simulations and analyzetest responses without outside support, making tests and diagnoses ofdigital systems quick and effective. However, as the circuit complexityand density increases, the BIST approach must detect the presence offaults and specify their locations for subsequent repair. The extendedtechniques of BIST are built-in self-diagnosis [5] and built-in self-re-pair (BISR) [6]. Although BIST and BISR are utilized in many studies,most studies focused on memory testing. Nowadays, the computationalcomplexity of modern video coding systems has increased; thus, effi-cient self-detection and self-correction techniques are needed to im-prove reliability.

Based on the concepts of BIST and biresidue codes, this paperpresents a built-in self-detection/correction (BISDC) architecturethat effectively self-detects and self-corrects PE errors in an MECA.Notably, any array-based computing structure, such as the discretecosine transform (DCT), iterative logic array (ILA), and finite-impulse

Manuscript received March 05, 2008; revised September 05, 2008. Firstpublished April 14, 2009; current version published January 20, 2010. Thiswork was supported by the National Science Council of the Republic of China,Taiwan, under Contract NSC 97-2221-E-259-031.

The authors are with the Department of Electrical Engineering, NationalDong Hwa University, Shou-Feng 974, Taiwan (e-mail: [email protected];[email protected]; [email protected]).

Digital Object Identifier 10.1109/TVLSI.2008.2009452

1063-8210/$26.00 © 2009 IEEE

320 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 2, FEBRUARY 2010

Fig. 1. Proposed MECA BISDC implementation.

filter (FIR), is suitable for the proposed method to detect and correcterrors based on biresidue codes.

II. ERROR DETECTION/CORRECTION CODES

The use of residue codes to detect error is a useful approach in com-puter arithmetic [7]. Residue codes are separable arithmetic codes thatcalculate a residue for data, and then apply this residue to data. Forinstance, we assume � denotes an integer, �� and �� represent datawords, and � is the modulus. A separate residue code of interest isone in which � is coded as a pair �� . Notably, �� is theresidue of � modulo �. Error detection logic for operations is typi-cally derived using a separate residue code such that detection logicis simply and easily implemented. However, error correction cannot beperformed effectively using residue codes. The arithmetic code, namelybiresidue codes, can be supported to realize error detection and errorcorrection.

The biresidue codes separate residue coding using two residuedetectors with respect to two suitable moduli. Consider integer �coded as a triple �� , where � and � are two rela-tively prime integers. Let moduli � � �� and � � �� such that �� . The set of all single errors denoted by� � �� will have distinct syndromes withrespect to � and B provided � is not greater than �. Additionally, thetriple � �� is considered a biresidue code word with respect tomoduli � and � if and only if � � � �� and � � � �� . Moreover,the syndrome for the triple � �� with respect to moduli � and�, denoted as �� , is a pair �� where �� and �� . Thus, a triple � �� of integers is a biresiduecode word with respect to moduli � and � if and only if its syndrome�� with respect to moduli � and � equals zero (�� ,�� ). In other words, the error in any component is detected and lo-cated based on the form of its corresponding syndrome. In accordancewith the error detection and error correction concepts in biresiduecodes, this paper proposes a BISDC architecture to self-detect andself-correct PE errors in an MECA.

III. PROPOSED METHODOLOGY

A. MECA BISDC Implementation

To demonstrate the feasibility of the proposed BISDC architecture,this paper adopts the MECA as a CUT [8]. The MECA consists ofmany PEs connected into a 1-D or 2-D array for video encoding appli-cations. Generally, a PE is made up of two adders (an 8-bit adder and a12-bit adder) and an accumulator. The PE in an MECA computes theabsolute difference between one pixel of the search area and one pixelof the current macroblock. Thus, by utilizing PEs, the sum of absolutedifferences (SADs) shown in (1) between the current macroblock andeach search position can be evaluated

� �

��

��

��

��

�� (1)

where �� and �� are the luminance pixel value of currentpixel (Cur.pixel) and reference pixel (Ref.pixel), respectively. Themacroblock size is � � � . The 2-D H.264/Advanced Video Coding(AVC) motion estimation architecture published in [9] is a clearexample of MECA operations. The best motion position of a 4 � 4block from the previous frame to the current frame can be capturedeasily using MECA operations in the video encoding system.

According to MECA characteristics, Fig. 1 shows the correspondingBISDC implementation. Signals �� and �� are utilized to selectdata paths from Cur.pixel and Ref.pixel, respectively. The output of aspecific �� can be delivered to a detector for detecting errors usingthe �� signal. Moreover, the selector circuit is controlled by signals�� and �� that receive data from a specific ��, and then exportthese data to the next specific �� or syndrome analysis and cor-rector (SAC) for error correction. The self-detection and self-correctionoperations (Fig. 1) are simply described as follows. First, the input dataof Cur.pixel and Ref.pixel for a specific �� in the MECA are sent tothe test code generator (TCG) to generate the corresponding test codes.Second, the test codes from the TCG and output data from the specific �� are detected and verified in detector and selector (DAS) circuitsto determine whether the specific �� has an error. In other words, theself-detection capability uses the detector circuit in DAS. Third, the


Fig. 2. Example of the self-detection/correction operations.

Fig. 3. Block diagram of a coder.

selector circuit in DAS delivers the error signal to SAC for error cor-rection. Finally, the error correction data from SAC, or error-free datafrom the selector circuit in DAS, are passed to the next specific ��

for subsequent testing.

B. Fault Model

The PEs are important building blocks and are connected in a reg-ular manner to construct an MECA. Generally, PEs are surrounded bysets of adders and accumulators that determine how data flows throughthem. Thus, PEs can be considered the class of circuits called ILAs,whose testing assignment can be easily achieved using the fault modelcalled as cell fault model (CFM) [10]. The use of the CFM is currentlyof considerable interest due to the rapid growth in the use of high-levelsynthesis and the parallel increase in complexity and density of ICs.Using the CFM allows tests to be independent of the adopted synthesistool and vendor library. Arithmetic modules, like adders (the primaryelement in a PE), due to their regularity, are designed in a very denseconfiguration.

Moreover, the use of a relatively more comprehensive fault model,the single stuck-at (SSA) model, is required to cover actual failures inthe interconnect data bus between PEs. The SSA fault is a well-knownstructural fault model that assumes faults cause a line in the circuit tobehave as if it were permanently at logic “0” [stuck-at 0 (SA0)] or logic“1” [stuck-at 1 (SA1)]. The SSA fault in an MECA architecture canresult in errors in computed SAD values. This paper refers to this asa distorted computational error; its magnitude is � � ��

� � ��,where �� is the computed SAD value with an SSA fault.

C. BISDC Processes

Fig. 2 shows an example of a specific �� to describe explicitlythe self-detection and self-correction of errors in an MECA using theproposed BISDC architecture. The TCG circuit uses two coders (coders�� and ��) to generate test codes. The following definitions, based

on the biresidue codes, are applied to verify the feasibility of the twocoders in the TCG.

Definition 1:

�� (2)

Definition 2: Let �� then

�� (3)

Based on Definitions 1 and 2, the design of the coder �� (or coder ��)circuit can be realized and shown in Fig. 3.

Fig. 4 shows the timing chart for a specific �� in an MECA to de-scribe the operation of coder �� circuit in the TCG. If data �� and ��,obtained from Cur.pixel and Ref.pixel, are sent to the �� and�� circuits at the first clock, then values �� and �� canbe created at the second clock. When the third clock is triggered, thesummation of modulus values, � � �� , is generated bythe adder. Additionally, the following two data, �� and ��, are simul-taneously passed into the �� and �� circuits for modulooperations. The �� circuit executes the operation of ��

�� , and then stores the computational results in theregister at the fourth clock. Moreover, the summation of the modulusvalue � � �� can be captured by an adder at this time.Notably, the two selected signals � and � in multiplexers � and� are set to 0 to sum the modulus values at clocks 1–4. At the fifthclock, the next two data, �� and ��, are transmitted to the coder �� cir-cuit, and signals� and� are changed from 0 to 1 to deliver the values�� and �� to the adder for addition. Fig. 4 clearly demonstratesthat the regularity processes will continue after the fifth clock. Since a4 � 4 macroblock in a specific �� of the MECA contains 16 pixels,the accumulated input data from coder �� and coder �� circuits in theTCG are exported to the DAC and SAC circuits for error detection anderror correction, respectively, after 50 clocks.


Fig. 4. Timing chart of a coder circuit.

TABLE ISYNDROME CORRESPONDING TO ALL SINGLE BIT ERRORS

1) Self-Detection Operation: The self-detection operation can beachieved using the DAS circuit. The detector circuit is utilized to com-pare the outputs between a specific �� and the TCG for determiningwhether an error has occurred (Fig. 2). A selector circuit in DAS is thenenabled to place the error in the SAC circuit for error correction or toexport the error-free results to the output directly.

A mathematical statement is presented herein to verify the self-de-tection operation. According to Definition 2, the residue of the ��

modulo � is �� when the specific�� has � pixels. Moreover, based on the biresidue codes theorem, atriple �� with respect to moduli �� and �� is given by

� � �� (4)

� � �� (5)

Thus, the syndrome can be represented by the pair �� . Additionally, we assume that pixel valueis adjusted to � �

� � �� when an error bit is present in the specific��. According to Definition 2, the residue of � �

� modulo �� and ��

is given by

��

� �� (6)

��

� �� (7)

Thus, the single error bit in the specific�� can be detected if and onlyif (4) �� (6) and/or (5) �� (7).

2) Self-Correction Operation: In the self-correction operation, theSAC circuit plays an important role in correcting errors in a specific

��. The SAC circuit (Fig. 2) receives data from the TCG and DAScircuits to start error correction. The syndrome decoder and correctorcircuits in the SAC are employed to diagnose single error and furthercorrect error signal, respectively. In other words, the syndrome decoderin the SAC generates syndromes �� and �� by adopting the error cor-rection concepts of biresidue codes. Table I and Fig. 5 show the syn-dromes corresponding to all cases of single bit error and the correctorcircuit, respectively, i.e., any single bit error of a specific �� of theMECA can be obtained by comparing the syndrome �� withTable I, and then the bit error is corrected using the circuit in Fig. 5.

For instance, based on (4)–(7), syndromes �� and �� can be ex-pressed as

��

� ��

� ��

� � ��

� �� (8)

Here, the specific �� is error-free when the syndrome �� . However, a single error bit can be detected when the syndrome�� and the error bit can be located and correctedusing the syndrome listed in Table I and the circuit shown in Fig. 5.

IV. PERFORMANCE EVALUATION

The proposed BISDC architecture was generated using VHDL andsynthesized using the Synopsys Design Compiler with TSMC 0.18 �m1P6M CMOS technology. The MECA was selected to act as the CUTto demonstrate the effectiveness of the proposed BISD and built-in self-correction (BISC) procedures. The area overhead, timing penalty, andthroughput are also utilized to demonstrate the good performance ofthe proposed BISDC architecture.


Fig. 5. Circuit of corrector in SAC.TABLE II

AREA OVERHEAD ESTIMATION

TABLE IIITIMING PENALTY AND THROUGHPUT ESTIMATION

Table II summarizes synthesis results of area estimation of thecircuit design. The area is estimated based on the number of cells.Considering 16 PEs in an MECA, the area overhead introduced isgiven by

��

�� (9)

The self-detection and self-correction functionalities can be achievedusing the proposed BISD and BISC with about 0.86% and 2.44% areaoverhead, respectively (Table II). Based on (9), Table II also shows thetotal area overhead of the proposed BISDC architecture, only 2.80%,which is reasonable for designing a circuit for testing.

The timing penalty and throughput must also be demonstrated toverify the flexibility of the proposed BISDC architecture. Table III


shows the operating time estimation of a specific �� and each com-ponent in the proposed BISDC architecture. Each �� in an MECA istested in succession; thus, a 4 � 4 macroblock (a �� with 16 pixels)is used to estimate timing penalty. A specific �� requires 1313.76and 1328.95 ns for self-detection and self-correction operations, re-spectively (Table III). Notably, the proposed BISDC architecture isa noncurrent online testing scheme for error detection/correction offaulty PEs. In other words, error detection/correction operations foreach faulty PE in an MECA is executed in sequence. Thus, if the pro-posed BISDC architecture is embedded into the MECA for testing, theentire timing penalty is equal to that for testing a single PE, i.e., about1.55% and 2.72% (Table III) for BISD and BISC, respectively. The testprocesses for the TCG circuit and tested �� are operated in parallel(Fig. 2). Preparation time of the TCG circuit is faster than that of theone-pixel operation of the tested �� (Table III). Therefore, the op-eration time of the TCG circuit can be neglected because the timingpenalty of the TCG is covered by the pixel operation. Table III alsoshows the total number of accessing macroblocks per second. Obvi-ously, when the MECA is combined with the proposed BISDC archi-tecture, complexity and throughput will be increased and decreased,respectively, i.e., complexity increases to 2.6% when an error occurs.Although the throughput of the MECA with the proposed BISDC ar-chitecture is lower than that of the MECA without the BISDC architec-ture, fewer data access is indicative of reduced demand on input/outputbandwidth. Generally, a low memory bandwidth is desirable in the pro-gressive high-definition television (HDTV) video format.

V. CONCLUSION

This paper proposes a BISDC architecture for self-detection andself-correction of errors of PEs in an MECA. Based on the error de-tection/correction concepts of biresidue codes, this paper presents thecorresponding definitions used in designing the BISD and BISC cir-cuits to achieve self-detection and self-correction operations. Perfor-mance evaluation reveals that the proposed BISDC architecture effec-tively achieves self-detection and self-correction capabilities with min-imal area overhead and a small timing penalty.

REFERENCES

[1] Z. L. He, C. Y. Tsui, K. K. Chan, and M. L. Liou, “Low-power VLSIdesign for motion estimation using adaptive pixel truncation,” IEEETrans. Circuits Syst. Video Technol., vol. 10, no. 5, pp. 669–678, Aug.2000.

[2] C. G. Peng, D. S. Yu, X. X. Cao, and S. M. Sheng, “Efficient VLSI de-sign and implementation of integer motion estimation for H.264 SDTVencoder,” in Proc. IEEE Int. Conf. Solid-State Integr. Circuits, 2006,pp. 2019–2021.

[3] P. Gallagher, V. Chickermane, S. Gregor, and T. S. Pierre, “A buildingblock BIST methodology for SOC designs: A case study,” in Proc. Int.Test Conf., Oct. 2001, pp. 111–120.

[4] T. H. Wu, Y. L. Tsai, and S. J. Chang, “An efficient design-for-testa-bility scheme for motion estimation in H.264/AVC,” in Proc. Int. Symp.VLSI Des. Autom. Test, Apr. 2007, pp. 25–27.

[5] J. C. Yeh, K. L. Cheng, Y. F. Chou, and C. W. Wu, “Flash memorytesting and built-in self-diagnosis with march-like test algorithms,”IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no.6, pp. 1101–1113, Jun. 2007.

[6] X. Xiong, Y. L. Wu, and W. B. Jone, “Reliability analysis of self-re-pairable MEMS accelerometer,” in Proc. IEEE Int. Symp., Defect FaultTolerance VLSI Syst., Oct. 2006, pp. 236–244.

[7] R. J. Higgs and J. F. Humphreys, “Two-error-location for quadraticresidue codes,” Proc. Inst. Electr. Eng. Commun., vol. 149, no. 3, pp.129–131, Jun. 2002.

[8] J. C. Tuan, T. S. Chang, and C. W. Jen, “On the data reuse and memorybandwidth analysis for full-search block-matching VLSI architecture,”IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 1, pp. 61–72,Jan. 2002.

[9] C. Wei and M. Z. Gang, “A novel VLSI architecture for VBSME inMPEG-4 AVC/H.264,” in Proc. IEEE Int. Symp. Circuits Syst., May2005, pp. 1794–1797.

[10] J. F. Li and C. C. Hsu, “Efficient testing methodologies for conditionalsum adders,” in Proc. Asian Test Symp., 2004, pp. 319–324.

A Novel -ploiting Strategy for Improving Performance ofTest Data Compression

Maoxiang Yi, Huaguo Liang, Lei Zhang, and Wenfa Zhan

Abstract—A precomputed core test set contains a large number ofdon’t cares (�’s) that can be effectively exploited to improve test datacompression (TDC). Extending pattern run-length coding, we present anovel strategy that propagates the �’s of a reference pattern to a newreference pattern in such a way that the reference pattern is XOR-edwith the pattern to be encoded. The �-propagating strategy can increasethe probability of a reference pattern being coding-compatible with thepattern to be encoded, and its validity can be established by filling some�’s of the already encoded patterns in backtracing way. How our strategyis used for TDC is demonstrated. Experimental results for large ISCAS89benchmarks show that, compared to the recently proposed schemes,our technique can effectively improve compression and simplify on-chipdecoder, and work better when used for core-unified TDC.

Index Terms—Backtracing, test data compression (TDC), �-filling,�-propagating.

I. INTRODUCTION

For large circuits, especially for today’s system-on-chips (SoCs), testdata volume becomes huge due to growth in design complexity and theintegration of embedded cores in SoCs, which results in long test appli-cation time and high tester memory requirement. Test data compression(TDC) is one of most effective techniques in terms of reducing test ap-plication time and memory requirement. Various TDC techniques havebeen presented in the literature [1]–[8], [10], [11], [13], [14].

For today’s SoCs, the percentage of �-bit in a test set �� can bequite high [8]. Properly filling these �’s can improve compression for��. The classical run-length coding scheme [1] fills �’s with 0 or 1so that a specified bit continuously runs as long as possible. In the sta-tistical-coding scheme [2], �’s are filled to encode each data block inas short codeword as possible, decreasing the statistically coded datavolume. Jas et al. pointed out in [3] that filling �’s in an optimum waymakes the occurrence frequency of some patterns in �� skew, whichhelps to improve compression rate based on the selective Huffmancoding (SHC). Further, in the literature [9], the relationship betweenentropy and compression of test data was studied, and the�-filling algo-rithms presented in [3] were extended. The two-word Huffman codingscheme [10] designed an enumeration-based algorithm to fill the �’s

Manuscript received July 08, 2008; revised September 24, 2008. First pub-lished June 16, 2009; current version published January 20, 2010. This workwas supported in part by the Natural Science Fund of China (NSFC) under Grant60633060 and Grant 60876028.

M. Yi and W. Zhan are with the Department of Electronic Science andTechnology, Hefei University of Technology, Hefei 230009, China (e-mail:mxyi126@ 126.com; [email protected]).

H. Liang and L. Zhang are with the School of Computer and Information,Hefei University of Technology, Hefei 230009, China (e-mail: [email protected]; [email protected]).

Digital Object Identifier 10.1109/TVLSI.2008.2009873

1063-8210/$26.00 © 2009 IEEE

14. built_in_ self_detection correction architecture for motion estimation computing arrays

Documents