built-in self-test for multipliers

Built-In Self-Test for MultipliersBuilt-In Self-Test for Multipliers

Mary PulukuriMary PulukuriDept. of Electrical & Computer EngineeringDept. of Electrical & Computer Engineering

Auburn UniversityAuburn University

M. Pulukuri 9/09M. Pulukuri 9/09 VLSI D&T SeminarVLSI D&T Seminar 22

Outline of PresentationOutline of PresentationOverivew of multiplier architecturesOverivew of multiplier architecturesHistory of Digital Signal Processor History of Digital Signal Processor

(DSP) Architectures in FPGAs(DSP) Architectures in FPGAsOverview of Virtex-4 DSPOverview of Virtex-4 DSP

Prior Testing R&D for MultipliersPrior Testing R&D for MultipliersOur ApproachOur ApproachAnalysis MethodologyAnalysis MethodologySimulation ResultsSimulation ResultsApplication to Virtex-4 & 5 DSPsApplication to Virtex-4 & 5 DSPsSummary and ConclusionsSummary and Conclusions


Overview of MultipliersOverview of MultipliersArray MultiplierArray Multiplier

Final product calculated by using an array of Final product calculated by using an array of full adders & and gatesfull adders & and gates


Overview of MultipliersOverview of MultipliersSigned array or Baugh Wooley multiplierSigned array or Baugh Wooley multiplier

Final product calculated using an array of full Final product calculated using an array of full adders, and gates & nand gatesadders, and gates & nand gates


Overview of MultipliersOverview of Multipliers Modified Booth multipliersModified Booth multipliers

Partial products calculated using the modified booth algorithmPartial products calculated using the modified booth algorithm Modified booth algorithm uses a binary encoder to calculate partial Modified booth algorithm uses a binary encoder to calculate partial

products using a series of shift operationsproducts using a series of shift operations Summation of partial products done using CLA addersSummation of partial products done using CLA adders

A.R. Cooper, “Parallel architecture modified Booth multiplier” IEEE Proc. Electronic Circuits and Systems, vol. 135, no. 3, pp. 125-128, 1998


Overview of MultipliersOverview of MultipliersModified Booth/Wallace Tree multipliersModified Booth/Wallace Tree multipliers

Summation of partial products done using a Wallace Summation of partial products done using a Wallace TreeTreeEach column of partial products are summed using a multi-Each column of partial products are summed using a multi-

stage setup of half and full addersstage setup of half and full addersEach multi-stage adder circuit generates a sum and carry Each multi-stage adder circuit generates a sum and carry

which form the two final partial productswhich form the two final partial productsTwo final stage partial products from the wallace tree Two final stage partial products from the wallace tree

are added using a CLA adderare added using a CLA adder


Xilinx FPGA ArchitecturesXilinx FPGA Architectures4000/Spartan4000/Spartan

NNxxNN array of unit cells array of unit cellsUnit cell = CLB + routingUnit cell = CLB + routingFast carry logic in CLBs for addersFast carry logic in CLBs for adders

Virtex/Spartan-2Virtex/Spartan-2MMxxNN array of unit cells array of unit cells

Carry logic + AND gate for array Carry logic + AND gate for array multipliersmultipliers

4K block RAMs at edges4K block RAMs at edgesVirtex-2/Spartan-3Virtex-2/Spartan-3

18K block RAMs in array18K block RAMs in array18x18-bit multipliers with each RAM18x18-bit multipliers with each RAM

““based on modified Booth architecture”based on modified Booth architecture”

Virtex-4/Virtex-5Virtex-4/Virtex-5Added 48-bit DSP cores w/multipliersAdded 48-bit DSP cores w/multipliers

PC PC

PC

PC


Virtex-4 DSP ArchitectureVirtex-4 DSP Architecture 2 DSP slices per tile2 DSP slices per tile

16-256 tiles in 1-8 16-256 tiles in 1-8 columnscolumns

Each DSP includes:Each DSP includes: 18x18-bit 2's-comp 18x18-bit 2's-comp

multiplier (w/o adder)multiplier (w/o adder) 3-input, 48-bit 3-input, 48-bit

adder/subtractor adder/subtractor P = ZP = Z(X+Y+Cin)(X+Y+Cin) Optional accum regOptional accum reg

User controlled User controlled operational modesoperational modes For X, Y, & Z MUXsFor X, Y, & Z MUXs

Configuration bits control Configuration bits control other MUXsother MUXs Pipelining registersPipelining registers Accumulator registerAccumulator register Easily testedEasily tested

X

Y

Z

X

Y

Z

C(48)

A(18)B(18)

A(18)B(18)

P(48)

P(48)

Inputs for cascading

Inputs for cascadingOutputs w/ dedicated routing

Outputs w/ dedicated routing


BIST Approach for Virtex-5 DSPBIST Approach for Virtex-5 DSP

Larger multiplier


Multiplier ArchitecturesMultiplier ArchitecturesTest algorithm depends on architectureTest algorithm depends on architecture

But architecture is not specified in data sheetsBut architecture is not specified in data sheetsEliminate sequential logic architecturesEliminate sequential logic architectures ““Based on modified Booth”Based on modified Booth”

Multiplier choices include:Multiplier choices include:ArrayArrayBoothBoothModified BoothModified BoothModified Booth/Wallace treeModified Booth/Wallace tree

Our assumption based on area/performance analysisOur assumption based on area/performance analysis

Our goal: find/develop architecture independent Our goal: find/develop architecture independent test algorithm(s)test algorithm(s)


Modified Booth Test AlgorithmsModified Booth Test AlgorithmsTest algorithm uses 8-bit counter Test algorithm uses 8-bit counter (256 vectors)(256 vectors)

“ ““ “Effective Built-In Self-Test for Booth Multipliers”Effective Built-In Self-Test for Booth Multipliers”Gizopoulos, Paschalis & ZorianGizopoulos, Paschalis & Zorian

IEEE Design & Test of ComputersIEEE Design & Test of Computers

pp. 105-111, 1998pp. 105-111, 1998Claim fault coverage ~ 99.8%Claim fault coverage ~ 99.8%

4x4 connections to multiplier inputs4x4 connections to multiplier inputsOrder of the bits does not matterOrder of the bits does not matterAlgorithm used in Srinivas Garimella’sAlgorithm used in Srinivas Garimella’s

MS thesis for Virtex-2 multipliersMS thesis for Virtex-2 multipliers

×

nn

2n

Booth encoding

n×n multiplier

8-bit counterMSB LSB

4 4

4×4 algorithm



““An Effective BIST Architecture for Fast Multiplier Cores”An Effective BIST Architecture for Fast Multiplier Cores”Paschalis, Kranitis, Psarakis Paschalis, Kranitis, Psarakis

Gizopoulus & ZorianGizopoulus & Zorian Proc. Design, Automation and Test in Proc. Design, Automation and Test in

Europe Conf. pp. 117-121, 1999Europe Conf. pp. 117-121, 1999 Claim fault coverage ~99.8%Claim fault coverage ~99.8%

5x3 connections with 5 inputs to5x3 connections with 5 inputs to

Booth encodingBooth encoding But this was not explicit in paperBut this was not explicit in paper

Only shown in figure Only shown in figure Order of the bits does not matterOrder of the bits does not matter

Note that this paper is from 1999Note that this paper is from 1999

×

nn

2n

Booth encoding

n×n multiplier


5 3

5×3 algorithm



““Low Power BIST for Wallace Tree-based Fast Multipliers”Low Power BIST for Wallace Tree-based Fast Multipliers”Bakalis, Kalligeros, Nikolos, Bakalis, Kalligeros, Nikolos,

Vergos & AlexiouVergos & Alexiou Proc. Int. Symp. on Quality of Electronic Design,Proc. Int. Symp. on Quality of Electronic Design,

pp. 433-438, 2000pp. 433-438, 2000 Claim fault coverage > 99%Claim fault coverage > 99%

5x3 connections with 5 inputs to 5x3 connections with 5 inputs to

Booth encodingBooth encoding Specifically stated in paperSpecifically stated in paper

But no data to back up claim that 5x3 better than 3x5But no data to back up claim that 5x3 better than 3x5

Did they just observe it in Zorian paper?Did they just observe it in Zorian paper? Note that this paper was published a year later than ZorianNote that this paper was published a year later than Zorian


×

nn

2n

Booth encoding

n×n multiplier

5 3

5×3 algorithm



But which side is Booth encoding?But which side is Booth encoding?Xilinx does not specifyXilinx does not specify

Our original approachOur original approachRun 5x3 algorithmRun 5x3 algorithm

256 vectors256 vectors

andand run 3x5 algorithm run 3x5 algorithm512 vectors512 vectors

Include 4x4 if fault coverage improvesInclude 4x4 if fault coverage improves768 vectors768 vectors

Additional algorithms only require multiplexers to Additional algorithms only require multiplexers to change inputschange inputs

Use same 8-bit counterUse same 8-bit counter

×

nn

2n

Booth encoding

n×n multiplier


5 3

5×3 algorithm

3 5

3×5 algorithm


Methodology for AnalysisMethodology for AnalysisMultipliers evaluatedMultipliers evaluated

Unsigned array Unsigned array Signed array – Baugh WooleySigned array – Baugh WooleyModified Booth Modified Booth

Carry look-ahead adders sum partial products in every stageCarry look-ahead adders sum partial products in every stage

Modified Booth Wallace Tree Modified Booth Wallace Tree Carry look-ahead adder sums final stage partial products Carry look-ahead adder sums final stage partial products Carry select adder sums final stage partial products Carry select adder sums final stage partial products Ripple carry adder sums final stage partial products Ripple carry adder sums final stage partial products


Methodology for AnalysisMethodology for AnalysisDesigned 8-bit models of the multipliersDesigned 8-bit models of the multipliersFault model: Collapsed single stuck-at Fault model: Collapsed single stuck-at

gate level faultsgate level faultsExhaustive testingExhaustive testing

To determine undetectable faultsTo determine undetectable faults

Test algorithms evaluatedTest algorithms evaluated44×4 ×4 5×35×33×53×55×3 & 3×55×3 & 3×544×4, 5×3 & 3×5×4, 5×3 & 3×5


MultiplierTotal Faults

Test Algorithm# faults detected (effective fault coverage)

Exhaust 4×4 5×3 3×55×3 &

3×55×3, 3×5 & 4×4

Unsigned array 16481644 (100)

1644 (100)

1644 (100)

1621 (98.60)

1644 (100)

1644 (100)

Signed array 16481644 (100)

1644 (100)

1644 (100)

1644 (100)

1644 (100)

1644 (100)

Mod-Booth 24992196 (100)

2180 (99.27)

2168 (98.72)

2179 (99.23)

2182 (99.36)

2193 (99.86)

Mod-Booth Wall-Tree CLA

21842090 (100)

2061 (98.61)

2068 (98.95)

2070 (99.04)

2071 (99.09)

2074 (99.23)

Mod-Booth Wall-Tree CSA

24222243 (100)

2215 (98.75)

2217 (98.84)

2218 (98.89)

2222 (99.06)

2228 (99.33)

Mod-Booth Wall-Tree RCA

20211962 (100)

1937 (98.73)

1944 (99.08)

1944 (99.08)

1944 (99.08)

1947 (99.24)


Application to Virtex-4 & 5 DSPsApplication to Virtex-4 & 5 DSPs In Virtex-4 & 5 DSPsIn Virtex-4 & 5 DSPs

Final stage carry look-ahead adder (CLA) separated from Final stage carry look-ahead adder (CLA) separated from the multiplierthe multiplier

5×3 & 3×5 give the same fault coverage for the multiplier 5×3 & 3×5 give the same fault coverage for the multiplier alonealone

Separate test algorithm for the CLASeparate test algorithm for the CLARun both 5×3 and 3×5 to test for bridging faults on the cascade Run both 5×3 and 3×5 to test for bridging faults on the cascade

routing between adjacent slicesrouting between adjacent slices

Mode (Test) First 256 ccs Second 256 ccs Third 256 ccs Fourth 256 ccs

00 (multiply) P = A×B P = A×B P = A×B+C P = A:B+C


Summary and ConclusionSummary and Conclusion If the architecture of the multiplier is not known:If the architecture of the multiplier is not known:

33×5 algorithm gives best overall fault coverage for most ×5 algorithm gives best overall fault coverage for most multipliersmultipliersContradicting the claim of the authors who proposed 5×3Contradicting the claim of the authors who proposed 5×3

Running 3×5 & 5×3 gives better fault coverage for all Running 3×5 & 5×3 gives better fault coverage for all multipliersmultipliers

Running all three algorithms: 3×5, 5×3 and 4×4 test Running all three algorithms: 3×5, 5×3 and 4×4 test algorithms provides the best fault coverage for all algorithms provides the best fault coverage for all multipliersmultipliersArchitecture independent testingArchitecture independent testing

Virtex-4 & Vritex-5 multipliersVirtex-4 & Vritex-5 multipliersOriginal approach was 3×5 and 5×3Original approach was 3×5 and 5×3Better approach would be 3×5 and 4×4Better approach would be 3×5 and 4×4


Summary and ConclusionSummary and ConclusionFor multipliers in Virtex-2 FPGAsFor multipliers in Virtex-2 FPGAs

Adder not separated from the multiplierAdder not separated from the multiplierRun both 3Run both 3×5 and 5×3 algorithms×5 and 5×3 algorithms

These give highest fault coverage for multiplier & CLAThese give highest fault coverage for multiplier & CLA

The 3×5 and 4×4 BIST algorithm should be The 3×5 and 4×4 BIST algorithm should be applied to multipliers inapplied to multipliers inSpartan-3ASpartan-3A

Similar to multipliers in Virtex-4Similar to multipliers in Virtex-4Spartan-6Spartan-6

Similar to multipliers in Virtex-4Similar to multipliers in Virtex-4Virtex-6Virtex-6

Similar to multipliers in Virtex-5Similar to multipliers in Virtex-5If only 2 algorithms can be appliedIf only 2 algorithms can be applied

Best results if all 3 can be appliedBest results if all 3 can be applied


Summary and ConclusionSummary and ConclusionArea overhead for different approachesArea overhead for different approaches

In addition to 8-bit counterIn addition to 8-bit counterMaximum area overhead for Maximum area overhead for NN-bit multiplier:-bit multiplier:

One test algorithm: 2One test algorithm: 2NN 2:1 multiplexers 2:1 multiplexersTwo test algorithms: 2Two test algorithms: 2NN 3:1 multiplexers 3:1 multiplexers

1 additional counter bit for control1 additional counter bit for controlAll three test algorithms: 2All three test algorithms: 2NN 4:1 multiplexers 4:1 multiplexers

2 additional counter bits for control2 additional counter bits for controlThis is worst case since synthesis tools may reduce This is worst case since synthesis tools may reduce

multiplexersmultiplexersParticularly in case of two and three test algorithmsParticularly in case of two and three test algorithms

Due to counter duplicate bits to same multiplexersDue to counter duplicate bits to same multiplexersRegardless, this is an area efficient BIST approachRegardless, this is an area efficient BIST approach

Paper almost finished for JETTA Letter or Trans. IE Corr.Paper almost finished for JETTA Letter or Trans. IE Corr. Brad is using 3×5, 5×3 & 4×4 algorithms in test bench Brad is using 3×5, 5×3 & 4×4 algorithms in test bench

for multipliers in Output Response Analyzer (ORA) for for multipliers in Output Response Analyzer (ORA) for mixed signal BISTmixed signal BIST

built-in self-test for multipliers

Documents

booth multiplier ieee

booth multipliersgizopoulos

fpgasoverview of virtex

multiplier inputsorder

dsp architecture2 dsp

columnseach dsp

array multipliers4k

parallel architecture