library characterization of arithmetic circuits for

https://doi.org/10.1007/s10836-020-05913-1

Library Characterization of Arithmetic Circuits for Reliability-AwareDesigns in SRAM-Based FPGAs

Akin Gokalan1 · Suleyman Tosun1 ·Deniz Dal2

Received: 17 January 2020 / Accepted: 16 November 2020© Springer Science+Business Media, LLC, part of Springer Nature 2020

AbstractDesigning an application in hardware under inversely competing constraints such as area and performance with differentobjective functions such as power consumption and reliability of the circuits is a cumbersome task. Having different versionsof the same resource type during the design process may ease this burden since there can be several alternative resources tomeet the given constraints. In this paper, we characterize a library of some commonly used arithmetic circuits in FPGAs interms of the speed, area, power consumption, and vulnerability to error propagation as the reliability parameter. Specifically,we implemented four well-known adders and two multipliers in an SRAM-based FPGA that is a part of Xilinx’s Zynq-7000SoC platform. We then injected errors to the configuration bits of the circuits to evaluate the error propagation. The resultsshow that different versions of the same resources can have different reliability values in addition to the area, latency, andpower values.

Keywords Soft error · FPGA · Reliability · Arithmetic circuits

1 Introduction

1.1 Motivation

A Field Programmable Gate Array (FPGA) is an electronicdevice that consists of a large number of configurablelogic blocks (CLBs), programmable routing switches thatconnect the CLBs, and input-output (IO) pins [26]. InFPGAs, logic functions are realized utilizing CLBs, whichin turn are composed of look-up tables (LUTs) thatstore the truth tables of the functions, multiplexers, and

Responsible Editor: A. Orailoglu

� Suleyman [email protected]

Akin [email protected]

Deniz [email protected]

1 Department of Computer Engineering, Hacettepe University,Ankara, Turkey

2 Department of Computer Engineering, Ataturk University,Erzurum, Turkey

flip-flops. FPGAs’ prevalence is growing in both industryand academia due to their advantages, such as reducedtime-to-market, reconfigurability, and parallel processingability. They are also preferred as the choice of the designplatform in aerospace industries [20]. However, underionizing radiation, the configuration bits of the FPGAs tendto flip, which is called SEU (single event upset) [1]. An SEUmay propagate through the circuit and cause unexpectedbehaviour. On the other hand, it may not change the outputof the design due to the error-masking capabilities of thecombinational circuits. In other words, the internal structureof the circuit and the input pattern applied determinewhether or not the behaviour changes in the presence of anSEU.

SEU, by definition, is a change of state in a semiconduc-tor that does not permanently change the circuit’s behavior.If such a non-persistent error causes the data stored in mem-ory to be erroneous even for a short period, all operationsusing that data will yield erroneous results until the datais updated. These types of errors in digital systems arecalled transient errors or soft errors (SEs). Since the behav-ior is determined by the configuration bits in SRAM-basedFPGAs, an SEU causes a permanent change of behavioruntil restart.

A typical FPGA design may consist of several arithmeticcircuits, such as adders and multipliers. Additionally,

/ Published online: 2 December 2020

Journal of Electronic Testing (2020) 36:743–756

http://crossmark.crossref.org/dialog/?doi=10.1007/s10836-020-05913-1&domain=pdf

http://orcid.org/0000-0002-3708-2009

https://orcid.org/0000-0003-0120-4315

mailto: [email protected]



there are several ways to implement an arithmetic circuit,each of which may have different area, latency, andpower consumption parameters. This difference comesfrom the fact that each design uses varying logic in itsimplementation. Different logic functions may also havea different response to SEs due to their propagationbehaviors. Therefore, a design may favor a specificarithmetic circuit under given constraints and an objectivefunction. On the other hand, having a library of arithmeticcircuits with different parameters offers better designchoices to the designer. Thus, a library characterization ofseveral commonly used arithmetic circuits helps the FPGAdesigners to meet their design objectives.

1.2 Contributions

In this paper, we present a new method for testing theSE vulnerability of the arithmetic circuits implemented inSRAM-based FPGAs. In this respect, we realized four well-known adders and two multipliers by using Xilinx Zynq-7000 SoC as our FPGA platform. In the design flow, wefirst characterize the adders and the multipliers in terms ofthe occupied area in the FPGA, speed (i.e., latency), powerconsumption, and the vulnerability to SEs as a reliabilitymetric. We then employ different adders and multipliers toimplement the same custom generated functions. We usethese functions as a case study to illustrate how the differentversions of the same resource affect the overall reliabilityand the other metrics (i.e., area, latency, and power) onthe final design. In the last step, we inject errors intothe FPGA configuration bits to observe if they propagatethrough the data-path of the arithmetic circuits and affect theresults.

We can summarize the contribution of our work asfollows:

– We propose a new methodology that can test thevulnerability of arithmetic circuits against SEs onFPGA-based implementations.

– We characterize a library that consists of four addersand two multipliers. The resources in this library arecharacterized in terms of the area, latency, powerconsumption, and the SE vulnerability metric thatrepresents the reliability. We strongly believe that sucha library will be exceedingly useful for further researchon high-level synthesis (HLS) of integrated circuitson FPGAs and ASICs (Application-Specific IntegratesCircuits).

– We present a case study that shows utilizing differentversions of the same resource in a design helps meetingthe constraints and yielding better-optimized circuits.We used a custom generated function that uses additionoperations for our case study and implemented it with

various adder types to show how different designparameters are affected.

1.3 Organization

The rest of the paper is organized as follows. In the nextsection, we overview the related work about SE mitigationand its detection methods in FPGAs. In Section 3, weexplain the system architecture and the technologicalcomponents used in our evaluations. We present the detailsof our test methodology in Section 4. In Section 5,we discuss the test results for the selected adders andthe multipliers, along with a case study that uses thecharacterized library. Finally, we conclude this paper inSection 6.

2 RelatedWork

There are some methods to mitigate or detect soft errors(SEs) in FPGAs. One of them is known as data scrubbing(DS) [23]. DS is the process of scanning all device memoryin some time intervals to correct the detected errors.It requires storing the original configuration or part ofthe original configuration data to replace the corruptedframes. Even though every frame of the FPGA bitstreamhas error correction code (ECC) bits, it may not beenough since ECC codes are only capable of correctinga single bit or two adjacent bits in a frame. Apart fromthe ECC codes, the bitstream is also protected by cyclicredundancy check (CRC) codes; however, CRC codes areonly beneficial for integrity checks and are not useful for anyerror correction operation [28]. Triple module redundancy(TMR), combined with a voter mechanism, is another well-known method to increase the reliability and detect SEs [6].Although this method drastically improves the mean timeto failure (MTTF), the area and the power consumptionis increased threefold compared to the original design.Therefore, some prior research endeavors to optimize theTMR solution. Pratt et al. replace only highly vulnerableparts with TMR while leaving other logic parts as theyare [17]. They compare reliability results of non-mitigateddesign against partial TMR and fully TMR design. Theresults show that the partial TMR can be a design choice toincrease reliability while sacrificing little on resources.

Some previous research has aimed to increase thereliability by modifying the circuit itself. In [9], theauthors improve the overall reliability by adding minimalextra hardware. They investigate the full adder circuittargeting application-specific integrated circuits (ASICs).By calculating the results with the original inputs and theircomplements in the full adder, they manage to use differentpaths on the full adder circuit resulting in the same output.

744 J Electron Test (2020) 36:743–756

They finally check the results and determine if a transienterror emerged during one of the calculations. With theaddition of the negligible extra hardware, they manage toget reliability results as if there is dual modular redundancy.However, they require two clock cycles to produce twodistinct results since two different paths have some commonelements. Thus, to prevent the time redundancy, theypropagate the result of the regular full adder path as soonas it is available not to make the remaining circuit wait.Afterward, they calculate the result of the alternative path.When the results are compared, they inform the remainingof the circuit if an error occurred. Therefore, the circuitdoes not suffer from the time redundancy in normal workingconditions, but only when an error emerged. In [11], theauthors propose a self-repairing full adder design, which iscapable of repairing itself using an extra logic to validate theresult. The extra logic is capable of managing only a singlefault at a time. They significantly improved the full adder’sreliability while sacrificing very little on the area whencompared to TMR. In [5], the authors present a method thatcan make the defected hardware operate correctly. To showthe use case of their method, they propose a redesignedversion of the Kogge-Stone adder, which can run accuratelyeven a fabrication defect exists in the circuit. Using themutually exclusive nature of even and odd bits of the Kogge-Stone adder, they reconfigure the hardware to make it onlyuse the healthy bit set of the adder if a manufacturing faultis detected in even or odd bits. This approach does notdecrease the throughput since it can run at the same or evenhigher clock rates; however, it requires two clock cycles toproduce a result if the circuit is reconfigured to use onlyeven or odd bit sets.

Some prior studies test the effects of SEUs in SRAM-based FPGAs. For example, Ostler et al. models thereliability of the FPGAs under harsh orbital environments[16]. They adopt a statistical method to estimate thereliability of the FPGAs at different orbit levels. Althoughthe reliability results of the tested FPGAs are verypromising, the authors conclude that the FPGAs are notsuitable devices to be used in every environment withoutadditional precautions. In another work [19], the researchersused reliability estimator tools to infer the reliability ofthe hardware components in FPGAs. This work showshow each component contributes to the overall systemfailure rate. It also states that the redundancy in a hardwarecomponent that significantly contributes to the systemfailure rate increases the system’s overall reliability whilesacrificing little on the resources. Another study aims to finda model to measure the reliability of designs in hardwaredescription language (HDL) level [13]. Since the existingreliability models cover only the standard programminglanguages, they do not adapt well to the concurrent nature

of the HDLs. The method presented in [13] improvesthe reliability model of a standard programming languageto handle the error propagation cases that stem from theparallel nature of the HDLs. We also see several researchthat exist in the literature when it comes to investigatingthe performance of the arithmetic circuits. Daphni et al.compared commonly used parallel prefix adders (PPA),including Kogge-Stone adder and Brent-Kung adder onFPGAs, in terms of power, speed, and area. In this respect,they presented their results in 16 and 64-bit variationsof the circuits. Vitoroulis et al. [24] expanded the studyintroduced in [4]. They presented the area and the speedresults of commonly used PPA adders. They implementedadders with input sizes of 16, 32, 64, 128, and 256bits. They also produced two results: the first one is theresults of the synthesis with the area optimization setting,while the latter is the synthesis results with the speedoptimization setting. Jayanthi et al. [7] and SaiKumar et al.[21] did not restrict their research on PPA adders and furtheranalyzed high speed VLSI adders. Mohanapriya et al. [15]investigated multiplexer-based adders using Cadence for180nm technology. They reported the performance of thecircuits in terms of speed, area, and power dissipation. In[22], the authors implemented various types of adders at90nm, 130nm, and 180 nm technology. They extracted allthe nets in the netlist and injected transient error to all ofthem. They checked if the injected error causes an erroron the final result. By doing so, they tried to find thearchitectural vulnerabilities of the commonly used addertopologies targeted for ASIC designs. However, to the bestof our knowledge, there is no previous work in the literaturethat compares the different implementations of the samearithmetic circuits using SRAM-based FPGAs as far as thereliability is concerned.

There are also commercially available antifuse-basedFPGAs that are generally preferred in high-radiationenvironments, such as space applications, due to theirendurance against SEs [18, 25]. However, they can beprogrammed once, and they are more expensive thanSRAM-based FPGAs. These disadvantages make themethods proposed in this paper more valuable since SRAM-based FPGAs are abundant and reprogrammable.

3 System Architecture

In this section, we present the necessary hardware andsoftware components of the experimental setup for theproposed method. We specifically explain the architectureof the FPGA board, FPGA configuration bitstream,essential bits for testing, and intellectual property (IP) thatis used for the SE propagation tests.

745J Electron Test (2020) 36:743–756

3.1 Zynq-7000 SoC

We selected Xilinx’s Zynq-7000 SoC (System-on-Chip)board as our implementation platform [3]. The reasonfor this selection is that Zynq-7000 SoC has dual ArmCortex A9 cores and 28nm Artix-7 based SRAM-basedprogrammable logic (FPGA) allowing us to easily programand control the behavior of the reconfigurable logic viathe Arm cores. The test method proposed in this paper isnot restricted to the selected FPGA platform. It is rathera general method that can be employed for all types ofSRAM-based FPGAs.

The architectural overview of this board is given inFig. 1. As can be seen in this figure, the board has twomain components connected via AXI buses: the processingsystem (PS) and the programmable logic (PL). Zynq-7000also has two ports that allow us to access and changethe FPGA configuration bits: the processor configurationaccess port (PCAP) and the internal configuration access

port (ICAP). While PCAP is used to access the FPGAconfiguration bits from the PS, ICAP is utilized to accessthe FPGA configuration bits from the PL part.

3.2 Bitstream Structure

A bitstream is the collection of the configuration bitsloaded into the FPGA to realize the desired logic. As aresult of its technology protection policy, Xilinx does notreveal the relationship between the logical placement andthe bitstream. Instead, it provides a linear address scheme.A linear address does not represent a physical address;however, it provides the flexibility for a user to pinpoint aspecific logic in the design. The linear addressing schemeuses frames and words. The whole bitstream is firstlydivided into the frames. Frames are then divided into thewords, where each word consists of a specific number ofbits. For example, the bitstream in Zynq-7000 consists of7948 frames. Every frame consists of 101 words, and every

Fig. 1 Architectural overview ofthe Zynq-7000 SoC

746 J Electron Test (2020) 36:743–756

word has 32 bits. Therefore, 25,687,936 bits in total exist inthe bitstream destined for Zynq-7000. At this point, it mustbe noted that the frame numbers and internal structure of theframes change among different FPGA models.

3.3 Xilinx’s Essential Bits Technology

The essential bits of a design are defined as the bits thatcontain important information about the design. Xilinx usesa special algorithm to identify the essential bits among theentire configuration bits. If an SEU strikes an essential bit,the configuration of the circuit changes as a result of theupset. However, this erroneous configuration bit might notchange the functionality of the design. Xilinx’s essential bitstechnology aims to provide the users with the flexibilityto mark the valuable bits for the design. The prioritizedessential bit technology takes it one step further. It allowsthe users to mark the essential bits only for the selectedparts of the design. Xilinx’s synthesis tools create threeclasses of data files for the essential bits with the extensionsof .ebc, .ebd, and an additional bit file after a successfulimplementation. The bit file is the configuration file loadedinto the FPGA. ebd file stores the configuration bits in theASCII format. ebc file is a marking file of the essential bitsin the ASCII format. ebc and ebd files have the same size. Ifa bit is 1 in the ebc file, then the corresponding configurationbit in the ebd file is marked as essential. The essential bitsprovide a favorable opportunity for designers to reduce thefailure in time (FIT) by extra precautions. Thus, they can beemployed for analyzing the SE susceptibility of a circuit.

3.4 Soft Error Mitigation IP

Xilinx provides a soft error mitigation intellectual property(SEM IP). It detects and corrects SEs occurring in theconfiguration bits of the FPGAs using error correction codes(ECCs). The ECCs carry enough information to correct one-bit flip, or two adjacent bit flips in a frame. If more thanone non-adjacent bits flip, the ECC codes cannot correct theerrors. In such a case, the frame must be reloaded utilizingthe ICAP or PCAP. The SEM IP is capable of reloadinga frame partially, as shown in [8]. The SEM IP can alsoclassify errors by comparing the locations of the errors usingthe essential bits map.

Another essential feature of the SEM IP is that it allowsthe designer to inject errors into the configuration memoryand emulate them as SEs. In this research, we only usedthe error injection capabilities of the SEM IP. We basicallyconfigured the SEM IP to accept commands through theUART interface for the error injection and tested the errormitigation capabilities of different implementations.

3.5 Pblocks

Pblocks are the user-drawn areas in the FPGA logic andused to supply specific place and route constraints toa specific module in the design. We used Pblocks toimplement different circuits in a specified FPGA location.

4 Library Characterization

In this study, we only focused on adders and multipliersfor our library characterization since they are the mostcommonly used resources. Additionally, several otherarithmetic operations can be performed on them after smallmodifications. For instance, the subtraction and comparisoncircuits can be implemented using adders. Other arithmeticcircuits and the different versions of adders and multiplierscan also be added to the characterized library by followingthe same methodology described below.

We included four most frequently used adders from theliterature. They are listed from the slowest to the fastest asfollows:

– Ripple-carry adder [14]– Carry-lookahead adder [14]– Brent-Kung adder [2].– Kogge-Stone adder [10]

We implemented the adders in the PL part using theCLBs of the Zynq-7000 board.

We also implemented two different multipliers: carry-lookahead multiplier (CLM) and DSP-based multiplier. Weimplemented the CLM in FPGA, whereas we utilized thepre-existing DSP fabric for the DSP unit-based multiplier.One may argue that it is unfair to compare these twomultiplier implementations since one is realized in hardwareand the other in software on existing DSP. However, ourmotivation in this selection is to demonstrate how softwareand hardware implementations mitigate SEs.

4.1 Overview of the Test Setup

The block diagram of our test setup is given in Fig. 2. In theFPGA design, we placed five exact copies of each circuit fortesting since each implementation may behave differentlyon the FPGA because of the non-deterministic behaviorof the place-and-route algorithms. We implemented allinputs and outputs as 32-bit wires, applied the same inputsto all five circuits and obtained their outputs. While theinput and the output sizes of the adders are 32 bits, theinput and outputs of the multipliers are 16 and 32 bits,respectively. Interconnection wires are driven by another

747J Electron Test (2020) 36:743–756

Fig. 2 Block diagram of the testsetup. UUT stands for unit undertest

module, which is called the intercommunication module.The intercommunication module is a clock-synchronousAXI slave. The task of the intercommunication module is toobtain the test data inputs from the PS via the AXI interfaceand drive the wires that are connected to the unit under test(UUT). The outputs of the five circuits are sent back to theintercommunication module. The PS also reads all outputsfrom the intercommunication module via the AXI interface.

The PS is connected to the SEM IP via the UART.The instruction parsing interface of the SEM IP is calledthe monitoring interface. The PS generates the instructionsand sends them to the SEM IP via the UART pins. Adedicated PS-UART0 is used for communication. Thereare also two addressing schemes: linear addressing andphysical addressing. Linear addressing scheme does notgive any information about a physical element or itslocation; however, the ebc file is generated according tothe linear addressing scheme. Therefore, we employ thelinear addressing scheme for the error injection. A sampleerror injection instruction is given by Fig. 3. It consists of40 bits. It is designed to fully depict a specific bit in thewhole bitstream. In the instruction bits, L, W, and B bitsrepresent the frame number, word number, and bit number,respectively. S bits are used to indicate whether the deviceis a stacked silicon interconnect (SSI) device. A zero valueindicates that the device is a non-SSI device, while a valueof one indicates that it is an SSI device. Since the hardwareused in this research is a non-SSI device, we set the S bitsto zero in this study.

4.2 Error Injection and TestingMethod

The PS flowchart of the error injection and the criticalbit identification is given in Fig. 4. PS first injects theerrors to the configuration bits and feeds the test data. Itthen compares the results generated by the FPGA logicwith the correct results. To perform these steps, it firstsearches the pre-loaded ebc file for determining the essentialbit locations. Whenever an essential bit is found, it infersthe bit’s linear address and injects an error into thatlocation. Error injection is simply flipping the value of theconfiguration bit. After flipping, PS feeds the test data tothe intercommunication module and collects the results. Ifany of the results are erroneous, PS assumes the bit ascritical and collects all the statistics from the test. Afterward,PS corrects the configuration bit. To ensure whether theinjected error causes the miscalculations, PS feeds the sametest data to the intercommunication module again. If all thereturned results are determined as valid, then PS acceptsthe bit as the critical bit. If any erroneous result is detectedfrom any of the circuits, PS marks that configuration bitas fake critical bit and increments the unexpected situationcounter. Observing a fake error means that there is anunexpected error in FPGA, and the bitstream should bereloaded to continue on the tests. Since we did not encounterany fake errors in our tests, we did not change the program’sbehavior.

We used Xilinx’s prioritized essential bits technology tomark the essential bits of the five circuits that are under

Fig. 3 Structure of linear frame instruction bits. The numbers on the top, from 0 to 39, represent the bit index numbers

748 J Electron Test (2020) 36:743–756

Fig. 4 PS flowchart

749J Electron Test (2020) 36:743–756

Fig. 5 Hierarchy of the configuration bits

test. Maximum clock frequency of the SEM IP is 100MHz. At maximum clock frequency, the latency of the errordetection is determined as 8.0 ms [28]. Since we used 50MHz as the clock frequency in the design to satisfy thetiming requirements on all over the design, the latency is

Fig. 6 An example of designplacement in FPGA

doubled to 16 ms for the error detection. The error detectionmechanism scans all configuration bits and verifies the ECCcodes of the frames. Therefore, PS program must wait 16ms for the injection to propagate through the circuit afterevery error injection. We increased the waiting time to 40ms that is our safe latency duration. Feeding the test dataand collecting the results also take another 40 ms. Aftercorrecting the error injected into the configuration bit, PSwaits 40 ms again for the correction to propagate throughthe circuit. Therefore, testing a single bit takes 120 ms ifthe bit is not critical. If the bit is critical, then it takes 160ms since the test data are fed twice. If every configurationbit is tested, testing of the whole configuration bits wouldtake around 40 days considering the fact that there are25,687,936 bits in the device. For this reason, Xilinx’sessential bits technology plays a vital role in terms of afeasible testing.

As stated before, essential bits are the bits that areimportant to the design. Prioritized essential bits are the bitsthat are essential in the marked area, and critical bits arethe bits that actually change the behavior of the circuit. Inthe light of these definitions, the configuration bits can beclassified as shown in Fig. 5, which is also reported in [12].

We select the ISE Design Suite for our design andsynthesis platform since it supports the prioritized essentialbit flow. For the placement of the circuits in the FPGA,we use Pblocks to restrain the tested circuits in a specific

750 J Electron Test (2020) 36:743–756

Fig. 7 Classification of thecircuits by the number ofeffected implementations by abit flip

location. We employ the constraint placement methodologysince it guarantees that only the specified circuit can beplaced or routed in the Pblock area [27]. Figure 6 shows howfive implementations of the same function are constrainedto the specified locations of the FPGA in the place-and-route phase. Since place-and-route algorithms are notdeterministic, each implementation of the same functionmay result in a different configuration. In other words,each implementation may give us a different set of errorpropagation results. Therefore, we implement five exactcopies of the circuits to be tested and take averages of thedetected errors and essential bits in our evaluations.

We store the statistics of the tests in a log file. Thesestatistics include the number of miscalculations for eachcircuit for every essential bit. We then use this log file forthe classification and statistical analysis of each arithmeticcircuit. We evaluate the log data utilizing the algorithmwhose flowchart is given in Fig. 7. If a bit creates an errorfor only one circuit, it is classified as an essential bit forthe related circuit. If a bit creates errors for more than onecircuit but not to all circuits, that particular bit is considereda strangely affecting error. Strangely affecting bits mightexist in a circuit since the input nets of all five circuitsare common. In other words, the input nets arrive at theregion of tested circuits as one net and start leaving themain net as they reach to the related circuit’s location. Thenet distribution continues until the inputs arrive at the lastcircuit. As a result, some bits do affect all circuits. In thisstudy, we assume that the errors affecting only one circuitcontain the value for the result. Therefore, we ignore theeffects of the wires in calculating the SE vulnerability. If abit affects all five circuits, it means that this particular bitaffects the net tree before any of the nets are left the net treeto reach their target circuit. In the same way, if a bit affectsmore than one circuit, that bit is also considered relatedto the net tree, which does not carry valuable informationabout the circuit itself.

To obtain the minimum number of test data required interms of the accuracy and the test time, we fed 1000, 5000,and 10000 inputs to the ripple-carry adder implementation.We then stored the number of critical bits and the number of

erroneous results in a log file. Later, we counted the numberof erroneous results for each test. If this number was lessthan five, we incremented our counter parameter, which wenamed critically low error count. We expect this value asclose to zero as possible for having accurate test results. Thetest results for the ripple carry adder when the size of thetest data are 1000, 5000, and 10000 are given in Table 1. Inour tests, 382 bits appear to create less than five erroneousresults out of 1000 test inputs. This number decreases totwo when the test data size is increased to 5000 while itbecomes zero when the test data size is 10000. As a result,we selected 10000 as the size of the test input data since itdecreases the critically low error count to zero. Furthermore,increasing the number of test data from 5000 to 10000 doesnot change the detected number of critical bits.

5 Results and Case Study

5.1 Results of Library Characterization

As mentioned above, we implemented four adders and twomultipliers in our resource library. We characterized them interms of area, latency, power consumption, and reliability.In a separate project, we realized only one instance of eacharithmetic circuit on the same FPGA platform to obtain thearea, latency, and power consumption values. Since we didnot apply pipelining to the circuits, we measured the delayof the critical path as the value of the latency. As our areaparameter, we used the total number of slice LUTs utilizedby the circuits. Since DSP is a special resource in PL thatdoes not require any additional logic, we considered its areaas zero. We took only the dynamic power consumption ofthe logic circuits. As we did not give a clock restriction to

Table 1 Error counts of the ripple carry adder

Number of Test Data 1000 5000 10000

Number of Critical Bits 1815 1836 1836

Critically Low Error Count 382 2 0

751J Electron Test (2020) 36:743–756

Table 2 Comparison of circuits in terms of power, speed, and area

Area* Latency(ns) Power(W)

Ripple-Carry Adder 31 20.913 0.253

Carry-Lookahead Adder 47 15.476 0.236

Brent-Kung Adder 117 9.213 0.359

Kogge-Stone Adder 185 8.982 0.737

Carry-Lookahead Multiplier 383 36.425 7.445

DSP-Based Multiplier 0 3.884 0.961

Unit of area is the total number of slice LUTs

the synthesis tool, dynamic power values were calculatedunder maximum switching frequency that the tool canhandle. Therefore, they only represent the relative powerconsumption values in each circuit, not the absolute powerconsumption values under a specific clock frequency. Wedepict the area, latency, and power values of each circuitin Table 2. This table indicates the correlation between theadder circuits’ speed and their area as expected. Since DSP-based multiplier is not an FPGA-based implementation, itsparameters do not fit other circuits’ behavior.

As the reliability metric, we count the number of theerroneous results for every essential bit, as explained in theprevious section. We test the five identical copies of thecircuits together to determine the vulnerability. We give thenumber of total essential bits and critical bits (i.e., the bitsthat results in an error at the output when flipped) in thesecond and third columns of Table 3 for each adder andmultiplier instances. In the last three columns, we list thecategorized bits for each circuit. The bits affecting onlyone circuit are the bits causing an error at the output inone of the five identical copies. Strangely affecting bitsare the bits that affect more than one circuit but not allcircuits. As expected, the number of essential bits is directlyproportional to the area of the circuits since bigger circuitswith a large number of logic gates occupy more LUTs inthe FPGA. However, critical bits vary for each arithmeticcircuit since each circuit may have different error masking

Table 3 Comparison of the circuits in terms of vulnerability (reliability)

Number of Bits

Circuit Name Essential Critical Strangely Affecting Affecting All Circuits Affecting One Circuit

Ripple-Carry Adder 13040 6796 33 4 6759

Carry-Lookahead Adder 12200 6124 23 0 6101

Brent-Kung Adder Adder 20427 8282 49 0 8233

Kogge-Stone Adder Adder 96443 34338 317 10 34011

Carry-Lookahead Multiplier 439202 169848 95 1 169752

DSP-Based Multiplier 735 55 0 0 55

Table 4 Error rates of the circuits on average

Essential Bits Critical Bits Ratio

Ripple-Carry Adder 2608 1359 0.521

Carry-Lookahead Adder 2440 1224 0.501

Brent-Kung Adder 4085 1656 0.405

Kogge-Stone Adder 19228 6867 0.357

Carry-Lookahead Multiplier 87840 33969 0.386

DSP-Based Multiplier 147 11 0.074

capabilities due to their differences in their logic functions.As can be seen from Table 4, the proportion of the criticalbits over the essential bits tends to decrease while the totalessential bits increase. The average essential and critical bitsfor each implementation are listed in the second and thirdcolumn of this table, respectively. The ratio, listed in thefourth column, indicates how vulnerable the circuit to SEs.In other words, the bigger the ratio, the lower the reliabilityof the circuit. One can use this ratio as the vulnerability or1-ratio as the reliability metric.

Our results demonstrate that the ripple-carry adder hasthe smallest area; however, it is the least reliable and slowestresource among its counterparts. On the other hand, Kogge-Stone adder is the most reliable and fastest adder whileit occupies the biggest area compared to the other threeadders. The other two adders, namely carry-lookahead andBrent-Kung, lie between these two adders in terms of area,latency, and reliability. These variations in four addersallow the designer to meet the given design constraintsand improve the objective function. The last two rowsof Tables 3 and 4 give the reliability of the multiplierimplementations. Although the DSP-based multiplier hasmuch more error resilience than CLM, it is not suitable forthe designs when the area is concerned. If the design needsseveral multipliers, the DSP-based multiplier can be a goodchoice.

It is reported in [29] that the FIT (Failure in Time) ratefor Zynq-7000 families is 76 FIT/Mb. The test results areobtained in 109 device operation hours. Therefore, the FIT

752 J Electron Test (2020) 36:743–756

Table 5 FIT rates

FIT Rate

Ripple-Carry Adder 1.03 · 10−1

Carry-Lookahead Adder 9.3 · 10−2

Brent-Kung Adder 1.25 · 10−1

Kogge-Stone Adder 5.2 · 10−1

Carry-Lookahead Multiplier 2.58

DSP-Based Multiplier 8.36 · 10−4

of one nbit is calculated as 76 · 10−6. If one of the criticalbits fails, the circuit produces incorrect results. Therefore,we can calculate the FIT of a circuit using the followingsimple equation:

FIT = (F IT RateP erBit) × (Numberof CriticalBits)

(1)

The FIT rates of our arithmetic circuits are listed inTable 5. The FIT rates can also be used in designing circuitseither as a constraint or as an objective function.

5.2 Case Study

In this subsection, we demonstrate how using differentversions of the same resource in an application can resultin different reliability and power consumption values underthe given area and latency constraints. For our case study,we generated a custom data-flow graph (DFG), whichis depicted in Fig. 8 and consists of six additions. Weimplemented three variations of the topology: using onlya carry-lookahead adder (CLA), using only a Brent-Kungadder (BKA), and using a mixed circuit containing both

Fig. 8 Customly designed data flow graph for the case study

CLA and BKA. For the mixed circuit, we used BKA forthe nodes one, two, three, and four, while we utilized CLAfor nodes five and six. The schedule of the nodes for thecircuits that only use CLA and BKA is shown in Fig. 9.This schedule takes four clock cycles, and each cycle isdenoted as a step in the figure. However, the clock cyclesfor each circuit’s implementation are determined by takingthe maximum delays of CLA and BKA into account. Wegive the schedule of the mixed circuit in Fig. 10. In thisimplementation, we select the clock rate based on the delayof BKA. Therefore, we pipelined the CLA into two clockcycles to synchronize it with the BKA under four clockcycle delay. Between every step of the schedule in Figs. 9and 10, we place intermediary registers to store the resultof each addition. In the mixed circuit, the critical path usesonly BKA not to increase the latency of the circuit. Since theCLA is used on the non-critical paths, it does not increasethe latency although it takes two clock cycles to finish itsexecution. Note that we do not use resource sharing in ourimplementations since it adds extra steering and controllogic that affect the overall area, power consumption, andthe number of critical bits. Moreover, the simulation resultsfor the implementation become unreliable as it is difficultto calculate the contribution of this extra logic to the overallsimulation results.

We show the performance characteristics (speed, area,and power) of three implementations in Table 6. The thirdand fourth columns in this table depict the total clock cyclesand the latencies of the circuits, respectively. Column fivegives the area of each implementation. Finally, the lastcolumn of the table lists the power consumption values.Since the dynamic power consumption is proportionalto the switching frequency, we collected power resultswithout giving timing constraints to the synthesis tool.Therefore, power consumption values are generated underthe maximum switching activity that can be handled by thesynthesis tool. Thus, the circuits’ power consumption valuesgive only the proportional relation among the circuits, notthe absolute values.

When we analyze the results given in Table 6, circuits C2and C3 are the fastest circuits since they use BKA in thecritical path. On the other hand, C3 consumes less area thanC2 due to replacing BKAwith CLA for nodes five and six inour DFG. This replacement does not affect the total latencysince nodes five and six are not on the circuit’s critical path.The power consumption of three implementations seemsdirectly proportional to their areas as expected.

To show the effectiveness of available resource libraryparameters in the design process, we used our libraryparameters to calculate the reliability values of ourthree implementations. We also used our test method onthese three circuits to obtain the simulation results. Thiscomparison is performed to see if the designs that use our

753J Electron Test (2020) 36:743–756

Fig. 9 Schedule for only CLAand BKA implementations. Notethat clock rates are different as aresult of adder delays

Fig. 10 Schedule for the mixedadder implementation

Table 6 Latency, area, and power consumption of three different implementations

Circuit Adders Clock Cycle(ns) Latency (ns) Area(LUT) Power(W)

C1 CLA 17 67 282 2.952

C2 BKA 11 44 671 6.224

C3 CLA-BKA 11 44 564 5.940

754 J Electron Test (2020) 36:743–756

Table 7 Error propagation values (critical bits in all essential bits) from the calculation (i.e., estimation) using resource library and from thesimulation

Circuit Bits from calculation Ratio Bits from simulation Ratio Diff (%) in

Essential (1) Critical (2) Rc (2/1) Essential ((3) Critical (4) Rs (4/3) Rc and Rs

C1 14640 7344 0.501 13417 6518 0.485 3.19

C2 24510 9936 0.405 23050 8737 0.379 6.40

C3 17930 8208 0.457 18924 7969 0.421 7.87

Last column gives the error of estimation

resource library give similar or close results to the realimplementations. In Table 7, we give the error propagationresults obtained from the calculations (i.e., estimation) byusing our resource library and from our simulations. Thecolumns two and three show the essential bits and criticalbits, respectively, which are determined by adding theessential and critical bits of each adder. The column four ofTable 7 is the ratio of the critical bits to the essential bits,which gives the vulnerability of our estimation. Similarly,the columns five, six, and seven give the essential bits,critical bits, and their ratios obtained by our simulation.The last column shows the difference (i.e., Diff ) inpercentage between the vulnerabilities of the estimation(Rc) and simulation (Rs). We determine this value usingthe percentage reduction formula given in Equation (2).This column indicates that our resource library parametersaccurately estimate the total vulnerability (or reliability)with a minimal deviation from the results obtained bysimulations. Therefore, our resource library is a usefulsource for designing better circuits in FPGAs in terms ofarea, latency, power, and reliability. While some of theseparameters can be used as design constraints, others can bethe objective function parameters.

Diff = Rc − Rs

Rc

× 100 (2)

6 Conclusion

Different arithmetic circuits may have a different area,latency, and power consumption values on FPGAs. Theycan even have a different response to the transient errorson the circuits. Having a characterized resource libraryfor high-level synthesis eases the design process, and ithelps to satisfy the design constraints while optimizing theparameters in question. In this study, we fulfill the needsof such a resource library and characterize it with fourcommonly used adders and two multipliers. We present amethodology for the error propagation simulations to testthe vulnerability and reliability of the circuits. Our methodcan easily be applied to different arithmetic and logiccircuits. We also test our resource library’s effectiveness on

a custom-generated application by comparing the estimationand simulation results. The estimation results are inagreement with the results obtained by the simulations.

Acknowledgments This work was supported by The Scientific andTechnological Research Council of Turkey (TUBITAK) under grantnumber 116E095.

References

1. Alderighi M, Casini F, D’Angelo S, Pastore S, Sechi GR,WeigandR (2007) Evaluation of single event upset mitigation schemesfor sram based fpgas using the flipper fault injection platform.In: Proc. 22nd IEEE international symposium on defect andfault-tolerance in VLSI systems (DFT 2007), pp 105–113

2. Brent RP, Kung HT (1982) A regular layout for parallel adders.IEEE Trans Comput C-31(3):260–264

3. Crockett LH, Elliot RA, Enderwitz MA, Stewart RW (2014) Thezynq book: Embedded processing with the arm cortex-a9 on thexilinx zynq-7000 all programmable soc. Strathclyde AcademicMedia, Glasgow

4. Daphni S, Grace KSV (2017) A review analysis of parallel prefixadders for better performance in vlsi applications. In: Proc. 2017IEEE international conference on circuits and systems (ICCS), pp103–106

5. Ghosh S, Ndai P, Roy K (2008) A novel low overhead faulttolerant kogge-stone adder using adaptive clocking. In: Proc.design, automation and test in Europe conference, pp 366–371

6. Graham PS, Rollins N, Wirthlin MJ, Caffrey MP (2003)Evaluating tmr techniques in the presence of single event upsets.All Faculty Publications 1:1–5

7. Jayanthi AN, Ravichandran CS (2013) Comparison of perfor-mance of high speed vlsi adders. In: Proc. 2013 Internationalconference on current trends in engineering and technology(ICCTET), pp 99–104

8. Keshk ME, Asami K (2018) Fault injection in dynamic partialreconfiguration design based on essential bits. Journal ofAeronautics and Space Technologies 11(2):25–34

9. Khedhiri C, Karmani M, Hamdi B, Man KL (2011) Concurrenterror detection adder based on two paths output computation. In:Proc. 2011 IEEE ninth international symposium on parallel anddistributed processing with applications workshops, pp 27–32

10. Kogge PM, Stone HS (1973) A parallel algorithm for the efficientsolution of a general class of recurrence equations. IEEE TransComput C-22(8):786–793

11. Kumar P, Sharma RK (2017) Double fault tolerant full adderdesign using fault localization. In: Proc. 2017 3rd internationalconference on computational intelligence communication technol-ogy (CICT), pp 1–6

12. Le R (2012) Soft error mitigation using prioritized essential bits

755J Electron Test (2020) 36:743–756

13. Machmur B, Hayek A, Boercsoek J (2013) Practical applicationof the reliability model for hdl in safety related systems. In: Proc.7th WSEAS international conference CSST13, pp 226–232

14. Mano MM, Ciletti MD (2015) Digital design. Pearson EducationInc, New Jersey

15. Mohanapriya D, Saravanakumar DN, BIT E (2016) A comparativeanalysis of different 32-bit adder topologies with multiplexerbased full adder. Int J Eng Sci 1(1):4850–4854

16. Ostler PS, Caffrey MP, Gibelyou DS, Graham PS, Morgan KS,Pratt BH, Quinn HM, Wirthlin MJ (2009) Sram fpga reliabilityanalysis for harsh radiation environments. IEEE Trans Nucl Sci56(6):3519–3526

17. Pratt B, Caffrey M, Graham P, Morgan K, Wirthlin M (2006)Improving fpga design robustness with partial tmr. In: Proc. 2006IEEE international reliability physics symposium proceedings, pp226–232

18. Qin X, Feng C, Zhang D, Miao B, Zhao L, Hao X, Liu S, An Q(2013) Development of a high resolution tdc for implementationin flash-based and anti-fuse fpgas for aerospace application. IEEETransactions on Nuclear science 60(5):3550–3556

19. Radu M (2014) Reliability and fault tolerance analysis of fpgaplatforms. In: Proc. IEEE Long island systems, applications andtechnology (LISAT) conference 2014, pp 1–4

20. Ratter D (2004) Fpgas on mars. Xcell J 50:8–1121. SaiKumar M, Punniakodi S (2013) Design and performance

analysis of various adders using verilog. International Journal ofComputer Science and Mobile Computing 2(9):128–138

22. Salehi M, Azarpeyvand A, Aboutalebi AH (2018) Vulnerabilityanalysis of adder architectures considering design and synthesisconstraints. J Electron Test 34(1):7–14

23. Sari A, Psarakis M (2011) Scrubbing-based seu mitigationapproach for systems-on-programmable-chips. In: Proc. 2011international conference on field-programmable technology, pp 1–8

24. Vitoroulis K, Al-Khalili AJ (2007) Performance of parallel prefixadders implemented with fpga technology. In: Proc. 2007 IEEEnortheast workshop on circuits and systems, pp 498–501

25. Wang J-J, Cronquist B, Sin B, Moriarta J, Katz R (1997) Antifusefpga for space applications. In: Proc. fourth european conference

on radiation and its effects on components and systems, pp 1–626. Xilinx Inc (2012) 7 series fpgas configuration user guide, ug470

(v. 1.3)27. Xilinx Inc. (2013) Constraints guide, ug625 (v. 14.5)28. Xilinx Inc. (2015) Soft error mitigation controller v4.1, logicore

ip product guide29. Xilinx Inc. (2015) Device reliability report

Publisher’s Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations.

Akin Gokalan received his B.Sc. from the Department of Electricaland Electronics Department at Sabanci University, Istanbul, Turkeyin 2015. He is currently pursuing McS degree in the Department ofComputer Engineering at Hacettepe University, Ankara, Turkey. Hisresearch focuses on FPGA design, microprocessors, and program-ming.

Suleyman Tosun received his B.Sc. in Electrical and ElectronicsEngineering from Selcuk University, Turkey, in 1997 and his M.Sc.and Ph.D. degrees in computer engineering from Syracuse University,NY, in 2001 and 2005, respectively. He is currently a Professorwith the Department of Computer Engineering, Hacettepe University,Ankara, Turkey. His current research interests include electronicdesign automation, high-level synthesis of digital circuits, network-on-chips, and computer architecture.

Deniz Dal received his B.Sc. degree in Electrical Engineering fromIstanbul Technical University, Istanbul, Turkey, in 1996 and hisM.Sc. and Ph.D. degrees in Computer Engineering from SyracuseUniversity, Syracuse, NY, in 2001 and 2006, respectively. Since 2007,he has been with the Department of Computer Engineering at AtaturkUniversity, Erzurum, Turkey. His current research interests includehigh-level synthesis of digital circuits, combinatorial optimization,metaheuristics, and high-performance computing.

756 J Electron Test (2020) 36:743–756

library characterization of arithmetic circuits for

Documents