carry select adder

International Journal of Computer Science and Electrical Engineering (IJCSEE) ISSN No. 2315-4209, Vol-1, Iss-2, 2012

99

A NOVEL DESIGNING APPROACH FOR HIGH SPEED CARRY SELECT ADDER

SHALEMRAJU KOLASANI, K.HANUMANTHA RAO & T.MALYADRI

Department Of Electronics and Communication Engineering, Prakasam Engineering College, Kandukuru.

Abstract: In this paper we present a carry select adder (CSL) has been an eminent technique in the space-time tug-of-war of CPA design. It exhibits the advantage of logarithmic gate depth as in any structure of the distant-carry adder family. Conventionally, CSL is implemented with dual ripple-carry adder (RCA) with the carry-in of 0 and 1, respectively. Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to perform fast arithmetic functions. From the structure of the CSLA, it is clear that there is scope for optimization in the vlsi constraints in the CSLA. This work uses a simple and efficient gate-level modification to significantly reduce the Parameters of the CSLA. Based on this modification 32-b CSLA (CSLA) architecture have been developed and compared with the regular CSLA architecture. The proposed design has reduced delay compared with the regular CSLA. This work evaluates the performance of the proposed designs in terms of delay and area. The results analysis shows that the proposed CSLA structure is better than the regular CSLA. Keywords: Regular CSLA, Proposed CSLA, Area, Delay.

1. INTRODUCTION

This paper proposes a new add-one circuits using the fast all-one finding circuit and low-delay multiplexers to reduce the area and accelerate the speed of CSA, and no restrictions are imposed on the design of the adder blocks. For bit length n=16, this new carry-select adders requires approximate 38 percent fewer transistors and16 percent shorter delay than the original dual ripple-carry carry-select adder.

Adders are critical components of the ALU's (Arithmetic Logic Unit) or DSP (Digital Signal Processing) chips[l]. Among various adders, the carry-select adder (CSA) is intermediate regarding speed and area and widely used in mobile applications [2]. CSAs with very large sizes can be constructed hierarchically by combining smaller 'block' adders [3]. Fig. 1 shows the conventional CSA which consists of two ripple carry adders (RCAs) in each block(except blockl) , one for Cin(from lower block) = 0 and the other for Cin(from lower block) =1. Instead of using dual RCAs, Chang proposed a 29.2% reduced area CSA with 5.9% speed penalty by replacing one RCA to an add-one circuit [4]. And many work has been done to reduce large area of CSA with speed penalty or reduce smaller area with negligible speed penalty[5-8].In this paper we propose a further reduced area CSA with shorter delay. What has not been conceived in earlier designs of CSL is the power consumption.

Figure 1 Carry Select adder with ripple carry adder

Due to the relentless drive for smaller and versatile mobile and portable electronics, power has now become a premier concern in DSP design. From power perspective, gate output load which is an aggregate of circuit fan-out and wire capacitance is as important as the gate depth. The significance of wire capacitance to gate delay and power consumption is particularly pronounced in today deep sub-micron regime. Therefore, it is imperative to combine logic structure with circuit technique to further reduce the transistor count of CSL so as to decrease the wire length and simplify the layout. Very often, area and power optimization are ensued from sensible reduction of transistor count. In this paper, a square root scheme with a new add-one circuit using oneinverter instead of two-inverter buffer has been proposed for the design of an area efficient CSL.

2. CSA WITH ADD-ONE APPROACH

The carry-select adder partitions the adder into several groups, each of which performs two additions in parallel. Therefore, two copies of ripple-carry adder act as carry evaluation block per select stage. One copy evaluates the carry chain assuming the block carry-in is zero, while the other assumes it to be one. Once the carry signals are finally computed, the correct sum and carry-out signals will be simply selected by a set of multiplexers. A typical block of conventional CSL is shown in Fig. 2. FA and HA are abbreviations for full adder and half adder, respectively, and HA’ is a full adder with a constant carry-in of logic 1.

A Novel Designing Approach for High Speed Carry Select Adder


100

Figure 2 Conventional CSA

Figure 3 Example of first zero detection

The main drawback of the conventional CSL is the doubling of the area cost to duplicate another adder. Assume S0 = (Sn−1

0, Sn−20,…, S0

0) and S1 = (Sn−1

1, Sn−21,…, S0

1) are the sum outputs of these two copies of RCA with block carry-in c−1 0 = 0 and c−1

1 = 1, respectively. The add one circuit proposed by Chang [3] mitigates the resource overhead of CSL by replacing one copy of the RCA by S1 = S0 +1.

Let k denote the position that the first bit of “0” is detected in Si

0, starting from the least significant bit (LSB). Then, from

0

0

0, [0, 1)k

ii

s k n=

= ∈ −∏

where the single quotation mark on the

superscript implies Boolean complement. From the above derivation, the addone circuit is in essence, based on a “first” zero detection logic. It generates S1 by inverting each bit in S0 starting from the LSB until the first zero is encountered as shown in Fig. 2(a).

However, if no zero is detected in S0 as illustrated in Fig. 2(b), i.e., k [0, n −1] , S1 = (1, (Sn−1

0)’, ((Sn−20)’,… ,((S0

0)’). In other words, the carry-out signal for the add-

one circuit is one if and only if all the sum outputs from the n bit block are one. As all sums equal one, the first zero detection circuit generates one at the final node.

3. PROPOSED AOC ALGORITHM

Since the speed of a linear CSL is linearly proportional to the bit length n, thus, to optimize the worst-case delay, square root scheme will be used in this design of CSL with variable-sized blocks and ripple-carry addition in each block [5]. HA and HA’ are built with transmission gates to speed up the worst-case delay. The delay time of MUX (sel) refers to the delay of the multiplexer from the select signal to the output signal and MUX (thru) refers to the delay from the input signals to be selected to the output signal. FA (sum), HA (sum) and HA’ (sum) refer to the delays from the input to the sum output. The delays from the input to the carry output are similarly annotated with “(Cout)”. According to these basic gate latencies, it is evident that there will be mismatch of arrival time between the carry-select signal and the sum signals to the MUX in a square root CSL. The equalization of the delays through both paths can be achieved by progressively adding more bits to the subsequent stages of adder groups, so that more time is required for the generation of carry signals. Starting from two-bit RCA per group for the first two groups, the bits beyond the fifth bit are grouped in such a way that the number of bits in the group increases by one progressively. In this way the discrepancy in arrival time at the MUX nodes will be minimized. As the block delay of the conventional square root CSL is very similar to ours, the same configuration of CSL block sizes has been adopted in our proposed design.

The worst case delay happens when the carry propagates from the LSB to MSB. Each CSL block contains one copy of RCA and an add one circuit. The actual sum is selected using 2-to-1 MUX by the carry of previous block.

This carry-out signal from the previous block also selects the actual carry-out for the next block. Due to the capacitive loading, the carry-out signal of the CSL block has to be buffered before it is applied to the next block. 4. DELAY AND AREA EVALUATION

The structure of the proposed 16-b SQRT CSLA using BEC for RCA with cin-1 to optimize the area and power is shown in Fig. 4. We again split the structure into five groups. The delay and area estimation of each group are shown in Fig. 5. The steps leading to the evaluation are given here.

1) The group2 [see Fig. 5(a)] has one 2-b RCA which has 1 FA and 1 HA for Cn=0 Instead of another 2-b RCA with a 3-b BEC is used which adds one to the output from 2-b RCA.



101

Based on the consideration of delay values of

Table I, 2) For the remaining group’s the arrival time of mux selection input is always greater than the arrival time of data inputs from the BEC’s. Thus, the delay of the remaining groups depends on the arrival time of mux selection input and the mux delay

Figure 4 Modified 16 bit CSA

Figure 5 Delay and Area Evaluation Group-2 - Group-5

5. CONCLUSIONS

This work showed The design proposed in this paper has been developed using Verilog- HDL and synthesized in Cadence RTL compiler using typical libraries of TSMC 0.18 um technology.

The synthesized Verilog net list and their

respective design constraints file (SDC) are imported to Cadence SoC Encounter and are used to generate automated layout from standard cells and placement and routing [7]. Parasitic extraction is performed using Encounter’s Native RC extraction tool and the extracted parasitic RC (SPEF format) is back annotated to Common Timing Engine in Encounter platform for static timing analysis. For each word size of the adder, the same value changed dump (VCD) file is generated for all possible input conditions and imported the same to Cadence Encounter Power Analysis to perform the power simulations.

The area indicates the total cell area of the design and the total power is sum of the leakage power, internal power and switching power. The percentage reduction in the cell area, total power, power-delay product and the area–delay product as function of the bit size are calculated. ACKNOWLEDGEMENTS

The authors would like to thank the anonymous reviewers for their comments which were very helpful in improving the quality and presentation of this paper. REFERENCES: [1] M. Swanson. M. Kobayashi and A. Tewfik.

“Multimedia Data-Embedding and Watermarking Technologies”, Proceeding of the IEEE, vol. 86, no.6, June 1998. [3]N. Memon and P. W. Wong, “Protecting Digital Media Content”, Comm. of the ACM, pp. 35-43, vol. 41, no. 7, July, 1998.

[2] M.J. Tsai, K.Y. Yu, and Y.Z. Chen, “Joint Wavelct and Spatial Transformation for Digital Watermarking” IEEE Tvansaclions on Consumer Electronics, pp. 241-245, vol. 46, no. I , Feb. 2000.

[3] M.J. Tsai, K.Y. Yu, and Y.Z. Chen, “Wavelet packet

and adaptive spatial transformation ofwatcrmark for digital image authentication”, IEEE ICIPZOOO, pp. 450 -453,volume: I, Sep, Vancouver, Canada.

[4] Yiwei Wang, John F. Doherty and Robert E. Van Dyck,

“A Wavelct-Based Watermarking Algorithm for



102

Ownership Verification of Digital Images”. IEEE Transacrions on Imoge Processing, pp. 77-87, vol. 11, no. 2, Feb. 2002.

[5] Lin, W.-H., Horng, S.-J., Kao, T.-W., Fan, P., Lee, C.-L.,

Pan, Y.: ‘An efficient watermarking method based on significant difference of wavelet coefficient quantization’, IEEE Trans. Multimedia, 2008, 10, (5), pp. 746–757

[6] Ramanjaneyulu, K., Rajarajeswari, K. ‘An oblivious

image watermarking scheme using multiple description coding and genetic algorithm’, IJCSNS, Int. J. Comput. Sci. Netw. Secur., 2010, 10, (5), pp. 167–174

[7] Ramanjaneyulu, K., Rajarajeswari, K.: ‘An oblivious and

robust multiple image watermarking scheme using genetic algorithm’, Int. J. Multimedia Appl. (IJMA), 2010, 2, (3), pp. 19–38 12 Yongdong, Wu: ‘On the security of an SVD based ownership watermarking’, IEEE Trans. Multimedia, 2005, 7, (4), pp. 624–627

[8] “An Area Efficient 32-bit Carry-select Adder for Low Power

Applications”,IJCCT Vol 3- Issue 4 www.interscience.in

Authors Profile:

carry select adder

Documents