ahighspeedresidue-to-binaryconverterforbalanced4-moduli...

12
Journal of Computing and Security January 2015, Volume 2, Number 1 (pp. 43–54) http://www.jcomsec.org A High Speed Residue-to-Binary Converter for Balanced 4-Moduli Set MohammadReza Taheri a , Nasim Shafiee a , Mohammad Esmaeildoust b , Zhale Amirjamshidi c , Reza Sabbaghi-nadooshan c , Keivan Navi a,* a Faculty of Computer Science and Engineering, Shahid Beheshti University, GC, Tehran, Iran. b Faculty of Marine Engineering, Khorramshahr University of Marine Science and Technology, Khuzestan, Iran. c Electronic Engineering Department, Islamic Azad University, Central Tehran Branch, Tehran, Iran. ARTICLE I N F O. Article history: Received: 6 January 2014 Revised: 7 November 2015 Accepted: 7 December 2015 Published Online: 7 February 2016 Keywords: Mixed Radix Conversion, Residue Arithmetic, Residue Number System, Residue-to-Binary Converter ABSTRACT The moduli set 2 n-1 - 1, 2 n+1 - 1, 2 n , 2 n - 1 has been recently proposed in literature for class of 4n-bit dynamic range in residue number system. Due to only utilizing modulus in the form of 2 k - 1 besides modulo 2 n , this moduli set enjoys the efficient Arithmetic Unit (AU) in its architecture. Not only does the efficiency of a RNS system depend on the residue arithmetic unit but it also is limited to the residue to binary converter. In this paper, a new two level residue-to-binary converter architecture based on Mixed Radix Conversion (MRC) is presented for the aforementioned moduli set. The proposed converter includes two levels of design based on MRC properties. Firstly, the 3-moduli subset 2 n-1 - 1, 2 n+1 - 1, 2 n - 1 is properly organized and as it does not calculate several values, it results in some cost modifications. Eventually, a two-moduli set ( 2 n-1 - 1 )( 2 n+1 - 1 ) (2 n - 1) , 2 n is formed to compute the binary of RNS counterpart. The proposed architecture is shown to be more efficient both in terms of hardware cost and conversion delay in comparison with the related state-of-the-art works. c 2015 JComSec. All rights reserved. 1 Introduction The carry-free nature of the residue number system (RNS) makes it suitable to be used in the arithmetic level in VLSI design to achieve parallelism [1], [2]. In RNS, a weighted number is decomposed into a set of residues. Since arithmetic operations on residues can be performed without carry propagation between them, RNS results in high-speed addition, subtraction * Corresponding author. Email addresses: moh [email protected] (MR. Taheri), [email protected] (N. Shafiee), m [email protected] (M. Esmaeildoust), [email protected] (Z. Amirjamshidi), r [email protected] (R. Sabbaghi-nadooshan), [email protected] (K. Navi) ISSN: 2322-4460 c 2015 JComSec. All rights reserved. and multiplication [3], [4], which is appropriate for dig- ital signal processing (DSP) [5], [6], image processing [7], cryptography [8], [9] and communication systems [10]. However, arithmetic operations like division, sign detection and comparison are difficult in RNS. RNS includes three main parts: the binary-to- residue (forward) converter, arithmetic operator and residue-to-binary (reverse) converter. The forward converter transforms a weighted binary number into residue numbers, based on the moduli set. The arith- metic unit generally contains modular adder, subtrac- tor and multiplier. The reverse converter transforms residue numbers into a weighted binary number [11]. An appropriate choice of moduli set determines the efficiency of forward conversion, arithmetic operation

Upload: others

Post on 27-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AHighSpeedResidue-to-BinaryConverterforBalanced4-Moduli Setjcomsec.ui.ac.ir/article_21874_ae1328488436df6666e684861e83e600.pdfAdder (CSA) for operations in modulo 2n, CSA with End

Journal of Computing and Security

January 2015, Volume 2, Number 1 (pp. 43–54)

http://www.jcomsec.org

AHigh SpeedResidue-to-Binary Converter for Balanced 4-Moduli

Set

MohammadReza Taheri a, Nasim Shafiee a, Mohammad Esmaeildoust b,Zhale Amirjamshidi c, Reza Sabbaghi-nadooshan c, Keivan Navi a,∗aFaculty of Computer Science and Engineering, Shahid Beheshti University, GC, Tehran, Iran.bFaculty of Marine Engineering, Khorramshahr University of Marine Science and Technology, Khuzestan, Iran.cElectronic Engineering Department, Islamic Azad University, Central Tehran Branch, Tehran, Iran.

A R T I C L E I N F O.

Article history:Received: 6 January 2014

Revised: 7 November 2015

Accepted: 7 December 2015

Published Online: 7 February 2016

Keywords:

Mixed Radix Conversion, Residue

Arithmetic, Residue NumberSystem, Residue-to-Binary

Converter

A B S T R A C T

The moduli set{

2n−1 − 1, 2n+1 − 1, 2n, 2n − 1}

has been recently proposed in

literature for class of 4n-bit dynamic range in residue number system. Due to

only utilizing modulus in the form of 2k − 1 besides modulo 2n, this moduli

set enjoys the efficient Arithmetic Unit (AU) in its architecture. Not only does

the efficiency of a RNS system depend on the residue arithmetic unit but it

also is limited to the residue to binary converter. In this paper, a new two level

residue-to-binary converter architecture based on Mixed Radix Conversion

(MRC) is presented for the aforementioned moduli set. The proposed converter

includes two levels of design based on MRC properties. Firstly, the 3-moduli

subset{

2n−1 − 1, 2n+1 − 1, 2n − 1}

is properly organized and as it does not

calculate several values, it results in some cost modifications. Eventually, a

two-moduli set{(

2n−1 − 1) (

2n+1 − 1)

(2n − 1) , 2n}

is formed to compute the

binary of RNS counterpart. The proposed architecture is shown to be more

efficient both in terms of hardware cost and conversion delay in comparison

with the related state-of-the-art works.

c© 2015 JComSec. All rights reserved.

1 Introduction

The carry-free nature of the residue number system(RNS) makes it suitable to be used in the arithmeticlevel in VLSI design to achieve parallelism [1], [2]. InRNS, a weighted number is decomposed into a setof residues. Since arithmetic operations on residuescan be performed without carry propagation betweenthem, RNS results in high-speed addition, subtraction

∗ Corresponding author.

Email addresses: moh [email protected] (MR. Taheri),[email protected] (N. Shafiee), m [email protected]

(M. Esmaeildoust), [email protected] (Z. Amirjamshidi),

r [email protected] (R. Sabbaghi-nadooshan),[email protected] (K. Navi)

ISSN: 2322-4460 c© 2015 JComSec. All rights reserved.

and multiplication [3], [4], which is appropriate for dig-ital signal processing (DSP) [5], [6], image processing[7], cryptography [8], [9] and communication systems[10]. However, arithmetic operations like division, signdetection and comparison are difficult in RNS.

RNS includes three main parts: the binary-to-residue (forward) converter, arithmetic operator andresidue-to-binary (reverse) converter. The forwardconverter transforms a weighted binary number intoresidue numbers, based on the moduli set. The arith-metic unit generally contains modular adder, subtrac-tor and multiplier. The reverse converter transformsresidue numbers into a weighted binary number [11].An appropriate choice of moduli set determines theefficiency of forward conversion, arithmetic operation

Page 2: AHighSpeedResidue-to-BinaryConverterforBalanced4-Moduli Setjcomsec.ui.ac.ir/article_21874_ae1328488436df6666e684861e83e600.pdfAdder (CSA) for operations in modulo 2n, CSA with End

44 A High Speed Residue-to-Binary Converter for Balanced 4-Moduli Set — MR. Taheri, N. Shafiee, et al.

Table 1. Comparison of arithmetic operation for different moduli sets for high dynamic range applications

Moduli set Design Critical modulus Delay

{2n − 1, 2n, 2n + 1, 2n+1 − 1} [13, 14] 2n + 1 2log2n+ 6

{2n − 1, 2n, 2n + 1, 2n+1 + 1} [13, 15] 2n+1 + 1 2log2(n+ 1) + 6

{2n − 3, 2n − 1, 2n + 1, 2n + 3} [16] 2n + 3 2log2(n− 1) + 7

{2n, 2n+1 − 1, 2n − 1, 2n−1 − 1} [17] 2n+1 − 1 2log2(n+ 1) + 3

and reverse conversion. A reverse converter has morecomplex architecture and its complexity will growdepending on the number of modules. Therefore, aneffective design of reverse converter is needed in orderto get the benefit of the RNS [12].

Many works have been reported on balanced 4-moduli sets such as {2n − 1, 2n, 2n + 1, 2n+1 − 1}[13, 14, 16], {2n−1, 2n, 2n+1, 2n+1−1} [13, 15],{2n−3, 2n − 1, 2n + 1, 2n + 3} [16] and {2n, 2n+1 − 1, 2n −1, 2n−1 − 1} [17]. Efficiency of arithmetic operationsis restricted to critical modulus. The critical moduliin [13–17] are shown in Table 1. The unit gate de-lays of the parallel prefix adders 2k − 1, 2k + 1 and2k + 3 are 2log2n+ 3,2log2n+ 6 and 2log2(n− 1) + 7,respectively [18–20]. Therefore, as it is shown in Ta-ble 1, moduli set {2n, 2n+1 − 1, 2n − 1, 2n−1 − 1}[17] provides more effecient arithmetic unit. However,more efficient reverse converter for the moduli set{2n, 2n+1 − 1, 2n − 1, 2n−1 − 1} with less hardwarerequirements and delay, compared to [17] and othermoduli sets in literature, is needed. Therefore, in thispaper, a new design of the reverse converter for the4-moduli set is presented. The proposed converter hasachieved less delay and more desirable hardware re-quirements compared to the state-of-the-art convert-ers.

This paper consists of a background about RNSin Section 2, design of the proposed RNS to binaryconverter in Section 3, evaluation of hardware require-ments and critical path delay of the proposed reverseconverter in Section 4, comparison of the performanceof the proposed RNS to binary converter with othermoduli sets in Section 5 and finally the conclusions ofthe paper in Section 6.

2 Background

A residue number system is defined in terms of rel-atively prime moduli set {P1, P2, . . . , Pn} that isgcd(Pi, Pj) = 1 for i 6= j. An integer number X inthe range of [0,M − 1] can be represented as X =(x1, x2, . . . , xn) where xi = XmodPi , 0 ≤ xi ≤ Pi,

and M = P1 × P2 . . . × Pn is the dynamic range ofthe RNS system [21].

Reverse conversion algorithms are principally basedon the Chinese remainder theorem (CRT), Mixed-radix conversion (MRC) and new Chinese remaindertheorems (New CRTs) [11]. Through the MRC, thenumber X can be calculated using

X = vn

n−1∏i=1

Pi + · · ·+ v3P2P1 + v2P1 + v1 (1)

The coefficient {v1, v2, . . . , vn} can be obtained fromthe following formulas:

v1 = x1 (2)

v2 =∣∣∣(x2 − v1)

∣∣P1−1∣∣P2

∣∣∣P2

(3)

v3 =∣∣∣((x3 − v1)

∣∣P1−1∣∣P3− v2

) ∣∣P2−1∣∣P3

∣∣∣P3

(4)

In general

vn =∣∣∣(((xn − v1)

∣∣P1−1∣∣Pn− v2

) ∣∣P2−1∣∣Pn

− · · · − vn−1)∣∣Pn−1

−1∣∣Pn

∣∣∣Pn

(5)

Where∣∣P−1

i

∣∣Pj

is the multiplicative inverse of Pi mod-

ulo Pj [11].

Three types of adders are used to realize the hard-ware architecture of the reverse converter, Carry SaveAdder (CSA) for operations in modulo 2n, CSA withEnd Around Carry (EAC) for operations in modulo2k − 1, Carry Propagate Adder (CPA) and ModularAdder (MA). For MA in modulo 2k − 1, CPA withend around carry (EAC) is used, which has the similararea and double delay in comparison with a regularCPA [22]. These are explained more in Section 4.

3 Proposed RNS to Binary Converter

The two-level architecture, realized by the MRCmethod, can lead to an efficient implementa-tion of RNS to binary converter of moduli setΨ =

{2n−1 − 1, 2n+1 − 1, 2n − 1, 2n

}. In the first

step, number Y is calculated from the residues inthe subset Γ =

{2n−1 − 1, 2n+1 − 1, 2n − 1

}by

Page 3: AHighSpeedResidue-to-BinaryConverterforBalanced4-Moduli Setjcomsec.ui.ac.ir/article_21874_ae1328488436df6666e684861e83e600.pdfAdder (CSA) for operations in modulo 2n, CSA with End

January 2015, Volume 2, Number 1 (pp. 43–54) 45

Subset {2n-1, 2n+1-1, 2n-1-1}

using MRC

x1 x2

Superset {(2n-1)(2n+1-1)(2n-1-1), 2n} using

MRC

X

3x

Y

x4

First Step

Second Step

Figure 1. Proposed Schema for residue-to-binary conversion

using MRC in a parallel manner. In the secondstep, the MRC method is applied to the supersetΛ =

{(2n−1 − 1

) (2n+1 − 1

)(2n − 1) , 2n

}and the

final result is realized. The proposed reverse converterscheme is composed of two parts, as shown in Figure 1.The details are presented in the next subsections.

3.1 First Step Design

In the first step, the reverse converter of the subset Γ isdesigned. In order to decrease the delay generated bythe serial attribute of the MRC method, the proposedapproach in [23] is utilized.Using this approach, moreparallelism is obtained without noteworthy hardwareredundancy. Also to reduce the total architecture delay,in the first step, all modulus which are in the formof 2k − 1 and modulo 2n will be included in the nextstep. Utilizing modulo 2n in the second step leads tosignificant improvement in terms of delay because thismodulo has better speed compared to modulus in theforms of 2k−1. The first step of design is described asfollows. The weighted number Y can be calculated as

Y = Z1 + Z2P1 + Z3P1P2 (6)

whereZ1 = x1 (7)

Z2 =∣∣∣(x2 − x1)

∣∣P1−1∣∣P2

∣∣∣P2

(8)

Z3 =∣∣∣((x3 − x1)

∣∣P1−1∣∣P3− Z2

) ∣∣P2−1∣∣P3

∣∣∣P3

(9)

and P1 = 2n − 1, P2 = 2n+1 − 1 and P3 = 2n−1 − 1.Proposition 1. The multiplicative inverse of P1 inmodulo P2 is

∣∣P−11

∣∣P2

= −2.

Proof. By considering multiplicative inverse defini-tion we have:

∣∣∣(2n − 1)×∣∣P−1

1

∣∣P2

∣∣∣2n+1−1

= 1

→ |(2n − 1)× (−2)|2n+1−1 =∣∣2− 2n+1

∣∣2n+1−1

=∣∣1− (2n+1 − 1

)∣∣2n+1−1

= 1

Proposition 2. The multiplicative inverse of P1 inmodulo P3 is

∣∣P−11

∣∣P3

= 1.

Proof. Based on multiplicative inverse definition, it’sclear that:∣∣∣(2n − 1)×

∣∣P−11

∣∣P3

∣∣∣2n−1−1

= 1

→ |(2n − 1)|2n−1−1 =∣∣2× (2n−1 − 1

)+ 1∣∣2n−1−1

= 1

Proposition 3. The multiplicative inverse of P2 in

modulo P3 is∣∣P−1

2

∣∣P3

=

n2 −1∑i=0

22i.

Proof. Based on multiplicative inverse definition, it’sobvious: ∣∣∣(2n+1 − 1

)×∣∣P2

−1∣∣P3

∣∣∣2n−1−1

= 1

∣∣∣∣∣∣(2n+1 − 1)×

n2 −1∑i=0

22i

∣∣∣∣∣∣2n−1−1

=

∣∣∣∣(2n+1 − 1)× 1− 2n

−3

∣∣∣∣2n−1−1

=∣∣∣∣(4× (2n−1 − 1)

+ 3)× 2n − 1

3

∣∣∣∣2n−1−1

=

|2n − 1|2n−1−1 =∣∣2× (2n−1 − 1) + 1

∣∣2n−1−1

= 1

After realizing multiplicative inverses, Z2 can becalculated as follows

Z2 = |(x2 − x1)× (−2)|2n+1−1 (10)

Lemma 1. If V is an n-bit number in the interval[0, 2n−1], the residue of (−V ) in modulo 2n−1 equalsto one’s complement of V [24].Lemma 2. If V is an n-bit number in the interval[0, 2n − 1], the multiplication of V by 2p in modulo2n− 1, equals to its p-bit circular left shift counterpart[24].

By multiplying x2 − x1 by -2, based on lemma 2,Z2 is resulted as:

Z2 = |L1 − L2|2n+1−1 (11)

whereL1 = x1,n−1 · · ·x1,0 (12)

L2 = x2,n−1 · · ·x2,0x2,n (13)

Page 4: AHighSpeedResidue-to-BinaryConverterforBalanced4-Moduli Setjcomsec.ui.ac.ir/article_21874_ae1328488436df6666e684861e83e600.pdfAdder (CSA) for operations in modulo 2n, CSA with End

46 A High Speed Residue-to-Binary Converter for Balanced 4-Moduli Set — MR. Taheri, N. Shafiee, et al.

Z2 =

L1 − L2 if L1 − L2 ≥ 0

L1 − L2 +(2n+1 − 1

)if L1 − L2 < 0

(14)To calculate Z3, after calculating

∣∣P−11

∣∣P3

and∣∣P−12

∣∣P3

, the results are replaced in Equation (9) asfollow:

Z3 =

∣∣∣∣∣∣ ((x3 − x1)× 1− Z2)

×(20 + 22 + · · ·+ 2n−2

)∣∣∣∣∣∣2n−1−1

=

∣∣∣∣∣∣ (x3 − x1 − Z2)

×(20 + 22 + · · ·+ 2n−2

)∣∣∣∣∣∣2n−1−1

(15)

To eliminate the computation ofZ2 in modulo 2n+1−1,in computing Z3, the following method can be utilized.The result of subtracting L2 from L1 will be either apositive number smaller than 2n+1 − 1 or a negativenumber greater than 1 − 2n+1. By default the firstcase has a result in modulo 2n+1− 1; however, adding2n+1−1 to the result of e1−e2 is required when e1−e2is negative. The outgoing carry of the adder utilizedfor L1 and L2 subtraction, can distinguish the twocases indicated in Equation (14).If L1 > L2, Z3 can be obtained as Equation (16):

Z3 =

∣∣∣∣∣∣ (x3 − x1 − L1 + L2)

×(20 + 22 + · · ·+ 2n−2

)∣∣∣∣∣∣2n−1−1

(16)

For more simplicity x3 − x1 − L1 + L2 is rewrittenin the bit-level representation and then segregated innumbers with the length of n− 1 bit to ease applyingits coefficient,

(20 + 22 + · · ·+ 2n−2

).

Z3 =

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

x3,n−2 · · ·x3,0 − 0 · · · 0︸ ︷︷ ︸n−2

x1,n−1

−x1,n−2 · · ·x1,0 − 0 · · · 0︸ ︷︷ ︸n−3

L1,nL1,n−1

−L1,n−2 · · ·L1,0 + 0 · · · 0︸ ︷︷ ︸n−3

L2,nL2,n−1

+L2,n−2 · · ·L2,0

×(20 + 22 + · · ·+ 2n−2

)

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣2n−1−1

(17)

and using Lemma 1:

Z3 =

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

x3,n−2 · · ·x3,0 + 1 · · · 1︸ ︷︷ ︸n−2

x1,n−1

+x1,n−2 · · · x1,0 + 1 · · · 1︸ ︷︷ ︸n−3

L1,nL1,n−1

+L1,n−2 · · · L1,0 + 0 · · · 0︸ ︷︷ ︸n−3

L2,nL2,n−1

+L2,n−2 · · ·L2,0

×(20 + 22 + · · ·+ 2n−2

)

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣2n−1−1

(18)

Equation (18) can be simplified as following:

Z3 =

∣∣∣∣∣∣∣∣∣

Z3,1 + Z3,2 + Z3,3 + Z3,4

+Z3,5 + Z3,6 + Z3,7

×(20 + 22 + · · ·+ 2n−2

)∣∣∣∣∣∣∣∣∣2n−1−1

(19)

where

Z3,1 = x3,n−2 · · ·x3,0Z3,2 = 1 · · · 1︸ ︷︷ ︸

n−2

x1,n−1

Z3,3 = x1,n−2 · · · x1,0Z3,4 = 1 · · · 1︸ ︷︷ ︸

n−3

L1,nL1,n−1

Z3,5 = L1,n−2 · · · L1,0

Z3,6 = 0 · · · 0︸ ︷︷ ︸n−3

L2,nL2,n−1

Z3,7 = L2,n−2 · · ·L2,0

In the other case when L1 < L2,Z3 =

∣∣(x3 − x1 − L1 + L2 −(2n+1 − 1

))×(20 + 22 + · · ·+ 2n−2

)∣∣2n−1−1

and since∣∣(2n+1 − 1

)∣∣2n−1−1

= |−3|2n−1−1,the fol-lowing expression is resulted:∣∣− (2n+1 − 1

)×(20 + 22 + · · ·+ 2n−2

)∣∣2n−1−1

= |−1|2n−1−1 = 1 · · · 1︸ ︷︷ ︸n−2

0 (20)

therefore, Z3 can be rewritten as

Z3 =

∣∣∣∣∣∣∣∣∣

Z3,1 + Z3,2 + Z3,3 + Z3,4

+Z3,5 + Z3,6 + Z3,7

×(20 + 22 + · · ·+ 2n−2

)+ Z3,8

∣∣∣∣∣∣∣∣∣2n−1−1

(21)

whereZ3,8 = 1 · · · 1︸ ︷︷ ︸

n−2

0

In Figure 2, Z3 is generated by Operand PreparationUnit1 (OPU1) with x1, x2 and x3 as its inputs. Thevalues of Z3,1, Z3,2, Z3,3, Z3,4, Z3,5, Z3,6 and Z3,7 arealso reduced to S1 and C1 by CSA1, CSA2, CSA3,CSA4 and CSA5.

Z3 is then obtained as

Z3 =

∣∣∣∣∣∣ (S1 + C1)

×(20 + 22 + · · ·+ 2n−2

)∣∣∣∣∣∣2n−1−1

if L1 − L2 ≥ 0

∣∣∣∣∣∣∣∣∣(S1 + C1)

×(20 + 22 + · · ·+ 2n−2

)+Z3,8

∣∣∣∣∣∣∣∣∣2n−1−1

if L1 − L2 < 0

(22)

Page 5: AHighSpeedResidue-to-BinaryConverterforBalanced4-Moduli Setjcomsec.ui.ac.ir/article_21874_ae1328488436df6666e684861e83e600.pdfAdder (CSA) for operations in modulo 2n, CSA with End

January 2015, Volume 2, Number 1 (pp. 43–54) 47

Based on Lemma 2

Z3 =

|ϕ+ θ|2n−1−1 if L1 − L2 ≥ 0

|ϕ+ θ + Z3,8|2n−1−1 if L1 − L2 < 0(23)

ϕ =

n−22∑

i=0

CLS (S1, 2i) (24)

θ =

n−22∑

i=0

CLS (Ci, 2i) (25)

where CLS(x,y) equals to y-bit circular left shift of x.

Operand Preparation Unit2 (OPU2) is used toimplement Z3 with S1 and C1 as its inputs. AlsoOPU2 generates multiple (n− 1)-bit outputs. Theoutputs are CLS(C1, n− 2), ..., CLS(C1, 0) and CLS(S1, n− 2), ..., CLS(S1, 0). To obtain a compact finaloutput, all of the outputs must be reduced by a CSAtree. So the output of the mentioned CSA tree andthe output of the MUX connect to the CSA block andlater to the modulo 2n−1 − 1 adder. The final resultof the MA2 is Z3 signal which will be used in the nextstep.

Z2 is the other signal which should be prepared forthe next step. To Decrease the delay of the proposeddesign, it is preferred to compute Z2 in parallel withcalculating Z3. Based on the Equation (11), Z2 shouldbe obtained by a modular adder in modulo 2n+1 − 1named MA1. Also, Z2 is needed in the second stepof design; therefore, the output of the MA1 goes toOperand Preparation Unit3 (OPU3), including n+ 1inverters, to produce Z2. Only in the first step, all ofthe above mentioned calculations are done in parallelwith computing Z3. Thus the delay of computing Z2

is not considered in the critical path delay. Hardwareimplementations of Z2 and Z3 are shown in Figure 2.

After the calculation of Z2 and Z3 , Y can be ob-tained from its residues in 3-moduli set Γ as

Y = Z1 + Z2P1 + Z3P1P2 (26)

Y = Z1 +Z2× (2n − 1) +Z3× (2n − 1)×(2n+1 − 1

)(27)

At the next level of design, Z1, Z2, and Z3 are usedwith consideration of the value of Y. Furthermore, Z2

andZ3 are computed at the first stage for more parallelarchitecture. There is no need to compute the finalvalue of Y at the first stage. Only some arrangementsof Z1, Z2 and Z3 which are needed in computing Yand indicated by yi, i = 1, . . . , 7, are utilized at thenext level of design. Therefore the delay of computingY is omitted. The yi signals are expressed by thefollowing expression:

Y = Z1 + Z2 (2n − 1) + Z3 (2n − 1)(2n+1 − 1

)= Z1 + Z2 0 · · · 0︸ ︷︷ ︸

n

−Z2 + Z3 0 · · · 0︸ ︷︷ ︸2n+1

−Z3 0 · · · 0︸ ︷︷ ︸n+1

− Z3 0 · · · 0︸ ︷︷ ︸n

+Z3 (28)

Y = y1 + y2 + y3 + y4 + y5 + y6 + y7 (29)

where y1 = Z1 , y2 = Z2 0 · · · 0︸ ︷︷ ︸n

, y3 = −Z2 , y4 =

Z3 0 · · · 0︸ ︷︷ ︸2n+1

, y5 = −(Z3 0 · · · 0︸ ︷︷ ︸n+1

) , y6 = −(Z3 0 · · · 0︸ ︷︷ ︸n

) and

y7 = Z3.

3.2 Second Step Design

After the computation of yi, i = 1, . . . , 7, the two mod-ulus superset Λ is considered for obtaining weightednumber X. The residue of weighted number X in mod-ulo P123 and P4 is equal to Y and x4, respectively,where P123 = (2n − 1)×

(2n+1 − 1

)×(2n−1 − 1

)and

P4 = 2n. The MRC method for moduli set with twomodulus is utilized to calculate X as follows:

X = v1 + v2P123 (30)

wherev1 = Y (31)

v2 =∣∣∣(x4 − Y )

∣∣P123−1∣∣P4

∣∣∣P4

(32)

Proposition 4. The multiplicative inverse of P123 inmodulo P4 is equal to −2n−1 − 1.

Proof. According to multiplicative inverses defini-tion, we have:∣∣∣(2n − 1)×

(2n+1 − 1

)×(2n−1 − 1

)×∣∣P123

−1∣∣P4

∣∣∣2n

= 1

→∣∣(2n − 1)×

(2n+1 − 1

)×(2n−1 − 1

)×(−2n−1 − 1

)∣∣2n

=∣∣(−1)× (−1)×

(−22n−2 + 1

)∣∣2n+1−1

=|(−1)× (−1)× 1|2n+1−1 = 1

thus,

v2 =∣∣(x4 − Y )× (−1)×

(2n−1 + 1

)∣∣2n

= |Y − x4|2n(33)

By replacing Y based on Equation (29), V2 couldbe rewritten as:

v2 =

∣∣∣∣∣∣(y1 + y2 + y3 + y4 + y5 + y6 + y7 + x4 + 1

)×(2n−1 + 1

)∣∣∣∣∣∣2n

(34)

The digits which weigh more than or equal 2n is notconsidered in the operations of modulo 2n. Accord-

Page 6: AHighSpeedResidue-to-BinaryConverterforBalanced4-Moduli Setjcomsec.ui.ac.ir/article_21874_ae1328488436df6666e684861e83e600.pdfAdder (CSA) for operations in modulo 2n, CSA with End

48 A High Speed Residue-to-Binary Converter for Balanced 4-Moduli Set — MR. Taheri, N. Shafiee, et al.

n-1-bit CSA with EAC (CSA5)

n-1-input CSA tree with EAC

Modulo 2n-1-1 adder (MA2)

Z 3

Operand Preparation Unit 2

n-1-bit CSA with EAC (CSA4)

Operand Preparation Unit 1

x 1 x 2 x 3

Comparator

n-1-bit CSA with EAC (CSA1) n-1-bit CSA with EAC (CSA2)

n-1-bit CSA with EAC (CSA3)

...CLS(S1, n-2)

CLS(C1, 0)

S 1 C 1

Cout

n-1-bit CSA with EAC (CSA6)

Mux

Z 3,8 0

L1 L2

Z 3,7

Z 3,6Z 3,5Z 3,4Z 3,3Z 3,2Z 3,1

Modulo 2n+1-1 adder

(MA1)

L 1 L 2

Z 2

Operand Preparation Unit 3

Z 2

...CLS(S1, 0)

CLS(C1, n-2)

n n+1 n-1

n n+1

n-1

n+1

n-1

Figure 2. Hardware schema for first step design

ingly, only the lowest weighted n bits are used in theoperations. Therefore, v2 can be expressed as:

v2 =

∣∣∣∣∣∣∣∣∣Z1,0 0 · · · 0︸ ︷︷ ︸

n−1

+Z2,0 0 · · · 0︸ ︷︷ ︸n−1

+Z3,0Z3

+x4,0 0 · · · 0︸ ︷︷ ︸n−1

+x4 + Z1 + Z2 + µ

∣∣∣∣∣∣∣∣∣2n

(35)

v2 =∣∣K1 +K2 + Z1 + Z2 + µ

∣∣2n

(36)

where

k1 = XOR 〈x4,0, x4,n−1〉 x4,n−2 · · · x4,1x4,0 (37)

k2 = XOR⟨Z1,0, Z2,0, Z3,0

⟩Z3 (38)

µ = 0 · · · 00︸ ︷︷ ︸n−2

10 (39)

The structure of the OPU4 is implemented based onthe above equation for v2. The inputs are Z1, Z2, Z3,and x4 and its outputs are k1 and k2. CSA7 neglectsthe nth bit of its outputs. CSA7 is also put a 0 inthe least significant bit of the carry. It also omits themost significant bit of Z2 by considering Equation (29)based on the previous subsection. This procedure isalso done by CSA8 and CSA9. Finally, the outputsof the CSAs go to the Modular adder 3 (MA3) as its

n-bit CSA(CSA7)

n-bit CSA(CSA8)

Modulo 2n adder (MA3)

v2

k1 k2

Z2

Z1

n-bit CSA(CSA9)

µ

Operand Preparation Unit 4

x4 Z1 Z3Z2

Figure 3. Calculation of of v2

inputs to compute v2. Hardware implementation ofv2 is shown in Figure 3.

The value of weighted number X, based on the com-puted value of v2, is calculated by the Equation (40).

Page 7: AHighSpeedResidue-to-BinaryConverterforBalanced4-Moduli Setjcomsec.ui.ac.ir/article_21874_ae1328488436df6666e684861e83e600.pdfAdder (CSA) for operations in modulo 2n, CSA with End

January 2015, Volume 2, Number 1 (pp. 43–54) 49

X = v1 + v2 × (2n − 1)×(2n+1 − 1

)×(2n−1 − 1

)(40)

Since v1 = Y , v1 is replaced by Y as follow:

X = Y + v2 × (2n − 1)×(2n+1 − 1

)×(2n−1 − 1

)(41)

Equation (41) can be simplified as

X = v2Z3Z2Z1 − 0 · · · 0︸ ︷︷ ︸n−1

v2 0 · · · 0︸ ︷︷ ︸2n+1

− 0 · · · 0︸ ︷︷ ︸n

v2Z3Z2

− 0 · · · 0︸ ︷︷ ︸n+1

v2Z3v2 + v2 0 · · · 0︸ ︷︷ ︸n+1

+v2 0 · · · 0︸ ︷︷ ︸n

+v2Z3 (42)

X = v2Z3Z2Z1 + 1 · · · 1︸ ︷︷ ︸n−1

v2 1 · · · 1︸ ︷︷ ︸2n+1

+ 1 · · · 1︸ ︷︷ ︸n+1

v2Z3Z2

+ 1 · · · 1︸ ︷︷ ︸n

v2Z3v2 + v2 0 · · · 0︸ ︷︷ ︸n−1

11 + v2 0 · · · 0︸ ︷︷ ︸n

+v2Z3

(43)

X is the summation of seven values,7∑

k=1

Xk ,

where X1 = v2Z3Z2Z1 , X2 = 1 · · · 1︸ ︷︷ ︸n−1

v2 1 · · · 1︸ ︷︷ ︸2n+1

,

X3 = 1 · · · 1︸ ︷︷ ︸n+1

v2Z3Z2 , X4 = 1 · · · 1︸ ︷︷ ︸n

v2Z3v2 , X5 =

v2 0 · · · 0︸ ︷︷ ︸n−1

11 , X6 = v2 0 · · · 0︸ ︷︷ ︸n

and X7 = v2Z3. Xk with

the bit-length of 4n-bit enters to the carry save addertree and the outcomes of carry save adder connectto the input of a 4n-bit CPA to compute weightednumber X. Figure 4 depicts the architecture of thisscenario.

X

v2

Operand preparation Unit 5

Z1 Z2 Z3

4n-bit CPA

X1X2X3X4X5X6X7

2n+1-bit CSA(CSA10)

3n+1-bit CSA(CSA11)

3n+2-bit CSA(CSA12)

3n+3-bit CSA(CSA13)

4n-bit CSA(CSA14)

Figure 4. Hardware implementation for calculation of X

3.3 Numerical Example

Considering moduli set {63, 127, 31, 64}, which isderived from moduli set Ψ when n = 6, the RNS num-ber (33, 7, 5, 63) can be converted to its equivalentin weighted number X as:First stage :

x1 = 3310 = 1000012

x2 = 710 = 00001112

x3 = 510 = 001012

By substituting these values in Equation (6),(11),(19) and (21), the following results will be obtained:

Z1 = 3310 = 1000012

L1 = 10000102

L2 = 00011102

Z2 = 01101002

Z3 = 110012

Second stage: by considering Equation (31), (36)and (43), the desired values in second step are ob-tained:

v1 = Y = 10732210 = 110100011001110102

k1 = 0

k2 = 5710 = 1110012

v2 = 3910 = 1001112

X = 107322 + 39× 63× 127× 31 = 9876543

Thus X = 9876543, and the verification can be sim-ply done as

x1 = |9876543|63 = 33

x2 = |9876543|127 = 7

x3 = |9876543|31 = 5

x4 = |9876543|64 = 63

4 HardwareCost and theDelay of Pro-posed Converter

The hardware requirements were indicated briefly inSection 2. In this section the evaluation of the hardwarecost and the critical path delay of the proposed reverseconverter are done in detail. In such an evaluationprocess, the wire loads are usually assumed to benegligible. The hardware costs is based on the numberof primitive logic components utilized in the reverseconverter of moduli set Ψ. To calculate the delay ofthe whole circuit, the critical path, which is shownin Figures 2, 3 and 4 with the red dash line, shouldbe determined. In the reverse converter evaluationprocess, the delay of n-bit CPA equals to the time in

Page 8: AHighSpeedResidue-to-BinaryConverterforBalanced4-Moduli Setjcomsec.ui.ac.ir/article_21874_ae1328488436df6666e684861e83e600.pdfAdder (CSA) for operations in modulo 2n, CSA with End

50 A High Speed Residue-to-Binary Converter for Balanced 4-Moduli Set — MR. Taheri, N. Shafiee, et al.

Table 2. Different conditions of a full adder cell according on constant input

Number of constant value Constant value Reduced gates

1 1 a pair of two input XNOR and OR gate

1 0 a pair of two input XOR and AND gate

2 Same value ( both 0 or both 1) Wire

2 Same value ( one input 0 another 1) Inverter gate

which two numbers are aggregated in a ripple structure.The logic function of a full adder is described by thefollowing equations:

Sum = XOR (x, y, z) = xyz+xyz+ xyz+ xyz (44)

Carry = xy + xz + zy (45)

According to Equations (44) and (45), if one of theinputs of the full adder equals to 1 (for instance z = 1),the Sum and Carry are equivalent to xy + xy andx+ y respectively. If one of the inputs equals to 0 (forinstance z = 0), the Sum and Carry are equivalentto xy+ xy and xy respectively. In these two cases onlytwo input gates are used. If two inputs have constantvalues, the simplification process is the same as above.Table 2 shows different conditions of a full adder cellaccording to constant inputs.

In the first step of the design of the reverse converter,the modular adders (MA1, MA2) are implementedby CPAs with EAC (Figure 2). The 2k − 1 modularadder has the similar area and double delay comparedto the k -bit CPA. The latter modular adder (MA3)is implemented by a regular n-bit CPA neglecting itscarryout (Figure 3). CSAs used in the design of thereverse converter are divided to the regular CSA andCSA with EAC [13].

(1) In the first level of design, CSAs of the reductiontree are CSAs with EAC. The output of a finalCSA enters to the modulo 2n−1−1 adder (MA2).

(2) The summation of five operands in modulo 2n

computes V2 in the initiation of the second stepof design. The structure of the reduction treeused for computing V2 employs three regularCSAs. Due to the fact that the output of thereduction tree must be in modulo 2n, for achiev-ing the hardware cost efficiency, the carryoutsignal of every CSA in the architecture shownin Figure 3 is neglected.

(3) In the last part of the second step of design,for computing X, CSAs which their inputs aresignals with different bit numbers, are used.

The only difference between regular CSA and CSAwith EAC is the generated result of c from CSA. Fig-

ure 5 demonstrates basic architecture of a CSA anda CSA with EAC. The delay of n-bit CSA denotesadditional time of a full adder cell. In CSA like CPA,hardware cost can be reduced according to the con-stant input values. In general, the hardware cost ofn-bit CSA is equal to the hardware cost of n full addercell. Table 3 shows the hardware cost and the delay ofvarious components in the proposed reverse converter.

5 Comparison

This section presents the comparison of the pro-posed reverse converter architecture for the mod-uli set Ψ with other balanced 4-moduli setswith the same dynamic range class, such asthe 4-moduli sets {2n − 1 , 2n, 2n + 1, 2n+1 − 1

}[13, 14],{2n − 1 , 2n, 2n + 1, 2n+1 + 1

}[13, 15],

{2n − 3 , 2n− 1, 2n + 1, 2n + 3} [16] and {2n , 2n+1−1, 2n − 1, 2n−1 − 1

}[17]. The comparisons are done

in terms of the delay and the area of the reverseconverter. Table 4 shows the comparison between theproposed reverse converter and its other state-of-the-art counterpart. In order to achieve fair comparison,the delay and the area of the modulus adders andcarry save adders are considered the same as [24].Asshown in Table 2, the proposed reverse converter forthe moduli set Ψ has achieved the highest speed ofthe reverse converter compared to {2n − 1 , 2n, 2n +1, 2n+1 − 1

}[13, 14], {2n − 1 , 2n, 2n + 1, 2n+1 + 1

}[13, 15], {2n − 3 , 2n − 1, 2n + 1, 2n + 3} [16] and{2n , 2n+1 − 1, 2n − 1, 2n−1 − 1

}[17]. It is worth

mentioning that, the proposed reverse converter is thefastest adder based reverse converter in the balanced4-moduli class [17].

The unit gate delay and the unit gate area are mod-els for evaluation of the hardware requirement andthe critical path delay between the different adderbased reverse converters. In this model, FA gates havethe area of seven gates and the delay of four gates.XOR/XNOR gates have two gates area and delay, andeach two-input monotonic gates have one area anddelay [22]. For more fair comparison, the unit gatedelay and the unit gate area of the different adderbased reverse converters are included in the Table 5,

Page 9: AHighSpeedResidue-to-BinaryConverterforBalanced4-Moduli Setjcomsec.ui.ac.ir/article_21874_ae1328488436df6666e684861e83e600.pdfAdder (CSA) for operations in modulo 2n, CSA with End

January 2015, Volume 2, Number 1 (pp. 43–54) 51

x0x1x2xn-2yn-1

y0y1y2yn-2yn-1

z0z1z2zn-2zn-1

s0s1s2sn-2sn-1

...

...

c0c1c2cn-1 cn-2 0...

α

β

Generate Sum and

Carry from α,β and operands

Carry Save Adder

zyx

sc

s

c

x+y+z=2c+s

(a)

x0x1x2xn-2yn-1

y0y1y2yn-2yn-1

z0z1z2zn-2zn-1

s0s1s2sn-2sn-1

...

...

c0c1c2cn-2 ...

Carry Save Adder with End Around Carry

zyx

sc

s

c cn-1

(b)

Figure 5. Basic Architecture for (a) a Carry Save Adder and (b) a CSA with End Around Carry.

Table 3. Hardware and delay of various components in the proposed reverse converter.

Component Area Delay Component Area Delay

OPU1 (2n+ 1)AInv 1DInv CSA7 nAFA 1DFA

Comparator 3nAAND + nAOR3 nDAND CSA8 (n− 1)AFA +AXOR 1DFA

+nDOR3 +AAND

CSA1 1AFA 1DFA CSA9 (n− 2)AXOR + 1AXNOR 1DInv

+ (n− 2) (AXNOR +AOR) + (n− 2)AAND + 1AOR

CSA2 2AFA 1DFA MA3 (n− 3)AFA (n− 3)DFA

+1AXOR + 1AAND +1DHA

CSA3 (n− 1)AFA 1DFA OPU5 (2n− 1)AInv 1DInv

CSA4 (n− 1)AFA 1DFA CSA10 (n− 2)AFA 1DFA

+2AXOR + 2AAND

CSA5 (n− 1)AFA 1DFA CSA11 (n− 2)AFA 1DFA

+ (2n+ 2) (AXNOR +AOR)

OPU2 0 0 CSA12 2nAFA 1DFA

+ (n+ 1) (AXOR +AAND)

CSA Tree(n2 − n

)AFA pDFA CSA13 2nAFA 1DFA

+ (n+ 1) (AXOR +AAND)

CSA6 (n− 2)AFA 1DFA CSA14 (3n+ 1)AFA 1DFA

+AXOR +AAND +2 (AXOR +AAND)

MA1 (n+ 1)AFA (2n+ 2)DFA CPA (4n− 2)AFA (4n− 2)DFA

+2 (AXOR +AAND) +1DHA

MA2 (n− 1)AFA (2n− 2)DFA MUX 2:1 0 0

OPU3 (n+ 1)AInv 1DInv OPU4 2AXOR + nAInv 1AXOR3

Page 10: AHighSpeedResidue-to-BinaryConverterforBalanced4-Moduli Setjcomsec.ui.ac.ir/article_21874_ae1328488436df6666e684861e83e600.pdfAdder (CSA) for operations in modulo 2n, CSA with End

52 A High Speed Residue-to-Binary Converter for Balanced 4-Moduli Set — MR. Taheri, N. Shafiee, et al.

Table 4. Hardware requirements and delay of reverse converters.

Moduli Set Design Hardware requirements Delay{2n − 1, 2n, 2n + 1, 2n+1 − 1

}[13]−1 (9n+ 5 + ((n− 4) (n+ 1) /2))AFA (23n+ 12) /2DFA

+ 2nAXNOR + 2nAOR + (6n+ 1)AINV{2n − 1, 2n, 2n + 1, 2n+1 + 1

}[13] 2n2 + 11n+ 3 11.5nDFA{

2n − 1, 2n, 2n + 1, 2n+1 + 1}

[13]−2 (6n+ 7)AINV +(n2 + 12n+ 12

)AFA (16n+ 22)DFA

+2n (AXNOR +AOR) + (4n+ 8)A2:1MUX{2n − 1, 2n, 2n + 1, 2n+1 + 1

}[15] (58n+ 23 + log2 (c+ 1))AFA

(24n+ 17 + logc+1

2

)DFA

{2n − 3, 2n − 1, 2n + 1, 2n + 3} [16]-C1 CE(25.5n+ 12 + 2.5n2

)AFA (18n+ 23)DFA

+5nAHA + 3n (AXNOR +AOR)

{2n − 3, 2n − 1, 2n + 1, 2n + 3} [16]-C2 CE (20n+ 17)AFA + (3n− 4)AHA (13n+ 22)DFA

+2n (5n+ 2)AROM +3DROM

{2n − 3, 2n − 1, 2n + 1, 2n + 3} [16]-C3 CE (23n+ 11)AFA + (2n− 2)AHA (16n+ 14)DFA

+(6n+ 4)2nAROM +DROM{2n − 1, 2n, 2n + 1, 2n+1 − 1

}[14]-3stage-CE (n2 + 10n+ 3)AFA +AHA (9n+ 6 +m)DFA

+(3n+ 2)AINV + 2A2:1MUX{2n, 2n+1 − 1, 2n − 1, 2n−1 − 1

}[17]-D1-C-I

(n2 + 16n+ 6

)AFA + 4nAINV (12n+ 9 + q)DFA

+ (n+ 2) (AXNOR +AOR)

+ (3n− 5) (AXOR +AAND){2n, 2n+1 − 1, 2n − 1, 2n−1 − 1

}[17]-D1-C-III

(n2 + 24n+ 24

)AFA + (2n+ 3)AHA (8n+ 11 + q)DFA

+2(AXNOR +AOR)

+ (2n− 5) (AXOR +AAND)

+ (2n+ 1)A3:1MUX + 4nAINV{2n, 2n+1 − 1, 2n − 1, 2n−1 − 1

}[17]-D1-C-II

(n2 + 22n+ 22

)AFA + (2n+ 2)AHA (8n+ 11 + q)DFA

+10(2n+ 1)AROM + 2 (AXNOR +AOR)

+ (2n− 5) (AXOR +AAND)

+(2n+ 1)A2:1MUX + 4nAINV{2n, 2n+1 − 1, 2n − 1, 2n−1 − 1

}Proposed

(n2 + 21n− 11

)AFA (7n+ 1)DFA

+ (3n+ 1) (AXNOR +AOR) +n (DOR3 +DAND)

+ (6n+ 9)AAND + (3n+ 9)AXOR +2DHA + 4DINV

+nAOR3 + (5n+ 1)AINV

∗m and q are the number of levels in CSA tree with (n+ 2), (n + 1) inputs, respectively.

which confirms the remarkable improvement in termsof speed of the reverse converter. Also degraded hard-ware resources are achieved compared to [13–17].

6 Conclusion

In this paper, the quadruple moduli set Ψ was the fo-cus of study in reducing the computational intensityof the reverse converter design. Ψ has the dynamicrange of 4n and utilizes modulos only in the form of(2k − 1

)beside modulo 2n, which provides efficient

arithmetic operations in RNS channels. The new re-verse converter eliminates the extra intermediate cal-culations. For each level of design, the moduli subsetsare selected to make the design more efficient in bothdelay and the hardware cost. To put everything in thenutshell, the overall area and time complexity analy-sis indicates that the proposed reverse converters aremore efficient than the converters for the 4-moduli set{

2n−1 − 1 , 2n+1 − 1, 2n, 2n − 1}.

Page 11: AHighSpeedResidue-to-BinaryConverterforBalanced4-Moduli Setjcomsec.ui.ac.ir/article_21874_ae1328488436df6666e684861e83e600.pdfAdder (CSA) for operations in modulo 2n, CSA with End

January 2015, Volume 2, Number 1 (pp. 43–54) 53

Table 5. Unit gate area and delay of reverse converters.

Moduli Set Design Unit gate area Unit gate delay{2n − 1, 2n, 2n + 1, 2n+1 − 1

}[13]-1 3.5n2 + 72.5n+ 23 46n+ 24{

2n − 1, 2n, 2n + 1, 2n+1 + 1}

[13]-2 7n2 + 128n+ 146 64n+ 88

{2n3, 2n1, 2n + 1, 2n + 3} [16]C1 CE 17.5n2 + 210.5n+ 84 72n+ 92{2n − 1, 2n, 2n + 1, 2n+1 − 1

}[14]- 3Stage-CE 7n2 + 76n+ 41 36n+ 24 + 4m∗{

2n, 2n+1 − 1, 2n − 1, 2n−1 − 1}

[17]D1-C-I 7n2 + 136n+ 30 48n+ 36 + 4q∗{2n, 2n+1 − 1, 2n − 1, 2n−1 − 1

}[17]D1-C-II 7n2 + 204n+ 174 32n+ 44 + 4q∗{

2n, 2n+1 − 1, 2n − 1, 2n−1 − 1}

Proposed 7n2 + 169n− 47 30n+ 4

∗m and q are the number of levels in CSA tree with (n+ 2), (n + 1) inputs, respectively.

Acknowledgements

The authors are grateful to the anonymous reviewers’valuable comments and suggestions that improved thequality of manuscript. Also the authors would liketo thank Dr. B. Yoberd and Ms. F. Shaker for theirliterature contributions.

References

[1] MA Bayoumi and P Srinivasan. Parallel arith-metic: from algebra to architecture. In Circuitsand Systems, 1990., IEEE International Sympo-sium on, pages 2630–2633. IEEE, 1990.

[2] T Stouraitis and V Paliouras. Considering thealternatives in low-power design. Circuits andDevices Magazine, IEEE, 17(4):22–29, 2001.

[3] Behrooz Parhami. Computer arithmetic: algo-rithms and hardware designs. Oxford UniversityPress, Inc., 2009.

[4] Mi Lu. Arithmetic and logic in computer systems,volume 169. John Wiley & Sons, 2005.

[5] Richard Conway and John Nelson. Improvedrns fir filter architectures. Circuits and SystemsII: Express Briefs, IEEE Transactions on, 51(1):26–28, 2004.

[6] Ricardo Chaves and Leonel Sousa. Rdsp: A riscdsp based on residue number system. In DigitalSystem Design, 2003. Proceedings. EuromicroSymposium on, pages 128–135. IEEE, 2003.

[7] Wei Wang, MNS Swamy, and M Omair Ah-mad. Rns application for digital image process-ing. In System-on-Chip for Real-Time Applica-tions, 2004. Proceedings. 4th IEEE InternationalWorkshop on, pages 77–80. IEEE, 2004.

[8] Sung-Ming Yen, Seungjoo Kim, Seongan Lim,and Sang-Jae Moon. Rsa speedup with chinese re-mainder theorem immune against hardware faultcryptanalysis. Computers, IEEE Transactions

on, 52(4):461–472, 2003.[9] Mohammad Esmaeildoust, Dimitrios Schini-

anakis, Hamid Javashi, Thanos Stouraitis, andKeivan Navi. Efficient rns implementation of el-liptic curve point multiplication over. Very LargeScale Integration (VLSI) Systems, IEEE Trans-actions on, 21(8):1545–1549, 2013.

[10] Javier Ramırez, Antonio Garcıa, U Meyer-Baese,and A Lloris. Fast rns fpl-based communicationsreceiver design and implementation. In Field-Programmable Logic and Applications: Reconfig-urable Computing Is Going Mainstream, pages472–481. Springer, 2002.

[11] Keivan Navi, Amir Sabbagh Molahosseini, andMohammad Esmaeildoust. How to teach residuenumber system to computer scientists and engi-neers. Education, IEEE Transactions on, 54(1):156–163, 2011.

[12] MohammadReza Taheri, Elham Khani, Moham-mad Esmaeildoust, and Keivan Navi. Effi-cient reverse converter design for five moduli set{2n, 22n+1−1, 2n/2−1, 2n/2+1, 2n+1}. Journalof Computations & Modelling, 2(1):93–108, 2012.

[13] PV Ananda Mohan and AB Premkumar. Rns-to-binary converters for two four-moduli sets {2 n-1, 2 n, 2 n+ 1, 2 n+ 1- 1} and {2 n- 1, 2 n, 2 n+1, 2 n+ 1+ 1}. Circuits and Systems I: RegularPapers, IEEE Transactions on, 54(6):1245–1254,2007.

[14] B Cao, T Srikanthan, and CH Chang. Efficientreverse converters for four-moduli sets {2n- 1,2n, 2n+ 1, 2n+ 1- 1} and {2n- 1, 2n, 2n+ 1, 2n-1- 1}. IEE Proceedings-Computers and DigitalTechniques, 152(5):687–696, 2005.

[15] M Bhardwaj, Thambipillai Srikanthan, andChristopher T Clarke. A reverse converter for the4-moduli superset {2 n-1, 2 n, 2 n+ 1, 2 n+ 1+ 1}.In Computer Arithmetic, 1999. Proceedings. 14thIEEE Symposium on, pages 168–175. IEEE, 1999.

Page 12: AHighSpeedResidue-to-BinaryConverterforBalanced4-Moduli Setjcomsec.ui.ac.ir/article_21874_ae1328488436df6666e684861e83e600.pdfAdder (CSA) for operations in modulo 2n, CSA with End

54 A High Speed Residue-to-Binary Converter for Balanced 4-Moduli Set — MR. Taheri, N. Shafiee, et al.

[16] PV Ananda Mohan. New reverse converters forthe moduli set {2n-3, 2n-1, 2n+ 1, 2n+ 3}. AEU-International Journal of Electronics and Commu-nications, 62(9):643–658, 2008.

[17] Mohammad Esmaeildoust, Keivan Navi, Moham-madReza Taheri, Amir Sabbagh Molahosseini,and Siavash Khodambashi. Efficient rns to bi-nary converters for the new 4-moduli set {2n, 2n+1-1, 2n-1, 2n-1-1}. IEICE Electronics Express, 9(1):1–7, 2012.

[18] Lampros Kalampoukas, Dimitris Nikolos, CostasEfstathiou, Haridimos T Vergos, and John Kala-matianos. High-speed parallel-prefix modulo 2n-1adders. IEEE Transactions on Computers, (7):673–680, 2000.

[19] Costas Efstathiou, Haridimos T Vergos, and Dim-itris Nikolos. Fast parallel-prefix modulo 2 n+ 1adders. Computers, IEEE Transactions on, 53(9):1211–1216, 2004.

[20] Riyaz Patel, Mohammed Benaissa, Neil Powell,Said Boussakta, et al. Novel power-delay-area-efficient approach to generic modular addition.Circuits and Systems I: Regular Papers, IEEETransactions on, 54(6):1279–1292, 2007.

[21] W Kenneth Jenkins and Benjamin J Leon. Theuse of residue number systems in the design offinite impulse response digital filters. Circuitsand Systems, IEEE Transactions on, 24(4):191–201, 1977.

[22] Mohammad Esmaeildoust, Keivan Navi, and Mo-hammadReza Taheri. High speed reverse con-verter for new five-moduli set {2n, 22n+ 1-1,2n/2-1, 2n/2+ 1, 2n+ 1}. IEICE Electronics Ex-press, 7(3):118–125, 2010.

[23] NAVI Keivan, Mohammad Esmaeildoust, andAmir Sabbagh Molahosseini. A general reverseconverter architecture with low complexity andhigh performance. IEICE TRANSACTIONS onInformation and Systems, 94(2):264–273, 2011.

[24] Amir Sabbagh Molahosseini, Keivan Navi, ChitraDadkhah, Omid Kavehei, and Somayeh Timarchi.Efficient reverse converter designs for the new 4-moduli sets and based on new crts. Circuits andSystems I: Regular Papers, IEEE Transactionson, 57(4):823–835, 2010.

MohammadReza Taheri received his B.Sc.

in Computer Hardware Engineering from Is-fahan University, Isfahan, Iran. He obtained

his M.Sc. degree in Computer System Archi-

tecture from Science and Research Branch ofIslamic Azad University, Tehran, Iran. He is

currently pursuing his Ph.D. Degree in Com-puter Architecture at Shahid Beheshti Uni-

versity, Tehran, Iran. He is also a member of

the Nanotechnology and Quantum Computing Laboratory ofShahid Beheshti University since 2009. His current research

interests include residue number system, low power arithmetic,

approximate computing, and circuit techniques for emergingtechnologies.

Nasim Shafiee received the B.Sc degree in

computer hardware engineering from ShahidBeheshti University, Tehran, Iran in 2014. She

is a member of the Nanotechnology and Quantum Computing

Laboratory of Shahid Beheshti University since 2013. Herresearch interests include low power computer arithmetic,

approximate computing, and robotic.

Mohammad Esmaeildoust received his

M.Sc. degree in Computer architecture atShahid Beheshti University of Technology,Tehran, Iran, in 2008. He also received the

Ph.D. degree in computer architecture fromShahid Beheshti University of Technology,Tehran, Iran, in 2012. He is currently As-sistant Professor in faculty of Marine Engi-

neering, Khorramshahr University of MarineScience and Technology. His research interests include VLSIdesign, Cryptography, Network security, computer arithmetic.

Zhale Amirjamshidi earned her M.Sc. inelectronic engineering from Central Branch of

Islamic Azad University, Tehran, Iran. She iscurrently pursuing the Ph.D. degree in electronic engineeringat Iran University of Science and Technology, Tehran, Iran.

Her research interests are mainly focus on low power digitalarithmetic and renewable energy.

Reza Sabbaghi-nadooshan received his

B.Sc. and M.Sc. in electrical engineering fromthe Iran University of Science and Technology,

Tehran, Iran, in 1991 and 1994 and Ph.D. in

Electrical Engineering from the Science andResearch Branch, Islamic Azad University,Tehran, Iran in 2010. From 1998, he became

faculty member of Department of Electronicsin Central Tehran branch, Islamic Azad University, Tehran,

Iran. His current research interests include nanocomputingand networks-on-chips. He is a member of IEEE.

Keivan Navi received the B.Sc. and M.Sc.

degrees in computer hardware engineering

from Beheshti University, Tehran, Iran, in1987 and Sharif University of Technology,

Tehran, Iran, in 1990, respectively. He also re-ceived the Ph.D. degree in computer architec-ture from Paris XI University, Paris, France,

in 1995. He is currently Professor in faculty

of electrical and computer engineering of Beheshti University.His research interests include VLSI design, single electron

transistors (SET), carbon nanotube, computer arithmetic, in-terconnection network and quantum computing.