design of a low power processor for trigonometric functions for

Design of a low power processor fortrigonometric functions for hearing

aids

Anders Torp

Kongens Lyngby 2008IMM-Master Thesis-2008-33

Technical University of DenmarkInformatics and Mathematical ModellingBuilding 321, DK-2800 Kongens Lyngby, DenmarkPhone +45 45253351, Fax +45 [email protected]

Abstract

Some of the functionalities in the next generation of hearing aids rely on trigono-metric functions. Therefore, special purpose processors with dedicated hardwarefocusing on low power are required. The implementation of these functions canbe based on either the CORDIC [30] algorithm or function approximation [9].The latter requires a multiplication and is therefore not suitable for a low powerimplementation. The CORDIC algorithm on the hand is extremely flexible asit can evaluate a large portfolio of functions [31] with little hardware. However,most of the work done so far has focused on implementations that are suit-able for high-speed applications. Limited work has focused on the low powercapabilities of the CORDIC algorithm.

The first part of this thesis accounts for the different CORDIC algorithms thatcan be used in a low power processor. Based on this work the thesis proposes anew modified version of the CORDIC algorithm called the radix-8 D-CORDICalgorithm. The new algorithm solves some of the drawbacks that the originalCORDIC suffer from, such as the dependency of a scaling factor and the highnumber of iterations. For the new algorithm, the scaling factor is easy to evalu-ate, as it does not require additional logic, in addition, the required number ofiterations is reduced by 67%. The algorithm is not restricted to functions in thecircular coordinate system but has been extended to the hyperbolic and linearsystems as well.

The algorithm has been implemented in VHDL and synthesized at a clock fre-quency of 50 MHz with a 1.0 V 90 nm process. The processor is implementedwith only 5,650 gates and has a power consumption of 0.879mW in rotationmode. In comparison with the CORDIC algorithm, this is a reduction by 32%.

Preface

This thesis was prepared at the Department of Informatics and MathematicalModelling, at the Technical University of Denmark in partial fulfillment of therequirements for acquiring the M.Sc. degree in engineering.

The thesis deals with the design of a low power numerical processor capable ofevaluating trigonometric functions suitable hearing aids. The main focus hasbeen on analysis of different CORDIC algorithms and the presentation of a newradix-8 D-CORDIC algorithm.

The thesis was completed in the period from the 1st of September 2007 to 31stof Marts 2008, with Associate Professor Alberto Nannarelli as supervisor.

Lyngby, Marts 2008

Anders Torp

Acknowledgements

I would like to thank my supervisor Alberto Nannarelli first and foremost forsupport and guidance during the thesis period and for the help finding an ex-citing and challenging project.

Also thanks to my fellow students Michael Reibel Boesen and Niels BrownVillumsen for constructive and valuable critique of the report.

A special thanks to the woman of my life, Camilla. For her patience and appre-ciation for all the time I have spend on this thesis instead of her. Particularlyin the stressed final months of the project.

Contents

Abstract i

Preface iii

Acknowledgements v

1 Introduction 1

1.1 Thesis objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Thesis overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 CORDIC algorithms 9

2.1 CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Support for arccos and arcsin . . . . . . . . . . . . . . . . . . . . 16

2.3 Radix-4 D-CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 Radix-8 D-CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5 Scaling Free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

viii CONTENTS

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Design 33

3.1 Numerical processor architecture . . . . . . . . . . . . . . . . . . 34

3.2 CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3 D-CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.4 Radix-4 D-CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5 Radix-8 D-CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6 Scaling free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4 Optimization for low power 61

4.1 Gray coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2 Toggle prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.3 Other circuit transformations . . . . . . . . . . . . . . . . . . . . 67

5 Results 73

5.1 Results from low power optimization . . . . . . . . . . . . . . . . 76

6 Future work 79

6.1 Increased flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.2 Further optimization of the algorithm . . . . . . . . . . . . . . . 83

6.3 The system in perspective . . . . . . . . . . . . . . . . . . . . . . 84

7 Conclusion 87

CONTENTS ix

A Symbols and Notations 89

B Radix-8 D-CORDIC 91

C Matlab code 95

D VHDL 125

D.1 CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

D.2 D-CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

D.3 Radix-4 D-CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . 200

D.4 Radix-8 D-CORDIC . . . . . . . . . . . . . . . . . . . . . . . . . 248

D.5 Scaling free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

x CONTENTS

Chapter 1

Introduction

The change from analog to digital programmable hearing aids has expandedthe possibilities for hearing aid manufacturers to increase the flexibility of thehearing aids. An analog hearing aid receives the sound through a microphonethat transforms the acoustic energy into an electrical signal. This electricalsignal is sent through an amplifier, which increases the signal strength (makesthe sound louder) before the signal is sent into a transducer. The transducerconverts the electric signal back to sound that is passed into the ear. Theconstant development in DSP technology has opened up the possibilities ofdigital hearing aids. In a digital hearing aid, the sound is still analog, but digitalprogramming allows greater flexibility since the sound processing is done in thedigital domain. This means that the hearing aid can support functionalitiessuch as preset programs for different situations, which can change the outputsound to fit the environment. For instance, programs for a quiet conversationand a noisy restaurant can make the daily usage a more pleasant experience forthe hearing impaired patients. An analog hearing aid can do many of the samefunctions as a digital hearing aid, but the digital hearing aid does it significantlybetter. The advantage of digital hearing aids is that complex processing can beimplemented with less power and circuit area.

Some of the most common functionalities in digital hearing aids are noise re-duction, compression and feedback elimination [12].

2 Introduction

• Noise reduction can remove some of the undesirable noise from the sur-roundings, which for instance could come from a car or background music.For hearing impaired patients this noise can be a particularly big prob-lem. In analog hearing aids the noise problem is solved by low-frequencyfiltering, which unfortunately also cuts out some of the low-frequencyspeech [12]. Digital hearing aids on the other hand attempts to differ-entiate the noise from the speech. This is possible because noise is a clearconstant input in contrast to speech that modulates.

• Compression is used as sound compensation for the hearing impaired.Since most hearing impaired has a reduced dynamic hearing range, theoutput should be limited so the sound remains comfortable. In the oldhearing aids, this could be solved by peak chopping, where the soundwaves are chopped off [12]. It is an effective technique but the sound isat the same time distorted. Compression techniques in a digital hearingaid imply that softer sounds are amplified more than louder sounds. Theprincipal is to split the input into different frequency widths, and thenmanipulate each band individually.

• Feedback elimination is used to control acoustic feedback that occurs whenthere exist a loop between an audio input and an audio output. Peopleusing hearing aids often experience this annoying sound as whistling, howl-ing or screeching. In analog devices the problem is solved by reducing themaximum system gain or reducing the acoustic leakage from the hous-ing1 [16]. In a digital hearing aid, the DSP can be used to produce acounter-phase signal that cancels the feedback signal. This means that ahigher system gain can be achieved, which helps the profoundly impairedpatients.

The above listed features require complex computations, computed on a lowpower processor. From an engineering point of view, this is challenging becausethe processor in size should be small enough to fit into an ear, work on a lowvoltage and run on a standard battery for at least 100 hours [16]. This means,that to continue the further improvement of the hearing aids, the limited powersupply has to be utilized even more to allow more complex algorithms to beimplemented. These algorithms relies on elementary functions, and in a devicewhere an increase in power consumption limits the possibility to implementfurther functionalities, a hearing aid has to have a good ratio between highflexibility and low power consumption.

Previously an elementary function could be implemented with a lookup table,but that is not feasible since it limits the precision on the stored values. There-fore, research in dedicated hardware focusing on low power is crucial to further

1The housing is the plastic shell that contains the components.

1.1 Thesis objective 3

improve and enhance the hearing aids. A better way to implement elementaryfunctions with focus on low power is therefore important as it would permitfunctionalities that offer more convenience for the user.

Trigonometric functions are some of these elementary functions as they can beused in for instance tone generation. Implementation of these functions can bemade with methods such as function approximation with lookup tables [22] or bythe CORDIC [30] algorithm. Using tables and function approximation requireslarge tables and multipliers, which means that it is an unfeasible solution for alow power implementation.

The CORDIC algorithm presented by Volder [30] in 1959 made it possible tointerpret the evaluation of trigonometric functions as an iterative approach con-sisting of only shift and add operations [33]. This made it very suitable fora VLSI implementation and CORDIC is considered the most appropriate al-gorithm for trigonometric functions when there is no multiplier available, andarea should be kept at a minimum. The algorithm has found its ways intonumerous products for example calculators [1] and 3D graphics [9]. The basicalgorithm permits evaluation of sin and cos and with some modifications acos,asin, hyperbolic functions, logarithms, exponentials, square root, division andmultiplication, are possible with the addition of extra logic [9]. This makes thealgorithm extremely useful for implementing multiple functions when little areais available.

1.1 Thesis objective

The objective with this thesis is to find the best suited algorithm for a low powernumerical processor for trigonometric functions. The processor should have agood ratio between power consumption and flexibility. The target applicationfor the processor is tone generation in digital hearing aids.

CORDIC is the most common algorithm for designing a numerical processorfor trigonometric functions [1], and is the background for this thesis. However,the algorithm described in 1959 is far from perfect. Ever since the CORDICalgorithm was developed, there have been many attempts to solve the draw-backs, which are the low throughput and the dependency of a scaling factor.This means that there exist numerous architectures that all try to solve theproblems in their own way. Worth mentioning is the scaling free CORDIC[17]relying on Taylor series expansion, redundant versions making double CORDICrotations[28] and radix-4 implementations decreasing the required number ofiterations. Most of the work has focused on techniques to decrease the delay,

4 Introduction

Function Exact Input Exact Output Reduced inputrange range range

cos, sin [−∞;∞] [−1; 1] [−π;π]acos [−1; 1] [0;π]asin [−1; 1] [−π/2;π/2]atan [−∞;∞] ]0;π[ [−1; 1]

Table 1.1: Input and output range for the five most common trigonometricfunctions.

and make the CORDIC algorithm more beneficial for high-speed applications.Little research is however, focusing on algorithms appropriate for low powersolutions. When designing for low power, most of the power gain is achievedin the algorithmic level [26]. This thesis will therefore start with a thoroughanalysis of different CORDIC algorithms, to find which solution should be thefoundation for the final implementation. The analysis is then used to choosethe final architecture, which is optimized for low power.

The CORDIC algorithm is very flexible and is able to evaluate trigonometricfunctions, logarithm, division, square root etc [31]. However, some of the func-tions are more complex to implement, hence this thesis will also focus on thetrade-off for adding some of these complex functions. To fulfill the objective,the thesis is divided into three smaller parts:

1. Analysis of different algorithms appropriate for a low power implementa-tion of trigonometric functions. This includes analysis using Matlab tostudy the accuracy and behavior of the algorithms (Chapter 2).

2. VHDL implementation of the algorithms and discussion on the chosenalgorithm to further optimize for low power (Chapter 3).

3. Low power optimization of the chosen algorithm (chapter 4).

1.1.1 Design constraints

The numerical processor should be able to run at a clock frequency of 50 MHzwith a 1.0 V 90 nm process. A result with a precision of 16-/24-bit should beready within 4/6 cycles. Multiplication is not allowed and function approxima-tion can therefore not be used.

The data representation is two’s complement fixed point, where the most signif-icant bit (MSB) is the sign bit followed by the integer and the fractional bits.

1.1 Thesis objective 5

Figure 1.1: The use of argument reduction to extend the convergence rangefrom [0;π/4] to [−π;π]. The unit circle is divided into 8 partitions.

The bit length of the integer and fractional part depends on which functions areimplemented. In addition the input and output length might not be the samefor each of the different functions. Table 1.1 shows the required data represen-tation for the five trigonometric functions cos, sin, acos, asin and atan. It showsfor instance that the output from cos always will be in the range [−1; 1], henceonly one integer bit is required for the implementation of cos.

In reality there is no restrictions on the input range for cos, sin and atan.However, for simplification the input ranges are reduced for these functions asillustrated in the right column in table 1.1. The input range for cos and sinevaluation can actually be made even smaller by argument reduction. Thisproperty is possible because of the symmetry in the unit circle, which is illus-trated in figure 1.1. The argument reduction technique takes advantage of thesymmetry from the unit circle and that the CORDIC algorithm evaluates cosand sin in parallel (this will be explained in detail in section 2.1). This meansthat a result in partition 2-8 (see figure 1.1), can be directly linked to an equiv-alent value in partition 1. In the figure xf = − sin means, that the final valuefor the x variable is equal to − sin in the first partition. For instance the valueof cos(7/8π) located in the 4th partition is equal to − sin(3/8π) from the 1stpartition, as illustrated in figure 1.1. The difference between the evaluation of apositive and negative angle is also noticeable in the figure. The only difference

6 Introduction

is that evaluation of sin becomes negative. The argument reduction means thata convergence range of [0;π/4] is enough to guarantee convergence in the entireunit circle.

Even though the input and output range is known, the internal bit width cannotbe chosen at this stage. Instead it is found with Matlab analysis in section 2.6.Some assumptions have been made for the input and output vector

• Positive and negative values occur equally often.

• Knowledge about the distribution of the input vector might make some lowpower optimization possible, but since this information is not available, itis assumed that the input vector is equally distributed within the entirerange.

• Even though the CORDIC algorithm often evaluates two functions in par-allel (for instance cos and sin) results can only be extracted one at thetime.

1.2 Thesis overview

The rest of the report is structured in the following way:

Chapter 2 introduces the algorithms. The chapter begins with the basic COR-DIC algorithm, the D-CORDIC, the radix4/8 D-CORDIC and finally thescaling free algorithm. Matlab analysis is used to verify and clarify the be-havior of the algorithms which are the mathematically background behindthe VHDL implementations.

Chapter 3 describes the most important design considerations. The imple-mentations have been synthesized in Synopsys, for power, timing and areacharacteristics. The chapter contains a summary with the results and themost important observations. At the end, the chapter concludes on theresults and the most optimal solution is found.

Chapter 4 shows different low power techniques and their effect on the chosenalgorithm. Some of the techniques used are Gray coding, clock gating andtoggle prevention.

Chapter 5 presents the obtained results from the synthesis and discusses themost important observations.

1.2 Thesis overview 7

Chapter 6 extends the CORDIC algorithm to support hyperbolic and linearcoordinates. There is also a discussion on how to further optimize theCORDIC algorithm and a perspective view on possible applications thatmight benefit from this thesis.

Chapter 7 contains the conclusion of this thesis.

Appendix A starting on page 95 lists the most important symbols and nota-tions used in this thesis.

Appendix B shows a summary of the radix-8 D-CORDIC algorithm for thecircular, hyperbolic and linear coordinate systems.

Appendix C contains the Matlab files used in chapter 2 to analyze and verifythe behavior of the different algorithms.

Appendix D contains the VHDL files used for synthesis.

8 Introduction

Chapter 2

CORDIC algorithms

The COordinate Rotation DIgital Computer (also known as the CORDIC algo-rithm) was first described in 1959 by Volder [30] as an elegant way to evaluatetrigonometric functions. It was originally developed to replace the analog navi-gation computer on the B-58 aircraft bomber due to a need for higher accuracyand higher performance [29]. In 1971 Walther [31] extended the CORDIC algo-rithm to hyperbolic functions, and the algorithm is today used in many applica-tions such as calculators and robotics[1]. The algorithm belongs to the class oflinear convergence algorithms and can likewise be implemented using only shiftand add operations making it suitable for VLSI implementations.

This chapter will begin with a description of the basic CORDIC algorithm thatVolder presented. In section 2.2 the Double-CORDIC algorithm is presentedsince the basic CORDIC algorithm does not support acos and asin. Unfortu-nately support for particularly these two functions has a high cost in terms ofcomplexity and that should be included into the considerations when choos-ing the final algorithm. One of the problems with the CORDIC and Double-CORDIC algorithms is the high number of iterations, since the algorithms onlyevaluates one bit pr. iteration. A higher radix could be an obvious solutionand has been proposed in [3], however, it complicates the calculation of thescaling factor and requires a final multiplication. In section 2.3 and 2.4 thisthesis presents two new unified algorithms which extends the Double-CORDICalgorithm to radix 4 and 8. The unified algorithms takes advantages of the

10 CORDIC algorithms

Figure 2.1: General rotation of a vector (a), rotation of vector v0 = [1 0]leading to the evaluation of cos θ and sin θ (b).

simplified scaling factor that the D-CORDIC algorithm introduce. A scalingfree CORDIC algorithm is presented in section 2.5 which is independent of ascaling factor and very suitable for a low power implementation since it can skipunnecessary iterations. On average 50% of the iterations are skipped which isverified through a Matlab analysis.

The algorithms presented in this chapter is the mathematical background forthe VHDL implementations described in chapter 3 and the most importantobservations are summarized in section 2.6.

2.1 CORDIC

The CORDIC algorithm is based on vector rotation and it permits the evaluationof trigonometric functions, hyperbolic functions, multiplication, division andwith small modifications the square root, the exponential and logarithm can alsobe derived. For simplicity the focus in this chapter will be on the trigonometricfunctions cos, sin, acos, asin and atan1. A rotation of a vector v is illustratedin figure 2.1(a) where v = [x y] is the starting vector and v′ = [x′ y′] isthe the same vector rotated by an angle θ. A vector rotation can be derivedmathematically as a Givens rotation [30].[

x′

y′

]=[

cos θ − sin θsin θ cos θ

] [xy

](2.1)

which is equivalent tox′ = x cos θ − y sin θy′ = y cos θ + x sin θ (2.2)

1This means that the algorithm is restricted to the circular coordinates. See chapter 6 fora discussion on the hyperbolic and linear coordinates.

2.1 CORDIC 11

i Angle αi

0 45◦ 0.785401 26.565◦ 0.463652 14.036◦ 0.244983 7.1250◦ 0.12435

Table 2.1: Precomputed values of αi = atan(2−i) stored in a lookup table andthe corresponding value in degrees.

If the starting vector v0 = [1 0], as the vector in figure 2.1(b), the circle willbe equivalent to the unit circle, where the final value of x′ and y′ correspondsto the evaluation of cos θ and sin θ after rotating v0 by the angle θ.

Equation 2.2 is called a perfect rotation, but is unfeasible to implement. Itrequires the evaluation of cos θ and sin θ (which we would like to evaluate), fourmultiplications and two additions. At this point nothing is simplified, howeverif the complete rotation angle θ is partitioned into smaller rotations αi andθ =

∑∞i=0±αi, equation 2.2 can be described as an iterative algorithm (with

the help of tanx = sin xcos x ).

xi+1 = cosαi(xi − yi tanαi)yi+1 = cosαi(yi + xi tanαi)

(2.3)

Because a rotation can be either positive or negative, σi is introduced as therotation direction and for a radix-2 implementation σi = {−1, 1}. Restrictingthe rotations so that αi = atan(2−i), the parenthesis with the multiplication isreduced to a simple shift-and-add operation.

xi+1 = Ki(xi − σiyi2−i)yi+1 = Ki(yi + σixi2−i)

(2.4)

Here each iteration is multiplied with Ki = cos(atan(2−i)) which is denotedthe scaling factor. Removing the scaling factor yields an iterative shift-and-addalgorithm for vector rotation, which is easily implemented in hardware. Thescaling factor is constant as long as i remains constant and the scaling factorcan be compensated for in other parts of the circuit. This could for example bein an existing multiplier or by adding extra logic after the CORDIC iterations.However, the scaling factor is still one of the major drawbacks with the CORDICalgorithm, in particular when optimizations or extensions to the algorithm areconsidered. This will be explained further in section 2.1.2.

A third iterative component is needed to keep track of the rotations of theangle. The rotations are accumulated with the help of a lookup table holdingthe values of αi = atan(2−i). The values for the first four rotations are listed


Figure 2.2: First 3 rotations for: rotation mode(a) and vectoring mode(b).

in table 2.1 and it should be immediately clear that the next rotation αi+1 isalways smaller than the previous rotation αi. The required size of the lookuptable is the total number of iterations n multiplied by the internal word lengthW . The accumulation of the residual angle can be implemented with only oneextra adder.

zi+1 = zi − σiαi (2.5)

To sum up, the total requirements in terms of hardware to implement oneCORDIC iteration is: 2 shift and 2 adders from equation 2.4 and a lookuptable and 1 adder from equation 2.5. Equation 2.6 shows a complete CORDICiteration.

xi+1 = xi − σiyi2−iyi+1 = yi + σixi2−i

zi+1 = zi − σiαi(2.6)

The CORDIC algorithm can run in either rotation or vectoring mode whichyields two different outputs. The most common is rotation mode as it evaluatescos and sin. In rotation mode the initial vector is rotated by an angle z0 = θ.The objective is to find the coordinates [x′ y′], which are identical to thecos and sin values. At each rotation the recurrence zi+1 keeps track of theresidual angle (missing rotations) and because the algorithm makes one rotationin each iteration the next rotation angle could be negative. This means thatthe sign of zi (also denoted σi), decides whether the next rotation is clockwiseor counter clockwise. The algorithm ends when zi = 0, but for a hardwareimplementation this is simplified to a constant number of iterations, as will beexplained in section 2.1.1. An example is illustrated in figure 2.2(a), where thefirst 3 rotations are illustrated . The first rotation v1 is always positive, and inthis example followed by a negative (v2) and positive rotation (v3). The finalvalues for the CORDIC algorithm in rotation mode is

xf = K(x cos θ − y sin θ)yf = K(x sin θ + y sin θ)zf = 0

(2.7)

2.1 CORDIC 13

With a starting point of v0 = [1 0], the requested values of cos θ and sin θ arecomputed after compensation for the scaling factor.

In vectoring mode the initial vector is rotated towards the x-axis (yf = 0). Youcould say that vectoring is the contrary of the rotation mode. This means thatthe y variable decides the rotation direction and σi is now equal to the inversesign of yi. The rational explanation for this can be seen on figure 2.2(b) wherethe first 3 rotations are illustrated. When yi is negative the vector is below thex-axis and a counter clockwise rotation is needed, hence σi must be positive. xfis now the magnitude of the vector and the total angle rotated after iteration iis accumulated in zi.

xf = K√

(x20 + y2

0)yf = 0zf = z0 + atan

(y0x0

) (2.8)

If the initial values are x0 = 1 and z0 = 0, the evaluation of atan(y0) is accu-mulated in zf . What should be clear from equation 2.8 is that no scaling factoris needed for the evaluation of zf .

2.1.1 Convergence range and precision

The convergence range depends on the number of iterations, but if we assumean infinite amount of iterations the maximum value for the angle, would be asum of all possible angles [9].

θmax =∞∑i=0

atan(2−i) ≈ 1.743 ≈ 99.9◦ (2.9)

Therefore, an angle θ < θmax must be guaranteed convergence. To guaranteeconvergence the set of rotations have to satisfy that the residual angle αi isfeasible. This is given by equation 2.10 [7].

αi −n−1∑j=i+1

αj ≤ αn−1 (2.10)

The maximum negative angle occur when zi = 0 which means that the sumof the remaining rotations have to be large enough to guarantee that the finalangle returns to zero again or within the desired accuracy. Equation 2.10 is


equivalent to

atan(2−i)−n−1∑j=i+1

atan(2−j) ≤ atan(2n−1)

atan(2−i) ≤ atan(2n−1) +n−1∑j=i+1

atan(2−j)

(2.11)

and since we can always make n large enough to satisfy 2.11, the algorithmconvergence as long as the rotation angle is smaller than θmax.

Two types of errors occur in the CORDIC algorithm [13]. One being the ap-proximation error due to the finite number of iterations and the other being thetruncation error, because only a finite precision is guaranteed in any datapath.Since the algorithm is reduced to a fixed number of iterations, the rotated angleθ is approximated by an error

δ = θ −n−1∑i=0

atan(2−i) (2.12)

Where δ is the error from the approximation, and the accuracy of the outputis therefore limited by the magnitude of the last rotation atan(21−n). The lastrotation is restricted by

2−n < atan(21−n) < 21−n (2.13)

This means, that for n-bit accuracy n+ 1 iterations are required. To guaranteean n-bit result an internal word length of n bits is not enough. It has beenproven that to reach a precision of b bits the internal datapath should have(b + log b + 2) bits [13]. With the design constrains for this thesis it wouldrequire a word length of 29-bits and 25 iterations to obtain a 24-bit result.

2.1.2 Scaling factor

At first glance the scaling factor Ki introduced in equation 2.4, does not seemto cause a huge extra effort. The scaling factor can easily be compensated for bya multiplication at the end of the CORDIC algorithm iterations. It is constantand it does not depend on the rotation angle or direction which is shown in 2.14.

K =∞∏i=0

cos(atan(2−i)) =∞∏i=0

1√1 + σ2

i 2−2i(2.14)

However, the scaling factor is only constant if the exact same iterations areevaluated. For the CORDIC algorithm this is not a problem, but it could be a

2.1 CORDIC 15

problem if some of the rotations are skipped to reduce the power consumption(required accuracy is achieved before i = n). This means that the scalingfactor becomes variable and would either have to be evaluated together withthe CORDIC iterations or added in a lookup table. The lookup table couldeasily grow large when considering the different number of possibilities for thescaling factor, so it is not a feasible solution2. The evaluation of a variablescaling factor is complicated and requires an effort which is considered to be1/3 of the whole CORDIC algorithm itself [7].

The scaling factor can be simplified and evaluated without adding to much extralogic. First of all, since it is a constant multiplication the compensation could beput before the CORDIC rotations begin. This means, that instead of initializingx0 = 1 and y0 = 0, compensation for the scaling factor could be implementedby initializing x0 = 1 ·K = K and y0 = 0 ·K = 0. If the numerical processorshould be able to rotate any vector and not only [1, 0] this solution would notbe possible. But since the purpose in this thesis is evaluation of trigonometricfunctions and not vector rotation in general this is not an obstacle.

The numerical processor should evaluate 4 bits pr. cycle and therefore, fourCORDIC iterations are needed pr. cycle. This is expensive and one solutioncould be to implement a higher radix CORDIC algorithm. However, as ex-plained in the beginning of this chapter, the scaling factor becomes a problem,when trying to implement other variations of the CORDIC algorithm. Theproblem is particularly noticeable for a higher radix implementation. One im-portant factor that makes the scaling factor constant in equation 2.14 is σi. Fora radix-2 implementation σi = {−1, 1}, which has no effect because σi is raisedto the power of two, hence it is constant. For a higher radix this is no longerthe case as shown in 2.15.

σradix 2 = {−1, 1}σradix 4 = {−2,−1, 0, 1, 2}σradix 8 = {−4,−3− 2,−1, 0, 1, 2, 3, 4}

(2.15)

This means that it is no longer enough to only compensate for the scaling factor,but also to evaluate it in parallel with the CORDIC algorithm itself. Sincedivision and square root both occur in equation 2.14 it begins to get expensiveand complicated. Radix-4 CORDIC algorithms have been proposed for high-speed processors, since the number of required iterations is reduced by 50%. Theevaluation of the scaling factor can be implemented by a lookup table, shift andadders and computed in parallel as presented in [3]. A disadvantage is that thissolution still requires a multiplication after the CORDIC iterations and that isthe reason that it is not considered a solution for a low power implementation.A combined radix-2/radix-4 CORDIC algorithm [2][15] has the benefit of a

2The possible outcomes for the scaling factor in 25 iterations is 225 = 33, 554, 432


Figure 2.3: Detection of rotation direction: Wrong comparison between xi ≥ tdue to the scaling factor(a) and correct comparison, since ti is corrected byscaling factor compensation (b).

constant scaling factor, because the scaling factor only depends on the first n/2iterations [15]. However, this approach only reduces the number of iterationsby 25 percent compared to the original CORDIC algorithm.

The Double-CORDIC algorithm presented in the next section has the benefitof a more simple scaling factor, and that property is enhanced for a unifiedradix-4/8 Double-CORDIC algorithm in section 2.3 and 2.4.

2.2 Support for arccos and arcsin

To support evaluation of the inverse trigonometric functions acos and asin,the CORDIC algorithm has to be extended, and unfortunately it increases thecomplexity significantly. Two different approaches exist, one based on Double-CORDIC rotations [19] and another using the basic CORDIC module, but withextra computation for the rotation direction [14]. The latter will not be de-scribed as it requires a multiplier. For simplicity the following section onlyfocus on the evaluation of acos since the correlation

asin(t) = π2 − acos(t) (2.16)

means that we can get asin with only one subtraction [25]. Even though theevaluation of acos could be considered as the opposite evaluation of cos thealgorithm becomes much more complicated. The reason for the extra complexityis the presence of the scaling factor. Before looking at the algorithm, the mainproblem with the computation of acos is introduced.

Recall from section 2.1 that in rotation mode the next rotation direction σi wasfound by an analysis of the sign of zi. However, such a simple comparison isnot possible for acos, as illustrated in figure 2.3(a). After the rotation zi, the

2.2 Support for arccos and arcsin 17

question is how the algorithm know whether the next rotation should be positiveor negative. Intuitively a comparison could be made between t and xi, where tis the input value and xi is the first coordinate of the vector after iteration i. Inthe example on figure 2.3(a) this would result in a positive rotation since xi > t.The problem is that the scaling factor is present in equation 2.4 and the correctvalue of the rotation is xi = Ki cos zi and xi = t would therefore be an incorrectcomparison (since you cannot compare an exact value with a value affected bythe scaling factor). The problem with the scaling factor is unfortunately not assimple to solve as in section 2.1.2. In section 2.1.2 initializing x0 = K meansthat the total scaling is being compensated for before the first iteration, but thatwould be wrong since the scaling factor is distributed among all the iterations.This means that it is not possible to compensate for the total scaling factor inone single step, but should instead be done after each iteration. In other words:in rotation mode the total scaling was compensated for in one step for variable xand y. This is only possible because none of these values are used to find σi. Forevaluation of acos this is different and it is only possible to compensate xj bythe scaling factor in iterations 0...j. It is not a feasible solution to compensatefor the scaling factor in each iteration in the CORDIC algorithm and that is themotivation to use Double-CORDIC rotations instead.

2.2.1 Double CORDIC

The Double-CORDIC (D-CORDIC) algorithm performs a double rotation ineach iteration. Unfortunately it does not mean that the number of iterationsare reduced, but that each rotation is twice as large. The algorithm has theadvantage of a much simpler scaling factor but the trade-off is a more complexiterative algorithm. With D-CORDIC rotations the scaling factor from equation2.14 can be expressed by the far more simple scaling factor in equation 2.17[19].

Ki =(

1√1 + 2−2i

)2

=1

1 + 2−2i(2.17)

This can be implemented with only a shift and an addition. If t is the startingangle, then equation 2.17 can be used to evaluate ti = t/Ki as the same angleafter compensation for the scaling factor in iteration i. So instead of compen-sating the rotation vector (x and y), the input angle (t) is moved by the scalingfactor in each iteration. The comparison xi ≥ ti is now possible as illustrated infigure 2.3(b) where ti+1 = ti/Ki = ti(1 + 2−2i). The rotation direction can nowbe found by a simple comparison. As mentioned the price to pay is an increasein complexity for the iterative algorithm. The algorithm is found via matrix


multiplication as can be seen in equation 2.18.[x′

y′

]=[

1 −σi2−iσi2−i 1

]2=[

1− 2−2i −σi2−i+1

σi2−i+1 1− 2−2i

](2.18)

and the basic CORDIC iteration in equation 2.6 is now rewritten to

xi+1 = (1− 2−2i)xi − σiyi2−i+1

yi+1 = (1− 2−2i)yi + σixi2−i+1

zi+1 = zi − σi2αiti+1 = ti + ti2−2i

(2.19)

The variable ti is only used for evaluation of acos(t) [11] as the ’compensated’input angle. In terms of complexity, the D-CORDIC algorithm requires 6 adders,5 shifters and a lookup table (compared to CORDIC’s 3 adders, 2 shifters andlookup table). This must be taken into consideration when choosing the finalarchitecture. When evaluating acos(t) the result is accumulated in zf , tf ≈ xfand yf = K2

√1− t2 [19].

The D-CORDIC algorithm works exactly in the same way as the CORDICalgorithm in rotation and vectoring mode. The only difference is that it cannow operate in an extra mode for evaluation of acos(t), which will be denotedinvert-rotation mode in the rest of the report. The convergence range for theD-CORDIC algorithm is two times the convergence for the CORDIC algorithm.However, this property is not necessary since the convergence range for theCORDIC algorithm is already enough. The D-CORDIC algorithm requires 26iterations for a 24-bit result. This follows directly from the z variable in equation2.19 where a double rotation is archived by a multiplication by 2. Because weknow from the CORDIC algorithm that 25 iteration is enough, we also know thatthe same iteration is twice as large in the D-CORDIC algorithm and thereforean extra iteration is required to guarantee the same accuracy3.

2.3 Radix-4 D-CORDIC

Recall from section 2.1.2 that a radix-4 version of the CORDIC algorithm hasthe disadvantage of a variable scaling factor. For this reason it is consideredinappropriate for a low power implementation because it requires a multiplica-tion. Additionally recall from section 2.2.1 that the D-CORDIC algorithm hasa more simple scaling factor than the basic CORDIC algorithm. With this inmind the following section will combine those two variations of the CORDIC

3The precision is controlled by the magnitude of the last rotation.

2.3 Radix-4 D-CORDIC 19

algorithm to a unified radix-4 D-CORDIC algorithm. The motivation for com-bining the two algorithms was that it would also support the evaluation of acosand asin. However, as described in section 2.3.3 this is not possible. The reasonfor still considering this algorithm is that it does reduce the required number ofiteration to only b/2 + 1 for the evaluation of sin, cos and atan.

A radix-4 implementation of the CORDIC algorithm is shown in equation2.20 [3]. [

x′

y′

]=[


] [xy

](2.20)

Here σi = {−2,−1, 0, 1, 2} is provided by a selection function. When σi = 0 therotation is skipped and σi = {−2, 2} means that two rotations are performedin one single step. With matrix multiplication the radix-4 algorithm can beextended for double rotations like equation 2.18.[

x′

y′

]=[


]2=[

1− σ2i 4−2i −2σi4−i

2σi4−i 1− σ2i 4−2i

](2.21)

Again the result is an iterative algorithm that can be implemented with onlyshift and adders. One complete iteration for the radix-4 D-CORDIC is shownbelow.

xi+1 = (1− σ2i 4−2i)xi − 2σiyi4−i

yi+1 = (1− σ2i 4−2i)yi + 2σixi4−i

zi+1 = zi − 2 · αi(2.22)

In terms of complexity it is slightly more complex than the D-CORDIC but itshould be clear that the number of iterations are reduced by 50 percent4. Thesize of the lookup table is identical to the one used for the D-CORDIC algorithmsince αi = atan(σi4−i) and the introduction of σi in the table is identical tothe reduction of iterations by 50%5. The next three sections will explain therequired modifications to equation 2.22 to support rotation and vectoring mode,and why invert-rotation mode is not supported.

2.3.1 Rotation mode

The CORDIC algorithm has already been extended to radix-4 in rotation modeas proposed in [21]. In this section the same principals will be used for the radix-4 D-CORDIC algorithm. In rotation mode, zi is accumulating the residual anglewhich determines the rotation angle and for the radix-4 D-CORDIC algorithmalso the magnitude of the rotation, which can be 2, 1 and 0. To simplify the

4This follows directly from the last rotation: atan(4−i).5Since atan(−σi) = −atan(σi) and atan(0) = 0.


σi L[σ] U [σ] Interval2 2.67 3 ≤wi1 0.67 3.3 1 ≤wi< 30 −1.3 1.3 −1 ≤wi< 1-1 −3.3 −0.67 −3 ≤wi< −1-2 −2.67 wi< −3

Table 2.2: Selection interval for σi in rotation mode for the radix-4 algorithm.

selection function a change of the variable zi is proposed [21] which makes theselection of σi similar to the selection of q in the selection function for digitrecurrence division [9]. The selection variable wi = 4izi and the lookup tableAi[σi] = 4iatan(σi4−i) is introduced, which means zi can be expressed as

wi+1 = 4(wi −Ai[σi]) (2.23)

Two conditions have to be satisfied by a selection function: containment andcontinuity. Containment guarantees that all residuals must be bounded anddetermines the selection interval. Continuity means that for any value of withere must exist a valid choice of σi. These conditions will now be explained.The selection intervals are selected such that

Li[σ] = 2Ai[σ]− (4/3)Ai[1]Ui[σ] = 2Ai[σ] + (4/3)Ai[1]

Li[σ] ≤ wi ≤ Ui[σ](2.24)

To simplify the selection function it should be independent from the index i andthe intervals to select σi should be identical in every iteration. This means thatthe selection of σi is now provided by the criterion.

L[σ] ≤ wi ≤ U [σ] (2.25)

Where L[σ] = max(Li[σ]) and U [σ] = min(Ui[σ]). If i→∞, L∞[σ+1] < U1[σ])and L1[−σ] < U∞[−σ− 1]) there is an overlapping area, hence the selection in-tervals are independent from i and the continuity condition is satisfied. Insertingthese definitions in equation 2.25, yields the intervals for the selection functionsas can be seen in table 2.2. Since there exists an overlap the most suited in-tervals are chosen and one important property for the selected intervals is thatthey only depends on bit 21, 20 (LSB of the integer part) and the sign bit. Theintervals are suitable for redundant arithmetic as in [3] but will not be used inthis implementation since reducing power is more important than speed. Themaximum required number of iterations is 13 for a 24-bit result, in additionMatlab simulations shows that on average 3 iterations are skipped.


Figure 2.4: Difference between the approximation and exact value of the scalingfactor compensation.

2.3.1.1 Scaling factor

The scaling factor for the radix-4 D-CORDIC is simpler than for the normalradix-4 algorithm since the square root is removed from the equation and cannow be written as

Ki =1

1 + σ2i 4−2i

(2.26)

The division is still making the variable scaling factor unfeasible to implementdirectly as it is written in equation 2.26, but for i > 3 the scaling factor can beapproximated as Ki = 1 − σ2

i 4−2i. The difference between the approximationand the exact scaling factor is shown in figure 2.4 and it can be seen that fori > 3 the difference becomes very small. This means that for iterations withi > 3 the scaling factor can be compensated for within the normal radix-4 D-CORDIC rotations. Additionally the scaling is only affected by the first n/2iterations6 and thereby simple to implement and not a problem for the radix-4D-CORDIC algorithm. Compensation for the first 4 iterations is achieved bythe same approach as for the normal CORDIC algorithm by initializing x0 to thecompensated scaling factor. The only difference is that for this implementationthe initialization can have 34 = 81 values since this is the possible outcomesfor the first four iterations. These values can be stored in a lookup table andselected by computing the first 4 iterations of wi before starting the rotations

6Since 4−2i would be larger than W and hereby have no effect.


of xi and yi. The concept is shown below

K = K0 ·K1 ·K2 ·K3︸︷︷︸Initialization

·K4 ·K5 . . .Kn/2︸︷︷︸approximation

(2.27)

where the first 4 scaling factors are solved by initialization and the last byinternal approximation.

2.3.2 Vectoring mode

A radix-4 implementation of vectoring mode has already been proposed by [5],this section will extend that approach for the radix-4 D-CORDIC algorithm.Recall that in vectoring mode variable yi is determining the rotation directionas it approaches zero. The selection function wi = 4iyi is therefore introducedand variable yi and xi is rewritten to

xi+1 = (1− σ2i 4−2i)xi + 2σiwi4−2i

wi+1 = 4(wi(1− σ2i 4−2i)− 2σixi)

(2.28)

Contrary to the selection function in rotation mode where the selection of σidepends on wi, the selection function is now a function of both wi and xi. Thisis similar to the selection function for division which have the form wi+1 =4(wi−qid) where the selection function depends on d. One important differenceis that in division d is constant but in CORDIC xi varies in every iteration. Theselection intervals are therefore similar as for the division algorithm

Lσ[xi] = (σ − 2/3)xiUσ[xi] = (σ + 2/3)xiLσ[xi] ≤ wi ≤ Uσ[xi]

(2.29)

It is not possible to make the selection intervals identical for each iteration sincexi varies. However, it has been proven in [5] that when i > 2 variable xi canbe considered as constant and therefore the selection intervals only have to becomputed for the first three iterations. The selection intervals are illustrated infigure 2.5 and because of the overlap between Lσ[xi] ≤ Uσ−1[xi] the continuitycondition is satisfied. The selection function can be found directly from figure2.5. One appropriate comparison point to distinguish between σi = 0 and σi = 1would be to select Pi(1) = xi, and likewise Pi(2) = 3xi to distinguish betweenσi = 1 and σi = 2. Of course similar comparison point can be made for −1 and


Figure 2.5: Selection intervals and comparison points.

−2. The final selection function is therefore

σi =

+2 3xi < wi+1 xi < wi ≤ 3xi

0 −xi < wi ≤ xi−1 −3xi < wi ≤ −xi−2 wi ≤ −3xi

for i ≤ 1

σi =

+2 3x2 < wi+1 x2 < wi ≤ 3x2

0 −x2 < wi ≤ x2

−1 −3x2 < wi ≤ −x2

−2 wi ≤ −3x2

for i > 1

(2.30)

This might seem expensive to implement because of the multiplication by 3 (oneleft shift and one addition). But it was proven in [5] that only 5 fractional bitsare necessary to guarantee correct selection.

2.3.3 Invert-rotation mode

Recall from section 2.2.1 that in invert-rotation mode the rotation direction σiis determined from a comparison between the xi value and the input argumentti after compensation for the scaling factor. The same principal can be used forthe radix-4 algorithm.


Figure 2.6: Same rotation on two vectors giving two different angles.

The selection intervals seem easy to find, but in reality, this is far from thetruth. The problem is the unit circle and as the vector is rotating towardseither the x-axis or y-axis the magnitude of a rotation is varying in size. Figure2.6 illustrates the problem where a rotation on vector v1 by β is smaller than thesame rotation on vector v2 (α1 < α2)7. This actually means that the selectionintervals depends on the location on the unit circle x and the index numberi. However, the dependency on x is not linear, as x comes closer to 1 thedifference is rising as can be seen in figure 2.7. The reason for this being aproblem in the radix-4 D-CORDIC algorithm and not the D-CORDIC is thatboth the rotation direction and the magnitude of the rotation have to be selected(σi = {−2,−1, 0, 1, 2}). Basically it means that to implement invert-rotationmode, the selection function is becoming extremely difficult to make, as theintervals would be a function of xi and i. Therefore, an implementation of thismode has unfortunately not been possible for this algorithm.


Since the previous section showed that it is possible to extend the D-CORDICto radix-4, it is naturally to see what happen when the radix is increased to8. The principal for the radix-8 D-CORDIC algorithm is the same and thealgorithm is shown below.

xi+1 = (1− σ2i 8−2i)xi − 2σiyi8−i

yi+1 = (1− σ2i 8−2i)yi + 2σixi8−i

zi+1 = zi − 2αi(2.31)

7For instance a rotation by 0.01 yields: acos(0.2)− acos(0.2 + 0.01) ≈ 0.01 and acos(0.9)−acos(0.9 + 0.01) ≈ 0.024.


Figure 2.7: When x → 1 the rotation size depends on the the location on theunit circle (acos(x)− acos(x+ 0.0001)).

One of the main differences for the two algorithms is that the selection variableσi is no longer a power of two, since σi = {−4,−3,−2,−1, 0, 1, 2, 3, 4}. Thismight mean that it is not feasible to extend the algorithm to a higher radix.The problem is that when σi = 3 the shifted variables in the parenthesis shouldbe multiplied by 9 and the second part of the equations in 2.31 should bemultiplied with 3. In terms of hardware, this can no longer be implementedwith only a shifter, but needs an addition.

Additionally the required number of iterations is not further reduced by 50percent. The smallest value of the last rotation is controlled by the followingequation for 24-bit precision.

atan(81−n) < 224 (2.32)

To satisfy this equation n ≥ 9 and this means that compared to the CORDICalgorithm the required number of iterations is reduced by 64% and reduced only30% compared to the radix-4 D-CORDIC implementation. Increasing to radix-16, would only further reduce the required number of iterations down to 7, or72% compared to the CORDIC algorithm. Therefore, it does not make sense tofurther increase the radix since the complexity of the algorithm also increases[4]. The optimal choice should therefore be either the radix-2, radix-4 or theradix-8 D-CORDIC algorithm.

The radix-8 algorithm can be extended to rotation and vectoring mode and asimilar way as for the radix-4 algorithm. In rotation mode the scaling factorcan be approximated by 1− σ2

18−2i already after iteration i > 2. For i ≤ 2 thescaling factor compensation for the first three iterations is solved in the sameway as for the radix-4 by initialization. The possible outcomes for the first three


σi L[σ] U [σ] Interval4 6.67 7 ≤wi3 4.67 7.3 5 ≤wi< 72 2.67 5.3 3 ≤wi< 51 0.67 3.3 1 ≤wi< 30 −1.3 1.3 −1 ≤wi< 1-1 −3.3 −0.67 −3 ≤wi< −1-2 −5.3 −2.67 −5 ≤wi< −3-3 −7.3 −4.67 −7 ≤wi< −5-4 −6.67 wi< −7

Table 2.3: Selection interval for σi in rotation mode for the radix-8 algorithm.

iterations yields an index size of 53 = 125 for the lookup table. The selectionintervals for rotation mode is shown in table 2.3 and since there exist an overlap,the most suited intervals are selected such that the intervals only depends onthe integer part.

For vectoring mode the selection function can be seen below8. The selectiononly needs to be updated in the first two iterations, as was the case for theradix-4 D-CORDIC algorithm.

σi =

+4 7xi < wi+3 5xi < wi ≤ 5xi+2 3xi < wi ≤ 5xi+1 xi < wi ≤ 3xi

0 −xi < wi ≤ xi−1 −3xi < wi ≤ −xi−2 −5xi < wi ≤ −3xi−3 −7xi < wi ≤ −5xi−4 wi ≤ −7xi

(2.33)

2 and 3 of the required 9 iterations are on average skipped in rotation andvectoring mode, which has been verified with Matlab analysis.

2.5 Scaling Free

So far the focus has been on the flexible D-CORDIC algorithm and its possi-bilities. The algorithm use the same principals as the the original CORDIC

8It is possible to make a graph for the selection intervals similar as figure 2.5 for the radix-4.

2.5 Scaling Free 27

algorithm and therefore naturally inherit its drawbacks. To sum up those arethe compensation for a scaling factor and the high number of iterations. Thissection will present some recent work trying to overcome those drawbacks witha scaling free CORDIC algorithm. Even though the presented algorithm is lessflexible than the D-CORDIC the motivation for including this into analysis isthat it performs particularly good for evaluation of cos and sin and is thereforeuseful as a reference.

The scaling free CORDIC algorithm [17] removes the need for a scaling factorand hereby making it possible to skip some of the iterations if they are notneeded. This means a decrease in the number of iterations but most importanta decrease in power consumption. Implementing an algorithm with a variablenumber of required iterations of course means that the worst case has to bepossible within the design constraints, meaning that the required 25 iterationsshould be possible within the available 6 clock cycles. Nevertheless it has beenproven that an average reduction of required iterations of 50 percent is possi-ble [17] and the effect of this will be discussed in section 2.5.1.

The starting point for the scaling free algorithm is based on [8] which uses aTaylor series expansion of cosx and sinx.

cosx =∞∑n=0

−1n

(2n)!x2n = 1− x2

2!+x4

4!− . . .

sinx =∞∑n=0

−1n

(2n+ 1)!x2n+1 = x− x3

3!+x5

5!. . .

(2.34)

If the angle, here denoted x, is partitioned into smaller rotations αi, like theoriginal CORDIC, and if the rotations are small enough then one scaling freerotation can be approximated as

cosαi ≈ 1− α2i

2!≈ 1− 2−(2i+1)

sinαi ≈ αi ≈ 2−i(2.35)

Since the rotations are now based on an approximation of cosαi and sinαi anerror is emerged. The error is equal to the largest term that is neglected from theapproximation of the sin which is x3

3! = 2−(3i+2.58) [8]. With an internal wordlength of W -bits then if 2−(3i+2.58) ≥ W a multiplication with the neglectedfactor will become machine zero. An example is illustrated in table 2.4. Thecolumn to the right lists the most significant bit (MSB) after a shift sequence of2−3i+2.58. For example a bit accuracy of 10-bit would not allow i < 3 since theapproximation error will become to large, however i ≥ 3 would have zero effectand is allowed. The lower limit of ilower can be expressed as

3i+ 2.58 ≥W ⇒ i ≥ (W − 2.58)/3 (2.36)


i Shift seq. MSB1 2−(3+2.58) 0.0000012 2−(6+2.58) 0.0000000013 2−(9+2.58) 0.0000000000014 2−(12+2.58) 0.0000000000000015 2−(15+2.58) 0.0000000000000000016 2−(18+2.58) 0.0000000000000000000017 2−(21+2.58) 0.000000000000000000000001

Table 2.4: listed values for the most neglected factor from the approximation ofthe sin.

which basically means that a rotation not satisfying 2.36 will introduce an ap-proximation error to big for the desired accuracy. The upper limit of i is equalto W − 1 since a right shift of W bits will result in machine zero. If the ap-proximation from 2.35 is inserted in 2.1 the result is a scaling free CORDICalgorithm [

x′

y′

]=[

1− 2−(2i+1) −2−i

2−i 1− 2−(2i+1)

] [xy

](2.37)

Like the original algorithm, equation 2.37 can be implemented with only shiftand add operations, however unlike the original CORDIC algorithm the scalingfree algorithm does only perform rotations in one direction. The accumulationof the angle for the scaling free algorithm does not need a lookup table, becausea rotation is approximated as 2−i. This means that zi is implemented with onlya shift and an adder. The iterative algorithm is very similar to the originalCORDIC. Equation 2.38 shows a complete scaling free iteration.

xi+1 = xi(1− 2−2i−1)) + yi2−i

yi+1 = yi(1− 2−2i−1))− xi2−izi+1 = zi + 2−i

(2.38)

Because the algorithm only perform rotations in one direction it means thatif the residual angle ∆θ is smaller than the next rotation zi then the rotationis skipped. A situation where ∆θ < zi is illustrated in figure 2.8, where theCORDIC algorithm is sketched on the left and the scaling free on the right.For the CORDIC algorithm this would lead to a rotation resolving in a negativeangle ∆θi+1 = ∆θi−zi. On the other hand the scaling free algorithm is skippingiterations until ∆θ ≥ zi. A similar example could be illustrated where thedesired accuracy is achieved before i = b − 1, in this situation the remainingiterations are skipped for the scaling free algorithm.

One property that the scaling free algorithm has in common with other modifiedCORDIC algorithms is that no optimization comes for free. For the scaling free

2.5 Scaling Free 29

Figure 2.8: Rotation where ∆θ < αi for original CORDIC (a) and scaling free(b).

algorithm the problem is a much smaller convergence angle compared to the99.9◦ from equation 2.9. The reason is the restriction on the lower limit for i.In a situation with a 24-bit result and an internal bit length of 29 fractionalbits the allowed set of iterations are i = [8, 9, 10 . . . 28]. This means that theconvergence angle is only 0.0078 which is a remarkable reduction. The poorconvergence range is practically useless for a general purpose implementationand there is a long way up to the required π/4. Nevertheless, a solution has beenpresented to extend the convergence range and thereby making it very suitablefor low power implementations [33].

2.5.1 Extending convergence range

Even though argument reduction can extend the convergence from [0;π/4] (seesection 1.1.1), the scaling free algorithm still needs to convergence from 0 up toπ/4 (so far the algorithm only guarantees convergence up to 0.0078 for 24-bitprecision).

The reason for the poor convergence range is the restriction of the lower limitilower (which where proven to depend on the accuracy). A solution is presentedin [17] where the scaling free algorithm is allowed to repeat some of the itera-tions. This means that the assumption that a rotation zi = 2−i is followed by asmaller rotation zi+1 = 2−i−1 is relaxed. The idea is to stop updating i at eachiteration and select it adaptively. If some of the iterations could be repeated itbasically means that all angles are within reach for the scaling free algorithm.Without repetitive iterations the maximum angle would be the sum of all pos-sible angles

∑b−1ilower

2i which for 24-bits is 0.0078. But if for instance the firstrotation is repeated four times the convergence angle becomes 0.0195. If there isno restriction on the maximum number of recurrences a final convergence angleof π/4 would bee possible. However it would require more than 200 iterationsand that is an unfeasible solution.


Partition size Max # Average # Table size (bits)iterations iterations

2−1 144 72 1 (58 bit)2−2 80 40 3 (174 bit)2−3 48 24 7 (406 bit)2−4 32 16 13 (754 bit))2−5 24 12 25 (1450 bit)2−6 20 10 50 (2900 bit)

Table 2.5: Matlab simulation of different partition sizes.

A solution is to implement a lookup table and divide the remaining convergenceangle (π/4) into smaller partitions. Each partition holds the value of the sinand cos values of that angle in the lookup table. This means that the angle isapproximated to the nearest partitions before the scaling free algorithm begins.For example if the starting angle is 0.20, then instead of starting the rotationsfrom 0.00 the algorithm could start from 0.17 if that angle had been stored in thelookup table. This method is reducing the number of iterations dramatically,but the cost is a larger area in terms of a lookup table. The method does raisea big question: What is the optimal number of partitions? The first step toanswer the question is to review what is known at this point.

1. The biggest rotation that is allowed is i = 8.

2. The smallest rotation needed is i = 25.

3. The only rotation that should be repeated is i = 8 [17]. The reason is thatif the residual angle is smaller than 2−8 then it would also be smaller than2−9 + 2−9.

If all rotations were to be repeated only one time, it would mean a total of 18iterations should be implemented. Table 2.5 lists the number of required itera-tions compared to different partition sizes. For each partition the correspondingcos and sin value have to be stored in a lookup table. For instance if a partitionsize of 2−3 were chosen, the size of the lookup table should be 2 ·7 ·29 = 406 bitwith an internal word length of 29-bit. The table does not give a clear answerto the optimal solution. The motivation to analyse this algorithm was to reducethe average number of iterations compared to the CORDCIC algorithm. Withthis in mind a partition size of 2−5 is chosen, since it on average reduces therequired number of iterations by 50 percent compared to the basic CORDICalgorithm. So why not choose a smaller partition? The answer is that thedifference from a partition size of 2−5 compared to 2−6 is on average only 2iterations but at the same time a table size twice as large.

2.6 Summary 31

Conv. range Iterations1 Error2

CORDIC [30]- Rotation mode [0;π/2] 25 8.76e-08- Vectoring mode [0; 1] 25 8.95e-08D-CORDIC [11] [19]- Rotation mode [0;π/2] 26 7.75e-08- Vectoring mode [0; 1] 26 8.64e-08- Invert rotation mode3 [0; 1[ 26 8.76e-08Radix-4 D-CORDIC- Rotation mode [0;π/2] 13(9) 7.21e-08- Vectoring mode [0; 1] 13(10) 6.91e-08Radix-8 D-CORDIC- Rotation mode [0;π/2] 9(7) 5.23e-08- Vectoring mode [0; 1] 9(6) 7.29e-08Scaling free [17]- Rotation mode [0;π/4] 24(12) 8.13e-08

1 Number of iterations for 24 bit precision - average in parenthesis.2 Largest error from Matlab simulation.3 Internal bit width of 31 fractional bits.

Table 2.6: Summary of Matlab simulation for the presented algorithm with24-bit precision.

2.6 Summary

The chapter have presented different algorithms, which implement trigonomet-ric functions. This section summarizes the most important properties for eachalgorithm, and the next chapter describe how these algorithms have been imple-mented in VHDL. The characteristics of the algorithms are summaries in table2.6.

The CORDIC algorithm is the simplest algorithm, and should be considered asa reference model, since the purpose of this thesis is to see which algorithm is thebest suited for a low power implementation. The D-CORDIC algorithm has thehighest flexibility among the algorithms, but this flexibility comes with a hugetrade-off. One D-CORDIC iteration requires 7 adders where the CORDIC onlyrequires 3. The trade-off could end up being too costly to make it a feasiblesolution. The radix-4 and radix-8 algorithms have been presented and theydecrease the required number of iterations with 50% and 64% percent comparedto the CORDIC algorithm. Some of these iterations can be skipped as listed


in table 2.6. However, there is no guarantee that it is feasible to implement askipping technique since that would require extra logic.

The last presented algorithm was the scaling free that only support rotationmode, but since this is the only algorithm that have been presented as a lowpower version by others (for instance in [17] and [33]). It is included as areference to see how good a low power algorithm is compared to the D-CORDICand radix algorithms. The fact that is does only support one operation mode,does give it an advantage when looking at the synthesis results for rotation modein the next chapter. Reducing some of the other algorithms to only run in onemode might eliminate the argument for choosing the scaling free algorithm. Thealgorithm does only require 12 iterations on average . However, the worst casestill has to be feasible within the final architecture meaning that the scaling freealgorithm still has to provide 24 iterations within the 6 clock cycles for eachcomputation.

In terms of complexity, the D-CORDIC algorithm is a clear winner. Sevenadders are required in each iteration, and this could make the functions acosand asin unfeasible to implement. It should therefore be included into theconsiderations, how much the designer is willing to pay for these functions. TheMatlab programs used in the simulations are located in appendix C beginningon page 95.

Chapter 3

Design

The previous chapter presented the theory behind the various CORDIC al-gorithms, which are the foundation for the low power implementation of thenumerical processor for trigonometric functions. This chapter will explain howto translate that theory into hardware. Five different implementations are pre-sented, starting with the basic CORDIC algorithm, the D-CORDIC, the radix-4/8 D-CORDIC and finally the scaling-free algorithm. Much of the hardwarebuilding blocks are identical in the various implementations and they are ex-plained as they appear in the sections. This chapter will not examine the lowpower properties for the various implementation but should be considered as apreliminary step towards the low power optimization in chapter 4.

The algorithms are implemented in VHDL and synthesized for a 1.0 V 90 nmprocess library with both standard threshold voltage implant (SVT) and highthreshold voltage implant (HVT). The simulation is carried out in Modelsim andverified through an exhaustive test over the entire convergence range. The testvectors and results are generated in Matlab and compared to the output fromthe Modelsim simulation. After an approved simulation the implementation issynthesized in Synopsys’ RTL synthesis tool Design Vision for area, power andtiming results. These results and important observations are analyzed in section3.7.

34 Design

3.1 Numerical processor architecture

The numerical processor should be able to evaluate a 16-/24-bit result in 4/6cycles. For simplicity the rest of the chapter will only focus on the 24-bitimplementation, because a 16-bit result can always be extracted after 4 cycles.An iterative algorithm can be implemented as an unfolded pipelined or word-serial architecture. The difference between the two alternatives are explainedbelow.

Pipelined implementation The benefits for the unfolded pipelined imple-mentation is a high throughput, since multiple evaluations can be com-puted in parallel and one result is evaluated in each cycle. In additioneach iteration is shifted by a fixed number, meaning that a shifter is sim-ple to implement since each iteration is known in advance. Therefore ashifter can be hardwired and would not require additional logic. This is abig advantage because combinatorial shifters are very costly to implement.The most important property for the numerical processor is low power,hence high throughput is not really a concern and this makes the use ofthe unfolded version less beneficial. The disadvantage of a pipelined im-plementation is an increase in hardware since the hardware is not reused.For the a 24-bit CORDIC implementation this would mean a total of 75adders1.

Word-serial implementation The word-serial implementation on the otherhand reuses the hardware in each iteration. This can reduce the requirednumber of adders from 75 to 12 as the processor still needs to compute4 bit pr. cycle. In addition to the decreased area, the static power dissi-pation is reduced. The word-serial implementation has been successfullyimplemented in [33] for the scaling free algorithm. The required shiftersbecomes more complex than for the unfolded pipelined version, however,this will be shown later not to be so costly.

The word-serial implementation is chosen because it is the most appropriatearchitecture for a low power implementation. The overall architecture for theCORDIC versions are almost identical and is shown in figure 3.1. The archi-tecture consists of the following blocks: Argument reduction, output selection,register, multiplexer, control, initialization and iteration blocks. The argumentreduction maps the input vector into a vector within the supported conver-gence range, for instance evaluation of cos(7π/8) is equal to the evaluation ofsin(3π/8). This means that if the input to the first iteration block is 3π/8

1Each iteration uses 3 adders and with 25 iterations this means a total of 75 adders.

3.1 Numerical processor architecture 35

Figure 3.1: Word-serial architecture. This chapter will explain how the blueboxes are implemented.

instead of 7π/8, the results will be identical. The output selection blocks com-pensates for this mapping to guarantee correct output. In the example abovethis means that the result should be taken from the y variable instead of x.For simplification the argument reduction and output selection blocks are notdiscussed, as they can be seen as identical steps for the different implementa-tions and therefore have no effect when looking at the difference between thealgorithms (white boxes on figure 3.1). The rest of the chapter will instead focuson the iterations blocks and the initialization block (blue boxes on figure 3.1).Each iteration block executes one iteration and is an exact implementation ofthe algorithms presented in chapter 2 and they are the blocks, which makesthe biggest variation between the implementations. Some of the implementa-tions require 4 iteration blocks while for instance the radix-4 D-CORDIC onlyrequires 2.

The internal data representation has a fractional length of 29 bits (this wasexplined in section 2.1.1), which has been verified through Matlab analysis tobe adequate. The length of the integer part depends on the implemented modes.For rotation mode x and y will always be in the range [0; 1] and the input rangeis restricted to π/2 hence z will never exceed π/2. However, z can becomenegative and therefore rotation mode requires 31 internal bits.

For vectoring mode there is in reality no upper bound for the input vector. Thereason is that when the input rises towards infinity the results rises towardsπ/2. However, for simplification the maximum input value is restricted to [0; 1]as explained in section 1.1.1. With this restriction y → 0 , z → π/2 andx → K

√1 + 1 ≈ 2.32892 which will be the highest possible number within the

circuit. This implies that an internal bit length of 32-bit is enough to guaranteecorrect result in vectoring mode.

2Taken from expression 2.8.

36 Design

Figure 3.2: Logic for initializing variables to i = 1.

Invert-rotation mode requires 31 fractional bits, which were found to be appro-priate in the Matlab simulation (table 2.6). The size of the highest integer partis for the y and x variable which is K2

√1− 12 ≈ 2.890 [19]. Therefore, an

internal bit length of 33-bit is required in the implementation of invert-rotationmode.

3.2 CORDIC

To guarantee a 24-bit result the CORDIC algorithm requires 25 iterations par-titioned into the 6 available cycles. 25 divided by 6 is an inappropriate numberbecause it would require 4 + 1 CORDIC blocks, however since the first rota-tion is always positive the algorithm could start with iteration i = 1 instead ofi = 0. This would reduce the number of iteration to 24 and it means that theinitialization is changed to

x1,rot = K x1,vec = 1 + ay1,rot = K y1,vec = a− 1z1,rot = a− atan(20) z1,vec = atan(20)

(3.1)

With the initialization for rotation mode on the left and vectoring mode to theright. K is the constant scaling factor and a is the input vector. An imple-mentation of this initialization is shown in figure 3.2 and a detailed explanationfollows below.

Variable x: In rotation mode x is initialized to the scaling factor which is aconstant, and therefore easy to implement in hardware. In vectoring modethe initialization step seems to require an adder, however, this can be re-moved by looking at the input range for a which is restricted to [0; 1[. Theinitialization can be implemented by concatenation since a is representedby 00.XXXXbin

3 and 1 is represented by 01.0000bin. This means that for3The bits before the dot is the integer bits, and the remaining bits are the fractional part.

3.2 CORDIC 37

Figure 3.3: Schematic of one CORDIC block.

vectoring mode x is initialized to 01.XXXXbin. If the restriction on a is re-laxed and an input value larger than 1 were allowed, it would only requirean addition of the integer part, the fractional part would remain constant.

Variable y: For rotation mode, the initialization value is the constant scalingfactor which is equivalent to the initialization for the x variable. Forvectoring mode the subtraction a−1 is also simple to implement since theinitialization value is the negative value of the input. It can be explainedby looking at an addition by −1 instead of a subtraction by 1. This wouldlead to the input vector 00.XXXXbin added with 11.0000bin (−1), whichgives 11.XXXXbin. In addition to the x variable, relaxing the restriction onthe input value would not require much extra logic, since it would againonly require a subtraction of the integer part.

Variable z: For rotation mode, the z variable requires a subtraction/additionwith a constant since the angle is reduced by one positive rotation a −atan(20). For vectoring mode this is not necessary because the startingpoint of the rotation is zero (0 + atan(20)).

The requirements in terms of hardware to start the first rotation from i = 1is: one adder, three multiplexers and memory containing the constant K andatan(20). However, at the same time the required number of CORDIC blocks isreduced to four. The scaling factor is only affecting rotation mode, and becauseit is solved by an initialization, it does not require additional hardware.

One CORDIC block should be able to run in both rotation- and vectoring-mode. A figure of one CORDIC block is shown in figure 3.3 which is an exactimplementation of expression 2.6. The block can be implemented with 3 carry-propagate-adders (CPA) two shifters, four multiplexers, three inverters and alookup table. The first multiplexer (located at the bottom) is used to distinguish

38 Design

(a)

Cycle 1 2 3 4 5 6Block 1 1 5 9 13 17 21Block 2 2 6 10 14 18 22Block 3 3 7 11 15 19 23Block 4 4 8 12 16 20 24

(b)

Figure 3.4: (a) Schematic of a (2−i) shifter for CORDIC block two , (b) Theordering of the fixed iterations at each cycle for the CORDIC implementaion.

between rotation and vectoring mode, as the sign of either z or y is determiningthe rotation direction in the next iteration. The rotation direction is input tothe three other multiplexers in the subsequent iteration to decide if a positiveor negative rotation is requested. The adders are simple CPA and will throughthis chapter not be discussed in more detail. The synthesis tool is responsiblefor whether a carry ripple adder (CRA), carry lookahead adder (CLA) or prefixadder is implemented to meet the timing constrains. Besides the adders, theshifters are the most important components for a fully functional CORDICblock. They can be implemented either as a combinatorial shifter or hardwired.The combinatorial shifter is most flexible, as it can shift any n-bit number byn. The alternative is to use hardwired shifters as in an unfolded pipelinedimplementation. The reason for making this a feasible solution is that the fourCORDIC blocks are used at fixed iterations. This is illustrated in figure 3.4(b).CORDIC block one implements iteration 1, 5, 9, 13, 17 and 21, CORDIC blocktwo iteration 2, 6, 10, 14, 18 and 22 and likewise for block three and four. Theshifters are therefore implemented as a 6-to-1 multiplexer as can be seen onfigure 3.4(a).

The lookup table contains the values of atan(2−i), and with a bit length of 31-bits this means 744 bits. An illustration of the complete CORDIC architectureis shown in figure 3.5. Each CORDIC block consumes almost 20% of the totalpower consumption and in each block 40% of this is used by the three adders.The initialization, the multiplexers, the register and the controller (not shownon figure 3.5) together uses the remaining 20%.

3.3 D-CORDIC

The D-CORDIC algorithm is the most flexible algorithm but unfortunately, adisadvantage is that it require one extra iteration compared to the CORDIC

3.3 D-CORDIC 39

Figure 3.5: Architecture for CORDIC implementation.

40 Design

algorithm. This could be a problem, since it means that one extra D-CORDICblock is needed. For a low power implementation, this makes the D-CORDIC al-gorithm a bad choice, but fortunately the problem can be fixed. Recall from sec-tion 2.2.1 that the D-CORDIC algorithm performs rotations, which are doublethe size than a normal rotation. This means that the first rotation is 2atan(2−i)instead of atan(2−i) for the CORDIC algorithm. This is of great significancebecause the convergence range is doubled. The impact can be explained by thefollowing calculations

∞∑i=0

atan(2−i) = atan(2−0) + atan(2−1) . . . ≈ 1.6184

∞∑i=0

2atan(2−i) = 2atan(2−0) + 2atan(2−1) . . . ≈ 3.2368(3.2)

Since a convergence range of [0;π/2] is enough, there is no reason to perform thefirst rotation. However, this significant property for the D-CORDIC algorithmcan be exploited even further. If the convergence range is reduced down to[0;π/4] the algorithm will still converge if the starting rotation is i = 2. So forthe D-CORDIC algorithm it is safe to start the algorithm with i = 2 instead ofi = 0 in rotation mode. The price to pay is a smaller convergence range, but asexplained in 1.1.1 this has no effect. In vectoring mode, we can only skip thefirst iteration, but the second iteration is simple to compensate for. The reasonis that the second iteration always is positive. The new required initializationvalues for rotation and vectoring mode are listed below.

x2,rot = K x2,vec = 0.75 + ay2,rot = 0 y2,vec = a(1− 2−2)− 1z2,rot = a z2,vec = 2atan(20)

(3.3)

The last obstacle for letting the algorithm start at iteration 2, is the invert-rotation mode. Recall from section 2.2.1 that in invert-rotation mode an extravariable t is introduced. It is not possible to skip any iterations in this mode,but it is actually possible to predict the rotation directions in the first twoiterations and compensate for this prediction relatively easy. The first rotationis always positive, and causes no problem. The values of the variables after thefirst iteration can be seen below to the left.

x1 = 0 x2,inv-rot = 2y1 = 1 y2,inv-rot = 1.5z1 = 2atan(20) z2,inv-rot = 2atan(20)− 2atan(2−2)t1 = 2a t2,inv-rot = (2a) + (2a2−2)

(3.4)

The next rotation direction is found by examining if x ≥ t and since that willnever happen (only for a = 0) the second rotation is always negative. Thismeans that the initialization can be compensated for by the values on the right

3.3 D-CORDIC 41

Figure 3.6: Logic for initializing variables to i = 2.

in equation 3.4. The implementation of the initialization is shown in figure 3.6and a description for vectoring and invert-rotation mode follows below. Rotationmode is not described as it is straight forward to implement.

Variable x: In vectoring mode the input value is added to 0.75 (00.1100bin),and since a is smaller than 1 (00.xxxxbin). It can be implemented witha small addition of the first two fractional bits. For invert-rotation modethe x variable has the constant value of 2.

Variable z: The z variable is easily compensated for in both vectoring andinvert-rotation mode, since it can be implemented by two constants.

Variable y: For invert-rotation mode variable y is initialized to 1.5. For vec-toring mode, the compensation requires two additions for the evaluationof a(1 − 2−2) − 1. This expression can be rewritten to a − (a2−2 + 1)which only requires one adder. The reason is that the input value a < 1and therefore the addition of one can be implemented by concatenation(01.0000bin & 00.xxxxbin). The last addition requires an adder.

Variable t: The t variable is only used in invert-rotation mode, and thereforethe variable is initialized to zero in both rotation and vectoring mode.In invert-rotation mode the compensation requires an addition for theevaluation of 2a+ 2a2−2 which can be rewritten to 2(a+ a2−2).

The adder required for the y and t variable is never used in the same mode, andtherefore this adder can be shared. To sum up, the required logic to start theD-CORDIC algorithm with i = 2 instead of i = 0 is only five multiplexers andone adder.

After removing the need for the first 2 iterations, the architecture requires fourD-CORDIC blocks, and table 3.1 illustrate the ordering among the requiredfour D-CORDIC blocks. A D-CORDIC block supporting rotation, vectoringand invert-rotation mode is illustrated in figure 3.7. The logic in the dashedbox should be ignored at this point, as it will be explained later. The block

42 Design

Figure 3.7: Schematic of a D-CORDIC block. The t signal in the dashed box isonly used in invert-rotation mode.

consist of five adders, four multiplexers, four shifters and some memory holdingthe values of 2atan(2−i). Again the focus will not be on the adders, as they areimplemented as simple CPA, and will be discussed in section 4.3.2. The blockconsist of two different types of shifters (2−i+1 and 2−2i) directly taken fromexpression 2.19. The second shifter is of most interest, since it is only used forthe first half of the iterations. The reason is that it does not make sense to shifta value more than the internal bit accuracy. With an internal fractional lengthof 29-bit it means that after iteration 15, two of the four shifters are unnecessary,and what should be more important so are two of the adders. Unfortunately,they cannot be removed from the design, but can be skipped to save power.Implementation of the shifters is similar as for the CORDIC version. The firsttype of shifter is a six input multiplexer and the second shifter is a four inputmultiplexer since the output after iteration 15 is zero.

The logic in the dashed box in figure 3.7 adds support for invert-rotation mode.The reason for adding the logic within a dashed box is to illustrate that the logicis only used in invert-rotation mode and can be skipped in the other modes toreduce power consumption. The extra logic consist of two adders and a shifterfor each block and that might seem as a huge overhead for adding support fortwo extra functions. In reality the logic is only supporting the evaluation of


Table 3.1: Ordering of the fixed iterations at each cycle for the D-CORDICimplementation.


acos, but because of the correlation in expression 2.16 one extra adder at theend can evaluate asin. The first adder is compensating for the scaling factor,and the second is used to find the difference between ti and xi. The differenceis used to find the next rotation direction in invert-rotation mode.

An illustration of the final D-CORDIC architecture can be seen on figure 3.8.The architecture does look relatively complex, but it is important to rememberthat the D-CORDIC algorithm has the highest flexibility among the imple-mented algorithms. The t path used in invert-rotation mode occupies 15% ofthe total area, so the conclusion could be that the extra flexibility comes ata price of 15% extra area. However, the D-CORDIC actually uses 67% morearea than the CORDIC implementation, so the trade-off for extra flexibility isimmense.


The advantage of the radix-4 D-CORDIC algorithm is a reduction in the num-ber of iterations by 50%. In reality, this would mean a reduction in terms ofhardware, but each block is more complicated, and this might even out the sav-ings in the number of iterations. The radix-4 D-CORDIC algorithm requires 13iterations as listed in table 2.6 to support a convergence range from [0;π/2]. Ifthis convergence range is reduced to [0;π/4] the outcome of the first rotationwill always be zero. The proof follows directly from the selection function, wherea rotation size of 0 is chosen in the interval [−1; 1]. If the first rotation always iszero there is no need to implement it, hence 12 iterations is necessary for a 24-bit result4. In addition to reducing the number of iterations, the initializationis extremely simple, as can be seen below.

x1,rot = K x1,vec = 1y1,rot = 0 y1,vec = 4az1,rot = 4a z1,vec = 0

(3.5)

The only problem here is the scaling factor K. Recall from section 2.1.2 thatthe scaling factor depends on the magnitude of the rotations in a higher radiximplementation. This means that if x1,rot has to be initialized to the totalscaling factor it would mean that all the rotations have to be known in advance.This is of course not a feasible solution, but because the scaling factor can beapproximated as Ki = 1− σ2

i 4−2i in iteration 4, 5 and 65 it is only necessary toknow the rotations in the first three iterations. In other words, the idea is to

4The same argument holds for vectoring mode.5Recall that the scaling factor is only effecting the first n/2 iterations.

44 Design

Figure 3.8: Architecture for D-CORDIC implementation.


Figure 3.9: Initialization for the radix4 implementation.

compensate for the first three iterations by initialization and for the remainingiterations by simple logic.

One question remains: How is it possible to know/predict the first three rota-tions? The simplest solution is to perform the first three iterations two times,which sounds very stupid. However, it can be implemented with only 2 addersand 3 small lookup tables, which seems as a small price to pay for the radix-4 implementation. The reason behind this statement is that the selection ofσi is only affected by the zi variable in rotation mode and therefore only thisvariable is computed to predict the rotations (those extra iterations are calledpre-iterations in the rest of the report). The required logic is illustrated in fig-ure 3.9. The input to the first lookup table is 4a where the maximum value is4(π/4) ≈ 3.14. If the selection interval from table 2.2 is used, this would meanthat the first iteration has 3 different outcomes as σi = {0, 1, 2}. However, closerexamination on the selection interval for σi = 1 shows that the interval couldbe extended from [1; 3]to [1; 3.2]. This would still guarantee convergence andthe benefit is that the possible outcomes for the first iterations are reduced toσi = {0, 1}. In terms of hardware this reduces the index size of the lookup tablefrom 27 down to 18. This technique is only used for the first iteration, for theremaining iterations it is still easier to have a selection interval of 3 instead of3.2. The selection between σ = 0 and σ = 1 is performed by examining the inte-ger part of the input value and implemented with a nor -gate. The output fromthe first multiplexer is then the input for the first adder, which performs thefirst rotation. The same principal is used for the other two iterations. However,there is no reason to perform the last addition since the result is not needed.

After the last pre-iteration the first 3 iteration directions are known, and thisinformation is used to select among the precomputed scaling factors from alookup table. For example, if the pre-iterations shows that the first 3 rotations

46 Design

Figure 3.10: Illustration of a scaling block (a), and a radix-4 block (b).

are σ1 = 1, σ2 = 0 and σ3 = 2, then x1 = 0.9403 6. The compensation forthe scaling factor in the remaining iterations can be implemented with addingextra scaling blocks as illustrated in figure 3.10(a), which consists of two addersand two shifters. However, those blocks can actually be removed with cleveroptimization, which will come clear after the radix-4 D-CORDIC block is intro-duced.

One radix-4 D-CORDIC block is illustrated in figure 3.10(b). Most of the blocksare identical to the D-CORDIC version presented in the previous section. Themagnitude of the rotations causes the major difference between the two blocks.In the other algorithms, every shift was known in advance because the algorithmsalways perform one rotation in each iteration. This is different now, where notonly the direction of the rotation varies but also the magnitude, since a rotationcan be σi = {−2,−1, 0, 1, 2}. First of all, it is now possible to skip a rotation.The normal solution in this situation would be to skip the entire block and reducepower consumption. However, for simplicity this will not be implemented at thispoint, as it might not be as beneficial as it might seem, since on average only2 of the 12 iterations are skipped. Instead the situation is solved by addingzero to the previous value7. This is solved by the extra-shift blocks, which setstheir output to zero when a rotation should be skipped. It could also have beenimplemented by a simple and -gate, but the extra-shift block does also haveanother task. When a rotation of σi = {−2, 2} is selected a double rotation isperformed and in equation 2.22 it can be seen that the effect of this is that the

6Compensation for the scaling factor in the first three iterations 11+12·4−2 · 1

1+02·4−4 ·1

1+22·4−6 = 0.94037For example xi+1 = xi(1− 0)− 0 · yi.


Cycle 1 2 3 4 5 6Block 1 1 3 5 7 9 11Block 2 2 4 6 8(s4) 10(s5) 12(s6)

Table 3.2: Ordering of the fixed iterations for the radix-4 implementation. s4,means that scaling compensation for i = 4 is evaluated in the 4th cycle in block2.

shifted value should be multiplied by 2 or 22 (depends on the first or second partof the equation). It means that the input has to be left shifted by either oneor two bits. When σi = {−1, 1} is selected a simple rotation is performed andthis means that the input is just passed through the extra-shift blocks. Four ofthese blocks are necessary as σi occurs twice in the expressions for the x and yvariable.

One other important addition in the radix-4 block is the opmode-shifters atthe end of the y and z variable. These two blocks depends on the operationmode. In rotation mode z has to be multiplied by 4, and in vectoring mode yis multiplied by 4.

The reason for putting the scaling block 3.10(a) and radix-4 block 3.10(b) figuresnext to each other is not a coincident. After iteration 7, two of the 2−4i shiftersand two of the adders are not needed in the radix-4 block. This is a nice propertybecause the exact same shift is used in the scaling block. This means that forthe last three iterations the shifters that are not supposed to be used are used asscaling compensation. The ordering is shown in table 3.2, where s4 illustratesscaling compensation for iteration 4. Of course this is only effecting rotationmode, in vectoring mode, the shifters are still unnecessary after iteration 7. Therequirement in terms of logic is therefore reduced, since the compensation forthe scaling factor does not require additional logic.

3.4.1 Selection function

Since σi is not only deciding the rotation direction but also the magnitude ofthe rotation a selection function is needed. The selection functions for vectoringand rotation mode are different and both are described below.

For rotation mode, the selection is extremely simple to implement. Recall fromequation 2.23 that the z variable is multiplied by 4 at each iteration. Thishas the effect that the selection intervals are identical in each iteration. Theselection intervals are restricted by the values in table 2.2 and these intervalsmean that the selection of σi only depends on the integer part of the z variable.

48 Design

If z is negative the value is negated before the selection of σ. Negation of a two’scomplement value should require an addition because of the carry, but that isnot necessary in this step. The worst thing that could happen is that a wrongselection interval is chosen, however, this has no effect because of the overlapin the selection intervals. For instance if z = −3, which in binary would be101.00000bin, then σi = 2 should be selected. If z is only negated then it wouldlead to the binary value 010.11111bin which would mean that σi = 1 is selectedinstead. However, this has no effect since that selection does not violate theselection intervals in table 2.2. The ”wrong” selection would be compensatedfor in the following iteration.

In vectoring mode, the selection function is slightly more complicated. The rea-son is that the selection now also depends on the x value. It requires two addersas explained in section 2.3.2, but it was also described that only 5 fractionalbits from x was required. It is only necessary to compute the intervals for thefirst three iterations, in the subsequent iterations the multiplication would beconstant. However, for simplicity, this feature is not exploited at this point andtherefore the intervals are computed in each iteration. After the intervals arecomputed they have to be compared to the y value to select the appropriatevalue of σi. This is carried out by a subtraction between the computed intervalsand y.

The radix-4 algorithm has the ability to skip an iteration if σi = 0. Thiswould theoretically reduce the power consumption, however, extra logic has tobe inserted (either as latches/multiplexers or and -gates). This extra logic mightequalize or even increase the power consumption. Therefore, this trade-off hasto be examined before implementing this feature. Another important thing toremember is that a skipping technique is only good, if the circuit knows thevalue of σi prior to the cycle where it is needed. The reason is that if σi hasnot settled before the other signals arrive to the block, then the logic would stilltoggle even though the iteration should be skipped. Since two radix-4 blocks areneeded in one cycle, it is only possible to know the value of σi in the previouscycle for the first radix-4 block. For the second block, σi is evaluated in thesame cycle where it is used. For the reason listed above the skipping techniquefor the radix-4 algorithm was not implemented.

The architecture for the radix-4 implementation is shown in figure 3.11. Oneradix-4 block is 120% larger than a CORDIC block, and therefore the reductionin iterations by 50% have no effect. In fact, the total area is increased by 20%and the power consumption is only reduced by 2%. So going from radix-2 toradix-4 has little effect on the power consumption.


Figure 3.11: Architecture for radix4 implementation.

50 Design

Cycle 1 2 3 4 5 6Initialization 1 - - - - -Block 1 2 3 4 5 6(s3) 7(s4)last iteration - - - - - 8

Table 3.3: Ordering of the iterations for the radix-8 implementation.


The radix-8 D-CORDIC algorithm requires 8 iterations to guarantee a 24-bitresult with a convergence range between [0;π/4]8. A radix-8 block is naturallymore complicated than a radix-4 block, and therefore it can be discussed whethera solution with two radix-8 blocks would be able to compete with the radix-4implementation. It is possible to reduce the required number of full iterationsdown to 6 but the cost is some extra logic to evaluate the initialization and thelast iteration.

The ordering of the iterations are shown in table 3.3. The first iteration is eval-uated by initialization and will be explained below. Iteration 2-7 is evaluated inthe radix-8 block and the final iteration is evaluated with an extra minor itera-tion. The idea with an extra minor iteration at the end is simple. There is noneed to evaluate all the variables in the last iteration. For instance, to computeatan (variable z8) it is not necessary to evaluate x8 and y8. Therefore, the lastiteration can be evaluated with significantly less logic than a complete radix-8iteration. x8 and y8 should have been used for the next rotation, but since i = 9is not necessary for 24-bit precision they can be omitted. The extra logic isshown in figure 3.14 and it can be seen that two adders and four multiplexersare needed. It is important to remember that the other implementations evalu-ate both cos and sin in rotation mode, where this implementation only updateseither cos or sin in the last iteration.

For the initialization, the idea is that the first rotation is relatively simple toevaluate and can be computed with less logic than a complete iteration. Thenew initialization values are shown below.

x2,rot = xtable x2,vec = (1− σ22−6)− σa2−2

y2,rot = ytable y2,vec = 8(a(1− σ22−6) + 2σ)z2,rot = 8(a− 24atan(σ2−3)) z2,vec = 2atan(σ2−6)

(3.6)

It might seem complicated, but it can be implemented with 5 adders 2 multiplex-ers and a lookup table. The circuit is shown in figure 3.12, and an explanationfollows below. The left part of the figure is for rotation mode and the right is

8With the reduced convergence range the first iteration i = 0 is always zero.


Figure 3.12: Initialization for the radix-8 implementation.

for vectoring mode.

Variable x: In rotation mode, the initialization value is stored in a lookuptable. The index value is found by performing two pre-iterations. Forvectoring mode the initialization requires one adder to evaluate the rightpart from equation 3.6 in the case that σ = 3. The parenthesis in theequation can be stored in a lookup table, as there are only five differentoutcomes. The subtraction also requires an adder, but the size of the addercan be reduced by looking at the range for the parenthesis. For instanceif σ = 1, then the parenthesis is equal to (1 − 122−6) = 0.984375 =0.1111110bin which only requires a 7 bit adder.

Variable y: For rotation mode, the initialization value is found in a table sim-ilar to the x variable by pre-iterations. From the equation, it might seemthat three adders are needed in vectoring mode, but one of them can beremoved. One of the adders is required in the situation when σ = 3 andsince that value is shifted 6 bits, it can be concatenated with the integer2σ to remove one adder.

Variable z: In rotation mode one adder is required and three values stored ina lookup table (24atan(σ2−3)). For vectoring mode the initialization isread from a lookup table storing the values of 2atan(σ2−6).

The radix-8 version only requires two pre-iterations where the radix-4 requiredthree. The reason is that that compensation for the scaling factor can be ap-proximated as Ki = 1 − σ2

i 8−2i already from iteration i = 3 in the radix-8implementation. The scaling iterations for i = 3 and i = 4 are inserted atiteration 6 and 7 and shown in table 3.3.

One radix-8 block is shown in figure 3.13. The block consists of 9 adders, 11shifters, 5 multiplexers and a selection function. Since σi is no longer a power

52 Design

Figure 3.13: Schematic of a radix-8 block.

of two, extra adders are needed to solve the case where σi = {3,−3}. In thissituation the variables x and y have to be multiplied by 3 and 9. Multiplicationby 3, is solved by a left-shift and an addition and multiplication by 9 is solvedby three left-shifts and an addition. Therefore, the radix-8 block requires fouradders more than a radix-4 block. Besides the increase in adders and anothershift sequence, the radix-8 block is implemented in the same way as the radix-4block. The selection is somewhat similar to the selection function in the radix-4implementation, the difference is of course an increase in the number of intervalsfrom two to four.

The final architecture for the radix-8 implementation is shown in figure 3.14.The power consumption is reduced by 22% compared to the CORDIC imple-mentation. This also means that the radix-8 is significantly better than theradix-4.

3.6 Scaling free

The scaling free algorithm relies on a different mathematic concept, but theidea behind the VHDL implementation is the same. The algorithm requires 24iterations to guarantee a 24-bit result, but unlike the other implementations the

3.6 Scaling free 53

Figure 3.14: Architecture for radix-8 implementation.

54 Design

(a)


(b)

Figure 3.15: Decoding the z variable (a) and ordering of the iterations for thescaling free implementation(b).

magnitude of the first 7 iterations are identical. The ordering of the iterationsare listed in table 3.15(b) and it can be seen that the largest rotation is i = 8.The initialization of the variable in the scaling free algorithm is found via alookup table as explained in section 2.5.1. The idea is to start a rotation, notfrom zero, but from the nearest pre-computed angle. An appropriate index sizeof the pre-computed angles was found to be 2−5 (table 2.5).

To find the nearest pre-computed angle it is not necessary to use any extra logic,the reason is that it can be found by looking directly at the first 5-bit of theinput value. This should give 25 index values, however, since the convergencerange is limited to [0;π/4] there is no reason to store values above 0.78125(0.11001bin), and therefore only 25 angles have to be stored. Now when theinitialization of the x and y variable is in place the missing step is the z variable.However, since this algorithm only supports rotation mode, the z variable canbe removed. The reason is that the z variable is responsible for selecting theiterations that have to be performed, and for the scaling free algorithm this isextremely simple. First of all the rotations are always positive, and the missingresidual angle is found by zi+1 = zi−2−i which is easy to calculate. This meansthat to see whether a rotation is needed in iteration i, it is sufficient to look atthe logical value of bit i from z. Therefore the z path can be removed from thedesign, making it less complex.

The first 7 iterations are special since they are all performed with i = 8, thisof course means that it is not enough to look at the 8th bit of z to see whethera rotation is needed in all the iterations. Instead it is known that the first5 bits are used as index to the lookup table and from bit number 9 we cansee the rotations for i > 8. The only missing bits are bit number 6, 7 and8. The 3-bit have seven different outcomes, and therefore controls the first 7iterations. By decoding these 3-bit, the hole z path can be removed. The ideais illustrated in figure 3.15(a), where the first 5-bit of a is used to index thelookup table, the next 3-bit are decoded, to control the first 7 iterations, andthe missing bits controls the iterations from i > 8. For example the input value

3.7 Summary 55

Figure 3.16: Implementation of a scaling free block.

0.01101 011 011001...bin, would mean that the index 01101bin is used to findthe nearest pre-computed angle. The next 3-bit (011bin) indicates that four ofthe first seven iterations are skipped and the last bits indicate that iteration 9,12, 13 ... are skipped. The major point is that all the required iterations areknown in advance before the rotations begin.

Implementation of a scaling free block is shown in figure 3.16. The block consistsof four adders, four shifters, an inverter and two multiplexers. The multiplex-ers control the skipping of the iterations, but it could also be implementedwith latches and demultiplexers to possibly decrease power consumption, butto simplify the circuit a multiplexer is used at this point. As for the otherimplementations two of the shifters and adders are not used after i = 15.

The final architecture is shown on figure 3.17. The scaling free version has thelowest power consumption. Compared to the CORDIC algorithm the powerconsumption is reduced by 34 %. But it is important to remember that thescaling free algorithm only supports rotation mode. Some of the other algo-rithms might perform better if they where optimized to only support one mode.It is therefore difficult to say whether it is the best algorithm when only rotationmode is requested.

3.7 Summary

The synthesis was performed in Synopsys for timing, area and power charac-teristics. The top level of the architecture under synthesis is inserted into atestbench, which read a randomly generated input every 6th cycle. The input ispassed through a register and into the top level. The synthesis tool compiles the

56 Design

Figure 3.17: Architecture for scaling free implementation.

3.7 Summary 57

Dynamic Leakage TotalPower mW Power µW Power mW

CORDIC [30]- Rotation mode 1.294 8.54 1.302 (1.00)- Vectoring mode 1.262 8.57 1.271 (0.98)D-CORDIC [11] [19]- Rotation mode 1.883 21.15 1.904 (1.46)- Vectoring mode 1.897 21.15 1.918 (1.47)- Invert rotation mode 2.019 21.16 2.040 (1.57)Radix-4 D-CORDIC- Rotation mode 1.273 7.37 1.280 (0.98)- Vectoring mode 1.283 7.32 1.290 (0.99)Radix-8 D-CORDIC- Rotation mode 1.009 5.14 1.014 (0.78)- Vectoring mode 1.002 5.09 1.007 (0.78)Scaling free [17]- Rotation mode 0.820 3.47 0.824 (0.64)

Table 3.4: Power results at 50 MHz. Ratio in parenthesis.

design to run with a clock frequency of 50 MHz and optimize for low power. Thesection contains a brief summary of the results, for details see chapter 5. Theobtained results are listed in table 3.4 and 3.5 for power and area respectively.The dynamic power in table 3.4 is the sum of the internal- and switching-powerfrom the synthesis tool. The difference between the two will be accounted forthe beginning of the next chapter.

The power results show that the flexibility for the D-CORDIC algorithm is ex-pensive as is uses almost 50 % more power than the original CORDIC algorithm.The flexibility cost an extra variable and two adder’s pr. block and that seemsvery expensive. In addition, the total area is also increased by 67%. The powergain for the radix-4 implementation is very small, as the power consumption isalmost identical as for the CORDIC algorithm. This seems strange since thenumber of blocks is reduced from 4 to 2, but the higher radix results in sig-nificant increase in the complexibility of one radix-4 block. In fact, the areaof the radix-4 implementation is increased by 14% and that is actually with atotal of 12 adders compared to the 13 required in the CORDIC implementation.This indicates that the shifters needed for the radix-4 implementation consumea significant amount of the area and power. Increasing the radix to 8 does havea positive effect on the power consumption, since the power is reduced by 22%compared to the CORDIC algorithm. As expected the scaling free implemen-tation has the best performance in terms of power. However, as explained insection 2.5 this algorithm does only support rotation mode.

58 Design

Area µm2 SVT cells HVT cells CPACORDIC [30] 22438 (1.00) 287 2529 13D-CORDIC [11] [19] 37489 (1.67) 625 3726 29Radix-4 D-CORDIC 25592 (1.14) 232 3107 12Radix-8 D-CORDIC 26531 (1.19) 133 2985 14Scaling free [17] 20112 (0.89) 61 2370 16

Table 3.5: Area results. Ratio in parenthesis.

The radix-4/8 and the scaling free algorithms have the property that they canskip unnecessary rotations. As explained there is not implemented any skippingtechnique for any of the implementations, and the question is therefore whatthe impact would be if such a technique was implemented. For the radix-4/8roughly 20% of the rotations are skipped and for the scaling free 50%. Skippinga rotation means that the logic retains the logic value from the previous cycle.What is important to remember is that the internal bits are getting closer to thefinal result and therefore the signals trough an unnecessary rotation does notmean many extra transitions. In addition, extra logic is needed to control theskipping technique in terms of latches, multiplexers or and -gates. With thesearguments, it seems that a skipping technique is not an advantage. Additionallyfor a skipping technique to be feasible the enable signal should be known prior tothe cycle were it is needed, this property is satisfied in the radix-8 and scaling-free implementation. If this property is not satisfied then extra activity mightoccur before the skipping signal (σi = 0) is stable.

Recall that the iterations are ordered as shown in table 3.15(b) for the scalingfree implementation. Therefore, the probability that two or more iterationswithin the same block are skipped is equal to 25% and 12.5%. This means thatthe probability that the same block should be skipped in two subsequent cyclesis small. This analysis together with the fact that the internal bits gets closer tothe final result means that a skipping technique is not required. A solution wherethe skipping could be useful would be in a full iterative architecture with onlyone block. In this situation, a skipped iteration could mean that the algorithmon average would execute 50% faster.

The purpose with implementing all the algorithms was to see whether an optimalsolution exists. It is difficult to conclude at this point since they do not havethe same flexibility. However, looking at the results shows that implementingacos and asin does mean a significant overhead in terms of area and power.The difference for rotation mode in the D-CORDIC and the radix-8 algorithmis that for the D-CORDIC it consumes 87% more power than for the radix-8.The radix-8 algorithm presented in section 2.4 is therefore considered as a goodchoice for a low power CORDIC algorithm. In the next chapter, the radix-8

3.7 Summary 59

D-CORDIC algorithm is optimized for low power.

The timing characteristics obtained from the synthesis is not that important,since the 50 MHz is more than enough to allow the synthesis tool to compile thedesign. One important observations is however that most of the inserted cellsare HVT which can be seen on table 3.5. This is also visible on the static powerconsumption in for instance the D-CORDIC implementation where the staticpower consumption only accounts for 1% of the total power consumption. TheD-CORDIC implementation is the architecture that has the highest percentageof SVT cells (14%).

60 Design

Chapter 4

Optimization for low power

The purpose with this chapter is to examine different low power techniques andtheir effect on the radix-8 D-CORDIC architecture. Designing for low poweris not a trivial task as there is no guarantee that all the low power techniquesthat are available has a positive effect on the power consumption for the radix-8implementation. Before the chapter presents the low power techniques used,this section will account for the power dissipation in a CMOS circuit. Thedissipation comes from three components [32].

Ptotal = Pstatic + Pload + Psc︸︷︷︸dynamic

(4.1)

Static dissipation Due to sub-threshold cunduction, gate leakage current andleakage current. The static power is consumed even when the logic isinactive and the only way to reduce the static power consumption is bycompletely shutting down the circuit. This can be seen in equation 4.2.

Pstatic = VDDIleakage (4.2)

Another way is to increase the threshold voltage. Since the library used inthis thesis supports both SVT and HVT cells, inserting HVT cells insteadof SVT will reduce the leakage power. HVT cells are slower than SVTcells and for this reason they can only be used if it does not violate the

62 Optimization for low power

timing constrains. The Synopsys synthesis tool does already insert HVTcells when possible and this is visible on table 3.5 where more than 96% ofthe cells used in the radix-8 implementation are HVT cells. The only wayto insert a larger percentage of HVT cells is to reduce the critical path.Another way to reduce the static power consumption is by decreasing thearea, but since the inserted cells are inserted with a purpose, it might bedifficult to reduce the number of cells.

Load capacitance Due to charging and discharging of the load capacitance.Pload is proportional with the activity, the output load, the supply voltageand the clock frequency as seen in equation 4.3. Since the supply voltageand clock frequency are constant, the only way to reduce the dynamicpower consumption is by looking at the activity and the output load.

Pload = αCV 2DDf (4.3)

The activity can for instance be reduced by disable unnecessary parts ofthe circuit, clock gating and decreasing the logical depth.

Short-circuit Due to current pulse from VDD to GND for a short period. Thereason is that since the input rise/fall time is greater than zero both thepMOS and nMOS networks are temporary on for a short period. Equation4.4 shows the short-circuit factor

Psc = tscVDDIpeakf (4.4)

and one way to reduce the short-circuit power is to reduce the activity.

The Synopsys synthesis tool does not distinguish between the Pload and Psc forthe dynamic power. Instead the power results from the synthesis shows theinternal power and the switching power. The internal power is the power usedfor charging and discharging of the load capacitance within the cell and theshort-circuit power. The switching power is the power dissipated by chargingand discharging the load capacitance at the output of the cell.

This chapter will focus on decreasing the dynamic power consumption by reduc-ing the switching activity for the radix-8 architecture. The following techniqueswill be used: Gray coding, toggle prevention, clock gating and other circuittransformations. Since the difference between the power consumption in rota-tion and vectoring mode are almost identical, the chapter will only focus onrotation mode unless otherwise specified. The obtained results are with a clockfrequency of 50 MHz.

4.1 Gray coding 63

State Binary counter Gray counter0 000 0001 001 0012 010 0113 011 0104 100 1105 101 100

# transitions 10 6

Table 4.1: Number of transitions for a binary and a Gray counter, countingfrom 0 to 5.

4.1 Gray coding

A simple way to reduce the power consumption in the controller is by using Graycoding [18]. The controller was not described in the previous chapter, becauseit is a simple 3-bit counter counting from 0 to 5 to distinguish between the 6clock cycles. The regular way to implement this is by using a binary counterthat increment the 3-bit from 000 to 101. However, this is a and expensiveway to increment the counter. The reason is that it requires unnecessary extratransitions. For instance incrementing the counter from 3 to 4, would result in 3transitions since each bit has to change its logical value (011→ 100). Reorderingthe counting sequence can reduce the total number of transitions as illustratedin table 4.1.

The middle column in table 4.1 shows the required transitions for a binarycounter that counts from 0 → 5. The total number of transitions is 10 wherethe first 8 is from the change from 0 → 5 and the last 2 when the counter isreset back to zero. The Gray coding in the right column reduces the number oftransitions down to 6. Since the counter only occupies 1.2% of the total areaand only use 2% of the total power consumption, it might seem overestimatedto reduce the total number of transitions by four in the counter and still expecta noticeable difference. However, the coding does have a positive effect in otherparts of the circuit. The ouput of the counter is for instance used in the shifters,to determine the number of bits to shift the input. For the binary counter ashifter would have to wait until the counting signal settles. Meaning that all3 bits are changed to their final value. The question is what might happen ifone of the 3 bits arrives later than the other two. For example, in the changefrom cycle 3 to 4, the counting signal would experience 3 transitions and in theworst case, none of the bits would arrive simultaneous. This would mean that ashifter will start toggling and the wrong output is send trough the subsequentlogic. The impact of this is unnecessary activity/glitches in the shifter, but also


Input Enable and -gate Multiplexer010111 1 010111 010111110001 0 000000 010111

Table 4.2: Deactivating blocks with either an array of and -gates or multiplexer.

in a large part of the circuit. With Gray coding, this situation is avoided sinceonly one bit is changed from the previous cycle.

The result after Gray coding is that a shifter would experience less unwantedactivity due to late arrival of one of the counting bits. The change from binaryto Gray coding has actually reduced the power consumption within the shiftersby 29% and the total power consumption for rotation mode is reduced by 5.1%.This is actually very impressive that reordering three bits in the controller canreduce the total power consumption by 5.1%.

4.2 Toggle prevention

Toggle prevention is about reducing the activity for blocks that are not needed.This section describes how to disable some of the inactive blocks to reduce thepower consumption.

4.2.1 Disable inactive blocks

Some blocks in the architecture is only used in a limited number of the 6 cyclesthat one computation takes. Therefore, deactivating these blocks when theyare not used can reduce the dynamic power consumption. The blocks thatcan be deactived is the block for initialization and the last-iteration block (seefigure 3.14). The initialization is split into two blocks (one for rotation modeand one for vectoring mode), since only one of these blocks are used in onecomputation. The other block can be deactivated when not needed. In addition,the initialization block that is used is only used in the first clock cycle.

It is possible to deactivate logic either with an array of and -gates or by usingmultiplexers. Multiplexers are most expensive but also the most flexible sincethey are not disturbed by changing inputs when the enable signal is low. On theother hand an and -gate does not hold the value from the previous cycle and thismight give some extra activity if the input changes. An example is illustrated intable 4.2, where the and -gate generates extra transitions when the deactivated

4.2 Toggle prevention 65

Last-iteration InitializationArea µm2 Power µW Area µm2 Power µW

No toggle prev. 2024 (1.00) 74 (1.00) 4343 (1.00) 50.3 (1.00)and -gates 2072 (1.02) 39 (0.53) 4813 (1.11) 13.6 (0.27)Multiplexer 5071 (2.50) 119 (1.61) 5705 (1.31) 24.3 (0.48)

Table 4.3: Results from synthesis of different ways to implement toggle preven-tion. Ratio in parenthesis.

block is reset to zero.

For the two initialization blocks, this is not a problem, since it is assumedthat the input to the numerical processor is stored in a register before theinitialization blocks. This register is storing the input value throughout thecomputation until a new computation is requested. Therefore, the initializationblocks will not be disturbed by floating inputs and it is sufficient to use and -gates at the inputs to those blocks. This also means that there is no reason todeactivate the used block when it is not used. The reason is that with an and -gate this would reset the input to zero and that would generate extra activity.For example, when the circuit is in rotation mode the enable signal for thatblock is held high through the entire evaluation while the vectoring block isdeactivated. Of course it works opposite in vectoring mode.

The block for the last iteration is only used in the last clock cycle and can bedisabled in the first five cycles. For this block the input signal is not constantwithin the 6 cycles. Therefore, the best way to deactivate the block is notstraightforward since it requires some synthesis to find the best solution interms of power and area. Table 4.3 shows the obtained synthesis results for theinitialization and last-iteration block. The results shows that the best solutionis to use and -gates for both blocks and surprisingly the power consumption inthe implementation with multiplexers uses more power than when no toggleprevention is used. In terms of area, the solution with multiplexers does alsorequired a significant amount of extra logic.

4.2.2 Clock gating

The Synopsys synthesis tool can automatically insert clock gating. It requiresthat the set clock gating style variable is set before elaborating the design andthat the registers contains an enable signal. The automatically inserted clockgating from the synthesis tool can be either latch-based or latch-free. Choosinglatch-free requires that the enable signal is constant from the active rising edge of


the clock to the inactive falling edge of the clock. If this property is not satisfiedthen a latch-based clock gating style should be used. If a latch-free solution isstill used, extra activity at the clock signal, might result in wrong values storedin the register. The most important thing to do when implementing clock gatingis therefore, to decide where the enable signal should come from. For the radix-8 architecture the enable signal could come from either a skipping technique(using σ) or by the clock cycle number. Both solutions will be accounted forbelow.

Skipping technique Since the radix-8 algorithm has the property that someof the iterations can be skipped, the σi signal could be used for disablingthe registers. An advantage with this solution is that the enable signalis easy to retrieve since it is computed in the previous cycle. However,only 2 of the 8 iterations are skipped on average and only two of the threevariables can be skipped1. In addition, there is a problem in rotationmode where compensation for the scaling factor is carried out in the lasttwo iterations. This means that these iterations cannot be clock-gated iffor instance, the iterations should be skipped and a scaling compensationis required. Consequently, the variables can only be clock-gated in thelast two iterations if both the iteration should be skipped and there is nocompensation for the scaling factor. It is therefore doubtful that clockgating with σi as the enable signal is a feasible solution.

Cycle number Using the cycle number to control the enable signal is basedon the fact that in each iteration a fixed number of bits of the result iscomputed. At least that is the principal for the CORDIC algorithm. Un-fortunately, this does not hold entirely for a higher radix implementation.The reason is that when a rotation is skipped the internal bits do not comecloser to the final result. The case is best explained by an example. Con-sider the evaluation of cos(2−8) where x1 is initialized to 1.00000...bin.If the first rotation is skipped then x2 = 1.00000...bin. Now if the first 4bits were clock-gated, then it would mean that the final result should endup with 1.000...bin. But this is only correct for cos(0). For cos(2−8) thefirst 4 bits of the result should be 0.111...bin. Therefore, clock gatingdoes not work with this approach because iterations are skipped.

Another way to use the cycle number to disable the clock signal is by usingthe fact, that there is no reason to store any values in the register in thelast clock cycle. In the last cycle the result is send to the last iterationblock and stored at the output register. So what does this actually mean?It can be assumed that the evaluation of trigonometric functions does notoccur frequent within a hearing aid. Therefore, the entire CORDIC unit

1Since, the y or z variable is multiplied with 8, in either rotation or vectoring mode eventhough the iteration should be skipped.

4.3 Other circuit transformations 67

Figure 4.1: The logic automatically inserted by the synthesis tool with a latch-based clock gating style.

Total power µW Register µWBefore 926 (1.00) 180 (1.00)After 905 (0.98) 151 (0.84)

Table 4.4: The power results from synthesis obtained before and after clockgating has been implemented.

is shut down when not needed (properly clock-gated). This means thatinstead of clock gating the trigonometric processor at the cycle after theresult is retrieved, the circuit can be clock-gated one clock cycle earlier.It is assumed that the trigonometric processor is clock-gated when it isnot used, and there is no cost of implementing clock gating within theprocessor. Disabling the clock within the trigonometric processor is simplyhappening one clock cycle earlier.

Since the enable signal is coming from the combinatorial path, a latch-basedsolution is used2. The logic from the automatically inserted clock gating isshown in figure 4.1. The latch and the and -gate consumes 37µW and this isalmost the same amount of power that is saved in the registers. However, theclock gating also means less activity in the following cycle and therefore the totalpower consumption has been reduced by 2%. The results are listed in table 4.4where the total power consumption and the power consumed in registers arelisted before and after clock gating.

4.3 Other circuit transformations

It is possible to further reduce the unnecessary activity by reducing the logicaldepth of the architecture and change the behavior of the adders. This will beexplained in this section

2With the above mentioned command: set clock gating style -sequential cell latch.


4.3.1 Reducing the logical depth

Looking at the final architecture in figure 3.14, shows that the logical depth ofthe system is large. This has an effect on the critical path and on the totalswitching activity, since small changes at the beginning of the circuit can prop-agate through a large part of the circuit. One way to reduce the logical depthis to combine some of the multiplexers. For instance the multiplexer for theinitialization can be combined with the multiplexer at the register. In additionone of the multiplexers used at the x variable can be removed. The reason isthat the first multiplexer is used to determine whether one of the input shouldbe taken directly from the register or the second shifter. Combining this mul-tiplexer with the shifter can reduce the logical depth. This does not only havean effect on the logical depth but also on the switching activity. The reason isthat the input is now only shifted if that is required.

Another way to reduce the logical depth of the circuit is by looking at the selec-tion function for vectoring mode. Recall from equation 2.33 that the selectionfunction depends on the x variable. However, this dependency is only visiblewithin the first two iterations. After iteration 2, the intervals remain constantbecause only the first 5 fractional bits are needed from the x variable. Thisactually means that the intervals calculated in the initialization block, can beused in the subsequent iterations. Therefore, these extra computations can beremoved from the selection function.

The two opmode-shifter ’s are used to multiply the y or z variable with 8. How-ever, this multiplication can be moved to after the register. This does not onlyhave a positive effect on the logical depth but also on the size of the register.The reason is that in the architecture on figure 3.14 the register saves the shiftedvalues. If the multiplication is moved to after the register it actually means thatthe size of register for the y and z variables can be reduced by 3 bits. This is areduction by 6% in terms of flip-flops in the register.

4.3.2 Adder

The x and y variable each use 4 adders. It is possible to reduce the powerconsumption within the adders by looking at the size of the shift sequence.For simplicity only the changes for the x variable will be accounted for in thefollowing, but the exact same observation can be made for the y variable. Thesmallest shift sequence occur in the first iteration when σ2 = 4 and that producesthe shift sequence for the x variable as illustrated in table 4.5. Since 2 of the5 values are shifted at least 8-bits there is no reason to use four 30-bit adders,


Mathematic Shift sequenceExpression

xi x.xxxxxxx xxxxxxxxxxxxxxxxxxxxxxσ22−6i 0.0000000 xxxxxxxxxxxxxxxxxxxxxxσ22−6i 0.0000000 000xxxxxxxxxxxxxxxxxxx 1

2σ2−3i s.ssyyyyy yyyyyyyyyyyyyyyyyyyyyy2σ2−3i s.ssssyyy yyyyyyyyyyyyyyyyyyyyyy 1

1 Only used for σ = 3, otherwise signal is set to zero.

Table 4.5: Shift sequence for xi(1− σ22−6i)− 2σyi2−3i for i = 2.

(a) (b)

Figure 4.2: Implementation of a FA with gates (a) , and cell from the 90 nmlibrary (b).

instead one of the adders can be reduced to a smaller CPA of only 22-bits.

Another observation is that the inserted CPA adders are implemented as CRAby the synthesis tool. This is possible because the use of CRA does not violatethe timing constrains. Since a CRA propagates the carry all the way from theLSB to the MSB, it might reduce the dynamic power consumption if carry-save-adders (CSA) where used instead. This optimization did turn out to be moretricky than assumed, the reason is the way that Synopsys choose to interpretthe VHDL code. Writing a CSA is simple, as it consists of FA’s (full-adder), butwhen Synopsys compiles the design and optimize for power, it implements theFA with gates as shown in figure 4.2(a). However, the 90 nm library used in thisthesis contains a special purpose FA cell as illustrated in figure 4.2(b). WhenSynopsys compiles and optimize the CPA it use the FA cells while it uses theFA with gates when implementing a CSA. So what does this mean? The fact isthat the implementation with gates consumes on average 50%3 more power thanthe FA cell. Additionally, the FA with gates occupies an area of 27.44µm2 whilethe FA from the library only occupies 18.65µm2. Therefore, it was necessary toforce the synthesis tool not to optimize the CSA adder4 before proper synthesis

3Values obtained from power analysis.4This is done by the don’t touch command.


Figure 4.3: Schematic of 3 different adder trees.

Dynamic Leakage TotalPower µW Power µW Power µW

4 CPA 93.94 0.26 94.2(1.00)2 CPA, 2 CSA 115.42 0.26 115.7(1.22)1 CPA, 3 CSA 127.22 0.28 127.5(1.35)

Table 4.6: Comparison between implementation of 3 different adder-trees (Ratioin parenthesis).

results could be obtained.

With that problem solved the next thing is to see how the adder tree shouldbe combined. The adder tree has 5 inputs, where 3 of them are 30-bit and thelast 2 are 22-bit. Figure 4.3 illustrates 3 different ways to implement the addertree, where the first one is the original design. The second implementation use2 CPA and 2 CSA, where the first CPA is used to add the two 22-bit values.The last implementation uses 3 CSA and a CPA.

The different designs have been synthesized and the results from power analysisis shown in table 4.6. The results are surprising since the effect on the powerconsumption is an increase of up to 35% when three of the CRA are replaced byCSA. Theoretically it should be free of charge to go from CRA to CSA since theyuse the exact same logic, which is justified by the leakage power which is almostidentical. The reason why the leakage power is slightly higher for the last designis that four 30-bit adders are used5. The results from table 4.6 shows that theoriginal design is the most optimal solution for a low power architecture. Theexplanation for the results could be that for each iteration the values within theadders are getting closer to the final result, which means that the probabilitythat the carry is propagated through a large portion of the bits in the CRA’sare limited. At the same time a CSA is exposed to more toggling since 3 inputscan now change its value.

5It is not possible to reduce one of the adders, since the previus reduced 22-bit CRA ischanged to a CSA.


The optimization concerning the logical depth and the adders has further re-duced the power consumption by 4.7%. The result for the synthesis is presentedin the next chapter and the final architecture is shown in figure 4.4.


Figure 4.4: Final architecture for the radix-8 D-CORDIC implementation afterlow power optimization.

Chapter 5

Results

This chapter will present the results obtained from the synthesis and discuss theimportant observations. A brief summary of the results was presented in section3.7, and in figure 5.1 and 5.2 the results are illustrated by a graph. Figure 5.1shows the total power consumption for each of the 6 implementations. Oneimportant observation is that the power consumption in rotation and vectoringmode are almost identical. For the invert-rotation mode in the D-CORDICimplementation, the power consumption deviates compared to the other modes.This is as expected because this mode contains an extra variable which requiretwo adders that are initialized to zero in rotation and vectoring mode. Figure5.2 shows the area used for each of the implementations and the most importantobservation is that a high percentage of HVT is used for all the implementations.Another observation is that the radix-8 implementation is larger than the radix-4 (figure 5.2(a)) but the radix-4 use more cells (figure 5.2(b)). This can beexplained by the fact that the radix-8 use more adders (and therefore FA cells)that are larger than regular logic cells1.

The CORDIC implementation is the least complex implementation and shouldbe considered as a reference model as it can be easily implemented from thealgorithm that Volder [30] presented. It is a very elegant and flexible algorithm,but for a low power implementation, this thesis has shown that other variationsof the algorithm can perform significantly better.

1For instance the size of an FA cell is 18.659µm2 while the size of an and-gate is 5.48µm2.

74 Results

Figure 5.1: Obtained power results from synthesis.

The acos and asin are extremely expensive to implement and whether it is fea-sible to implement them within a hearing aid or any other low power deviceshould be severely considered. It was not possible to find other ways to im-plement these functions without using a multiplier and therefore, further workmust be done to develop other ways to implement these functions. If acos andasin is not required then the D-CORDIC algorithm should be avoided. Eventhough it contains an extra variable t to support evaluation of acos and asin, itwould not help to remove this variable and use the D-CORDIC for rotation andvectoring mode. The reason is that the logic needed for the t variable only uses4% of the total power consumption. Therefore, removing this extra logic wouldstill mean a significant overhead compared to the CORDIC that would performbetter. The overhead in terms of power from CORDIC to D-CORDIC is 1.47so you can conclude that the trade off for adding support for acos and asin inrespect to power is an increase by 47%. But in fact the overhead is bigger, ifit is compared with the radix-8 implementation. In that case, the overhead is1.87.

The new radix versions of the D-CORDIC algorithm presented in this thesisperforms particularly well. Looking at the difference between the radix-4/8 andthe D-CORDIC reveal a distinct difference. Even when the extra variable t isneglected from the power results the difference is remarkable. Nevertheless, com-paring them to the CORDIC implementation shows another result. In terms ofpower the radix-4 is identical to the CORDIC implementation, and that is withan area overhead of 1.14. Therefore, it must be concluded that the CORDIC al-gorithm is better than the radix-4 algorithm. The natural explanation is that a

75

(a) (b)

Figure 5.2: Obtained area results from synthesis (a), and the distribution be-tween HVT and SVT cells (b).

radix-4 block uses 0.587µW while a CORDIC block only uses 0.260µW . So eventhough the number of blocks is reduced by 50% that cannot justify the poweroverhead within one block which is 1.45. Looking at the radix-8 implementationshows a 22% reduction and for a low power implementation the radix-8 shouldbe considered. One radix-8 block is significantly more complex than a CORDICblock, but since only one block is needed in each iteration that does overshadowthe extra complexity.

The included scaling free algorithm has the lowest flexibility, so the question iswhy it is included in the analysis. The reason is that it is the only algorithmthat successfully has been experimented with as a low power implementationof the CORDIC algorithm. It is therefore useful as a comparison point forthe radix-8 implementation that was concluded to be the best algorithm forrotation and vectoring mode. The difference between the scaling free and theradix-8 implementation is that the radix-8 uses 23% more power. But since thescaling free algorithm only supports rotation mode, the difference would becomesmaller if the radix-8 was restricted to only one mode. The poor convergencerange for the scaling free algorithm has been solved by argument reduction andif only cos and sin is required this algorithm should be selected.

Instead of using CORDIC (or other variations) for implementing trigonometricfunctions, it is possible to use function approximation with multiplication andlarge lookup tables. One method is presented in [23], where the expression inequation 5.1 can evaluate a large portfolio of functions.

f(X) ≈ C0 + C1X2 + C2X22 (5.1)

Here C0, C1 and C2 are obtained from a lookup table and X2 is the 17 LSB fromthe input value. The 7 MSB is used as index to read values from the tables. Themethod requires 3 multiplications and a lookup table, and must be considered

76 Results

to be very flexible since the only requirements to add extra functions is a largertable. However, since each function requires a unique table the system wouldbecome large. For instance, for evaluation of sin a table of 3.625 Kb is requiredfor 24-bit precision [23]. This is one-third the size of the table required for scal-ing approximation in the radix-8 algorithm (1.21 Kb). A coarse approximationwould therefore mean that to implement function approximation with supportfor cos, sin and atan it would require a table which in size would be equivalent toa radix-8 block2. In addition, a multiplication is required and therefore it mustbe concluded that when little area and power is available, function approxima-tion should not be used. Another important issue is the design constrain that aresult with 16/24-bit precision should be available within 4/6 cycles. To get a16-bit result it would still require the same logic (3 multiplications and lookuptable) and therefore it would be natural to argue whether one implementationthat yields 24-bit precision in 4 cycles is the only solution. Such architecturewould almost require one complete multiplication in each cycle. Of course, theconclusion would end up differently if an existing multiplier were available onthe processor and could be used for function approximation.

5.1 Results from low power optimization

The result from the low power optimization is presented in this section. Themotivation for experimenting with low power optimization was to see which tech-niques could be used and how much gain the extra time spend on the low poweroptimization would give. Of course, the same techniques that were used for theradix-8 implementation could be used in the other versions of the CORDIC al-gorithm and therefore it would be unfair to use the optimized radix-8 implemen-tation as a point for comparison towards the other implementations. However,comparing the CORDIC and the radix-8 algorithm, shows that the power hasbeen reduced 32%.

The results from the synthesis regarding the power consumption is listed in table5.1 where the power before and after the low power optimization for rotationmode is listed. Similar results can be made from vectoring mode but since thepower consumption is very similar, this section will only focus on rotation mode3.The most important observation is of course that the power has been reducedby 14% and additionally the area is now 7% smaller. The power reduction for

2The size of the table used in the radix-8 implementation is 1240µm2. A rough approxi-mation would therefore give a table size of 1240 · 3 · 3 = 11, 160µm2 which is 86% the size ofone radix-8 block.

3The power consumption in vectoring mode has been reduced from 1.002mW to 0.855mWwhich is a reduction by 15%.

5.1 Results from low power optimization 77

Before After Dif. % of totalmW mW % power

Initialization 0.061 0.020 67% 2%Multiplexer 0.069 0.056 19% 6%Radix-8 block 0.640 0.562 12% 64%Register 0.185 0.186 +1% 21%last-iteration 0.041 0.039 5% 4%Controller 0.021 0.013 38% 1%Total 1.014 0.879 14% 100%

Table 5.1: Power results from synthesis in rotation mode.

each of the VHDL blocks are listed in table 5.1 and it is not a surprise that mostof the power consumption is used in the radix-8 block which count for 64% ofthe total power consumption. The only block that is using more power afterthe optimization is the register, the reason being that the implemented clockgating requires extra logic to control the clock signal (a latch and an and -gate).This extra logic does experience a lot of activity, since one of the input is theclock signal. At the same time the enable signal is input to the entire registerand the load capacitance for the output is therefore very high. This explanationdoes match perfectly with the synthesis results where the switching power is 10times bigger than the internal power for the clock gating logic. In comparisonwith the entire circuit, this actually means that the output capacitance for thelatch and the and -gate uses 3.5% of the total power consumption. However,one thing that is not visible is that after clock gating the logic located after theregister experience less activity when the enable signal is low. Therefore, thetotal power comsumption is reduced from clock gating and that is for instancevisible in the multiplexer.

The area of the circuit has also been reduced and the results are listed in table5.2. Most of the reduction is within the radix-8 block that account for 53% ofthe total area. The radix-8 block is reduced by 13% and this reduction comesfrom the decreased size of four of the adders that were reduced by 8-bits. Inaddition, the re-evaluation of the intervals in the selection function for eachcycle has been removed since the intervals can be obtained directly from theintervals evaluated in the initialization step. The multiplexer is the only blockwere the area is increased, the reason is that the 2-input multiplexer is changedto a 3-input multiplexer.

One nand -gate in the 90 nm HVT library occupies 4.39µm2. Converting therequired area to gates means that roughly 5,650 gates is required to implementthe low power optimized radix-8 architecture.

78 Results

Before After Dif. % of totalµm2 µm2 % area

Initialization 4,381 4,389 0% 18%Table 892 868 0% 4%Multiplexer 930 1,269 +36% 5%Radix-8 block 14,981 13,026 13% 53%Register 2,097 1,941 7% 8%last-iteration 2,070 2,060 0% 8%Controller 336 315 6% 1%Total 26,531 24,800 7% 100%

Table 5.2: Area results from synthesis.

The critical path has not been of any concern in this thesis since a clock fre-quency of 50 MHz is more than enough for the synthesis tool to synthesize thedesign without violating the timing constrains. This is also visible in the highnumber of inserted HVT and for the optimized radix-8 implementation 97% ofthe cells are HVT.

Chapter 6

Future work

The objective of this thesis was to implement a numerical processor that couldevaluate trigonometric functions. For that purpose, the radix-8 D-CORDIC al-gorithm was shown to be the most optimal solution. However, the CORDICalgorithm is not only suitable for trigonometric functions in fact the algorithmcan also evaluate multiplication, division, hyperbolic, logarithmic and exponen-tial functions. The concept is simple and the idea is that instead of working inthe circular coordinate system (the unit circle), the algorithm can be extendedto the hyperbolic and the linear coordinate systems, this was presented in 1971by Walther [31]. The purpose with this chapter is to see whether the presentedradix-8 D-CORDIC algorithm developed in this thesis can be extended to sup-port all three coordinate systems.

The chapter will also discuss other ideas to further optimize the radix-8 imple-mentation and introduce possible applications that might benefit from the workdone in this thesis.

6.1 Increased flexibility

Rotation in the linear and hyperbolic coordinate systems is shown in figure 6.1.In the figure, the blue color illustrates the linear coordinate system and the

80 Future work

Figure 6.1: Rotation in the linear and hyperbolic coordinate systems.

green color the hyperbolic coordinate system. The required extensions will beexplained in the following sections.

6.1.1 Linear coordinates

A simple modification to equation 2.6 permits the computation of linear func-tions and the new modified CORDIC algorithm for the linear system is shownin equation 6.1.

xi+1 = x0

yi+1 = yi + σixi2−i

zi+1 = zi − σi2−i(6.1)

In the linear coordinate system it is important to notice that the algorithm isnot effected by a scaling factor and that the algorithm in rotation mode computeyf = y0+x0 ·z0 and in vectoring mode zf = z0+y0/x0[9]. Extension of the radix-8 algorithm into the linear coordinate system is shown in equation 6.2 whererotation mode is written to the left and vectoring mode to the right. Lookingat the CORDIC and radix-8 algorithm shows that much of the complexity inthe circular system is removed when linear coordinates are used.

xi+1 = x0 xi+1 = x0

yi+1 = yi + 2σixi8−i wi+1 = 8(wi + 2σixi)wi+1 = 8(wi − 2σi) zi+1 = zi − 2σi8−i

(6.2)

The algorithm is simple to implement and the algorithm shares the same selec-tion function that is used in the circular coordinate system. Therefore, it shouldbe possible to extend the current radix-8 architecture in chapter 4 with thenlinear coordinate system with a couple of multiplexers.

Using the linear CORDIC algorithm as a multiplier or divisor is not an optimalsolution [1], since a special purpose multiplier and divisor always will outperform

6.1 Increased flexibility 81

the CORDIC algorithm1. Nevertheless, it is important to remember the flex-ibility that the CORDIC algorithm offer and in a system where trigonometricfunctions are a prerequisite it might be worth considering whether an existingCORDIC processor should be extended to support multiplication and divisionsince the extra flexibility comes at a relatively small area overhead.

6.1.2 Hyperbolic coordinates

There is a close relationship between the trigonometric and the hyperbolic func-tions. Because of this, the CORDIC algorithm can be extended to the hyperboliccoordinate system with minor changes and this will be accounted for in this sec-tion. As was the case for the circular rotations in equation 2.1, a hyperbolicvector rotation can be described similar by equation 6.3 [9].[

x′

y′

]=[

cosh θ sinh θsinh θ cosh θ

] [xy

](6.3)

This gives an iterative algorithm that is very similar to equation 2.6 as shownbelow.

xi+1 = xi+σiyi2−i

yi+1 = yi + σixi2−i

zi+1 = zi − σiatanh(2−i)(6.4)

Notice that the only difference is that the second part of the x expression is nolonger subtracted (illustrated by the red color) and that the angles stored in thelookup table is atanh(2−i) instead of atan(2−i). The change of the sign doesalso affect the scaling factor, which is now given by

K =∞∏i=0

1√1− σ2

i 2−2i(6.5)

and because it is illegal to divide with zero (and at the same time atanh(20) =∞) the rotations have to start at i = 1 instead of i = 0. One general problemwith rotations in the hyperbolic coordinate system is that the algorithm doesno longer converge since a rotation is performed in each iteration. A solution isto repeat iteration 4, 13, 40 . . . as shown in [9].

Extension of the radix-8 D-CORDIC algorithm to the hyperbolic coordinatesystem is slightly more complex than the extension into the linear system. Nev-ertheless, it is possible and the algorithm will be presented below. Rewriting

1This comparison is for the CORDIC vs. multiplier/divisor. Whether the same holds forthe radix-8 algorithm is not certain.

82 Future work

Coordinate system Rotation mode Vectoring mode

Circular xf = K cos(θ) xf = K√x2 + y2

yf = K sin(θ) yf = 0zf = 0 zf = θ + atan( yx )

Linear xf = x xf = xyf = y + xz yf = 0zf = 0 zf = z + y

x

Hyperbolic xf = K cosh(θ) xf = K√x2 − y2

yf = K cosh(θ) yf = 0zf = 0 zf = θ + atanh( yx )

Table 6.1: Complete list of the functions which are possible in the extendedradix-8 algorithm.

equation 6.3 in a similar way as for equation 2.18 yields a radix-8 D-CORDICalgorithm.[

x′

y′

]=[

1 σi8−i

σi8−i 1

]2=[

1 + σ2i 8−2i σi8−i+1

σi8−i+1 1 + σ2i 8−2i

](6.6)

Here the change of the sign is noticeable since the algorithm has 1 + σ2i 8−2i

instead of 1 − σ2i 8−2i for the circular system. The final algorithm for rotation

and vectoring mode is shown in equation 6.7 with rotation mode to the left andvectoring to the right.

xi+1 = xi(1+σ2i 8−2i)+2σiyi8−i xi+1 = xi(1+σ2

i 8−2i)+2σiwi8−2i

yi+1 = yi(1+σ2i 8−2i) + 2σixi8−i wi+1 = 8(wi(1+σ2

i 8−2i) + 2σixi)wi+1 = 8(wi − 2 · 8iatanh(σi8i)) zi+1 = zi − 2atanh(σi8−i)

(6.7)

The extended algorithm seem complex but again the modifications to the circu-lar algorithm are limited and can be implemented with multiplexers. In fact theonly changes are shown by the red color in equation 6.7. In addition the scalingfactor can be compensated for in the same way as in the circular system andthe selection intervals are identical to the one proposed for circular coordinates.

Table 6.1 lists the functions that can be evaluated in each of the three coordinatesystems. In addition to the functions in table 6.1 the immediate output fromthe radix-8 D-CORDIC algorithm can also be used to calculate functions like

tan(θ) = sin(θ)cos(θ)

tanh(θ) = sinh(θ)cosh(θ)

eθ = sinh(θ) + cosh(θ)ln(θ) = 2atanh( yx )√θ =

√x2 − y2

(6.8)

6.2 Further optimization of the algorithm 83

Since the purpose with this thesis was to implement and synthesize a processorwith trigonometric functions, the focus has been on VHDL implementation ofthe algorithms described in chapter 2. Therefore, there has not been enoughtime to also experiment with VHDL implementations of the extensions intothe hyperbolic and the linear coordinate systems. But Matlab files for theextended radix-8 D-CORDIC algorithm has been created and they can be foundin appendix C. In addition a summary of the radix-8 algorithm for all the threecoordinate systems are listed in appendix B.

6.2 Further optimization of the algorithm

Even though the radix-8 D-CORDIC algorithm was intended to be used for alow power implementation of the CORDIC algorithm, it would be interestingto see how the algorithm would perform as a high-speed processor. The reasonis that most of the work done so far has been on high-speed radix-4 versionsof the basic CORDIC algorithm. It would therefore be interesting to see howthe radix-8 D-CORDIC would be able to compete with these solutions. Ofcourse the architecture presented in chapter 3 cannot be used, since the criticalpath is large, but one way to increase the speed is by using pipelining, sincethat would naturally decrease the critical path. Another thing that would beworth studying is what happen if a redundant number system is used instead.The purpose with the D-CORDIC algorithm in [11] was to develop a redundantversion of the CORDIC algorithm, to make it more suitable for high-speedsystems. The fact that it would support evaluation of acos and asin was a sideeffect. However, the D-CORDIC algorithm does require n+ 1 iterations for ann-bit result. This means that even though it might have a high throughput thelatency is still a problem. This is where the radix-8 algorithm probably wouldbe able to perform better. The radix-8 algorithm does require a significantamount of extra adders pr. block, but they can be combined into an adder-tree of redundant adders. In addition, recall that the radix-8 algorithm onlyrequires 9 iterations compared to 26 for the D-CORDIC. One application thatmight benefit from the high speed algorithm could for instance be real timeimage rotation for computer graphics [27].

If the purpose of the algorithm was vector rotation and not only evaluationof trigonometric functions, the scaling factor would become more difficult tocompensate for. The reason is that for evaluation of trigonometric functions thestarting vector is [1, 0]. With this vector the compensation for the scaling factor

84 Future work

in the first two iterations is solved by initialization from pre-calculated values2.However, this is only possible when the starting vector is [1, 0] since it is easyto multiply the pre-calculated scaling factor with 1 and 0. If the purpose wasvector rotation then the starting vector becomes variable and therefore morecomplex. It would therefore be interesting to see whether it would be possibleto find other ways to solve the problem with the scaling factor in the first twoiterations.

One thing that is fascinating with the algorithm is the high flexibility with littlearea overhead. The architecture can be optimized even further if one extra cyclewas available. With an extra clock cycle the initialization block can be simplifiedand the last-iteration block can be completely removed. With these modifica-tions and extensions to the linear and the hyperbolic coordinate systems, theradix-8 algorithm would be extremely useful in multiprocessor systems, sinceit would make it possible to implement the algorithm with even less area andfeasible to insert more than one unit.

6.3 The system in perspective

The thesis has proven that a large portfolio of elementary functions can beimplemented with little area and power. After all this work, one big question isstill unanswered, and that is what kind of applications in a digital hearing aidmight benefit from this work. This section will present three potential examplesof such applications.

The first generation of hearing aids that in some way communicates with theenvironment, has been developed. One such hearing aid is Epoq from Oticon3.The hearing aid is part of a system that includes a streamer that has a built-in Bluetooth device. This allows the user to listen to music from for instancean iPod via the streamer and through the hearing aid. This is of course onlythe beginning and there is no reason to think that the development will endhere. We are already in the wireless age where every possible device shouldbe able to communicate with the environment with technologies such as Blue-tooth and wireless internet connection. This does of course mean that manyextra possibilities arise for the hearing aid manufactures. In terms of wirelessinternet connections, the authors in [24] have successfully implemented a lowpower CORDIC processor that can evaluate advanced wireless communicationalgorithms that can increase the bandwidth.

2Recall that for i > 2 compensation for the scaling factor is solved by 1− σ2i 8−2i.

3http://www.oticon.dk/dk da/OurProducts/ConsumerProducts/Epoq/Overview/index.htm.

6.3 The system in perspective 85

Another application could be speech recognition [20]. It would be cool if thehearing aid would be capable of recognizing speech and be able to executecommands based on the voice from the user. That would for example allow ahigher level of control for the user of the hearing aid. A system with speechrecognition is presented in [10] where they use a CORDIC module to evaluatethe logarithm.

In addition, speech synthesis could be a possible application. A speech synthe-sizer also known as text-to-speech engine (TTS) [6], can read normal languagetext and convert the text into understandable speech. This means that thetechnique can be described as the artificial product of the human speech. Thiscould for instance be used by hearing impaired children with reading disabilitiesto help them interpret the words they are reading.

Whether the described applications are feasible within the next generation ofhearing aids, cannot be answered by the work based on this thesis alone. How-ever, one thing is certain and that is the next generation of hearing aids wouldsupport functionalities that we previously would not even dream about.

86 Future work

Chapter 7

Conclusion

The CORDIC algorithm is an incredibly flexible algorithm, that can evaluate alarge number of functions such as trigonometric, hyperbolic, exponential, loga-rithmic, division and multiplication. The algorithm is based on vector rotationand can be implemented in an elegant way with only shift and add operations.What is particularly important is that multiplication is not required in con-trast to, for instance function approximation that use a full multiplier and largetables[22]. However, as many things in life it does have some drawbacks. Thealgorithm only evaluates one bit pr. iteration and therefore, most of the workdone so far has focused on ways to decrease the latency and make it moresuitable for high-speed systems. Little research has focused on the low poweraspect of the algorithm and at this point, this is where this thesis stands out.The thesis contains analysis of the low power capabilities of the CORDIC[30],D-CORDIC[28], radix-4/8 D-CORDIC and the scaling free[17] algorithms.

In addition, the work on this thesis has lead to the presentation of a new algo-rithm, which can be considered as a hybrid between the D-CORDIC algorithmand the radix-4 CORDIC algorithm. The new algorithm combines the bestfrom the two existing algorithms and this has resulted in a radix-8 D-CORDICalgorithm that among other things have the following properties.

• The number of iterations is reduced by 64% in comparison with the CORDICalgorithm. This means that for a 24-bit result, only 9 iterations is required.

88 Conclusion

• One of the major problems with the CORDIC algorithm is the scalingfactor. In the radix-8 D-CORDIC algorithm the variable scaling factor iseasy to calculate. For the first 3 iterations the scaling factor is solved byinitialization, and from i > 2 the compensation for the scaling factor canbe evaluated within the normal rotations. This is even possible withoutadding extra logic.

• The algorithm can operate in both rotation and vectoring mode. In addi-tion, the algorithm can be extended to the hyperbolic and linear coordi-nate systems. This gives an extremely flexible algorithm that can evaluatetrigonometric, hyperbolic, multiplication, division, exponential and loga-rithmic functions, just like the original CORDIC algorithm.

The presented radix-8 algorithm has been implemented in VHDL and com-pared with 3 other versions of the CORDIC algorithm. In comparison with theCORDIC algorithm, the radix-8 algorithm reduces the power consumption by22%. The processor can evaluate a 24-bit result in 6 clock cycles and operatein both rotation and vectoring mode.

The overhead concerning the functions acos and asin has been proven to beimmense. The D-CORDIC algorithm was used to implement these functions andit shows that the price is an overhead of 1.87 compared to the radix-8 algorithmin respect to power. This means that for instance evaluation of cos and sinconsumes 87% more power if the D-CORDIC algorithm is chosen. Unfortunatelyit was not possible to extend the presented radix-8 algorithm to support acos andasin. The reason is that for a higher radix implementation both the directionand the magnitude of the rotation should be selected from a selection function.This selection function would end up being too complex as it would be a functionof the iteration number and the location on the unit circle.

Low power optimization of the radix-8 implementation yields a further reduc-tion by 14%. This was achieved with techniques such as Gray coding, clockgating and toggle prevention. One of the surprising results was that changingthe binary coding with a Gray coding instead reduced the total power consump-tion with 5%. The final VHDL implementation shows a circuit with a powerconsumption of 0.879mW in rotation mode and 0.855mW in vectoring mode.The processor runs at a clock frequency of 50 MHz with a 1.0 V 90 nm library.The requirements in terms of area is 24, 800µm2, which is equivalent to a designof 5,650 gates. In the numerical processor, 97% of the inserted cells are HVT.

The work with this thesis has shown that no matter what the target applicationis within the next generation of hearing aids, the radix-8 D-CORDIC algorithmcan provide a large portfolio of the functions that are needed.

Appendix A

Symbols and Notations

Figure A.1: Unit circle

90 Symbols and Notations

i Iteration numbern Total number of iterationsiupper Upper limit for iilower Lower limit if iv0 Starting vectorvf Final vector[xi yi] Vector coordinatesθ Total rotation angleαi Micro-rotationσi Rotation directionµ Lookup table valueKi Scaling factor in iteration iK Total scaling factort Input vector for evaluation of acosW Internal word lengthb External word length (precision)

Appendix B

Radix-8 D-CORDIC

The purpose with this appendix is to summaries the presented radix-8 D-CORDIC algorithm. The algorithm works with circular, hyperbolic and linearcoordinate systems. For rotation mode the selection function depends on thez variable and in vectoring the y variable is used. To clarify this the variablesused for the selection functions have been renamed to w. The rotation directionand magnitude is controlled by σ, which for the radix-8 algorithm can have thefollowing values σi = {−4,−3,−2,−1, 0, 1, 2, 3, 4}.

Circular coordinate system:

xi+1 = xi(1− σ2i 8−2i)− 2σiyi8−i xi+1 = xi(1− σ2

i 8−2i)− 2σiwi8−2i

yi+1 = yi(1− σ2i 8−2i) + 2σixi8−i wi+1 = 8(wi(1− σ2

i 8−2i) + 2σixi)wi+1 = 8(wi − 2 · 8iatan(σi8i)) zi+1 = zi − 2atan(σi8−i)

(B.1)Scaling factor

K =∞∏i=0

11 + σ2

i 2−2i(B.2)

92 Radix-8 D-CORDIC

Linear coordinate system:

xi+1 = x0 xi+1 = x0

yi+1 = yi + 2σixi8−i wi+1 = 8(wi + 2σixi)wi+1 = 8(wi − 2σi) zi+1 = zi − 2σi8−i

(B.3)

In the linear coordinate system the algorithm is not affected by a scaling factor.

Hyperbolic coordinate system:

xi+1 = xi(1 + σ2i 8−2i) + 2σiyi8−i xi+1 = xi(1 + σ2

i 8−2i) + 2σiwi8−2i

yi+1 = yi(1 + σ2i 8−2i) + 2σixi8−i wi+1 = 8(wi(1 + σ2

i 8−2i) + 2σixi)wi+1 = 8(wi − 2 · 8iatanh(σi8i)) zi+1 = zi − 2atanh(σi8−i)

(B.4)Scaling factor

K =∞∏i=0

11− σ2

i 2−2i(B.5)

Selection function: Selection intervals for rotation mode is shown in tableB.1. For vectoring mode the selection intervals are listed below, recall that foriteration i > 2 the intervals are constant, and does not need to be re-evaluated.

σi =

+4 7xi < wi+3 5xi < wi ≤ 5xi+2 3xi < wi ≤ 5xi+1 xi < wi ≤ 3xi

0 −xi < wi ≤ xi−1 −3xi < wi ≤ −xi−2 −5xi < wi ≤ −3xi−3 −7xi < wi ≤ −5xi−4 wi ≤ −7xi

(B.6)

93

σi L[σ] U [σ] Interval4 6.67 7 ≤wi3 4.67 7.3 5 ≤wi< 72 2.67 5.3 3 ≤wi< 51 0.67 3.3 1 ≤wi< 30 −1.3 1.3 −1 ≤wi< 1-1 −3.3 −0.67 −3 ≤wi< −1-2 −5.3 −2.67 −5 ≤wi< −3-3 −7.3 −4.67 −7 ≤wi< −5-4 −6.67 wi< −7

Table B.1: Selection interval for σi in rotation mode

94 Radix-8 D-CORDIC

Appendix C

Matlab code

This appendix contains the Matlab programs used in the thesis. The first listof files are related to the CORDIC algorithms and chapter 2 and 6. The testpattern generators are listed at the end. The appendix includes the followingfiles

CORDIC cordic rot.m - cordic vec.m

D-CORDIC dcordic rot.m - dcordic vec.m - dcordic inv rot.m

Radix-4 radix4 rot.m - radix4 vec.m

Radix-8 radix8 rot.m - radix8 vec.m - radix8 linear rot.m - radix8 linear vec.m- radix8 hyper rot.m - radix8 hyper vec.m

Scaling free sf rot.m

Test generators exhaustive rot.m - exhaustive vec.m - exhaustive inv rot.m

cordic rot.m

96 Matlab code

function [ ] = c o r d i c r o t ( )% Implementation o f the b a s i c CORDIC a l gor i th m f o r% r o t a t i o n mode . S c a l i n g compensation i s ach ived by% i n i t i a l i z i n g x (1) = 1/K. I n t e r n a l b i t l e n g t h i s% c o n t r o l l e d by the ” pre ” v a r i a b l e .%% @ Author : Anders Torp ( Master t h e s i s )% @ Date : 19 dec 07 ( Fina l v e r s i o n )

i n c r e a s e = 2ˆ−16;pre = 29 ;maxerror = 0 ;

% e v a l u a t i o n cons tant s c a l i n g f a c t o rs c a l i n g = 1 ;for n = 0:24

s c a l i n g = s c a l i n g ∗((1+2ˆ(−2∗n) ) ˆ(1/2) ) ;end

% f o r loop beg infor angle = 0 : i n c r e a s e : pi/2x = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;y = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;z = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;z (1 ) = angle ;j = 1 ;x (1 ) = 1/(2ˆ−pre ∗ f ix ( ( s c a l i n g ) ∗2ˆ pre ) ) ;

% CORDIC i t e r a t i o n sfor i = 0 :24

i f z ( j ) < 0d = −1;

elsed = 1 ;

end

x ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j )−d∗y ( j ) ∗2ˆ(− i ) ) ∗2ˆ pre ) ;y ( j +1) = 2ˆ−pre ∗ f ix ( ( y ( j )+d∗x ( j ) ∗2ˆ(− i ) ) ∗2ˆ pre ) ;z ( j +1) = 2ˆ−pre ∗ f ix ( ( z ( j )−d∗atan(2ˆ(− i ) ) ) ∗2ˆ pre ) ;

j = j +1;

end

% check r e s u l ti f cos ( angle ) < x ( j )

97

c o s d i f = x ( j ) − cos ( angle ) ;else

c o s d i f = cos ( angle ) − x ( j ) ;end

i f sin ( angle ) < y ( j )s i n d i f = y ( j ) − sin ( angle ) ;

elses i n d i f = sin ( angle ) − y ( j ) ;

end

% l a r g e s t e r r ori f c o s d i f > maxerror

maxerror = c o s d i f ;e l s e i f s i n d i f > maxerror

maxerror = s i n d i f ;end

end % f o r loop end

fpr intf ( ’ ∗∗∗∗∗∗∗∗∗∗∗∗ Resu l t s ∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’ ) ;maxerror

cordic vect.m

function [ ] = c o r d i c v e c ( )% Implementation o f the b a s i c CORDIC a l gor i th m f o r% v e c t o r i n g mode . No s c a l i n g i s needed .% I n t e r n a l b i t l e n g t h i s c o n t r o l l e d by the ” pre ” v a r i a b l e .%% @ Author : Anders Torp ( Master t h e s i s )% @ Date : 19 dec 07 ( Fina l v e r s i o n )


% f o r loop beg infor t = 0 : i n c r e a s e : 1x = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;y = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;z = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;y (1 ) = t ;j = 1 ;x (1 ) = 1 ;

98 Matlab code

% CORDIC i t e r a t i o n sfor i = 0 :24

i f y ( j ) < 0d = 1 ;

elsed = −1;

end

x ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j )−d∗y ( j ) ∗2ˆ(− i ) ) ∗2ˆ pre ) ;y ( j +1) = 2ˆ−pre ∗ f ix ( ( y ( j )+d∗x ( j ) ∗2ˆ(− i ) ) ∗2ˆ pre ) ;z ( j +1) = 2ˆ−pre ∗ f ix ( ( z ( j )−d∗atan(2ˆ(− i ) ) ) ∗2ˆ pre ) ;

j = j +1;

end

% check r e s u l ti f atan ( t ) < z ( j )

a t a n d i f = z ( j )−atan ( t ) ;else

a t a n d i f =atan ( t )− z ( j ) ;end

% Larges t e r rori f a t a n d i f > maxerror

maxerror = a t a n d i f ;end


fpr intf ( ’ ∗∗∗∗∗∗∗∗∗∗∗∗ Resu l t s ∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’ ) ;maxerror

dcordic rot.m

function [ ] = d c o r d i c r o t ( )% Implementation o f the doub le CORDIC a lgo r i t hm f o r% r o t a t i o n mode . S c a l i n g compensation i s ach ived by% i n i t i a l i z i n g x (1) = 1/K. I n t e r n a l b i t l e n g t h i s% c o n t r o l l e d by the ” pre ” v a r i a b l e .%% @ Author : Anders Torp − s021884@student . dtu . dk% @ Date : 19 dec 2007 ( Fina l v e r s i o n )

i n c r e a s e = 2ˆ−16;

99

pre = 29 ;maxerror = 0 ;

% e v a l u a t i o n cons tant s c a l i n g f a c t o rs c a l i n g = 1 ;for n = 0:25

s c a l i n g = s c a l i n g ∗(1+2ˆ(−2∗n) ) ;end

% f o r loop beg infor angle = 0 : i n c r e a s e : pi/4x = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;y = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;z = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;j = 1 ;z (1 ) = angle ;x (1 ) = 2ˆ−pre ∗ f ix ( (1/ s c a l i n g ) ∗2ˆ pre ) ;

% DCORDIC i t e r a t i o n sfor i = 0 :25

i f z ( j ) < 0d = 1 ;

elsed = −1;

endx ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j ) ∗(1−2ˆ(−2∗ i ) )+d∗y ( j ) ∗(2ˆ(− i +1) ) )

∗2ˆ pre ) ;y ( j +1) = 2ˆ−pre ∗ f ix ( ( y ( j ) ∗(1−2ˆ(−2∗ i ) )−d∗x ( j ) ∗(2ˆ(− i +1) ) )

∗2ˆ pre ) ;z ( j +1) = 2ˆ−pre ∗ f ix ( ( z ( j ) + d∗2∗atan(2ˆ(− i ) ) ) ∗2ˆ pre ) ;

j = j +1;end






end

100 Matlab code

%l a r g e s t e r rori f c o s d i f > maxerror



end % f o r loop endfpr intf ( ’ ∗∗∗∗∗∗∗∗∗∗∗∗ Resu l t s ∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’ ) ;maxerror

dcordic vec.m

function [ ] = dco rd i c v e c ( )% Implementation o f the doub le CORDIC a lgo r i t hm f o r% v e c t o r i n g mode . I n t e r n a l b i t l e n g t h i s% c o n t r o l l e d by the ” pre ” v a r i a b l e .%% @ Author : Anders Torp − s021884@student . dtu . dk% @ Date : 19 dec 2007 ( Fina l v e r s i o n )


% f o r loop beg infor t = 0 : i n c r e a s e : 1x = [ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;y = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;z = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;j = 1 ;y (1 ) = t ;x (1 ) = 1 ;


i f y ( j ) < 0d = −1;

elsed = 1 ;

endx ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j ) ∗(1−2ˆ(−2∗ i ) )+d∗y ( j ) ∗(2ˆ(− i +1) ) )

∗2ˆ pre ) ;y ( j +1) = 2ˆ−pre ∗ f ix ( ( y ( j ) ∗(1−2ˆ(−2∗ i ) )−d∗x ( j ) ∗(2ˆ(− i +1) ) )

∗2ˆ pre ) ;

101

z ( j +1) = 2ˆ−pre ∗ f ix ( ( z ( j ) + d∗2∗atan(2ˆ(− i ) ) ) ∗2ˆ pre ) ;

j = j +1;end

% check r e s u l ti f atan ( t ) < z ( j )

a t a n d i f = z ( j )−atan ( t ) ;else

a t a n d i f =atan ( t )− z ( j ) ;end

% l a r g e s t e r r ori f a t a n d i f > maxerror

maxerror = a t a n d i f ;end

end % f o r loop endfpr intf ( ’ ∗∗∗∗∗∗∗∗∗∗∗∗ Result ∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’ ) ;maxerror

dcordic inv rot.m

function [ ] = d c o r d i c i n v r o t ( )% Implementation o f the doub le CORDIC a lgo r i t hm f o r% i n v e r t r o t a t i o n mode . I n t e r n a l b i t l e n g t h i s% c o n t r o l l e d by the ” pre ” v a r i a b l e .%% @ Author : Anders Torp − s021884@student . dtu . dk% @ Date : 30 dec 2007 ( f i n a l v e r s i o n )

i n c r e a s e = 2ˆ−16;pre = 31 ;

maxerror = 0 ;

% f o r loop beg infor t t = 0 : i n c r e a s e : 1x = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;y = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;z = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;t = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;j = 1 ;x (1 ) = 1 ;t (1 ) = t t ;

102 Matlab code


i f x ( j ) >= t ( j )i f y ( j ) < 0

d = 1 ;else

d = 1 ;end

elsei f y ( j ) < 0

d = 1 ;else

d = −1;end

end

x ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j ) ∗(1−2ˆ(−2∗ i ) )−d∗y ( j ) ∗(2ˆ(− i +1) ) )∗2ˆ pre ) ;

y ( j +1) = 2ˆ−pre ∗ f ix ( ( y ( j ) ∗(1−2ˆ(−2∗ i ) )+d∗x ( j ) ∗(2ˆ(− i +1) ) )∗2ˆ pre ) ;

z ( j +1) = 2ˆ−pre ∗ f ix ( ( z ( j )+d∗2∗atan(2ˆ(− i ) ) ) ∗2ˆ pre ) ;t ( j +1) = 2ˆ−pre ∗ f ix ( ( t ( j )+t ( j ) ∗2ˆ(−2∗ i ) ) ∗2ˆ pre ) ;

j = j +1;end

% check r e s u l ti f acos ( t t ) < z ( j )

a c o s d i f = z ( j )−acos ( t t ) ;else

a c o s d i f =acos ( t t )− z ( j ) ;end

% l a r g e s t e r r ori f a c o s d i f > maxerror

maxerror = a c o s d i f ;end

end % f o r loop endfpr intf ( ’ ∗∗∗∗∗∗∗∗∗∗∗∗ Result ∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’ ) ;maxerror

radix4 rot.m

function [ ] = r a d i x 4 r o t ( )% Implementation o f the rad ix4 DCORDIC a lg or i t hm f o r

103

% r o t a t i o n mode . I n t e r n a l b i t l e n g t h i s% c o n t r o l l e d by the ” pre ” v a r i a b l e .%% @ Author : Anders Torp − s021884@student . dtu . dk% @ Date : 9 jan 2008 ( f i n a l v e r s i o n )

i n c r e a s e = 2ˆ−16;pre = 29 ;maxerror = 0 ;i t e r = 0 ; % a c t i v e i t e r a t i o n s pr computationt o t a l i t e r = 0 ; % t o t a l a c t i v e i t e r a t i o n snumberTest = 0 ; % # t e s t sh = 3 ; % s e l e c t i o n i n v e r v a ll = 1 ; % s e l e c t i o n i n t e r v a l

% f o r loop beg infor angle = 0 : i n c r e a s e : pi/4x = [ 1 0 0 0 0 0 0 0 0 0 0 0 0 ] ;y = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;w = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;numberTest = numberTest + 1 ;i t e r = 0 ;j = 1 ;w(1) = angle ;

% F i r s t r o t a t i o n w i l l never be 2i f w( j ) >= l

d = 1 ;s c a l i n g = . 5 ;w( j ) = 2ˆ−pre ∗ f ix ( (w( j )−2∗atan (1 ) ) ∗2ˆ pre ) ;

elsed = 0 ;s c a l i n g = 1 ;

end

w( j ) = 4∗(w( j ) ) ;i f w( j )>= h

d = 2 ;e l s e i f w( j )>=l && w( j ) <h

d = 1 ;e l s e i f w( j )>− l && w( j ) < l

d = 0 ;e l s e i f w( j )>−h && w( j ) <=−l

d = −1;else

d = −2;end

104 Matlab code

s c a l i n g = 2ˆ−pre ∗ f ix ( ( s c a l i n g ∗1/(1+dˆ2∗4ˆ−(2∗1) ) ) ∗2ˆ pre ) ;w( j ) = 2ˆ−pre ∗ f ix ( ( 4∗ (w( j ) −2∗(4ˆ1)∗atan (d∗4ˆ(−1) ) ) ) ∗2ˆ pre ) ;i f w( j )>= h

d = 2 ;e l s e i f w( j )>=l && w( j )<h

d = 1 ;e l s e i f w( j )>− l && w( j )< l

d = 0 ;e l s e i f w( j )>−h && w( j )<=−l

d = −1;else

d = −2;end

s c a l i n g = 2ˆ−pre ∗ f ix ( ( s c a l i n g ∗1/(1+dˆ2∗4ˆ−(2∗2) ) ) ∗2ˆ pre ) ;w( j ) = 2ˆ−pre ∗ f ix ( ( 4∗ (w( j ) −2∗(4ˆ2)∗atan (d∗4ˆ(−2) ) ) ) ∗2ˆ pre ) ;i f w( j ) >= h

d = 2 ;e l s e i f w( j ) >=l && w( j )<h

d = 1 ;e l s e i f w( j ) >− l && w( j )< l

d = 0 ;e l s e i f w( j ) >−h && w( j )<=−l

d = −1;else

d = −2;end

x (1 ) = 2ˆ−pre ∗ f ix ( ( s c a l i n g ∗1/(1+dˆ2∗4ˆ−(2∗3) ) ) ∗2ˆ pre ) ;w( j ) = 4∗angle ;

% Radix4 D−CORDIC i t e r a t i o n sfor i = 1 :12

% s e l e c t i o n f u n c t i o ni f w( j ) >= h

d = 2 ;i t e r = i t e r +1;

e l s e i f w( j ) >= l && w( j ) < hd = 1 ;i t e r = i t e r +1;

e l s e i f w( j ) > − l && w( j ) < ld = 0 ;

e l s e i f w( j ) > −h && w( j ) <=−ld = −1;i t e r = i t e r +1;

105

elsed = −2;i t e r = i t e r +1;

end

x ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j )∗(1−d∗d∗4ˆ(−2∗ i ) )−d∗y ( j ) ∗2∗(4ˆ(− i) ) ) ∗2ˆ pre ) ;

y ( j +1) = 2ˆ−pre ∗ f ix ( ( y ( j )∗(1−d∗d∗4ˆ(−2∗ i ) )+d∗x ( j ) ∗2∗(4ˆ(− i) ) ) ∗2ˆ pre ) ;

w( j +1) = 2ˆ−pre ∗ f ix ( ( 4∗ (w( j ) −2∗(4ˆ i ) ∗atan (d∗4ˆ(− i ) ) ) ) ∗2ˆpre ) ;

% S c a l i n g compensation f o r i = 4 , 5 , 6i f d ˜= 0 & i > 3 & i < 7

x ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j +1)∗(1−dˆ2∗4ˆ−(2∗ i ) ) ) ∗2ˆ pre ) ;y ( j +1) = 2ˆ−pre ∗ f ix ( ( y ( j +1)∗(1−dˆ2∗4ˆ−(2∗ i ) ) ) ∗2ˆ pre ) ;

end

j = j +1;end





maxerror = c o s d i f ;end

% t o t a l # i t e r a t i o n st o t a l i t e r = t o t a l i t e r + i t e r ;

end % f o r loop endfpr intf ( ’ ∗∗∗∗∗∗∗∗∗∗∗∗ Resu l t s ∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’ ) ;maxerrora v e r a g e i t e r a t i o n s = ( t o t a l i t e r ) /numberTest

radix4 vec.m

function [ ] = rad ix4 vec ( )% Implementation o f the rad ix4 DCORDIC a lg or i t hm f o r% v e c t o r i n g mode . I n t e r n a l b i t l e n g t h i s

106 Matlab code

% c o n t r o l l e d by the ” pre ” v a r i a b l e . The v a r i b l e pre2% c o n t r o l s the b i t l e n g t h f o r the s e l e c t i o n f u c t i o n%% @ Author : Anders Torp − s021884@student . dtu . dk% @ Date : 7 f e b 2008 ( f i n a l v e r s i o n )

i n c r e a s e = 2ˆ−16;pre = 29 ;pre2 = 4 ;maxerror = 0 ;i t e r = 0 ;t o t a l i t e r = 0 ;numberTest = 0 ;

for angle = 0 : i n c r e a s e : 1x = [ 1 0 0 0 0 0 0 0 0 0 0 0 0 ] ;w = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;z = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;j = 1 ;numberTest = numberTest + 1 ;i t e r = 0 ;w(1) = angle ;

% Radix 4 D−CORDIC i t e r a t i o n sfor i = 0 :12

% s e l e c t i o n f u n c t i o ni f 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) >= 2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2 ) ∗3

d = −2;i t e r = i t e r +1;

e l s e i f 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) > 2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2 )&& 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) < 2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2

) ∗3d = −1;i t e r = i t e r +1;

e l s e i f 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) > −2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2) && 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) <= 2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆpre2 )d = 0 ;

e l s e i f 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) > −2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2) ∗3 && 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) <=−2ˆ−pre2 ∗ f ix ( x ( j )∗2ˆ pre2 )d = 1 ;i t e r = i t e r +1;

elsed = 2 ;i t e r = i t e r +1;

107

end

x ( j +1) = x ( j )∗(1−d∗d∗4ˆ(−2∗ i ) )−d∗w( j ) ∗2∗(4ˆ(−2∗ i ) ) ;w( j +1) = 4∗(w( j )∗(1−d∗d∗4ˆ(−2∗ i ) )+d∗x ( j ) ∗2) ;z ( j +1) = z ( j ) − 2∗atan (d∗4ˆ(− i ) ) ;

j = j +1;end

% check r e s u l ti f atan ( angle ) < z ( j )

d i f = z ( j ) − atan ( angle ) ;else

d i f = atan ( angle ) − z ( j ) ;end

% l a r g e s t e r r ori f d i f > maxerror

maxerror = d i f ;end



radix8 rot.m

function [ ] = r a d i x 8 r o t ( )% Implementation o f the rad ix8 DCORDIC a lg or i t hm f o r% r o t a t i o n mode . I n t e r n a l b i t l e n g t h i s% c o n t r o l l e d by the ” pre ” v a r i a b l e .%% @ Author : Anders Torp − s021884@student . dtu . dk% @ Date : 7 f e b 2008 ( f i n a l v e r s i o n )

i n c r e a s e = 2ˆ−16;pre = 29 ;maxerror = 0 ;i t e r = 0 ; % a c t i v e i t e r a t i o n s pr computationt o t a l i t e r = 0 ; % t o t a l a c t i v e i t e r a t i o n snumberTest = 0 ; % # t e s t s

108 Matlab code

a = 1 ; % s e l e c t i o n i n v e r v a lb = 3 ; % s e l e c t i o n i n v e r v a lc = 5 ; % s e l e c t i o n i n v e r v a le = 7 ; % s e l e c t i o n i n v e r v a l

% f o r loop beg infor angle = 0 : i n c r e a s e : pi/4numberTest = numberTest +1;x = [ 1 0 0 0 0 0 0 0 0 0 0 0 0 ] ;y = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;w = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;j = 1 ;i t e r = 0 ;s c a l i n g = 1 ;

% f i r s t i t e r a t i o n i s a lways zerow(1) = 8∗angle ;

i f w( j ) >= cd = 3 ;

e l s e i f w( j ) >= b && w( j ) < cd = 2 ;

e l s e i f w( j ) >= a && w( j ) < bd = 1 ;

elsed = 0 ;

end

s c a l i n g = 2ˆ−pre ∗ f ix ( ( s c a l i n g ∗1/(1+dˆ2∗8ˆ−(2∗1) ) ) ∗2ˆ pre ) ;w( j ) = 2ˆ−pre ∗ f ix ( ( 8∗ (w( j ) −2∗(8ˆ1)∗atan (d∗8ˆ(−1) ) ) ) ∗2ˆ pre ) ;

i f w( j ) >= ed = 4 ;

e l s e i f w( j ) >= c && w( j ) < ed = 3 ;

e l s e i f w( j ) >= b && w( j ) < cd = 2 ;

e l s e i f w( j ) >= a && w( j ) < bd = 1 ;

e l s e i f w( j ) > −a && w( j ) < ad = 0 ;

e l s e i f w( j ) > −b && w( j ) <=−ad = −1;

e l s e i f w( j ) > −c && w( j ) <=−bd = −2;

e l s e i f w( j ) > −e && w( j ) <=−cd = −3;

109

elsed = −4;

end

x (1 ) = 2ˆ−pre ∗ f ix ( ( s c a l i n g ∗1/(1+dˆ2∗8ˆ−(2∗2) ) ) ∗2ˆ pre ) ;w(1) = 1∗angle ;


i f w( j ) >= ed = 4 ;i t e r = i t e r +1;

e l s e i f w( j ) >= c && w( j ) < ed = 3 ;i t e r = i t e r +1;

e l s e i f w( j ) >= b && w( j ) < cd = 2 ;i t e r = i t e r +1;

e l s e i f w( j ) >= a && w( j ) < bd = 1 ;i t e r = i t e r +1;

e l s e i f w( j ) > −a && w( j ) < ad = 0 ;

e l s e i f w( j ) > −b && w( j ) <=−ad = −1;i t e r = i t e r +1;

e l s e i f w( j ) > −c && w( j ) <=−bd = −2;i t e r = i t e r +1;

e l s e i f w( j ) > −e && w( j ) <=−cd = −3;i t e r = i t e r +1;

elsed = −4;i t e r = i t e r +1;

end

i f i < 8x ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j )∗(1−d∗d∗2ˆ(−6∗ i ) )−d∗y ( j )

∗2∗(2ˆ(−3∗ i ) ) ) ∗2ˆ pre ) ;y ( j +1) = 2ˆ−pre ∗ f ix ( ( y ( j )∗(1−d∗d∗2ˆ(−6∗ i ) )+d∗x ( j )

∗2∗(2ˆ(−3∗ i ) ) ) ∗2ˆ pre ) ;w( j +1) = 2ˆ−pre ∗ f ix ( ( 8∗ (w( j ) −2∗(2ˆ(3∗ i ) ) ∗atan (d∗2ˆ(−3∗

i ) ) ) ) ∗2ˆ pre ) ;

i f i > 2 & i < 5

110 Matlab code

x ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j +1)∗(1−dˆ2∗2ˆ−(6∗ i ) ) ) ∗2ˆpre ) ;

y ( j +1) = 2ˆ−pre ∗ f ix ( ( y ( j +1)∗(1−dˆ2∗2ˆ−(6∗ i ) ) ) ∗2ˆpre ) ;

endelse

x ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j )−d∗y ( j ) ∗(2ˆ(−23) ) ) ∗2ˆ pre ) ;y ( j +1) = 2ˆ−pre ∗ f ix ( ( y ( j )+d∗x ( j ) ∗(2ˆ(−23) ) ) ∗2ˆ pre ) ;

end

j = j +1;end





maxerror = c o s d i f ;end



radix8 vec.m

function [ ] = rad ix8 vec ( )% Implementation o f the rad ix8 DCORDIC a lg or i t hm f o r% v e c t o r i n g mode . I n t e r n a l b i t l e n g t h i s% c o n t r o l l e d by the ” pre ” v a r i a b l e .%% @ Author : Anders Torp − s021884@student . dtu . dk% @ Date : 7 f e b 2008 ( f i n a l v e r s i o n )

i n c r e a s e = 2ˆ−16;pre = 29 ;pre2 = 6 ;

111

maxerror = 0 ;i t e r = 0 ; % a c t i v e i t e r a t i o n s pr computationt o t a l i t e r = 0 ; % t o t a l a c t i v e i t e r a t i o n snumberTest = 0 ; % # t e s t s


% f o r loop beg infor angle = 0 : i n c r e a s e : 1x = [ 1 0 0 0 0 0 0 0 0 0 0 0 0 ] ;w = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;z = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;j = 1 ;numberTest = numberTest + 1 ;i t e r = 0 ;w(1) = 8∗angle ;


i f i == 1i f angle >= 0.687500000000000

d = −3;e l s e i f angle >= 0.375

d = −2;e l s e i f angle >= 0.125

d = −1;else

d = 0 ;end

else

i f 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) >= 2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2 ) ∗ed = −4;i t e r = i t e r +1;

e l s e i f 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) >= 2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2) ∗c && 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) < 2ˆ−pre2 ∗ f ix ( x ( j )∗2ˆ pre2 ) ∗ed = −3;i t e r = i t e r +1;

e l s e i f 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) >= 2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2) ∗b && 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) < 2ˆ−pre2 ∗ f ix ( x ( j )∗2ˆ pre2 ) ∗cd = −2;

112 Matlab code

i t e r = i t e r +1;e l s e i f 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) >= 2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2

) ∗a && 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) < 2ˆ−pre2 ∗ f ix ( x ( j )∗2ˆ pre2 ) ∗bd = −1;i t e r = i t e r +1;

e l s e i f 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) > −2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2) ∗a && 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) < 2ˆ−pre2 ∗ f ix ( x ( j )∗2ˆ pre2 ) ∗ad = 0 ;

e l s e i f 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) > −2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2) ∗b && 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) <= −2ˆ−pre2 ∗ f ix ( x (j ) ∗2ˆ pre2 ) ∗ad = 1 ;i t e r = i t e r +1;

e l s e i f 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) > −2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2) ∗c && 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) <= −2ˆ−pre2 ∗ f ix ( x (j ) ∗2ˆ pre2 ) ∗bd = 2 ;i t e r = i t e r +1;

e l s e i f 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) > −2ˆ−pre2 ∗ f ix ( x ( j ) ∗2ˆ pre2) ∗e && 2ˆ−pre2 ∗ f ix (w( j ) ∗2ˆ pre2 ) <= −2ˆ−pre2 ∗ f ix ( x (j ) ∗2ˆ pre2 ) ∗cd = 3 ;i t e r = i t e r +1;

elsed = 4 ;i t e r = i t e r +1;

end

enddd( j ) = d ;

i f i < 8x ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j )∗(1−d∗d∗2ˆ(−6∗ i ) )−d∗w( j )

∗2∗(2ˆ(−6∗ i ) ) ) ∗2ˆ pre ) ;w( j +1) = 2ˆ−pre ∗ f ix ( ( 8∗ (w( j )∗(1−d∗d∗2ˆ(−6∗ i ) )+d∗x ( j )

∗2) ) ∗2ˆ pre ) ;z ( j +1) = 2ˆ−pre ∗ f ix ( ( z ( j ) − 2∗atan (d∗8ˆ(− i ) ) ) ∗2ˆ pre ) ;

elsez ( j +1) = 2ˆ−pre ∗ f ix ( ( z ( j ) − 2∗atan (d∗8ˆ(− i ) ) ) ∗2ˆ pre ) ;

end

j = j +1;end

113

% check r e s u l ti f atan ( angle ) < z ( j )

d i f = z ( j ) − atan ( angle ) ;else

d i f = atan ( angle ) − z ( j ) ;end

i f d i f > maxerrormaxerror = d i f ;

end



sf rot.m

function [ ] = s f r o t ( )% Implementation o f the s c a l i n g f r e e DCORDIC a l gor i thm% f o r r o t a t i o n mode . I n t e r n a l b i t l e n g t h i s% c o n t r o l l e d by the ” pre ” v a r i a b l e .%% @ Author : Anders Torp − s021884@student . dtu . dk% @ Date : 7 f e b 2008 ( f i n a l v e r s i o n )

i n c r e a s e = 2ˆ−16;pre = 29 ;maxerror = 0 ;i t e r = 0 ; % a c t i v e i t e r a t i o n s pr computationt o t a l i t e r = 0 ; % t o t a l a c t i v e i t e r a t i o n snumberTest = 0 ; % # t e s t sk = 8 ; % lower l i m i t o f i

% f o r loop beg infor angle = 0 : i n c r e a s e : pi/4numberTest = numberTest +1;i t e r = 0 ;x = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;y = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;z = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;i = k ;

114 Matlab code

j = 1 ;z (1 ) = angle ;y = 0.781250000000000 ;

% compute p a r t i o nfor t = 0 :32i f angle > y

z (1 ) = 2ˆ−pre ∗ f ix ( ( angle − y ) ∗2ˆ pre ) ;x (1 ) = 2ˆ−pre ∗ f ix ( cos ( y ) ∗2ˆ pre ) ;y (1 ) = 2ˆ−pre ∗ f ix ( sin ( y ) ∗2ˆ pre ) ;break

endy = y − 2ˆ−5;

end

% s f i t e r a t i o n s i = 8while z (1 ) > 2ˆ(− i )

x ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j ) ∗(1−2ˆ−(2∗ i +1) )−y ( j )∗2ˆ− i ) ∗2ˆ pre) ;

y ( j +1) = 2ˆ−pre ∗ f ix ( ( y ( j ) ∗(1−2ˆ−(2∗ i +1) )+x ( j )∗2ˆ− i ) ∗2ˆ pre) ;

z ( j +1) = 2ˆ−pre ∗ f ix ( ( z ( j ) − 2ˆ− i ) ∗2ˆ pre ) ;i t e r = i t e r + 1 ;x ( j ) = x ( j +1) ;y ( j ) = y ( j +1) ;z ( j ) = z ( j +1) ;

end

% s f i t e r a t i o n s i > 8for i = k+1:25

i f z ( j ) >= 2ˆ(− i )i t e r = i t e r +1;x ( j +1) = 2ˆ−pre ∗ f ix ( ( x ( j ) ∗(1−2ˆ−(2∗ i +1) )−y ( j )∗2ˆ− i ) ∗2ˆ pre

) ;y ( j +1) = 2ˆ−pre ∗ f ix ( ( y ( j ) ∗(1−2ˆ−(2∗ i +1) )+x ( j )∗2ˆ− i ) ∗2ˆ pre

) ;z ( j +1) = 2ˆ−pre ∗ f ix ( ( z ( j ) − 2ˆ− i ) ∗2ˆ pre ) ;

i f z ( j +1) == 0j = j +1;break

end

115

elsez ( j +1) = z ( j ) ;x ( j +1) = x ( j ) ;y ( j +1) = y ( j ) ;

endj = j +1;

end






end






fpr intf ( ’ ∗∗∗∗∗∗∗∗∗∗∗∗ Resu l t s ∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’ ) ;maxerrora v e r a g e i t e r a t i o n s = ( t o t a l i t e r ) /numberTest

radix8 linear rot.m

function [ ] = R a d i x 8 l i n e a r r o t ( y1 , x1 , z1 )% Implementation o f the radix−8 a l g or i thm f o r% r o t a t i o n mode in the l i n e a r space . For the l i n e a r% space t h e r e i s no need f o r a s c a l i n g f a c t o r% The a l g or i thm e v a l u t e s y1 + x1∗ z1%% @ Author : Anders Torp ( Master t h e s i s )

116 Matlab code

% @ Date : 11 mar 08 ( Fina l v e r s i o n )


x = [ 1 0 0 0 0 0 0 0 0 0 0 0 0 ] ;y = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;w = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;j = 1 ;

x (1 ) = x1 ;y (1 ) = y1 ;w(1) = z1 ;


i f w( j ) >= ed = 4 ;

e l s e i f w( j ) >= c && w( j ) < ed = 3 ;

e l s e i f w( j ) >= b && w( j ) < cd = 2 ;

e l s e i f w( j ) >= a && w( j ) < bd = 1 ;

e l s e i f w( j ) > −a && w( j ) < ad = 0 ;

e l s e i f w( j ) > −b && w( j ) <=−ad = −1;

e l s e i f w( j ) > −c && w( j ) <=−bd = −2;

e l s e i f w( j ) > −e && w( j ) <=−cd = −3;

elsed = −4;

end

x ( j +1) = x (1) ;y ( j +1) = y ( j )+d∗x ( j ) ∗2∗(8ˆ(− i ) ) ;w( j +1) = 8∗(w( j )−2∗d) ;

j = j +1;end

117

% check r e s u l ti f y1+x1∗ z1 < y ( j )

error = y ( j ) − ( y1+x1∗ z1 ) ;else

error = ( y1+x1∗ z1 ) − y ( j ) ;end

fprintf ( ’ ∗∗∗∗∗∗∗∗∗∗∗∗ Resu l t s ∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’ ) ;y ( j )error

radix8 linear vec.m

function [ ] = R a d i x 8 l i n e a r v e c ( z1 , x1 , y1 )% Implementation o f the radix−8 a l g or i thm f o r% v e c t o r i n g mode in the l i n e a r space . For the l i n e a r% space t h e r e i s no need f o r a s c a l i n g f a c t o r% The a l g or i thm e v a l u t e s z1 + y1/x1%% @ Author : Anders Torp ( Master t h e s i s )% @ Date : 11 mar 08 ( Fina l v e r s i o n )


x = [ 1 0 0 0 0 0 0 0 0 0 0 0 0 ] ;w = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;z = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;j = 1 ;

w(1) = y1 ;x (1 ) = x1 ;z (1 ) = z1 ;


i f w( j ) >= x ( j ) ∗ed = −4;

e l s e i f w( j ) >= x ( j ) ∗c && w( j ) < x ( j ) ∗ed = −3;

e l s e i f w( j ) >= x ( j ) ∗b && w( j ) < x ( j ) ∗cd = −2;

118 Matlab code

e l s e i f w( j ) >= x ( j ) ∗a && w( j ) < x ( j ) ∗bd = −1;

e l s e i f w( j ) > −x ( j ) ∗a && w( j ) < x ( j ) ∗ad = 0 ;

e l s e i f w( j ) > −x ( j ) ∗b && w( j ) <= −x ( j ) ∗ad = 1 ;

e l s e i f w( j ) > −x ( j ) ∗c && w( j ) <= −x ( j ) ∗bd = 2 ;

e l s e i f w( j ) > −x ( j ) ∗e && w( j ) <= −x ( j ) ∗cd = 3 ;

elsed = 4 ;

end

x ( j +1) = x (1) ;w( j +1) = 8∗(w( j )+d∗x ( j ) ∗2) ;z ( j +1) = z ( j ) − 2∗d∗8ˆ(− i ) ;

j = j +1;end

% check r e s u l ti f ( z1 + y1/x1 ) < z ( j )

error = z ( j ) − ( z1 +y1/x1 ) ;else

error = ( z1 + y1/x1 ) − z ( j ) ;end

fprintf ( ’ ∗∗∗∗∗∗∗∗∗∗∗∗ Resu l t s ∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’ ) ;errorz ( j )

radix8 hyper rot.m

function [ ] = Radix8 hyper rot ( x1 )% Implementation o f the radix−8 a l g or i thm f o r% r o t a t i o n mode in the h y p e r b o l i c space .% The a l g or i thm e v a l u t e s cosh ( x1 ) and s inh ( x1 )% convergence range i s r e s t i c t e d by atan (8ˆ−1)%% @ Author : Anders Torp ( Master t h e s i s )% @ Date : 11 mar 08 ( Fina l v e r s i o n )

a = 1 ; % s e l e c t i o n i n v e r v a lb = 3 ; % s e l e c t i o n i n v e r v a lc = 5 ; % s e l e c t i o n i n v e r v a l

119

e = 7 ; % s e l e c t i o n i n v e r v a l

x = [ 1 0 0 0 0 0 0 0 0 0 0 0 0 ] ;y = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;w = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;j = 1 ;

w(1) = 8∗x1 ;


i f w( j ) >= ed = 4 ;

e l s e i f w( j ) >= c && w( j ) < ed = 3 ;

e l s e i f w( j ) >= b && w( j ) < cd = 2 ;

e l s e i f w( j ) >= a && w( j ) < bd = 1 ;

e l s e i f w( j ) > −a && w( j ) < ad = 0 ;

e l s e i f w( j ) > −b && w( j ) <=−ad = −1;

e l s e i f w( j ) > −c && w( j ) <=−bd = −2;

e l s e i f w( j ) > −e && w( j ) <=−cd = −3;

elsed = −4;

endx ( j +1) = x ( j ) ∗(1+d∗d∗2ˆ(−6∗ i ) )+d∗y ( j ) ∗2∗(2ˆ(−3∗ i ) ) ;y ( j +1) = y ( j ) ∗(1+d∗d∗2ˆ(−6∗ i ) )+d∗x ( j ) ∗2∗(2ˆ(−3∗ i ) ) ;w( j +1) = 8∗(w( j ) −2∗(8ˆ( i ) ) ∗atanh (d∗8ˆ(− i ) ) ) ;x ( j +1) = x ( j +1)/(1−dˆ2∗8ˆ−( i ∗2) ) ;y ( j +1) = y ( j +1)/(1−dˆ2∗8ˆ−( i ∗2) ) ;

j = j +1;end

% check r e s u l ti f cosh ( x1 ) < x ( j )

x e r r o r = x ( j ) − cosh ( x1 ) ;else

x e r r o r = cosh ( x1 ) − x ( j ) ;end

120 Matlab code

i f sinh ( x1 ) < y ( j )y e r r o r = y ( j ) − sinh ( x1 ) ;

elsey e r r o r = sinh ( x1 ) − y ( j ) ;

end

fprintf ( ’ ∗∗∗∗∗∗∗∗∗∗∗∗ Resu l t s ∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’ ) ;x e r r o ry e r r o rx ( j )y ( j )

radix8 hyper vec.m

function [ ] = Radix8 hyper vec ( x1 )% Implementation o f the radix−8 a l g or i thm f o r% v e c t o r i n g mode in the h y p e r b o l i c space .% The a l g or i thm e v a l u t e s atanh ( x1 )%% @ Author : Anders Torp ( Master t h e s i s )% @ Date : 11 mar 08 ( Fina l v e r s i o n )


x = [ 1 0 0 0 0 0 0 0 0 0 0 0 0 ] ;w = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;z = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 ] ;j = 1 ;w(1) = 8∗x1 ;


i f w( j ) >= x ( j ) ∗ed = −4;

e l s e i f w( j ) >= x ( j ) ∗c && w( j ) < x ( j ) ∗ed = −3;

e l s e i f w( j ) >= x ( j ) ∗b && w( j ) < x ( j ) ∗cd = −2;

e l s e i f w( j ) >= x ( j ) ∗a && w( j ) < x ( j ) ∗bd = −1;

e l s e i f w( j ) > −x ( j ) ∗a && w( j ) < x ( j ) ∗a

121

d = 0 ;e l s e i f w( j ) > −x ( j ) ∗b && w( j ) <= −x ( j ) ∗a

d = 1 ;e l s e i f w( j ) > −x ( j ) ∗c && w( j ) <= −x ( j ) ∗b

d = 2 ;e l s e i f w( j ) > −x ( j ) ∗e && w( j ) <= −x ( j ) ∗c

d = 3 ;else

d = 4 ;end

x ( j +1) = x ( j ) ∗(1+d∗d∗2ˆ(−6∗ i ) )+d∗w( j ) ∗2∗(2ˆ(−6∗ i ) ) ;w( j +1) = 8∗(w( j ) ∗(1+d∗d∗2ˆ(−6∗ i ) )+d∗x ( j ) ∗2) ;z ( j +1) = z ( j ) − 2∗atanh (d∗8ˆ(− i ) ) ;

j = j +1;end

% check r e s u l ti f atanh ( x1 ) < z ( j )

error = z ( j ) − atanh ( x1 ) ;else

error = atanh ( x1 ) − z ( j ) ;end

fprintf ( ’ ∗∗∗∗∗∗∗∗∗∗∗∗ Resu l t s ∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’ ) ;errorz ( j )

Exhaustive rot.m

function [ ] = e x h a u s t i v e r o t ( )% Test f i l e s f o r e x h a u s i t v e t e s t in r o t a t i o n% mode . Conv range from [ 0 ; p i /4 ] and [ 0 ; p i /2 ]%% @ Author : Anders Torp − s021884@student . dtu . dk% @ Date : 11 jan 2008 ( f i n a l v e r s i o n )

f i d p i 4 = fopen ( ’ t e s t e x h a u s t i v e r o t p i 4 . dat ’ , ’w ’ ) ;f i d r e s p i 4 = fopen ( ’ r e s e x h a u s t i v e r o t p i 4 . dat ’ , ’w ’ ) ;f i d p i 2 = fopen ( ’ t e s t e x h a u s t i v e r o t p i 2 . dat ’ , ’w ’ ) ;f i d r e s p i 2 = fopen ( ’ r e s e x h a u s t i v e r o t p i 2 . dat ’ , ’w ’ ) ;

s tep = 2ˆ−15;

% range from [ 0 : p i /4 ]

122 Matlab code

for n = 0 : s tep : pi/4n f i = f i (n , 1 , 36 ,29 , ’ roundmode ’ , ’ f l o o r ’ ) ;n f i h e x = hex ( n f i ) ;fpr intf ( f i d p i 4 , ’ 1 \n ’ ) ; % r o t a t i o n modefpr intf ( f i d p i 4 , ’%s \n ’ , n f i h e x ) ;

fpr intf ( f i d p i 2 , ’ 1 \n ’ ) ; % r o t a t i o n modefpr intf ( f i d p i 2 , ’%s \n ’ , n f i h e x ) ;

c o s f i = f i ( cos (n) , 0 , 36 , 29 , ’ roundmode ’ , ’ f l o o r ’ ) ;s i n f i = f i ( sin (n) , 0 , 36 , 29 , ’ roundmode ’ , ’ f l o o r ’ ) ;c o s f i h e x = hex ( c o s f i ) ;s i n f i h e x = hex ( s i n f i ) ;fpr intf ( f i d r e s p i 4 , ’%s \n ’ , c o s f i h e x ) ;fpr intf ( f i d r e s p i 4 , ’%s \n ’ , s i n f i h e x ) ;

fpr intf ( f i d r e s p i 2 , ’%s \n ’ , c o s f i h e x ) ;fpr intf ( f i d r e s p i 2 , ’%s \n ’ , s i n f i h e x ) ;

end

% range from [ p i /4 : p i /2 ]for n = pi/4 : s tep : pi/2

n f i = f i (n , 1 , 36 ,29 , ’ roundmode ’ , ’ f l o o r ’ ) ;n f i h e x = hex ( n f i ) ;fpr intf ( f i d p i 2 , ’ 1 \n ’ ) ;fpr intf ( f i d p i 2 , ’%s \n ’ , n f i h e x ) ;

c o s f i = f i ( cos (n) , 0 , 36 , 29 , ’ roundmode ’ , ’ f l o o r ’ ) ;s i n f i = f i ( sin (n) , 0 , 36 , 29 , ’ roundmode ’ , ’ f l o o r ’ ) ;c o s f i h e x = hex ( c o s f i ) ;s i n f i h e x = hex ( s i n f i ) ;fpr intf ( f i d r e s p i 2 , ’%s \n ’ , c o s f i h e x ) ;fpr intf ( f i d r e s p i 2 , ’%s \n ’ , s i n f i h e x ) ;

end

Exhaustive vec.m

function [ ] = exhaus t i v e vec ( )% Test f i l e s f o r e x h a u s i t v e t e s t in v e c t o r i n g% mode . Conv range from [ 0 ; 1 ]%% @ Author : Anders Torp − s021884@student . dtu . dk% @ Date : 11 jan 2008 ( f i n a l v e r s i o n )

f i d = fopen ( ’ t e s t e x h a u s t i v e v e c . dat ’ , ’w ’ ) ;

123

f i d r e s = fopen ( ’ r e s e x h a u s t i v e v e c . dat ’ , ’w ’ ) ;s tep = 2ˆ−15;

for n = 0 : s tep : 1n f i = f i (n , 1 , 36 ,29 , ’ roundmode ’ , ’ f l o o r ’ ) ;n f i h e x = hex ( n f i ) ;fpr intf ( f i d , ’ 2 \n ’ ) ; % v e c t o r i n g modefpr intf ( f i d , ’%s \n ’ , n f i h e x ) ;

a t a n f i = f i (atan (n) , 0 , 36 , 29 , ’ roundmode ’ , ’ f l o o r ’ ) ;a t a n f i h e x = hex ( a t a n f i ) ;fpr intf ( f i d r e s , ’%s \n ’ , a t a n f i h e x ) ;

end

Exhaustive inv rot.m

function [ ] = Exhaus t i v e inv ro t ( )% Test f i l e s f o r e x h a u s i t v e t e s t in inv r o t a t i o n% mode . Conv range from [ 0 ; 1 ]%% @ Author : Anders Torp − s021884@student . dtu . dk% @ Date : 13 jan 2008 ( f i n a l v e r s i o n )

f i d = fopen ( ’ t e s t e x h a u s t i v e i n v r o t . dat ’ , ’w ’ ) ;f i d r e s = fopen ( ’ r e s e x h a u s t i v e i n v r o t . dat ’ , ’w ’ ) ;

s tep = 2ˆ−15;

% range from [ 0 : 1 ]for n = 0 : s tep : 1

n f i = f i (n , 1 , 36 ,29 , ’ roundmode ’ , ’ f l o o r ’ ) ;n f i h e x = hex ( n f i ) ;fpr intf ( f i d , ’ 3 \n ’ ) ; % inv r o t a t i o n modefpr intf ( f i d , ’%s \n ’ , n f i h e x ) ;

a c o s f i = f i ( acos (n) , 0 , 36 , 29 , ’ roundmode ’ , ’ f l o o r ’ ) ;a c o s f i h e x = hex ( a c o s f i ) ;fpr intf ( f i d r e s , ’%s \n ’ , a c o s f i h e x ) ;

end

124 Matlab code

Appendix D

VHDL

This appendix contains the VHDL files for

CORDIC from page 126.

D-CORDIC from page 156.

Radix-4 D-CORDIC from page 200.

Radix-8 D-CORDIC from page 248.

Scaling free from page 298.

126 VHDL

D.1 CORDIC

VHDL files for the implementation of the CORDIC algorithm. The architectures isdescribed in section 3.2 beginning on page 36. The correlation between the files islisted below.

• Top level

– CORDIC block 1

∗ Shift block 1

∗ Z path 1

∗ CPA

∗ Inverter

– CORDIC block 2

∗ Shift block 2

∗ Z path 1

∗ CPA

∗ Inverter

– CORDIC block 3

∗ Shift block 3

∗ Z path 3

∗ CPA

∗ Inverter

– CORDIC block 4

∗ Shift block 4

∗ Z path 4

∗ CPA

∗ Inverter

– Register

– Multiplexer

– Controller

Used VHDL types

l ibrary IEEE ;use IEEE . s t d l o g i c 1 1 6 4 . a l l ;use IEEE . s t d l o g i c a r i t h . a l l ;package c o r d i c t y p e s i s

subtype p r e c i s i o n o u t i s s t d l o g i c v e c t o r (23 downto 0) ;

D.1 CORDIC 127

subtype p r e c i s i o n i n i s s t d l o g i c v e c t o r (23 downto 0) ;subtype word length i s s t d l o g i c v e c t o r (29 downto 0) ;subtype word length x i s s t d l o g i c v e c t o r (32 downto 0) ;

−− c o n s t a n t sconstant word l ength ze ro : word length := X”0000000” & ”00”

;constant word l ength x ze ro : word length x := X”0000000” & ”

00000” ;constant word length one : word length := ”10” & X”0000000”

;

constant s c a l e f a c t o r : word length x := ”000010011011011101001110110110101 ” ;

constant t a n t a b l e 0 : word length x := ”000011001001000011111101101010100 ” ;










constant t a n t a b l e 1 0 : word length x := ”000000000000001111111111111111111 ” ;






128 VHDL
















end package c o r d i c t y p e s ;

Top level

l ibrary IEEE ;use IEEE . s t d l o g i c 1 1 6 4 . a l l ;use IEEE . s t d l o g i c a r i t h . a l l ;use work . c o r d i c t y p e s . a l l ;

entity t o p l e v e l i sport ( c l k : in s t d l o g i c ;

r e s e t : in s t d l o g i c ;z i n : in s t d l o g i c v e c t o r (31 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x out : out s t d l o g i c v e c t o r (29 downto 0) ;y out : out s t d l o g i c v e c t o r (29 downto 0) ) ;

end t o p l e v e l ;

D.1 CORDIC 129

architecture behav io ra l of t o p l e v e l i s

component counterport ( c l k : in s t d l o g i c ;

r e s e t : in s t d l o g i c ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;i o u t : out s t d l o g i c v e c t o r (2 downto 0) ) ;

end component ;

component regport ( c l k : in s t d l o g i c ;

r e s e t : in s t d l o g i c ;x in : in word length x ;y in : in word length x ;z i n : in word length x ;s igma in : in s t d l o g i c ;x out : out word length x ;y out : out word length x ;z out : out word length x ;s igma out : out s t d l o g i c ) ;

end component ;

component mux1port ( x1 : in word length x ;

y1 : in word length x ;z1 : in word length x ;sigma1 : in s t d l o g i c ;x i t e r a t i v e : in word length x ;y i t e r a t i v e : in word length x ;z i t e r a t i v e : in word length x ;s i g m a i t e r a t i v e : in s t d l o g i c ;s e l : in s t d l o g i c v e c t o r (2 downto 0) ;x : out word length x ;y : out word length x ;z : out word length x ;sigma : out s t d l o g i c ) ;

end component ;

component c o r d i c b l o c k 1port ( x in : in word length x ;

y in : in word length x ;z i n : in word length x ;sigma : in s t d l o g i c ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x out : out word length x ;

130 VHDL

y out : out word length x ;z out : out word length x ;next s igma : out s t d l o g i c ) ;

end component ;


y in : in word length x ;z i n : in word length x ;sigma : in s t d l o g i c ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x out : out word length x ;y out : out word length x ;z out : out word length x ;next s igma : out s t d l o g i c ) ;

end component ;



end component ;



end component ;

signal i : s t d l o g i c v e c t o r (2 downto 0) ;signal x mux , y mux , z mux : word length x ;

D.1 CORDIC 131

signal x out 1 , y out 1 , z out1 : word length x ;signal x out 2 , y out 2 , z out2 : word length x ;signal x out 3 , y out 3 , z out3 : word length x ;signal x out 4 , y out 4 , z out4 : word length x ;signal x reg , y reg , z reg , x re s , y r e s : word length x ;signal sigma 1 , sigma mux , sigma out1 , sigma out2 , sigma out3 ,

sigma out4 , s igma reg : s t d l o g i c ;signal x 1 , y 1 , z 1 : word length x ;

begin

s igma 1 <= z 1 (32) when opmode = ”01” else not ( y 1 (32) ) ;

process (opmode , z i n )begin

i f opmode = ”01” thenx 1 <= s c a l e f a c t o r ;

elsex 1 <= ”0001” & z i n (28 downto 0) ;

end i f ;end process ;


i f opmode = ”01” theny 1 <= s c a l e f a c t o r ;

elsey 1 <= ”1111” & z i n (28 downto 0) ;



i f opmode = ”01” thenz 1 <= signed ( ’ 0 ’ & z i n ) − s igned ( t a n t a b l e 0 ) +

’ 0 ’ ;else

z 1 <= t a n t a b l e 0 ; −− ’0 ’ & z i n ;end i f ;

end process ;

c o u n t i : counterport map( c lk , r e s e t , opmode , i ) ;

b lock1 : c o r d i c b l o c k 1

132 VHDL

port map(x mux , y mux , z mux , sigma mux , i , opmode , x out 1 ,y out 1 , z out1 , s igma out1 ) ;

b lock2 : c o r d i c b l o c k 2port map( x out 1 , y out 1 , z out1 , sigma out1 , i , opmode ,

x out 2 , y out 2 , z out2 , s igma out2 ) ;





mux : mux1port map( x 1 , y 1 , z 1 , sigma 1 , x reg , y reg , z reg ,

s igma reg , i , x mux , y mux , z mux , sigma mux ) ;

r eg xyz : regport map( c lk , r e s e t , x out 4 , y out 4 , z out4 , sigma out4 ,

x reg , y reg , z reg , s igma reg ) ;

x out <= x out 4 (29 downto 0) when opmode = ”01” else z out4(29 downto 0) ;

y out <= y out 4 (29 downto 0) when opmode = ”01” elseword l ength ze ro ;

end behav io ra l ;

CORDIC block 1

l ibrary IEEE ;use IEEE . s t d l o g i c 1 1 6 4 . a l l ;use IEEE . s t d l o g i c a r i t h . a l l ;

use work . c o r d i c t y p e s . a l l ;

entity c o r d i c b l o c k 1 i sport ( x in : in word length x ;

y in : in word length x ;z i n : in word length x ;sigma : in s t d l o g i c ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x out : out word length x ;

D.1 CORDIC 133

y out : out word length x ;z out : out word length x ;next s igma : out s t d l o g i c ) ;

end c o r d i c b l o c k 1 ;

architecture behav io ra l of c o r d i c b l o c k 1 i s

component i n v e r t e rport ( x in : in word length x ;

x out : out word length x ) ;end component ;

component s h i f t 1port ( x in : in word length x ;

i : in s t d l o g i c v e c t o r (2 downto 0) ;x s h i f t : out word length x ) ;

end component ;

component z path1port ( z i n : in word length x ;

i : in s t d l o g i c v e c t o r (2 downto 0) ;sigma : in s t d l o g i c ;z out : out word length x ) ;

end component ;

component cpaport ( x : in word length X ;

y : in word length x ;neg : in s t d l o g i c ;s : out word length x ) ;

end component ;

signal x s h i f t o u t , y s h i f t o u t , y cpa out : word length x ;signal i n v x s h i f t o u t , i n v y s h i f t o u t : word length X ;signal y cpax in , x cpay in , z path out : word length X ;signal s igma inv : s t d l o g i c ;

begin

s h i f t x : s h i f t 1port map( x in , i , x s h i f t o u t ) ;

s h i f t y : s h i f t 1port map( y in , i , y s h i f t o u t ) ;

i n v e r t y s h i f t : i n v e r t e rport map( y s h i f t o u t , i n v y s h i f t o u t ) ;

134 VHDL

i n v e r t x s h i f t : i n v e r t e rport map( x s h i f t o u t , i n v x s h i f t o u t ) ;

cpax : cpaport map( x in , y cpax in , s igma inv , x out ) ;

cpay : cpaport map( y in , x cpay in , sigma , y cpa out ) ;

s igma inv <= not ( sigma ) ;

process ( sigma , y s h i f t o u t , i n v y s h i f t o u t )begin

i f sigma = ’0 ’ theny cpax in <= i n v y s h i f t o u t ;

elsey cpax in <= y s h i f t o u t ;


process ( sigma , x s h i f t o u t , i n v x s h i f t o u t )begin

i f sigma = ’1 ’ thenx cpay in <= i n v x s h i f t o u t ;

elsex cpay in <= x s h i f t o u t ;


z : z path1port map( z in , i , sigma , z path out ) ;

y out <= y cpa out ;

next s igma <= z path out (32) when opmode = ”01” else not (y cpa out (32) ) ;

z out <= z path out ;

end behav io ra l ;

CORDIC block 2


D.1 CORDIC 135










end component ;



end component ;

component cpaport ( x : in word length x ;


end component ;

signal x s h i f t o u t , y s h i f t o u t , y cpa out : word length x ;signal i n v x s h i f t o u t , i n v y s h i f t o u t : word length x ;signal y cpax in , x cpay in , z path out : word length x ;signal s igma inv : s t d l o g i c ;

136 VHDL

begin



















D.1 CORDIC 137


end behav io ra l ;

CORDIC block 3











end component ;



end component ;


138 VHDL


end component ;


begin















D.1 CORDIC 139






end behav io ra l ;

CORDIC block 4











end component ;

140 VHDL



end component ;



end component ;


begin










D.1 CORDIC 141











end behav io ra l ;

Controller

l ibrary IEEE ;use IEEE . s t d l o g i c 1 1 6 4 . a l l ;use work . c o r d i c t y p e s . a l l ;use IEEE . s t d l o g i c a r i t h . a l l ;

entity counter i sport ( c l k : in s t d l o g i c ;


end counter ;

architecture behav io ra l of counter i s

signal next i , i : s t d l o g i c v e c t o r (2 downto 0) ;

begin

i o u t <= i ;

142 VHDL

process ( i , r e s e t , opmode )begincase i i s

when ”000” =>i f r e s e t = ’1 ’ and ( opmode = ”01” or opmode = ”10” )

thenn e x t i <= ”001” ;elsen e x t i <= ”000” ;

end i f ;when ”001” =>

n e x t i <= ”010” ;when ”010” =>

n e x t i <= ”011” ;when ”011” =>

n e x t i <= ”100” ;when ”100” =>

n e x t i <= ”101” ;when ”101” =>

n e x t i <= ”000” ;when others =>

n e x t i <= ”000” ;end case ;

end process ;

process ( c lk , r e s e t )begin

i f r e s e t = ’0 ’ theni <= ”000” ;e l s i f c lk ’ event and c l k = ’1 ’ theni <= n e x t i ;


end behav io ra l ;

CPA


entity cpa i sport ( x : in word length x ;

y : in word length X ;

D.1 CORDIC 143

neg : in s t d l o g i c ;s : out word length x ) ;

end cpa ;

architecture behav io ra l of cpa i s

begin

s <= signed ( x )+s igned ( y )+neg ;

end behav io ra l ;

Inverter

l ibrary IEEE ;use IEEE . s t d l o g i c 1 1 6 4 . a l l ;use work . c o r d i c t y p e s . a l l ;

entity i n v e r t e r i sport ( x in : in word length X ;

x out : out word length x ) ;end i n v e r t e r ;

architecture behav io ra l of i n v e r t e r i s

beginx out <= not ( x in ) ;

end behav io ra l ;

Multiplexer


entity mux1 i sport ( x1 : in word length x ;

y1 : in word length x ;z1 : in word length x ;sigma1 : in s t d l o g i c ;x i t e r a t i v e : in word length x ;y i t e r a t i v e : in word length x ;z i t e r a t i v e : in word length x ;s i g m a i t e r a t i v e : in s t d l o g i c ;

144 VHDL

s e l : in s t d l o g i c v e c t o r (2 downto 0) ;x : out word length x ;y : out word length x ;z : out word length x ;sigma : out s t d l o g i c ) ;

end mux1 ;

architecture behav io ra l of mux1 i s

begin

process ( x1 , y1 , z1 , sigma1 , x i t e r a t i v e , y i t e r a t i v e ,z i t e r a t i v e , s e l , s i g m a i t e r a t i v e ) i sbegin

i f s e l = ”000” thenx <= x1 ;y <= y1 ;z <= z1 ;sigma <= sigma1 ;

elsex <= x i t e r a t i v e ;y <= y i t e r a t i v e ;z <= z i t e r a t i v e ;sigma <= s i g m a i t e r a t i v e ;


end behav io ra l ;

Register


entity reg i sport ( c l k : in s t d l o g i c ;

r e s e t : in s t d l o g i c ;x in : in word length x ;y in : in word length x ;z i n : in word length x ;s igma in : in s t d l o g i c ;x out : out word length x ;y out : out word length x ;z out : out word length x ;s igma out : out s t d l o g i c ) ;

D.1 CORDIC 145

end reg ;

architecture behav io ra l of reg i s

beginprocess ( c lk , r e s e t )

begini f r e s e t = ’0 ’ then

x out <= word l ength x ze ro ;y out <= word l ength x ze ro ;z out <= word l ength x ze ro ;s igma out <= ’ 0 ’ ;

e l s i f c lk ’ event and c l k = ’1 ’ thenx out <= x in ;y out <= y in ;z out <= z i n ;s igma out <= sigma in ;


end behav io ra l ;

Shift block 1



entity s h i f t 1 i sport ( x in : in word length x ;


end s h i f t 1 ;

architecture behav io ra l of s h i f t 1 i s

begin

process ( x in , i ) i sbegin

case i i swhen ”000” =>

x s h i f t <= x in (32) & x in (32 downto 1) ;when ”001” =>

146 VHDL

x s h i f t <= x in (32) & x in (32) &x in (32) &x in (32)& x in (32) & x in (32 downto 5) ;

when ”010” =>x s h i f t <= x in (32) & x in (32) &x in (32) &x in (32)

& x in (32) & x in (32) & x in (32) &x in (32) &x in (32) & x in (32 downto 9) ;


& x in (32) & x in (32) & x in (32) &x in (32) &x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32 downto 13) ;


& x in (32) & x in (32) & x in (32) &x in (32) &x in (32) & x in (32) & x in (32) & x in (32) &x in (32) &x in (32) & x in (32) & x in (32) &x in (32) & x in (32 downto 17) ;

when others =>x s h i f t <= x in (32) & x in (32) &x in (32) &x in (32)

& x in (32) & x in (32) & x in (32) &x in (32) &x in (32) & x in (32) & x in (32) & x in (32) &x in(32) &x in (32) & x in (32) & x in (32) & x in(32) & x in (32) &x in (32) &x in (32) & x in (32)& x in (32 downto 21) ;

end case ;end process ;

end behav io ra l ;

Shift block 2





end s h i f t 2 ;


begin

D.1 CORDIC 147



x s h i f t <= x in (32) &x in (32) & x in (32 downto 2) ;when ”001” =>

x s h i f t <= x in (32) &x in (32) & x in (32) &x in (32)&x in (32) & x in (32) & x in (32 downto 6) ;

when ”010” =>x s h i f t <= x in (32) &x in (32) & x in (32) &x in (32)

&x in (32) & x in (32) & x in (32) & x in (32) &x in (32) &x in (32) & x in (32 downto 10) ;


&x in (32) & x in (32) & x in (32) & x in (32) &x in (32) &x in (32) & x in (32) & x in (32) & x in(32) &x in (32) & x in (32 downto 14) ;


&x in (32) & x in (32) & x in (32) & x in (32) &x in (32) &x in (32) & x in (32) & x in (32) & x in(32) &x in (32) &x in (32) & x in (32) & x in (32)& x in (32) & x in (32 downto 18) ;

when others =>x s h i f t <= x in (32) &x in (32) & x in (32) &x in (32)

&x in (32) & x in (32) & x in (32) & x in (32) &x in (32) &x in (32) & x in (32) & x in (32) & x in(32) &x in (32) &x in (32) & x in (32) & x in (32)& x in (32) & x in (32) &x in (32) &x in (32) &x in (32) & x in (32 downto 22) ;


end behav io ra l ;

Shift block 3





148 VHDL

end s h i f t 3 ;


begin



x s h i f t <= x in (32) &x in (32) &x in (32) & x in (32downto 3) ;

when ”001” =>x s h i f t <= x in (32) &x in (32) &x in (32) & x in

(32) &x in (32) &x in (32) & x in (32) & x in (32downto 7) ;


(32) &x in (32) &x in (32) & x in (32) & x in(32) & x in (32) &x in (32) &x in (32) & x in (32downto 11) ;


(32) &x in (32) &x in (32) & x in (32) & x in(32) & x in (32) &x in (32) &x in (32) & x in(32) & x in (32) & x in (32) &x in (32) & x in(32 downto 15) ;


(32) &x in (32) &x in (32) & x in (32) & x in(32) & x in (32) &x in (32) &x in (32) & x in(32) & x in (32) & x in (32) &x in (32) &x in(32) & x in (32) & x in (32) & x in (32) & x in(32 downto 19) ;

when others =>x s h i f t <= x in (32) &x in (32) &x in (32) & x in

(32) &x in (32) &x in (32) & x in (32) & x in(32) & x in (32) &x in (32) &x in (32) & x in(32) & x in (32) & x in (32) &x in (32) &x in(32) & x in (32) & x in (32) & x in (32) & x in(32) &x in (32) &x in (32) & x in (32) & x in (32downto 23) ;


end behav io ra l ;

D.1 CORDIC 149

Shift block 4





end s h i f t 4 ;


begin



x s h i f t <= x in (32) &x in (32) &x in (32) &x in (32) &x in (32 downto 4) ;

when ”001” =>x s h i f t <= x in (32) &x in (32) &x in (32) &x in (32) &

x in (32) &x in (32) &x in (32) & x in (32) & x in(32 downto 8) ;


x in (32) &x in (32) &x in (32) & x in (32) & x in(32) & x in (32) &x in (32) &x in (32) & x in (32downto 12) ;


x in (32) &x in (32) &x in (32) & x in (32) & x in(32) & x in (32) &x in (32) &x in (32) & x in (32) &

x in (32) & x in (32) &x in (32) & x in (32 downto16) ;


x in (32) &x in (32) &x in (32) & x in (32) & x in(32) & x in (32) &x in (32) &x in (32) & x in (32) &

x in (32) & x in (32) &x in (32) &x in (32) & x in(32) & x in (32) & x in (32) & x in (32 downto 20) ;

when others =>

150 VHDL

x s h i f t <= x in (32) &x in (32) &x in (32) &x in (32) &x in (32) &x in (32) &x in (32) & x in (32) & x in(32) & x in (32) &x in (32) &x in (32) & x in (32) &

x in (32) & x in (32) &x in (32) &x in (32) & x in(32) & x in (32) & x in (32) & x in (32) &x in (32)&x in (32) & x in (32) & x in (32 downto 24) ;


end behav io ra l ;

Z path block 1


entity z path1 i sport ( z i n : in word length x ;


end z path1 ;

architecture b e h a v i o r a l b l o c k of z path1 i s



signal tan , tan add , t a n i n v e r t e d : word length x ;

begin

process ( i )begincase i i s

when ”000” =>tan <= t a n t a b l e 1 ;



when ”011” =>tan <= t a n t a b l e 1 3 ;

D.1 CORDIC 151


when others =>tan <= t a n t a b l e 2 1 ;


i n v e r t t a n : i n v e r t e rport map( tan , t a n i n v e r t e d ) ;

process ( tan inver t ed , tan , sigma )begin

i f sigma = ’0 ’ thentan add <= t a n i n v e r t e d ;

elsetan add <= tan ;


z out <= signed ( z i n ) + s igned ( tan add ) + sigma ;

end b e h a v i o r a l b l o c k ;

Z path block 2




end z path2 ;





begin

152 VHDL

process ( i )begin


tan <= t a n t a b l e 2 ;when ”001” =>


tan <= t a n t a b l e 1 0 ;when ”011” =>


tan <= t a n t a b l e 1 8 ;when others =>

tan <= t a n t a b l e 2 2 ;end case ;

end process ;








Z path block 3




D.1 CORDIC 153

end z path3 ;





begin

process ( i )begin








end process ;








154 VHDL

Z path block 4




end z path4 ;





begin

process ( i )begin








end process ;


process ( tan inver t ed , tan , sigma )

D.1 CORDIC 155

begini f sigma = ’0 ’ then

tan add <= t a n i n v e r t e d ;else

tan add <= tan ;end i f ;

end process ;



156 VHDL

D.2 D-CORDIC

VHDL for implementation of the D-CORDIC algorithm. Architecture is described insection 3.3 on page 38.

• Top level

– D-CORDIC block 1

∗ Shift block 1.a

∗ Shift block 1.b

∗ Z path 1

∗ T path 1

∗ CPA

∗ Inverter


∗ Shift block 2.a

∗ Shift block 2.b

∗ Z path 2

∗ T path 2

∗ CPA

∗ Inverter


∗ Shift block 3.a

∗ Shift block 3.b

∗ Z path 3

∗ T path 3

∗ CPA

∗ Inverter


∗ Shift block 4.a

∗ Shift block 4.b

∗ Z path 4

∗ T path 4

∗ CPA

∗ Inverter

– Register

– Multiplexer

– Controller

D.2 D-CORDIC 157

Used VHDL types


subtype p r e c i s i o n o u t i s s t d l o g i c v e c t o r (23 downto 0) ;subtype p r e c i s i o n i n i s s t d l o g i c v e c t o r (23 downto 0) ;subtype word length i s s t d l o g i c v e c t o r (32 downto 0) ;subtype word length x i s s t d l o g i c v e c t o r (32 downto 0) ;

−− c o n s t a n t sconstant word l ength ze ro : word length := X”0000000” & ”00000

” ;constant word length one : word length := X” f f f f f f f ” & ”11111

” ;constant word length 1 : word length := ”00010” & X”0000000

” ;constant x 2 i n i t : s t d l o g i c v e c t o r (5 downto 0) := ”

000011” ;constant x 2 75 : s t d l o g i c v e c t o r (2 downto 0) := ”

011” ;

constant s c a l e f a c t o r : unsigned (32 downto 0) := ”000010011011011101001110110110101 ” ;

constant t a n t a b l e 0 : word length := ”000110010010000111111011010101000 ” ;










158 VHDL

constant t a n t a b l e 1 0 : word length := ”000000000000011111111111111111111 ” ;






















Top level

l ibrary IEEE ;

D.2 D-CORDIC 159

use IEEE . s t d l o g i c 1 1 6 4 . a l l ;use IEEE . s t d l o g i c a r i t h . a l l ;use work . c o r d i c t y p e s . a l l ;







end component ;


r e s e t : in s t d l o g i c ;x in : in word length ;y in : in word length ;z i n : in word length ;t i n : in word length ;s igma in : in s t d l o g i c ;x out : out word length ;y out : out word length ;z out : out word length ;t ou t : out word length ;s igma out : out s t d l o g i c ) ;

end component ;

component mux1port ( x1 : in word length ;

y1 : in word length ;z1 : in word length ;t1 : in word length ;sigma1 : in s t d l o g i c ;x i t e r a t i v e : in word length ;y i t e r a t i v e : in word length ;z i t e r a t i v e : in word length ;

160 VHDL

t i t e r a t i v e : in word length ;s i g m a i t e r a t i v e : in s t d l o g i c ;s e l : in s t d l o g i c v e c t o r (2 downto 0) ;x : out word length ;y : out word length ;z : out word length ;t : out word length ;sigma : out s t d l o g i c ) ;

end component ;

component dco rd i c b l o ck1port ( x in : in word length ;

y in : in word length ;z i n : in word length ;t i n : in word length ;s igma in : in s t d l o g i c ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x out : out word length ;y out : out word length ;z out : out word length ;t ou t : out word length ;next s igma : out s t d l o g i c ) ;

end component ;



end component ;


y in : in word length ;z i n : in word length ;t i n : in word length ;s igma in : in s t d l o g i c ;i : in s t d l o g i c v e c t o r (2 downto 0) ;

D.2 D-CORDIC 161

opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x out : out word length ;y out : out word length ;z out : out word length ;t ou t : out word length ;next s igma : out s t d l o g i c ) ;

end component ;



end component ;

signal i : s t d l o g i c v e c t o r (2 downto 0) ;signal x mux , y mux , z mux , t mux : word length ;signal x out 1 , x out 2 , x out 3 , x out 4 : word length ;signal y out 1 , y out 2 , y out 3 , y out 4 : word length ;signal z out 1 , z out 2 , z out 3 , z ou t 4 : word length ;signal t out 1 , t out 2 , t out 3 , t o u t 4 : word length ;signal x reg , y reg , z reg , t r e g : word length ;signal sigma 2 , sigma mux , sigma out1 , sigma out2 , sigma out3 ,

sigma out4 , s igma reg : s t d l o g i c ;

signal x 2 , y 2 , z 2 , t 2 : word length ;signal x 2 temp : s t d l o g i c v e c t o r (5 downto 0) ;

signal a mux , a : s t d l o g i c v e c t o r (31 downto 0) ;signal c : s t d l o g i c ;begin

process ( c , z 2 , y 2 )begin

i f opmode = ”01” thens igma 2 <= z 2 (32) ;

e l s i f opmode = ”10” thens igma 2 <= not ( y 2 (32) ) ;

162 VHDL

elses igma 2 <= c ;


c <= not ( t 2 (30) ) ;

x 2 temp <= ”000” & unsigned ( z i n (29 downto 27) ) + unsigned (x 2 75 ) ;

a <= unsigned ( z i n ) + unsigned ( a mux ) + not ( opmode (0 ) ) ;a mux <= ”0000” & z i n (29 downto 2) when opmode = ”11” else

not ( ”00100” & z i n (28 downto 2) ) ;

process (opmode , z in , x 2 temp )begin

i f opmode = ”01” thenx 2 <= ” 000011101100000000010000000011111 ” ;

e l s i f opmode = ”10” thenx 2 <= x 2 temp & z i n (26 downto 0) ;

elsex 2 <= ” 001000000000000000000000000000000 ” ;


process (opmode , a )begin

i f opmode = ”01” theny 2 <= word l ength ze ro ;

e l s i f opmode = ”10” theny 2 <= a (31) & a ;

elsey 2 <= ” 000110000 ” & x”000000” ;



i f opmode = ”01” thenz 2 <= ’0 ’ & z i n ;

e l s i f opmode = ”10” thenz 2 <= ” 000011101101011000110011100000101 ” ;

elsez 2 <= ” 000010100100101111000111110100010 ” ;


D.2 D-CORDIC 163

process (opmode , a )begin

i f opmode = ”11” thent 2 <= a (31 downto 0) & ’ 0 ’ ;

elset 2 <= word l ength ze ro ;


count : counterport map( c lk , r e s e t , opmode , i ) ;

b lock1 : d co rd i c b l o ck1port map(x mux , y mux , z mux , t mux , sigma mux , i , opmode ,

x out 1 , y out 1 , z out 1 , t out 1 , s igma out1 ) ;

b lock2 : dco rd i c b l o ck2port map( x out 1 , y out 1 , z out 1 , t out 1 , sigma out1 , i ,

opmode , x out 2 , y out 2 , z out 2 , t out 2 , s igma out2 ) ;





mux : mux1port map( x 2 , y 2 , z 2 , t 2 , sigma 2 , x reg , y reg , z reg ,

t r eg , s igma reg , i , x mux , y mux , z mux , t mux , sigma mux) ;

r eg xyz : regport map( c lk , r e s e t , x out 4 , y out 4 , z out 4 , t out 4 ,

sigma out4 , x reg , y reg , z reg , t r eg , s igma reg ) ;

x out <= x out 4 (29 downto 0) when opmode = ”01” elsez ou t 4 (29 downto 0) when opmode = ”10” elsez ou t 4 (29 downto 0) ;

y out <= y out 4 (29 downto 0) when opmode = ”01” elseX”0000000” &”00” ;

end behav io ra l ;

164 VHDL

D-CORDIC block 1


entity dco rd i c b l o ck1 i sport ( x in : in word length ;


end dco rd i c b l o ck1 ;

architecture behav io ra l of dco rd i c b l o ck1 i s

component i n v e r t e rport ( x in : in word length ;

x out : out word length ) ;end component ;

component s h i f t 1port ( x in : in word length ;

i : in s t d l o g i c v e c t o r (2 downto 0) ;x s h i f t : out word length ) ;

end component ;

component s h i f t 1 2port ( x in : in word length ;


end component ;

component z path1port ( z i n : in word length ;

i : in s t d l o g i c v e c t o r (2 downto 0) ;sigma : in s t d l o g i c ;z out : out word length ) ;

end component ;

D.2 D-CORDIC 165

component cpaport ( x : in word length ;

y : in word length ;neg : in s t d l o g i c ;s : out word length ) ;

end component ;

component t path1port ( t i n : in word length ;

x in : in word length ;i : in s t d l o g i c v e c t o r (2 downto 0) ;sigma : out s t d l o g i c ;t ou t : out word length ) ;

end component ;

signal sigma , zero , one , s igma inv , s igma t : s t d l o g i c ;signal cpa2x out , x s h i f t 1 o u t , x s h i f t 2 o u t , cpax1 out ,

x mux , x s h i f t 1 o u t i n v : word length ;signal y s h i f t 1 o u t , y s h i f t 2 o u t , cpay1 out , y mux ,

y s h i f t 1 o u t i n v : word length ;signal z path out , y cpa out : word length ;

begin

zero <= ’ 0 ’ ;one <= ’ 1 ’ ;

s h i f t 1 x : s h i f t 1port map( x in , i , x s h i f t 1 o u t ) ;

s h i f t 2 x : s h i f t 1 2port map( x in , i , x s h i f t 2 o u t ) ;

cpax1 : cpaport map( x in , x s h i f t 2 o u t , one , cpax1 out ) ;

cpax2 : cpaport map( cpax1 out , x mux , s igma inv , cpa2x out ) ;

x out <= cpa2x out ;

process ( sigma , y s h i f t 1 o u t , y s h i f t 1 o u t i n v )begin

i f sigma = ’0 ’ thenx mux <= y s h i f t 1 o u t i n v ;

else

166 VHDL

x mux <= y s h i f t 1 o u t ;end i f ;

end process ;

s h i f t 1 y : s h i f t 1port map( y in , i , y s h i f t 1 o u t ) ;

s h i f t 2 y : s h i f t 1 2port map( y in , i , y s h i f t 2 o u t ) ;

cpay1 : cpaport map( y in , y s h i f t 2 o u t , one , cpay1 out ) ;

cpay2 : cpaport map( cpay1 out , y mux , sigma , y cpa out ) ;

process ( sigma , x s h i f t 1 o u t , x s h i f t 1 o u t i n v )begin

i f sigma = ’1 ’ theny mux <= x s h i f t 1 o u t i n v ;

elsey mux <= x s h i f t 1 o u t ;


i n v e r t x s h i f t 1 : i n v e r t e rport map( x s h i f t 1 o u t , x s h i f t 1 o u t i n v ) ;

i n v e r t y s h i f t 1 : i n v e r t e rport map( y s h i f t 1 o u t , y s h i f t 1 o u t i n v ) ;

z : z path1port map( z in , i , s igma in , z path out ) ;

t : t path1port map( t i n , cpa2x out , i , s igma t , t ou t ) ;



process ( z path out , s igma t , y cpa out , opmode )begin

i f opmode = ”01” thennext s igma <= z path out (30) ;

e l s i f opmode = ”10” thennext s igma <= not ( y cpa out (30) ) ;

D.2 D-CORDIC 167

elsenext s igma <= sigma t ;


sigma <= not ( s igma in ) when opmode = ”11” else s igma in ;s igma inv <= sigma in when opmode = ”11” else not (

s igma in ) ;

end behav io ra l ;

D-CORDIC block 2











end component ;

168 VHDL



end component ;



end component ;



end component ;



end component ;

signal sigma , zero , one , s igma inv , s igma t : s t d l o g i c ;signal cpa2x out , x s h i f t 1 o u t , x s h i f t 2 o u t , cpax1 out , x mux

, x s h i f t 1 o u t i n v : word length ;signal y s h i f t 1 o u t , y s h i f t 2 o u t , cpay1 out , y mux ,


begin

zero <= ’ 0 ’ ;one <= ’ 1 ’ ;



D.2 D-CORDIC 169






elsex mux <= y s h i f t 1 o u t ;













170 VHDL


z out <= z path out ;y out <= y cpa out ;






sigma <= not ( s igma in ) when opmode = ”11” else s igma in ;s igma inv <= sigma in when opmode = ”11” else not ( s igma in ) ;end behav io ra l ;

D-CORDIC block 3







component i n v e r t e r

D.2 D-CORDIC 171

port ( x in : in word length ;x out : out word length ) ;

end component ;



end component ;



end component ;



end component ;



end component ;



end component ;




begin

172 VHDL

zero <= ’ 0 ’ ;one <= ’ 1 ’ ;


















D.2 D-CORDIC 173











sigma <= not ( s igma in ) when opmode = ”11” else s igma in ;s igma inv <= sigma in when opmode = ”11” else not ( s igma in ) ;

end behav io ra l ;

D-CORDIC block 4




y in : in word length ;z i n : in word length ;t i n : in word length ;s igma in : in s t d l o g i c ;i : in s t d l o g i c v e c t o r (2 downto 0) ;

174 VHDL

opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x out : out word length ;y out : out word length ;z out : out word length ;t ou t : out word length ;next s igma : out s t d l o g i c ) ;







end component ;



end component ;



end component ;



end component ;



D.2 D-CORDIC 175

end component ;




begin

zero <= ’ 0 ’ ;one <= ’ 1 ’ ;













176 VHDL
















sigma <= not ( s igma in ) when opmode = ”11” else s igma in ;s igma inv <= sigma in when opmode = ”11” else not ( s igma in ) ;

end behav io ra l ;

Controller

l ibrary IEEE ;use IEEE . s t d l o g i c 1 1 6 4 . a l l ;

D.2 D-CORDIC 177

use work . c o r d i c t y p e s . a l l ;use IEEE . s t d l o g i c a r i t h . a l l ;



end counter ;



begin

i o u t <= i ;

process ( i , r e s e t , opmode )begincase i i s

when ”000” =>i f r e s e t = ’1 ’ and not ( opmode = ”00” ) thenn e x t i <= ”001” ;elsen e x t i <= ”000” ;

end i f ;when ”001” =>

n e x t i <= ”010” ;when ”010” =>

n e x t i <= ”011” ;when ”011” =>

n e x t i <= ”100” ;when ”100” =>

n e x t i <= ”101” ;when others =>

n e x t i <= ”000” ;end case ;

end process ;


i f r e s e t = ’0 ’ theni <= ”000” ;e l s i f c lk ’ event and c l k = ’1 ’ theni <= n e x t i ;

178 VHDL


end behav io ra l ;

CPA


entity cpa i sport ( x : in word length ;


end cpa ;


begins <= unsigned ( x )+unsigned ( y )+neg ;

end behav io ra l ;

Inverter


entity i n v e r t e r i sport ( x in : in word length ;

x out : out word length ) ;end i n v e r t e r ;


beginx out <= not ( x in ) ;

end behav io ra l ;

Multiplexer

l ibrary IEEE ;use IEEE . s t d l o g i c 1 1 6 4 . a l l ;

D.2 D-CORDIC 179

use work . c o r d i c t y p e s . a l l ;use IEEE . s t d l o g i c a r i t h . a l l ;

entity mux1 i sport ( x1 : in word length ;

y1 : in word length ;z1 : in word length ;t1 : in word length ;sigma1 : in s t d l o g i c ;x i t e r a t i v e : in word length ;y i t e r a t i v e : in word length ;z i t e r a t i v e : in word length ;t i t e r a t i v e : in word length ;s i g m a i t e r a t i v e : in s t d l o g i c ;s e l : in s t d l o g i c v e c t o r (2 downto 0) ;x : out word length ;y : out word length ;z : out word length ;t : out word length ;sigma : out s t d l o g i c ) ;

end mux1 ;


begin

process ( x1 , y1 , z1 , t1 , x i t e r a t i v e , y i t e r a t i v e , z i t e r a t i v e, t i t e r a t i v e , s e l , sigma1 , s i g m a i t e r a t i v e ) i sbegin

i f s e l = ”000” thenx <= x1 ;y <= y1 ;z <= z1 ;t <= t1 ;sigma <= sigma1 ;

elsex <= x i t e r a t i v e ;y <= y i t e r a t i v e ;z <= z i t e r a t i v e ;t <= t i t e r a t i v e ;sigma <= s i g m a i t e r a t i v e ;


end behav io ra l ;

180 VHDL

Register



r e s e t : in s t d l o g i c ;x in : in word length ;y in : in word length ;z i n : in word length ;t i n : in word length ;s igma in : in s t d l o g i c ;x out : out word length ;y out : out word length ;z out : out word length ;t ou t : out word length ;s igma out : out s t d l o g i c ) ;

end reg ;




x out <= word l ength ze ro ;y out <= word l ength ze ro ;z out <= word l ength ze ro ;t ou t <= word l ength ze ro ;s igma out <= ’ 0 ’ ;

e l s i f c lk ’ event and c l k = ’1 ’ thenx out <= x in ;y out <= y in ;z out <= z i n ;t ou t <= t i n ;s igma out <= sigma in ;


end behav io ra l ;

Shift block 1.a

l ibrary IEEE ;

D.2 D-CORDIC 181

use IEEE . s t d l o g i c 1 1 6 4 . a l l ;use IEEE . s t d l o g i c a r i t h . a l l ;use work . c o r d i c t y p e s . a l l ;

entity s h i f t 1 i sport ( x in : in word length ;


end s h i f t 1 ;


begin



x s h i f t <= x in (32) & x in (32 downto 1) ;when ”001” =>

x s h i f t <= x in (32) & x in (32) & x in (32)& x in (32)& x in (32) & x in (32 downto 5) ;

when ”010” =>x s h i f t <= x in (32) & x in (32) & x in (32) & x in (32)

& x in (32) & x in (32)& x in (32) & x in (32) &x in (32) & x in (32 downto 9) ;


& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32 downto 13) ;


& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in (32)& x in (32 downto 17) ;

when others =>x s h i f t <= x in (32) & x in (32) & x in (32) & x in (32)

& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in (32)& x in (32) & x in (32) & x in (32) & x in (32) &

x in (32 downto 21) ;end case ;

end process ;

182 VHDL

end behav io ra l ;

Shift block 1.b



entity s h i f t 1 2 i sport ( x in : in word length ;


end s h i f t 1 2 ;

architecture behav io ra l of s h i f t 1 2 i s

begin



x s h i f t <= not ( x in (32) ) & not ( x in (32) ) & not ( x in(32) ) & not ( x in (32) ) & not ( x in (32 downto 4) ) ;

when ”001” =>x s h i f t <= not ( x in (32) ) & not ( x in (32) ) & not ( x in

(32) ) & not ( x in (32) ) & not ( x in (32) ) & not (x in (32) ) & not ( x in (32) ) & not ( x in (32) ) & not( x in (32) ) & not ( x in (32) ) & not ( x in (32) ) &not ( x in (32) ) & not ( x in (32 downto 12) ) ;


(32) ) & not ( x in (32) ) & not ( x in (32) ) & not (x in (32) ) & not ( x in (32) ) & not ( x in (32) ) & not( x in (32) ) & not ( x in (32) ) & not ( x in (32) ) &not ( x in (32) ) & not ( x in (32) ) & not ( x in (32) ) &not ( x in (32) ) & not ( x in (32) ) & not ( x in (32) )

& not ( x in (32) ) & not ( x in (32) ) & not ( x in(32) ) & not ( x in (32 downto 20) ) ;

when ”011” =>x s h i f t <= word length one ;

when others =>x s h i f t <= word length one ;


D.2 D-CORDIC 183

end behav io ra l ;

Shift block 2.a




end s h i f t 2 ;

architecture behav io ra l of s h i f t 2 i sbegin



x s h i f t <= x in (32) & x in (32) & x in (32 downto 2) ;when ”001” =>

x s h i f t <= x in (32) & x in (32) & x in (32) & x in (32)& x in (32) & x in (32) & x in (32 downto 6) ;


& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32 downto 10) ;


& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32 downto 14) ;


& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in (32)& x in (32) & x in (32 downto 18) ;


& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in (32)

184 VHDL

& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32 downto 22) ;


end behav io ra l ;

Shift block 2.b





end s h i f t 2 2 ;


begin



x s h i f t <= not ( x in (32) ) & not ( x in (32) ) & not ( x in(32) ) & not ( x in (32) ) & not ( x in (32) ) & not (x in (32) ) & not ( x in (32 downto 6) ) ;


(32) ) & not ( x in (32) ) & not ( x in (32) ) & not (x in (32) ) & not ( x in (32) ) & not ( x in (32) ) & not( x in (32) ) & not ( x in (32) ) & not ( x in (32) ) &not ( x in (32) ) & not ( x in (32) ) & not ( x in (32) ) &not ( x in (32 downto 14) ) ;


(32) ) & not ( x in (32) ) & not ( x in (32) ) & not (x in (32) ) & not ( x in (32) ) & not ( x in (32) ) & not( x in (32) ) & not ( x in (32) ) & not ( x in (32) ) &not ( x in (32) ) & not ( x in (32) ) & not ( x in (32) ) &not ( x in (32) ) & not ( x in (32) ) & not ( x in (32) )

& not ( x in (32) ) & not ( x in (32) ) & not ( x in (32)

D.2 D-CORDIC 185

) & not ( x in (32) ) & not ( x in (32) ) & not ( x in (32downto 22) ) ;

when ”011” =>x s h i f t <= word length one ;



end behav io ra l ;

Shift block 2.a




end s h i f t 3 ;


begin



x s h i f t <= x in (32) & x in (32) & x in (32) & x in (32downto 3) ;


& x in (32) & x in (32) & x in (32) & x in (32downto 7) ;


& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32 downto

11) ;when ”011” =>

x s h i f t <= x in (32) & x in (32) & x in (32) & x in (32)& x in (32) & x in (32) & x in (32) & x in (32) &

186 VHDL

x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32 downto 15);


& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in (32)& x in (32) & x in (32) & x in (32 downto 19) ;


& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in (32)& x in (32) & x in (32) & x in (32) & x in (32) &

x in (32) & x in (32) & x in (32 downto 23) ;end case ;

end process ;

end behav io ra l ;

Shift block 3.b





end s h i f t 3 2 ;


begin



x s h i f t <= not ( x in (32) ) & not ( x in (32) ) & not ( x in(32) ) & not ( x in (32) ) & not ( x in (32) ) & not (x in (32) ) & not ( x in (32) ) & not ( x in (32) ) & not( x in (32 downto 8) ) ;

D.2 D-CORDIC 187


(32) ) & not ( x in (32) ) & not ( x in (32) ) & not (x in (32) ) & not ( x in (32) ) & not ( x in (32) ) & not( x in (32) ) & not ( x in (32) ) & not ( x in (32) ) &not ( x in (32) ) & not ( x in (32) ) & not ( x in (32) ) &not ( x in (32) ) & not ( x in (32) ) & not ( x in (32

downto 16) ) ;when ”010” =>

x s h i f t <= not ( x in (32) ) & not ( x in (32) ) & not ( x in(32) ) & not ( x in (32) ) & not ( x in (32) ) & not (x in (32) ) & not ( x in (32) ) & not ( x in (32) ) & not( x in (32) ) & not ( x in (32) ) & not ( x in (32) ) &not ( x in (32) ) & not ( x in (32) ) & not ( x in (32) ) &not ( x in (32) ) & not ( x in (32) ) & not ( x in (32) )

& not ( x in (32) ) & not ( x in (32) ) & not ( x in (32)) & not ( x in (32) ) & not ( x in (32) ) & not ( x in(32) ) & not ( x in (32) ) & not ( x in (32 downto 24) );



end behav io ra l ;

Shift block 4.a




end s h i f t 4 ;


begin



188 VHDL

x s h i f t <= x in (32) & x in (32) & x in (32) & x in(32) & x in (32 downto 4) ;

when ”001” =>x s h i f t <= x in (32) & x in (32) & x in (32) & x in

(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32 downto 8) ;


(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32 downto 12) ;


(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32 downto 16) ;


(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32 downto 20) ;

when others =>x s h i f t <= x in (32) & x in (32) & x in (32) & x in

(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32 downto 24) ;


end behav io ra l ;

Shift block 4.b



i : in s t d l o g i c v e c t o r (2 downto 0) ;

D.2 D-CORDIC 189

x s h i f t : out word length ) ;end s h i f t 4 2 ;


begin



x s h i f t <= not ( x in (32) ) & not ( x in (32) ) & not ( x in(32) ) & not ( x in (32) ) & not ( x in (32) ) & not (x in (32) ) & not ( x in (32) ) & not ( x in (32) ) & not( x in (32) ) & not ( x in (32) ) & not ( x in (32 downto

10) ) ;when ”001” =>


& not ( x in (32) ) & not ( x in (32 downto 18) ) ;when ”010” =>


& not ( x in (32) ) & not ( x in (32) ) & not ( x in (32)) & not ( x in (32) ) & not ( x in (32) ) & not ( x in(32) ) & not ( x in (32) ) & not ( x in (32) ) & not (x in (32) ) & not ( x in (32 downto 26) ) ;



end behav io ra l ;

Z path block 1


190 VHDL

use IEEE . s t d l o g i c a r i t h . a l l ;

entity z path1 i sport ( z i n : in word length ;


end z path1 ;




signal tan , tan add , t a n i n v e r t e d : word length ;signal s igma inv : s t d l o g i c ;

begin

process ( i )begin








end process ;




else

D.2 D-CORDIC 191

tan add <= tan ;end i f ;

end process ;

z out <= signed ( z i n ) + s igned ( tan add ) + sigma inv ;s igma inv <= not ( sigma ) ;


Z path block 2




end z path2 ;




signal tan , tan add , t a n i n v e r t e d : word length ;signal car ry : s t d l o g i c ;signal s igma inv : s t d l o g i c ;

begin

process ( i )begin





tan <= t a n t a b l e 1 5 ;

192 VHDL


when others =>tan <= t a n t a b l e 2 3 ;







z out <= signed ( z i n ) + s igned ( tan add ) + sigma inv ;s igma inv <= not ( sigma ) ;


Z path block 3




end z path3 ;





D.2 D-CORDIC 193

begin

process ( i )begin








end process ;






z out <= signed ( z i n ) + s igned ( tan add ) + sigma inv ;s igma inv <= not ( sigma ) ;end b e h a v i o r a l b l o c k ;

Z path block 4




194 VHDL

sigma : in s t d l o g i c ;z out : out word length ) ;

end z path4 ;





begin

process ( i )begin








end process ;






z out <= signed ( z i n ) + s igned ( tan add ) + sigma inv ;

D.2 D-CORDIC 195



T path block 1


entity t path1 i sport ( t i n : in word length ;


end t path1 ;

architecture b e h a v i o r a l b l o c k of t path1 i s

signal s h i f t , compare , t out2 : word length ;

begin

process ( i , t i n )begin


s h i f t <= ”0000” & t i n (32 downto 4) ;when ”001” =>

s h i f t <= ” 000000000000 ” & t i n (32 downto 12) ;when ”010” =>

s h i f t <= ” 00000000000000000000 ” & t i n (32 downto 20);

when ”100” =>s h i f t <= ” 0000000000000000000000000000 ” & t i n (32

downto 28) ;when others =>

s h i f t <= word l ength ze ro ;end case ;

end process ;

t out2 <= signed ( t i n ) + s igned ( s h i f t ) ;compare <= signed ( t out2 ) − s igned ( x in ) ;t ou t <= t out2 ;sigma <= ( compare (32) ) ;

196 VHDL


T path block 2




end t path2 ;



begin



s h i f t <= ”000000” & t i n (32 downto 6) ;when ”001” =>


s h i f t <= ” 0000000000000000000000 ” & t i n (32 downto22) ;

when ”100” =>s h i f t <= ” 000000000000000000000000000000 ” & t i n (32



end process ;


D.2 D-CORDIC 197


T path block 3




end z path3 ;





begin

process ( i )begin








end process ;

i n v e r t t a n : i n v e r t e r

198 VHDL

port map( tan , t a n i n v e r t e d ) ;





z out <= signed ( z i n ) + s igned ( tan add ) + sigma inv ;s igma inv <= not ( sigma ) ;end b e h a v i o r a l b l o c k ;

T path block 4




end t path4 ;



begin





s h i f t <= ” 00000000000000000000000000 ” & t i n (32downto 26) ;

when others =>

D.2 D-CORDIC 199


end process ;



200 VHDL

D.3 Radix-4 D-CORDIC

VHDL files for the implementation of the radix-4 D-CORDIC algorithm. The archi-tectures is described in section 3.4 beginning on page 43. The correlation between thefiles is listed below.

• Top level

– Radix-4 CORDIC block 1

∗ Shift block 1.a

∗ Shift block 1.b

∗ Shift block 1.c

∗ Shift block 1.d

∗ Z path 1

· Z path shift block 1

· Z path table block 1

∗ Selection function

∗ Operation mode shifter x path

∗ Operation mode shifter y path

∗ Multiplexer x path

∗ Multiplexer y path

∗ Multiplexer selection function

∗ Extra shift block 1.a

∗ Extra shift block 1.b

∗ CPA

∗ Inverter


∗ Shift block 2.a

∗ Shift block 2.b

∗ Shift block 2.c

∗ Shift block 2.d

∗ Z path 2




∗ Operation mode shifter x path

∗ Operation mode shifter y path





D.3 Radix-4 D-CORDIC 201


∗ CPA

∗ Inverter

– Initialization

– Register

– Multiplexer

– Controller

Used VHDL types


subtype p r e c i s i o n o u t i s s t d l o g i c v e c t o r (23 downto 0) ;subtype p r e c i s i o n i n i s s t d l o g i c v e c t o r (23 downto 0) ;subtype word length i s s t d l o g i c v e c t o r (29 downto 0) ; −−

f i r s t b i t has the v a l u e 1subtype word length z i s s t d l o g i c v e c t o r (32 downto 0) ;subtype word length x i s s t d l o g i c v e c t o r (30 downto 0) ;subtype word length y i s s t d l o g i c v e c t o r (32 downto 0) ;

−− c o n s t a n t sconstant word l ength ze ro : word length := X”0000000” & ”00” ;constant word length one : word length := X” f f f f f f f ” & ”11” ;constant s c a l e f a c t o r : unsigned (29 downto 0) := ”

010011011011101001110110110101 ” ;

constant t a n t a b l e 1 0 : word length z := ”000110010010000111111011010101000 ” ;








202 VHDL



constant t a n t a b l e 1 1 0 : word length z := ”000111111111111111111111111111111 ” ;

















Top level








r e s e t : in s t d l o g i c ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;s igma 1 : in s t d l o g i c v e c t o r (2 downto 0) ;s igma 2 : in s t d l o g i c v e c t o r (2 downto 0) ;i o u t : out s t d l o g i c v e c t o r (2 downto 0) ;s c a l e f a c t o r : out s t d l o g i c v e c t o r (2 downto 0) ) ;

end component ;


r e s e t : in s t d l o g i c ;x in : in word length x ;y in : in word length y ;z i n : in word length z ;s igma in : in s t d l o g i c v e c t o r (2 downto 0) ;x out : out word length x ;y out : out word length y ;z out : out word length z ;s igma out : out s t d l o g i c v e c t o r (2 downto 0) ) ;

end component ;

component mux1port ( x1 : in word length x ;

y1 : in word length y ;z1 : in word length z ;sigma1 : in s t d l o g i c v e c t o r (2 downto 0) ;x i t e r a t i v e : in word length x ;y i t e r a t i v e : in word length y ;z i t e r a t i v e : in word length z ;s i g m a i t e r a t i v e : in s t d l o g i c v e c t o r (2 downto 0) ;s e l : in s t d l o g i c v e c t o r (2 downto 0) ;x : out word length x ;y : out word length y ;z : out word length z ;

204 VHDL

sigma : out s t d l o g i c v e c t o r (2 downto 0) ) ;end component ;

component c o r d i c b l o c k 1port ( x in : in word length X ;

y in : in word length y ;z i n : in word length z ;sigma : in s t d l o g i c v e c t o r (2 downto 0) ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x out : out word length x ;y out : out word length y ;z out : out word length z ;next s igma : out s t d l o g i c v e c t o r (2 downto 0) ) ;

end component ;

component c o r d i c b l o c k 2port ( x in : in word length X ;

y in : in word length y ;z i n : in word length z ;sigma : in s t d l o g i c v e c t o r (2 downto 0) ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;s c a l e f a c t o r : in s t d l o g i c v e c t o r (2 downto 0) ;x out : out word length x ;y out : out word length y ;z out : out word length z ;next s igma : out s t d l o g i c v e c t o r (2 downto 0) ) ;

end component ;

component l a t c hport ( x in : in word length ;

enable : in s t d l o g i c ;x out : out word length ) ;

end component ;

component i n i tport ( z i n : in word length z ;

d : out s t d l o g i c v e c t o r (4 downto 0) ) ;end component ;

component s c a l i n g t a b l eport (d : in s t d l o g i c v e c t o r (4 downto 0) ;



signal s c a l e f a c t o r , i , sigma 1 , sigma mux , sigma out1 ,sigma out2 , s igma reg , s i gma reg s ca l e 2 , s i gma sca l e ,s i g m a r e g s c a l e : s t d l o g i c v e c t o r (2 downto 0) ;

signal z out 1 , z out 2 , z reg , z 0 , z mux , z in2 , z i n 3 :word length z ;

signal d : s t d l o g i c v e c t o r (4 downto 0) ;signal x 1 , x s c a l i n g , x out 1 , x out 2 , x reg , x mux :

word length x ;signal y 1 , y out 1 , y out 2 , y reg , y re s , y mux , x r e s :

word length y ;signal s i g m a s e l : s t d l o g i c v e c t o r (1 downto 0) ;

begin

z i n 3 <= ’0 ’ & z i n ;

i n i : i n i tport map( z in3 , d) ;

l u t : s c a l i n g t a b l eport map(d , x s c a l i n g ) ;

s igma 1 (2 ) <= ’1 ’ when s i g m a s e l = ”00” else’ 0 ’ ;

s igma 1 (0 ) <= ’1 ’ when s i g m a s e l = ”11” else −− s i z e’ 0 ’ ;

process (opmode , z in , x s c a l i n g )begin

i f opmode = ”01” theny 1 <= word l ength ze ro & ”000” ;x 1 <= x s c a l i n g ;z i n 2 <= z i n (30 downto 0) & ”00” ;s i g m a s e l <= ’0 ’ & d (4) ;s igma 1 (1 ) <= ’ 0 ’ ;

elsex 1 <= ”01” & ” 00000000000000000000000000000 ” ;y 1 <= z i n (30 downto 0) & ”00” ;z i n 2 <= word l ength ze ro & ”000” ;s i g m a s e l <= z i n (28 downto 27) ;s igma 1 (1 ) <= ’ 1 ’ ;


c o u n t i : counter

206 VHDL

port map( c lk , r e s e t , opmode , sigma out1 , sigma out2 , i ,s c a l e f a c t o r ) ;

mux : mux1port map( x 1 , y 1 , z in2 , sigma 1 , x reg , y reg , z reg ,

s igma reg , i , x mux , y mux , z mux , sigma mux ) ;

b lock1 : c o r d i c b l o c k 1port map(x mux , y mux , z mux , sigma mux , i , opmode , x out 1 ,

y out 1 , z out 1 , s igma out1 ) ;

b lock2 : c o r d i c b l o c k 2port map( x out 1 , y out 1 , z out 1 , sigma out1 , i , opmode ,

s c a l e f a c t o r , x out 2 , y out 2 , z out 2 , s igma out2 ) ;

r eg xyz : regport map( c lk , r e s e t , x out 2 , y out 2 , z out 2 , sigma out2 ,

x reg , y reg , z reg , s igma reg ) ;

process (opmode , x out 2 , y out 2 , z ou t 2 )begin

i f opmode = ”01” thenx r e s <= ”00” & x out 2 ;y r e s <= y out 2 ;

elsex r e s <= z out 2 ;y r e s <= word l ength ze ro & ”000” ;


s i g m a s c a l e <= s i g m a r e g s c a l e when i = ”011” elses igma out1 ;

s i g m a r e g s c a l e 2 <= sigma out2 when i = ”001” elses i g m a r e g s c a l e ;

x out <= x r e s (29 downto 0) ;y out <= y r e s (29 downto 0) ;


i f r e s e t = ’0 ’ thens i g m a r e g s c a l e <= ”000” ;

e l s i f c lk ’ event and c l k = ’1 ’ thens i g m a r e g s c a l e <= s i g m a r e g s c a l e 2 ;



end behav io ra l ;

Radix-4 D-CORDIC block 1


entity c o r d i c b l o c k 1 i sport ( x in : in word length X ;

y in : in word length y ;z i n : in word length z ;sigma : in s t d l o g i c v e c t o r (2 downto 0) ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x out : out word length x ;y out : out word length y ;z out : out word length z ;next s igma : out s t d l o g i c v e c t o r (2 downto 0) ) ;



component i n v e r t e r xport ( x in : in s t d l o g i c v e c t o r (32 downto 0) ;

x out : out s t d l o g i c v e c t o r (32 downto 0) ) ;end component ;

component i n v e r t e r yport ( x in : in word length y ;

x out : out word length y ) ;end component ;

component s h i f t y 1 b 1port ( x in : in word length y ;

i : in s t d l o g i c v e c t o r (2 downto 0) ;x s h i f t : out s t d l o g i c v e c t o r (34 downto 0) ) ;

end component ;


i : in s t d l o g i c v e c t o r (2 downto 0) ;x s h i f t : out word length y ) ;

end component ;

208 VHDL

component s h i f t x 1 b 1port ( x in : in word length x ;


end component ;



end component ;

component z path1port ( z i n : in word length z ;

i : in s t d l o g i c v e c t o r (2 downto 0) ;sigma : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;z out : out word length z ) ;

end component ;

component s e l e c t i o nport ( s e l e c t i o n i n : in word length z ;

x in : in s t d l o g i c v e c t o r (5 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;sigma : out s t d l o g i c v e c t o r (2 downto 0) ) ;

end component ;

component e x t r a s h i f t x 1port ( x in : in s t d l o g i c v e c t o r (32 downto 0) ;

sigma : in s t d l o g i c v e c t o r (2 downto 0) ;x out : out word length x ) ;

end component ;

component e x t r a s h i f t y 1port ( y in : in s t d l o g i c v e c t o r (34 downto 0) ;

sigma : in s t d l o g i c v e c t o r (2 downto 0) ;y out : out word length y ) ;

end component ;

component mux xport ( y in1 : in word length y ;

y in2 : in word length y ;sigma : in s t d l o g i c v e c t o r (2 downto 0) ;x out : out word length x ;car ry : out s t d l o g i c ) ;

end component ;


component mux yport ( x in1 : in word length y ;

x in2 : in word length y ;sigma : in s t d l o g i c v e c t o r (2 downto 0) ;y out : out word length y ) ;

end component ;

component opmode xport ( x in1 : in word length x ;

x in2 : in word length x ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x out : out word length y ) ;

end component ;

component opmode yport ( y in1 : in word length y ;

y in2 : in s t d l o g i c v e c t o r (34 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;y out : out word length y ) ;

end component ;

component e x t r a s h i f t x 2port ( x in : in word length y ;

sigma : in s t d l o g i c v e c t o r (2 downto 0) ;x out : out word length y ) ;

end component ;

component e x t r a s h i f t y 2port ( y in : in word length y ;


end component ;

component cpa xport ( x : in word length x ;

y : in word length x ;car ry : in s t d l o g i c ;s : out word length x ) ;

end component ;

component cpa yport ( x : in word length y ;

y : in word length y ;car ry : in s t d l o g i c ;s : out word length y ) ;

end component ;

210 VHDL

component sh i f t modeport ( y in : in word length y ;

opmode : in s t d l o g i c v e c t o r (1 downto 0) ;y s h i f t : out word length y ) ;

end component ;

component mux se l e c t i onport ( y in : in word length y ;

z i n : in word length y ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;s e l e c t i o n i n : out word length y ) ;

end component ;

signal s h i f t x 2 o u t , e x t r a s h i f t x 1 o u t , cpa x2 out :word length x ;

signal x out2 , mux x out : word length x ;signal s h i f t x 1 o u t : s t d l o g i c v e c t o r (32 downto 0) ;signal opmode x out , e x t r a s h i f t x 2 o u t , x i n v e r t :

word length y ;signal y inve r t , e x t r a s h i f t y 2 o u t , s h i f t y 2 o u t :

word length y ;signal e x t r a s h i f t y 1 o u t , cpa y2 out , y out2 : word length y ;signal mux y out , opmode y out , y o u t s h i f t : word length y ;signal sigma cpa1 , inv s igma : s t d l o g i c ;signal z out2 , s e l e c t i o n i n , z o u t s h i f t : word length z ;signal s h i f t y 1 o u t : s t d l o g i c v e c t o r (34 downto 0) ;signal opmode inv : s t d l o g i c v e c t o r (1 downto 0) ;

begin

inv s igma <= not ( sigma (2) ) ;

−− x paths h i f t x 1 : s h i f t x 1 b 1port map( x in , i , s h i f t x 1 o u t ) ;

s h i f t x 2 : s h i f t x 2 b 1port map( x in , i , s h i f t x 2 o u t ) ;

e x t r a s h i f t x 1 : e x t r a s h i f t x 1port map( s h i f t x 1 o u t , sigma , e x t r a s h i f t x 1 o u t ) ;

cpa x1 : cpa xport map( e x t r a s h i f t x 1 o u t , cpa x2 out , inv sigma , x out2 ) ;

cpa x2 : cpa xport map( x in , mux x out , sigma cpa1 , cpa x2 out ) ;


opmode mux x : opmode xport map( s h i f t x 2 o u t , x in , opmode , opmode x out ) ;

e x t r a s h i f t x 2 : e x t r a s h i f t x 2port map( opmode x out , sigma , e x t r a s h i f t x 2 o u t ) ;

i n v e r t x : i n v e r t e r xport map( e x t r a s h i f t x 2 o u t , x i n v e r t ) ;

mux x1 : mux xport map( y inve r t , e x t r a s h i f t y 2 o u t , sigma , mux x out ,

s igma cpa1 ) ;

x out <= x out2 ;

−− y paths h i f t y 1 : s h i f t y 1 b 1port map( y in , i , s h i f t y 1 o u t ) ;

s h i f t y 2 : s h i f t y 2 b 1port map( y in , i , s h i f t y 2 o u t ) ;

e x t r a s h i f t y 1 : e x t r a s h i f t y 1port map( s h i f t y 1 o u t , sigma , e x t r a s h i f t y 1 o u t ) ;

cpa y1 : cpa yport map( e x t r a s h i f t y 1 o u t , cpa y2 out , inv sigma , y out2 ) ;

cpa y2 : cpa yport map( y in , mux y out , sigma cpa1 , cpa y2 out ) ;

opmode mux y : opmode yport map( s h i f t y 2 o u t , s h i f t y 1 o u t , opmode , opmode y out ) ;

e x t r a s h i f t y 2 : e x t r a s h i f t y 2port map( opmode y out , sigma , e x t r a s h i f t y 2 o u t ) ;

i n v e r t y : i n v e r t e r yport map( e x t r a s h i f t y 2 o u t , y i n v e r t ) ;

mux y1 : mux yport map( x inve r t , e x t r a s h i f t x 2 o u t , sigma , mux y out ) ;

s h i f t v e c : sh i f t modeport map( y out2 , opmode , y o u t s h i f t ) ;

212 VHDL

y out <= y o u t s h i f t ;

−− z pathz : z path1port map( z in , i , sigma , opmode , z out2 ) ;

s h i f t r o t : sh i f t modeport map( z out2 , opmode inv , z o u t s h i f t ) ;

opmode inv <= opmode (0 ) & opmode (1 ) ;z out <= z o u t s h i f t ;

−− s e l e c t i o ns e l : s e l e c t i o nport map( s e l e c t i o n i n , x out2 (30 downto 25) , opmode ,

next s igma ) ;

mux sel : mux se l e c t i onport map( y o u t s h i f t , z o u t s h i f t , opmode , s e l e c t i o n i n ) ;

end behav io ra l ;



entity c o r d i c b l o c k 2 i sport ( x in : in word length X ;

y in : in word length y ;z i n : in word length z ;sigma : in s t d l o g i c v e c t o r (2 downto 0) ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;s c a l e f a c t o r : in s t d l o g i c v e c t o r (2 downto 0) ;x out : out word length x ;y out : out word length y ;z out : out word length z ;next s igma : out s t d l o g i c v e c t o r (2 downto 0) ) ;



component i n v e r t e r x


port ( x in : in s t d l o g i c v e c t o r (32 downto 0) ;x out : out s t d l o g i c v e c t o r (32 downto 0) ) ;

end component ;

component i n v e r t e r yport ( x in : in word length y ;

x out : out word length y ) ;end component ;



end component ;



end component ;



end component ;



end component ;

component z path2port ( z i n : in word length z ;


end component ;



end component ;

214 VHDL



end component ;



end component ;



end component ;



end component ;

component opmode xport ( x in1 : in word length x ;

x in2 : in word length x ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x out : out word length y ) ;

end component ;

component opmode yport ( y in1 : in word length y ;


end component ;


sigma : in s t d l o g i c v e c t o r (2 downto 0) ;x out : out word length y ) ;

end component ;


component e x t r a s h i f t y 2port ( y in : in word length y ;


end component ;



end component ;



end component ;

component sh i f t modeport ( y in : in word length y ;


end component ;



end component ;

signal s h i f t x 2 o u t , e x t r a s h i f t x 1 o u t , cpa x2 out :word length x ;

signal x out2 , mux x out : word length x ;signal s h i f t x 1 o u t : s t d l o g i c v e c t o r (32 downto 0) ;signal opmode x out , e x t r a s h i f t x 2 o u t , x i n v e r t :

word length y ;signal y inve r t , e x t r a s h i f t y 2 o u t , s h i f t y 2 o u t :

word length y ;signal e x t r a s h i f t y 1 o u t , cpa y2 out , y out2 : word length y ;signal mux y out , opmode y out , y o u t s h i f t : word length y ;signal sigma cpa1 , inv s igma : s t d l o g i c ;signal z out2 , s e l e c t i o n i n , z o u t s h i f t : word length z ;signal s h i f t y 1 o u t : s t d l o g i c v e c t o r (34 downto 0) ;

216 VHDL

signal opmode inv : s t d l o g i c v e c t o r (1 downto 0) ;

signal s c a l e s e l e c t : s t d l o g i c v e c t o r (2 downto 0) ;

begin

s c a l e s e l e c t <= sigma when i = ”000” elsesigma when i = ”001” elsesigma when i = ”010” elses c a l e f a c t o r ;

inv s igma <= not ( s c a l e s e l e c t (2 ) ) ;


s h i f t x 2 : s h i f t x 2 b 2port map( x in , i , s h i f t x 2 o u t ) ;

e x t r a s h i f t x 1 : e x t r a s h i f t x 1port map( s h i f t x 1 o u t , s c a l e s e l e c t , e x t r a s h i f t x 1 o u t ) ;

cpa x1 : cpa xport map( e x t r a s h i f t x 1 o u t , cpa x2 out , inv sigma , x out2 ) ;

cpa x2 : cpa xport map( x in , mux x out , sigma cpa1 , cpa x2 out ) ;

opmode mux x : opmode xport map( s h i f t x 2 o u t , x in , opmode , opmode x out ) ;

e x t r a s h i f t x 2 : e x t r a s h i f t x 2port map( opmode x out , sigma , e x t r a s h i f t x 2 o u t ) ;

i n v e r t x : i n v e r t e r xport map( e x t r a s h i f t x 2 o u t , x i n v e r t ) ;

mux x1 : mux xport map( y inve r t , e x t r a s h i f t y 2 o u t , sigma , mux x out ,

s igma cpa1 ) ;

x out <= x out2 ;

−− y paths h i f t y 1 : s h i f t y 1 b 2


port map( y in , i , s h i f t y 1 o u t ) ;


e x t r a s h i f t y 1 : e x t r a s h i f t y 1port map( s h i f t y 1 o u t , s c a l e s e l e c t , e x t r a s h i f t y 1 o u t ) ;

cpa y1 : cpa yport map( e x t r a s h i f t y 1 o u t , cpa y2 out , inv sigma , y out2 ) ;

cpa y2 : cpa yport map( y in , mux y out , sigma cpa1 , cpa y2 out ) ;

opmode mux y : opmode yport map( s h i f t y 2 o u t , s h i f t y 1 o u t , opmode , opmode y out ) ;

e x t r a s h i f t y 2 : e x t r a s h i f t y 2port map( opmode y out , sigma , e x t r a s h i f t y 2 o u t ) ;

i n v e r t y : i n v e r t e r yport map( e x t r a s h i f t y 2 o u t , y i n v e r t ) ;

mux y1 : mux yport map( x inve r t , e x t r a s h i f t x 2 o u t , sigma , mux y out ) ;

s h i f t v e c : sh i f t modeport map( y out2 , opmode , y o u t s h i f t ) ;

y out <= y o u t s h i f t ;

−− z pathz : z path2port map( z in , i , sigma , opmode , z out2 ) ;

s h i f t r o t : sh i f t modeport map( z out2 , opmode inv , z o u t s h i f t ) ;

opmode inv <= opmode (0 ) & opmode (1 ) ;z out <= z o u t s h i f t ;

−− s e l e c t i o ns e l : s e l e c t i o nport map( s e l e c t i o n i n , x out2 (30 downto 25) , opmode ,

next s igma ) ;

218 VHDL

mux sel : mux se l e c t i onport map( y o u t s h i f t , z o u t s h i f t , opmode , s e l e c t i o n i n ) ;

end behav io ra l ;

Controller



r e s e t : in s t d l o g i c ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;s igma 1 : in s t d l o g i c v e c t o r (2 downto 0) ;s igma 2 : in s t d l o g i c v e c t o r (2 downto 0) ;i o u t : out s t d l o g i c v e c t o r (2 downto 0) ;s c a l e f a c t o r : out s t d l o g i c v e c t o r (2 downto 0) ) ;

end counter ;


signal next i , i : s t d l o g i c v e c t o r (2 downto 0) ;signal s igma 4 save , s igma 5 save , s igma 6 save :

s t d l o g i c v e c t o r (1 downto 0) ;signal aa , s igma 6 reg , s igma 5 reg , s i gma 4 reg :

s t d l o g i c v e c t o r (1 downto 0) ;

begin

i o u t <= i ;

process (opmode , i , r e s e t , s igma 4 reg , s igma 5 reg ,s igma 6 reg , sigma 2 , s igma 1 )

begincase i i s

when ”000” =>i f r e s e t = ’1 ’ and ( opmode = ”01” or opmode = ”10” )

thenn e x t i <= ”001” ;

elsen e x t i <= ”000” ;

end i f ;s i gma 4 save <= sigma 4 reg ;


s igma 5 save <= sigma 5 reg ;s igma 6 save <= sigma 6 reg ;aa <= ”10” ;

when ”001” =>n e x t i <= ”010” ;s igma 5 save <= sigma 2 (2) & sigma 2 (0) ;s igma 6 save <= sigma 6 reg ;s igma 4 save <= sigma 1 (2) & sigma 1 (0) ;aa <= ”10” ;

when ”010” =>n e x t i <= ”011” ;s igma 4 save <= sigma 4 reg ;s igma 5 save <= sigma 5 reg ;s igma 6 save <= sigma 1 (2) & sigma 1 (0) ;aa <= ”10” ;

when ”011” =>n e x t i <= ”100” ;s igma 4 save <= sigma 4 reg ;s igma 5 save <= sigma 5 reg ;s igma 6 save <= sigma 6 reg ;aa <= sigma 4 reg ;

when ”100” =>n e x t i <= ”101” ;s igma 4 save <= sigma 4 reg ;s igma 5 save <= sigma 5 reg ;s igma 6 save <= sigma 6 reg ;aa <= sigma 5 reg ;

when others =>n e x t i <= ”000” ;s igma 4 save <= sigma 4 reg ;s igma 5 save <= sigma 5 reg ;s igma 6 save <= sigma 6 reg ;aa <= sigma 6 reg ;


s c a l e f a c t o r (2 ) <= aa (1) or opmode (1 ) ;s c a l e f a c t o r (1 ) <= ’ 0 ’ ; −− s i g n b i ts c a l e f a c t o r (0 ) <= aa (0) ;


i f r e s e t = ’0 ’ theni <= ”000” ;s i gma 4 reg <= ”10” ;s i gma 5 reg <= ”10” ;s i gma 6 reg <= ”10” ;

220 VHDL

e l s i f c lk ’ event and c l k = ’1 ’ theni <= n e x t i ;s i gma 4 reg <= sigma 4 save ;s i gma 5 reg <= sigma 5 save ;s i gma 6 reg <= sigma 6 save ;


end behav io ra l ;

CPA x


entity cpa x i sport ( x : in word length x ;


end cpa x ;

architecture behav io ra l of cpa x i s

begin

s <= unsigned ( x )+unsigned ( y )+carry ;

end behav io ra l ;

CPA y


entity cpa y i sport ( x : in word length y ;


end cpa y ;


architecture behav io ra l of cpa y i s

begin


end behav io ra l ;

Inverter x


entity i n v e r t e r x i sport ( x in : in s t d l o g i c v e c t o r (32 downto 0) ;

x out : out s t d l o g i c v e c t o r (32 downto 0) ) ;end i n v e r t e r x ;

architecture behav io ra l of i n v e r t e r x i s

begin

x out <= not ( x in ) ;

end behav io ra l ;

Inverter y


entity i n v e r t e r y i sport ( x in : in word length y ;

x out : out word length y ) ;end i n v e r t e r y ;

architecture behav io ra l of i n v e r t e r y i sbegin


end behav io ra l ;

222 VHDL

Multiplexer


entity mux1 i sport ( x1 : in word length x ;

y1 : in word length y ;z1 : in word length z ;sigma1 : in s t d l o g i c v e c t o r (2 downto 0) ;x i t e r a t i v e : in word length x ;y i t e r a t i v e : in word length y ;z i t e r a t i v e : in word length z ;s i g m a i t e r a t i v e : in s t d l o g i c v e c t o r (2 downto 0) ;s e l : in s t d l o g i c v e c t o r (2 downto 0) ;x : out word length x ;y : out word length y ;z : out word length z ;sigma : out s t d l o g i c v e c t o r (2 downto 0) ) ;

end mux1 ;


begin

process ( x1 , y1 , z1 , x i t e r a t i v e , y i t e r a t i v e , z i t e r a t i v e ,s e l , sigma1 , s i g m a i t e r a t i v e ) i sbegin

i f s e l = ”000” thenx <= x1 ;y <= y1 ;z <= z1 ;sigma <= sigma1 ;

elsex <= x i t e r a t i v e ;y <= y i t e r a t i v e ;z <= z i t e r a t i v e ;sigma <= s i g m a i t e r a t i v e ;


end behav io ra l ;

Register




r e s e t : in s t d l o g i c ;x in : in word length x ;y in : in word length y ;z i n : in word length z ;s igma in : in s t d l o g i c v e c t o r (2 downto 0) ;x out : out word length x ;y out : out word length y ;z out : out word length z ;s igma out : out s t d l o g i c v e c t o r (2 downto 0) ) ;

end reg ;




x out <= ’0 ’ & word l ength ze ro ;y out <= ”000” & word l ength ze ro ;z out <= ”000” & word l ength ze ro ;s igma out <= ”000” ;

e l s i f c lk ’ event and c l k = ’1 ’ thenx out <= x in ;y out <= y in ;z out <= z i n ;s igma out <= sigma in ;


end behav io ra l ;

Shift block 1.a


entity s h i f t x 1 b 1 i sport ( x in : in word length x ;

224 VHDL


end s h i f t x 1 b 1 ;

architecture behav io ra l of s h i f t x 1 b 1 i s

begin



x s h i f t <= x in (30) & x in (30) & x in (30) & x in (30)&( x in (30 downto 2) ) ;


&x in (30) & x in (30) & x in (30) & x in (30) &x in (30) & x in (30) & x in (30) & x in (30) &( x in(30 downto 10) ) ;


&x in (30) & x in (30) & x in (30) & x in (30) &x in (30) & x in (30) & x in (30) & x in (30) &x in(30) & x in (30) & x in (30) & x in (30) &x in (30)& x in (30) & x in (30) & x in (30) &( x in (30downto 18) ) ;


&x in (30) & x in (30) & x in (30) & x in (30) &x in (30) & x in (30) & x in (30) & x in (30) &x in(30) & x in (30) & x in (30) & x in (30) &x in (30)& x in (30) & x in (30) & x in (30) &x in (30) &x in (30) & x in (30) & x in (30) &x in (30) & x in(30) & x in (30) & x in (30) &( x in (30 downto 26) );

when others =>x s h i f t <= word l ength ze ro & ”000” ;


end behav io ra l ;

Shift block 1.b








begin



x s h i f t <= ’0 ’ & x in (30 downto 1) ;when ”001” =>

x s h i f t <= ”00000” & x in (30 downto 5) ;when ”010” =>

x s h i f t <= ” 000000000 ” & x in (30 downto 9) ;when ”011” =>


x s h i f t <= ” 00000000000000000 ” & x in (30 downto 17) ;when others =>

x s h i f t <= ” 000000000000000000000 ” & x in (30 downto21) ;


end behav io ra l ;

Shift block 1.c


entity s h i f t y 1 b 1 i sport ( x in : in word length y ;


end s h i f t y 1 b 1 ;

architecture behav io ra l of s h i f t y 1 b 1 i s

226 VHDL

begin



x s h i f t <= x in (32) & x in (32) & x in (32) & x in (32)&( x in (32 downto 2) ) ;


&x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) &( x in(32 downto 10) ) ;




&x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) &x in(32) & x in (32) & x in (32) & x in (32) &x in (32)& x in (32) & x in (32) & x in (32) &x in (32) &x in (32) & x in (32) & x in (32) &x in (32) & x in(32) & x in (32) & x in (32) &( x in (32 downto 26) );

when others =>x s h i f t <= word l ength ze ro & ”00000” ;


end behav io ra l ;

Shift block 1.d





x s h i f t : out word length y ) ;end s h i f t y 2 b 1 ;


begin



x s h i f t <= ’0 ’ & x in (32 downto 1) ;when ”001” =>







end behav io ra l ;

Shift block 2.a






begin


228 VHDL


x s h i f t <= x in (30) & x in (30) & x in (30) & x in (30)& x in (30) & x in (30) & x in (30) & x in (30) &(

x in (30 downto 6) ) ;when ”001” =>


x in (30) & x in (30) & x in (30) & x in (30) & x in(30) & x in (30) & x in (30) & x in (30) &( x in (30downto 14) ) ;


& x in (30) & x in (30) & x in (30) & x in (30) &x in (30) & x in (30) & x in (30) & x in (30) & x in(30) & x in (30) & x in (30) & x in (30) &x in (30)& x in (30) & x in (30) & x in (30) & x in (30) &x in (30) & x in (30) & x in (30) &( x in (30 downto22) ) ;


& x in (30) & x in (30) & x in (30) & x in (30) &x in (30) & x in (30) & x in (30) & x in (30) & x in(30) & x in (30) & x in (30) & x in (30) &( x in (30downto 14) ) ;


&x in (30) & x in (30) & x in (30) & x in (30) &x in (30) & x in (30) & x in (30) & x in (30) &x in(30) & x in (30) & x in (30) & x in (30) & x in (30)& x in (30) & x in (30) & x in (30) &( x in (30

downto 18) ) ;when others =>


x in (30) & x in (30) & x in (30) & x in (30) & x in(30) & x in (30) & x in (30) & x in (30) &x in (30)& x in (30) & x in (30) & x in (30) & x in (30) &x in (30) & x in (30) & x in (30) &( x in (30 downto22) ) ;


end behav io ra l ;

Shift block 2.b







begin








when others =>x s h i f t <= ” 00000000000000000000000 ” & x in (30

downto 23) ;end case ;

end process ;

end behav io ra l ;

Shift block 2.c




230 VHDL

x s h i f t : out s t d l o g i c v e c t o r (34 downto 0) ) ;end s h i f t y 1 b 2 ;


begin



x s h i f t <= x in (32) & x in (32) & x in (32) & x in (32)& x in (32) & x in (32) & x in (32) & x in (32) &(

x in (32 downto 6) ) ;when ”001” =>


x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) &( x in (32downto 14) ) ;


& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) &x in (32)& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) &( x in (32 downto22) ) ;


& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) & ( x in (32downto 14) ) ;




& x in (32) & x in (32) & x in (32) & x in (32) &x in (32) & x in (32) & x in (32) & x in (32) & x in(32) & x in (32) & x in (32) & x in (32) &x in (32)& x in (32) & x in (32) & x in (32) & x in (32) &


x in (32) & x in (32) & x in (32) &( x in (32 downto22) ) ;


end behav io ra l ;

Shift block 2.d






begin










end process ;

end behav io ra l ;

232 VHDL

Z path block 1


entity z path1 i sport ( z i n : in word length z ;


end z path1 ;


component z p a t h s h i f t 1port ( s h i f t s e l e c t : in s t d l o g i c v e c t o r (2 downto 0) ;

tan d : in s t d l o g i c v e c t o r (30 downto 0) ;t a n s h i f t : out s t d l o g i c v e c t o r (30 downto 0) ) ;

end component ;

component z p a t h l u t 1port ( t a b l e s e l e c t : in s t d l o g i c v e c t o r (3 downto 0) ;

tan d : out s t d l o g i c v e c t o r (30 downto 0) ) ;end component ;

signal tan add : word length z ;signal car ry : s t d l o g i c ;signal t a b l e s e l e c t : s t d l o g i c v e c t o r (3 downto 0) ;signal s h i f t s e l e c t : s t d l o g i c v e c t o r (2 downto 0) ;signal tan d , t a n s h i f t : s t d l o g i c v e c t o r (30 downto 0) ;

begin

t a b l e s e l e c t <= i & sigma (0) ;

a t an t a b l e : z p a t h l u t 1port map( t a b l e s e l e c t , tan d ) ;

s h i f t s e l e c t <= ”111” when sigma (2) = ’1 ’ else”110” when opmode = ”01” elsei ;

s h i f t : z p a t h s h i f t 1port map( s h i f t s e l e c t , tan d , t a n s h i f t ) ;


process ( t a n s h i f t , sigma )begin

i f sigma (1) = ’0 ’ thentan add <= ”11” & not ( t a n s h i f t ) ;car ry <= ’ 1 ’ ;

elsetan add <= ”00” & t a n s h i f t ;car ry <= ’ 0 ’ ;


z out <= signed ( z i n ) + s igned ( tan add ) + carry ;


Z path block 2


entity z path2 i sport ( z i n : in word length z ;


end z path2 ;




end component ;



signal tan add : word length z ;signal car ry : s t d l o g i c ;signal t a b l e s e l e c t : s t d l o g i c v e c t o r (3 downto 0) ;

234 VHDL

signal s h i f t s e l e c t : s t d l o g i c v e c t o r (2 downto 0) ;signal tan d , t a n s h i f t : s t d l o g i c v e c t o r (30 downto 0) ;

begin

t a b l e s e l e c t <= i & sigma (0) ;


s h i f t s e l e c t <= ”111” when sigma (2) = ’1 ’ else”110” when opmode = ”01” elsei ;



i f sigma (1) = ’0 ’ thentan add <= ”11” & not ( t a n s h i f t ) ;car ry <= ’ 1 ’ ;

elsetan add <= ”00” & t a n s h i f t ;car ry <= ’ 0 ’ ;


z out <= signed ( z i n ) + s igned ( tan add ) + carry ;


Z path shift block 1


entity z p a t h s h i f t 1 i sport ( s h i f t s e l e c t : in s t d l o g i c v e c t o r (2 downto 0) ;


end z p a t h s h i f t 1 ;

architecture behav io ra l of z p a t h s h i f t 1 i s


begin

process ( s h i f t s e l e c t , tan d )begin

case s h i f t s e l e c t i swhen ”000” =>

t a n s h i f t <= ”00” & tan d (30 downto 2) ;when ”001” =>


t a n s h i f t <= ” 0000000000 ” & tan d (30 downto 10) ;when ”011” =>


t a n s h i f t <= ” 000000000000000000 ” & tan d (30 downto18) ;

when ”101” =>t a n s h i f t <= ” 0000000000000000000000 ” & tan d (30

downto 22) ;when ”110” =>

t a n s h i f t <= tan d ;when others =>

t a n s h i f t <= ’0 ’ & word l ength ze ro ;end case ;

end process ;

end behav io ra l ;







begin


236 VHDL



t a n s h i f t <= ”00000000 ” & tan d (30 downto 8) ;when ”010” =>





t a n s h i f t <= ” 000000000000000000000000 ” & tan d (30downto 24) ;

when ”110” =>t a n s h i f t <= tan d ;

when others =>t a n s h i f t <= ’0 ’ & word l ength ze ro ;


end behav io ra l ;

Z path table block 1


entity z p a t h l u t 1 i sport ( t a b l e s e l e c t : in s t d l o g i c v e c t o r (3 downto 0) ;

tan d : out s t d l o g i c v e c t o r (30 downto 0) ) ;end z p a t h l u t 1 ;

architecture behav io ra l of z p a t h l u t 1 i s

begin

process ( t a b l e s e l e c t )begin

case t a b l e s e l e c t i swhen ”0000” =>

tan d <= t a n t a b l e 1 1 (30 downto 0) ;when ”0001” =>










tan d <= t a n t a b l e 1 9 (30 downto 0) ;when others =>

tan d <= t a n t a b l e 2 9 (30 downto 0) ;end case ;

end process ;

end behav io ra l ;






begin




tan d <= t a n t a b l e 2 2 (30 downto 0) ;

238 VHDL

when ”0010” =>tan d <= t a n t a b l e 1 4 (30 downto 0) ;







when others =>tan d <= t a n t a b l e 2 8 (30 downto 0) ;


end behav io ra l ;

Extra shift block 1.a


entity e x t r a s h i f t x 1 i sport ( x in : in s t d l o g i c v e c t o r (32 downto 0) ;


end e x t r a s h i f t x 1 ;

architecture behav io ra l of e x t r a s h i f t x 1 i s

begin

process ( sigma , x in )begin

i f sigma (2) = ’1 ’ thenx out <= word l ength ze ro & ’ 0 ’ ;

e l s i f sigma (0) = ’0 ’ thenx out <= not ( x in (32 downto 2) ) ;

elsex out <= not ( x in (30 downto 0) ) ;



end behav io ra l ;

Extra shift block 1.b


entity e x t r a s h i f t y 1 i sport ( y in : in s t d l o g i c v e c t o r (34 downto 0) ;


end e x t r a s h i f t y 1 ;

architecture behav io ra l of e x t r a s h i f t y 1 i s

begin

process ( sigma , y in )begin

i f sigma (2) = ’1 ’ theny out <= word l ength ze ro & ”000” ;

e l s i f sigma (0) = ’0 ’ theny out <= not ( y in (34 downto 2) ) ;

elsey out <= not ( y in (32 downto 0) ) ;


end behav io ra l ;



entity e x t r a s h i f t y 2 i sport ( y in : in word length y ;




240 VHDL

begin


i f sigma (0) = ’0 ’ theny out <= y in ;

elsey out <= y in (31 downto 0) & ’ 0 ’ ;


end behav io ra l ;



entity e x t r a s h i f t y 2 i sport ( y in : in word length y ;




begin


i f sigma (0) = ’0 ’ theny out <= y in ;

elsey out <= y in (31 downto 0) & ’ 0 ’ ;


end behav io ra l ;

Multiplexer x path



entity mux x i sport ( y in1 : in word length y ;


end mux x ;

architecture behav io ra l of mux x i s

begin

process ( sigma , y in2 , y in1 )begin

i f sigma (2) = ’1 ’ thenx out <= word l ength ze ro & ’ 0 ’ ;car ry <= ’ 0 ’ ;

e l s i f sigma (1) = ’0 ’ then −− i f one then n e g a t i v ex out <= y in1 (30 downto 0) ;car ry <= ’ 1 ’ ;

elsex out <= y in2 (30 downto 0) ;car ry <= ’ 0 ’ ;


end behav io ra l ;

Multiplexer y path


entity mux y i sport ( x in1 : in word length y ;


end mux y ;

architecture behav io ra l of mux y i s

begin

process ( sigma , x in2 , x in1 )begin

242 VHDL


e l s i f sigma (1) = ’1 ’ then −− i f one then n e g a t i v ey out <= x in1 ;

elsey out <= x in2 ;

end i f ;end process ;end behav io ra l ;

Multiplexer selection function


entity mux se l e c t i on i sport ( y in : in word length y ;


end mux se l e c t i on ;

architecture behav io ra l of mux se l e c t i on i s

begin

process (opmode , z in , y in )begin

i f opmode = ”01” thens e l e c t i o n i n <= z i n ;

elses e l e c t i o n i n <= y in ;


end behav io ra l ;

Operation mode shifter x path


entity opmode x i s


port ( x in1 : in word length x ;x in2 : in word length x ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x out : out word length y ) ;

end opmode x ;

architecture behav io ra l of opmode x i s

begin

process (opmode , x in1 , x in2 )begin

i f opmode = ”01” then −− i f zero then no s h i f tx out <= ”00” & x in1 ;

elsex out <= ’0 ’ & x in2 & ’ 0 ’ ;


Operation mode shifter y path


entity opmode y i sport ( y in1 : in word length y ;


end opmode y ;

architecture behav io ra l of opmode y i s

begin

process (opmode , y in1 , y in2 )begin

i f opmode = ”01” theny out <= y in1 ;

elsey out <= y in2 (33 downto 1) ;


244 VHDL

Scaling table


entity s c a l i n g t a b l e i sport (d : in s t d l o g i c v e c t o r (4 downto 0) ;

x out : out word length x ) ;end s c a l i n g t a b l e ;

architecture behav io ra l of s c a l i n g t a b l e i s

begin

process (d) i sbegin

case d i swhen ”00000” =>

x out <= ” 0100000000000000000000000000000 ”;

when ”00001” =>x out <= ” 0011111111111100000000000011111 ”

;when ”00010” =>

x out <= ” 0011111111110000000000111111111 ”;

when ”00100” =>x out <= ” 0011111111000000001111111100000 ” ;

when ”00101” =>x out <= ” 0011111110111100010000111111101 ” ;

when ”00110” =>x out <= ” 0011111110110000010100111010101 ” ;

when ”01000” =>x out <= ” 0011111100000011111100000011111 ” ;

when ”01001” =>x out <= ” 0011111100000000000000000011111 ” ;

when ”01010” =>x out <= ” 0011111011110100001100110011001 ” ;

when ”10000” =>x out <= ” 0011110000111100001111000011110 ” ;

when ”10001” =>x out <= ” 0011110000111000011110001011001 ” ;

when ”10010” =>x out <= ” 0011110000101101001100001110111 ” ;

when ”10100” =>x out <= ” 0011110000000000001110111111111 ” ;


when ”10101” =>x out <= ” 0011101111111100011111000011011 ” ;

when ”10110” =>x out <= ” 0011101111110001001111111010111 ” ;

when ”11000” =>x out <= ” 0011101101001111000000000011101 ” ;

when ”11001” =>x out <= ” 0011101101001011010010111000010 ” ;

when others =>x out <= ” 0011101101000000001100000010111 ” ;


end behav io ra l ;

Initialization


entity i n i t i sport ( z i n : in word length z ;

d : out s t d l o g i c v e c t o r (4 downto 0) ) ;end i n i t ;

architecture behav io ra l of i n i t i s

signal mux 1 carry , mux 2 carry : s t d l o g i c ;signal cpa 0 out , mux 0 , z 0 , z 1 , mux 1 , cpa 1 out , z 2 ,

mux 2 , cpa 2 out : word length z ;

begin

z 0 <= z i n (32) & z i n (29 downto 0) & ”00” ;

process ( z 0 )begin

i f z 0 (32 downto 29) = ”0000” thenmux 1 <= ”000” & word l ength ze ro ;mux 1 carry <= ’ 0 ’ ;d (4 ) <= ’ 0 ’ ;

elsemux 1 <= not ( t a n t a b l e 1 1 ) ;mux 1 carry <= ’ 1 ’ ;d (4 ) <= ’ 1 ’ ;

246 VHDL


cpa 1 out <= unsigned ( z 0 ) + unsigned ( mux 1 ) + mux 1 carry ;z 1 <= cpa 1 out (32) & cpa 1 out (29 downto 0) & ”00” ;


i f z 1 (32) = ’0 ’ theni f z 1 (32 downto 29) = ”0000” then

mux 2 <= ”000” & word l ength ze ro ;mux 2 carry <= ’ 0 ’ ;d (3 downto 2) <= ”00” ;

e l s i f z 1 (32 downto 29) = ”0010” or z 1 (32 downto 29)= ”0001” thenmux 2 <= not ( t a n t a b l e 1 2 ) ;mux 2 carry <= ’ 1 ’ ;d (3 downto 2) <= ”01” ;

elsemux 2 <= not ( t a n t a b l e 2 2 ) ;mux 2 carry <= ’ 1 ’ ;d (3 downto 2) <= ”10” ;

end i f ;else

i f z 1 (32 downto 29) = ”1111” thenmux 2 <= ”000” & word l ength ze ro ;mux 2 carry <= ’ 0 ’ ;d (3 downto 2) <= ”00” ;

e l s i f z 1 (32 downto 29) = ”1101” or z 1 (32 downto29) = ”1110” then

mux 2 <= t a n t a b l e 1 2 ;mux 2 carry <= ’ 0 ’ ;d (3 downto 2) <= ”01” ;

elsemux 2 <= t a n t a b l e 2 2 ;mux 2 carry <= ’ 0 ’ ;d (3 downto 2) <= ”10” ;

end i f ;end i f ;

end process ;

cpa 2 out <= unsigned ( z 1 ) + unsigned ( mux 2 ) + mux 2 carry ;z 2 <= cpa 2 out (32) & cpa 2 out (29 downto 0) & ”00” ;


i f z 2 (32) = ’0 ’ then


i f z 2 (32 downto 29) = ”0000” thend(1 downto 0) <= ”00” ;

e l s i f z 2 (32 downto 29) = ”0010” or z 2 (32 downto 29)= ”0001” then

d(1 downto 0) <= ”01” ;else

d(1 downto 0) <= ”10” ;end i f ;

elsei f z 2 (32 downto 29) = ”1111” then

d(1 downto 0) <= ”00” ;e l s i f z 2 (32 downto 29) = ”1101” or z 2 (32 downto 29)

= ”1110” thend(1 downto 0) <= ”01” ;

elsed(1 downto 0) <= ”10” ;

end i f ;end i f ;

end process ;

end behav io ra l ;

Selection function


entity s e l e c t i o n i sport ( s e l e c t i o n i n : in word length z ;


end s e l e c t i o n ;

architecture behav io ra l of s e l e c t i o n i ssignal z : word length z ;signal sigma z , sigma y : s t d l o g i c v e c t o r (2 downto 0) ;signal a1 , a2 , temp x : s t d l o g i c v e c t o r (7 downto 0) ;signal temp add , x add : s t d l o g i c v e c t o r (7 downto 0) ;

begin

temp x <= ”00” & x in ;temp add <= ( unsigned ( ’ 0 ’ & x in & ’0 ’ ) + unsigned ( ”00” &

x in ) ) ;

248 VHDL

x add <= not ( temp add ) ;a1 <= unsigned ( z (32 downto 25) )+ unsigned ( ( x add ) ) ;a2 <= unsigned ( z (32 downto 25) )+ unsigned ( ”11” & (not ( x in ) ) ) ;

s igma y (2) <= a1 (7) and a2 (7 ) ;s igma y (1) <= not ( s e l e c t i o n i n (32) ) ;s igma y (0) <= not ( a1 (7 ) ) ;

s igma z (2 ) <= (not ( z (31) nor z (30) ) ) nor z (29) ;s igma z (1 ) <= s e l e c t i o n i n (32) ;s igma z (0 ) <= ( z (30) xnor z (29) ) or z (31) ;

process ( s e l e c t i o n i n )begin

i f s e l e c t i o n i n (32) = ’1 ’ thenz <= not ( s e l e c t i o n i n ) ;

elsez <= ( s e l e c t i o n i n ) ;


sigma <= sigma z when opmode = ”01” elses igma y ;

end behav io ra l ;

D.4 Radix-8 D-CORDIC

VHDL files for the implementation of the optimized radix-8 D-CORDIC algorithm.The architectures is described in section 3.5 beginning on page 50. The correlationbetween the files is listed below.

• Top level


∗ Shift block 1.a

∗ Shift block 1.b

∗ Shift block 1.c

∗ Shift block 1.d

∗ Z path 1










∗ Extra shift block 1.c

∗ Extra shift block 1.d

∗ CPA

∗ CPA 22-bit

∗ CPA 26-bit

∗ Inverter

– Initialization rotation mode

– Initialization vectoring mode

– Operation mode shifter x path

– Operation mode shifter y path

– Last iteration block

– Register

– Multiplexer

– Gray controller

Used VHDL types


subtype word length i s s t d l o g i c v e c t o r (29 downto 0) ; −−f i r s t b i t has the v a l u e 1

subtype word length x i s s t d l o g i c v e c t o r (30 downto 0) ;subtype word length y i s s t d l o g i c v e c t o r (34 downto 0) ;subtype word length z i s s t d l o g i c v e c t o r (34 downto 0) ;

−− c o n s t a n t sconstant word l ength ze ro : word length := X”0000000” & ”00” ;constant word length one : word length := X” f f f f f f f ” & ”11” ;

constant x vec1 : s t d l o g i c v e c t o r (30 downto 0) := ”0011111100000000000000000000000 ” ;


250 VHDL


constant y vec1 : s t d l o g i c v e c t o r (34 downto 0) := ”11111000000000000000000000000000000 ” ;



constant t a n t a b l e 1 1 : s t d l o g i c v e c t o r (32 downto 0) := ”000111111101010110111010100110101 ” ;


































Top level



r e s e t : in s t d l o g i c ;z i n : in s t d l o g i c v e c t o r (31 downto 0) ;enable : in s t d l o g i c ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;r e s u l t : out s t d l o g i c v e c t o r (29 downto 0) ) ;


252 VHDL


component c o n t r o l l e rport ( c l k : in s t d l o g i c ;

r e s e t : in s t d l o g i c ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;s igma 1 : in s t d l o g i c v e c t o r (3 downto 0) ;i o u t : out s t d l o g i c v e c t o r (2 downto 0) ;s c a l e f a c t o r : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end component ;


r e s e t : in s t d l o g i c ;enable : in s t d l o g i c ;x in : in word length x ;y in : in s t d l o g i c v e c t o r (31 downto 0) ;z i n : in s t d l o g i c v e c t o r (31 downto 0) ;s igma in : in s t d l o g i c v e c t o r (3 downto 0) ;x out : out word length x ;y out : out s t d l o g i c v e c t o r (31 downto 0) ;z out : out s t d l o g i c v e c t o r (31 downto 0) ;s igma out : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end component ;

component mux1port ( x r o t : in word length x ;

y r o t : in word length y ;z r o t : in word length z ;s i gma rot : in s t d l o g i c v e c t o r (3 downto 0) ;x vec : in word length x ;y vec : in word length y ;z vec : in word length z ;s igma vec : in s t d l o g i c v e c t o r (3 downto 0) ;x i t e r a t i v e : in word length x ;y i t e r a t i v e : in word length y ;z i t e r a t i v e : in word length z ;s i g m a i t e r a t i v e : in s t d l o g i c v e c t o r (3 downto 0) ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x : out word length x ;y : out word length y ;z : out word length z ;sigma : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end component ;

component r ad ix8 b l o ck


port ( x in : in word length X ;y in : in word length y ;z i n : in word length z ;sigma : in s t d l o g i c v e c t o r (3 downto 0) ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;s c a l e f a c t o r : in s t d l o g i c v e c t o r (3 downto 0) ;x out : out word length x ;y out : out word length y ;z out : out word length z ;next s igma : out s t d l o g i c v e c t o r (3 downto 0) ;d1 out : in s t d l o g i c v e c t o r (10 downto 0) ;d3 out : in s t d l o g i c v e c t o r (10 downto 0) ;d5 out : in s t d l o g i c v e c t o r (10 downto 0) ;d7 out : in s t d l o g i c v e c t o r (10 downto 0) ) ;

end component ;

component i n i t r o tport ( z i n : in s t d l o g i c v e c t o r (31 downto 0) ;

z out : out word length z ;d : out s t d l o g i c v e c t o r (5 downto 0) ) ;

end component ;

component s c a l i n g t a b l eport (d : in s t d l o g i c v e c t o r (5 downto 0) ;

x out : out s t d l o g i c v e c t o r (30 downto 0) ;y out : out s t d l o g i c v e c t o r (30 downto 0) ) ;

end component ;

component i n i t v e cport ( y in : in s t d l o g i c v e c t o r (31 downto 0) ;

x out : out word length x ;y out : out word length y ;z out : out word length z ;d1 out : out s t d l o g i c v e c t o r (10 downto 0) ;d3 out : out s t d l o g i c v e c t o r (10 downto 0) ;d5 out : out s t d l o g i c v e c t o r (10 downto 0) ;d7 out : out s t d l o g i c v e c t o r (10 downto 0) ;sigma : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end component ;

component l a s t i t e r a t i o nport ( x in : in word length x ;

y in : in word length y ;z i n : in word length y ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;

254 VHDL

s igma in : in s t d l o g i c v e c t o r (3 downto 0) ;x out : out s t d l o g i c v e c t o r (29 downto 0) ) ;

end component ;

component sh i f t modeport ( y in : in s t d l o g i c v e c t o r (31 downto 0) ;


end component ;

component sh i f t modezport ( y in : in s t d l o g i c v e c t o r (31 downto 0) ;

opmode : in s t d l o g i c v e c t o r (1 downto 0) ;y s h i f t : out word length z ) ;

end component ;

signal s c a l e f a c t o r , sigma 1 , sigma mux , sigma out1 , s igma reg: s t d l o g i c v e c t o r (3 downto 0) ;

signal z out 1 , z mux , z in2 , z 1 , z vec , z ro t , y r e g s h i f t ,z r e g s h i f t : word length z ;

signal d : s t d l o g i c v e c t o r (5 downto 0) ;signal i : s t d l o g i c v e c t o r (2 downto 0) ;signal x s c a l i n g , y s c a l i n g , x out 1 , x reg , x mux , x vec ,

x r o t : word length x ;signal y out 1 , y mux , y vec , y rot , l a t c h i n : word length y ;signal y reg , z reg , z e n a b l e r o t , z enab l e vec , z i n 3 :

s t d l o g i c v e c t o r (31 downto 0) ;signal d1 , d3 , d5 , d7 : s t d l o g i c v e c t o r (10 downto 0) ;signal opmode inv : s t d l o g i c v e c t o r (1 downto 0) ;signal s igma vec , s i gma rot : s t d l o g i c v e c t o r (3 downto 0) ;signal enab l e ro t , enab l e vec : s t d l o g i c ;

begin

e n a b l e r o t <= ’1 ’ when opmode (0 ) = ’1 ’ else ’ 0 ’ ;enab l e vec <= ’1 ’ when opmode (1 ) = ’1 ’ else ’ 0 ’ ;opmode inv <= opmode (0 ) & opmode (1 ) ;

process ( enab l e ro t , z i n )begin

for i in 0 to 31 loopz e n a b l e r o t ( i ) <= z i n ( i ) and e n a b l e r o t ;

end loop ;end process ;

process ( enab le vec , z i n )


beginfor i in 0 to 31 loop

z e n a b l e v e c ( i ) <= z i n ( i ) and enab l e vec ;end loop ;

end process ;

y r o t <= ”0000” & y s c a l i n g ;x r o t <= x s c a l i n g ;z r o t <= z 1 ;s igma rot (3 ) <= d (2) ;s i gma rot (2 ) <= z 1 (34) ;s i gma rot (1 downto 0) <= d(1 downto 0) ;

i n i t r o t a t i o n : i n i t r o tport map( z e n a b l e r o t , z 1 , d ) ;

i n i t v e c t o r i n g : i n i t v e cport map( z enab l e vec , x vec , y vec , z vec , d1 , d3 , d5 , d7 ,

s igma vec ) ;

l u t : s c a l i n g t a b l eport map(d , x s c a l i n g , y s c a l i n g ) ;

G r a y c o n t r o l l e r : c o n t r o l l e rport map( c lk , r e s e t , opmode , s igma reg , i , s c a l e f a c t o r ) ;

mux : mux1port map( x rot , y rot , z ro t , s igma rot , x vec , y vec , z vec ,

s igma vec , x reg , y r e g s h i f t , z r e g s h i f t , s igma reg , i ,opmode , x mux , y mux , z mux , sigma mux ) ;

b lock1 : r ad ix8 b l o ckport map(x mux , y mux , z mux , sigma mux , i , opmode ,

s c a l e f a c t o r , x out 1 , y out 1 , z out 1 , sigma out1 , d1 , d3, d5 , d7 ) ;

r eg xyz : regport map( c lk , r e s e t , enable , x out 1 , y out 1 (31 downto 0) ,

z ou t 1 (31 downto 0) , s igma out1 , x reg , y reg , z reg ,s igma reg ) ;

s h i f t r o t : sh i f t modeport map( z reg , opmode inv , z r e g s h i f t ) ;

s h i f t v e c : sh i f t modeport map( y reg , opmode , y r e g s h i f t ) ;

256 VHDL

o u t s e l e c t i o n : l a s t i t e r a t i o nport map( x out 1 , y out 1 , z out 1 , i , opmode , sigma out1 ,

r e s u l t ) ;

end behav io ra l ;



entity r ad ix8 b l o ck i sport ( x in : in word length X ;

y in : in word length y ;z i n : in word length z ;sigma : in s t d l o g i c v e c t o r (3 downto 0) ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;s c a l e f a c t o r : in s t d l o g i c v e c t o r (3 downto 0) ;x out : out word length x ;y out : out word length y ;z out : out word length z ;next s igma : out s t d l o g i c v e c t o r (3 downto 0) ;d1 out : in s t d l o g i c v e c t o r (10 downto 0) ;d3 out : in s t d l o g i c v e c t o r (10 downto 0) ;d5 out : in s t d l o g i c v e c t o r (10 downto 0) ;d7 out : in s t d l o g i c v e c t o r (10 downto 0) ) ;

end r ad ix8 b l o ck ;

architecture behav io ra l of r ad ix8 b l o ck i s



end component ;



end component ;




end component ;


i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x s h i f t : out s t d l o g i c v e c t o r (34 downto 0) ) ;

end component ;

component z pathport ( z i n : in word length z ;


end component ;


opmode : in s t d l o g i c v e c t o r (1 downto 0) ;sigma : out s t d l o g i c v e c t o r (3 downto 0) ;d1 out : in s t d l o g i c v e c t o r (10 downto 0) ;d3 out : in s t d l o g i c v e c t o r (10 downto 0) ;d5 out : in s t d l o g i c v e c t o r (10 downto 0) ;d7 out : in s t d l o g i c v e c t o r (10 downto 0) ) ;

end component ;


sigma : in s t d l o g i c v e c t o r (3 downto 0) ;x add1 : out s t d l o g i c v e c t o r (22 downto 0) ;x add2 : out s t d l o g i c v e c t o r (22 downto 0) ) ;

end component ;


sigma : in s t d l o g i c v e c t o r (3 downto 0) ;y add1 : out s t d l o g i c v e c t o r (26 downto 0) ;y add2 : out s t d l o g i c v e c t o r (26 downto 0) ) ;

end component ;


sigma : in s t d l o g i c v e c t o r (3 downto 0) ;x out : out word length x ;

258 VHDL

car ry : out s t d l o g i c ) ;end component ;



end component ;

component opmode xport ( x in1 : in s t d l o g i c v e c t o r (32 downto 0) ;

x in2 : in word length x ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;i : in s t d l o g i c v e c t o r (2 downto 0) ;x out : out word length y ) ;

end component ;

component opmode yport ( y in1 : in s t d l o g i c v e c t o r (35 downto 0) ;

y in2 : in s t d l o g i c v e c t o r (26 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;i : in s t d l o g i c v e c t o r (2 downto 0) ;y out : out word length y ) ;

end component ;


sigma : in s t d l o g i c v e c t o r (3 downto 0) ;x add1 : out word length y ;x add2 : out word length y ) ;

end component ;

component e x t r a s h i f t y 2port ( y in : in word length x ;

sigma : in s t d l o g i c v e c t o r (3 downto 0) ;y add1 : out word length x ;y add2 : out word length x ) ;

end component ;



end component ;

component cpa x 22b i t


port ( x : in s t d l o g i c v e c t o r (22 downto 0) ;y : in s t d l o g i c v e c t o r (22 downto 0) ;car ry : in s t d l o g i c ;s : out s t d l o g i c v e c t o r (22 downto 0) ) ;

end component ;



end component ;

component cpa y 26b i tport ( x : in s t d l o g i c v e c t o r (26 downto 0) ;

y : in s t d l o g i c v e c t o r (26 downto 0) ;car ry : in s t d l o g i c ;s : out s t d l o g i c v e c t o r (26 downto 0) ) ;

end component ;



end component ;

signal e x t r a s h i f t x 1 o u t , cpa x2 out , e x t r a s h i f t y 2 o u t :word length x ;

signal s h i f t x 2 o u t : s t d l o g i c v e c t o r (32 downto 0) ;signal x out2 , mux x out , cpa y2 in1 , cpa y2 in2 :

word length x ;signal s h i f t x 1 o u t : s t d l o g i c v e c t o r (22 downto 0) ;signal opmode x out , e x t r a s h i f t x 2 o u t , x i n v e r t :

word length y ;signal y i n v e r t : word length y ;signal e x t r a s h i f t y 1 o u t , cpa y2 out , y out2 , cpa x2 in1 ,

cpa x2 in2 : word length y ;signal mux y out , opmode y out , y o u t s h i f t : word length y ;signal sigma cpa1 , inv sigma , s igma cpa1 inv : s t d l o g i c ;signal z out2 , s e l e c t i o n i n , z o u t s h i f t : word length z ;signal s h i f t y 1 o u t , cpa y26 out : s t d l o g i c v e c t o r (26 downto

0) ;signal opmode inv : s t d l o g i c v e c t o r (1 downto 0) ;signal s h i f t y 2 o u t : s t d l o g i c v e c t o r (35 downto 0) ;

260 VHDL

signal zero : s t d l o g i c ;signal s c a l e s e l e c t : s t d l o g i c v e c t o r (3 downto 0) ;signal cpa y in1 , cpa y in2 : s t d l o g i c v e c t o r (26 downto 0) ;signal cpa x in1 , cpa x in2 , cpa x18 out : s t d l o g i c v e c t o r (22

downto 0) ;signal csa x3 out1 , csa x3 out2 , csa x1 out1 , csa x1 out2 ,

csa x2 out1 , csa x2 out2 , cpa x in2 extended ,cpa x in1 extended : word length x ;

signal csa y1 out1 , csa y1 out2 , csa y2 out1 , c sa y2 out2 :word length y ;

begin

s c a l e s e l e c t <= sigma when i = ”000” elsesigma when i = ”001” elsesigma when i = ”011” elsesigma when i = ”010” elses c a l e f a c t o r ;

inv s igma <= not ( s c a l e s e l e c t (3 ) ) ;ze ro <= ’ 0 ’ ;


s h i f t x 2 : s h i f t x 2 b 2port map( x in , i , opmode , opmode x out ) ;

e x t r a s h i f t x 1 : e x t r a s h i f t x 1port map( s h i f t x 1 o u t , s c a l e s e l e c t , cpa x in1 , cpa x in2 ) ;

e x t r a s h i f t x 1 o u t <= cpa x18 out (22) & cpa x18 out (22) &cpa x18 out (22) & cpa x18 out (22) & cpa x18 out (22) &cpa x18 out (22) & cpa x18 out (22) & cpa x18 out (22) &cpa x18 out ;

cpa x3 : cpa x 22b i tport map( cpa x in1 , cpa x in2 , zero , cpa x18 out ) ;

cpa x2 : cpa xport map( e x t r a s h i f t y 2 o u t , x in , inv sigma , cpa x2 out ) ;

cpa y4 : cpa xport map( cpa y2 in1 , cpa y2 in2 , zero , e x t r a s h i f t y 2 o u t ) ;

cpa x1 : cpa x


port map( cpa x2 out , e x t r a s h i f t x 1 o u t , sigma cpa1 , x out2 ) ;

e x t r a s h i f t x 2 : e x t r a s h i f t x 2port map( mux y out , sigma , cpa x2 in1 , cpa x2 in2 ) ;

mux x1 : mux xport map( opmode y out , sigma , mux x out , s igma cpa1 ) ;

x out <= x out2 ;

−− y paths h i f t y 1 : s h i f t y 1 b 2port map( y in , i , s h i f t y 1 o u t ) ;


e x t r a s h i f t y 1 : e x t r a s h i f t y 1port map( s h i f t y 1 o u t , s c a l e s e l e c t , cpa y in1 , cpa y in2 ) ;

cpa x4 : cpa yport map( cpa x2 in1 , cpa x2 in2 , zero , e x t r a s h i f t x 2 o u t ) ;

e x t r a s h i f t y 1 o u t <= cpa y26 out (26) & cpa y26 out (26) &cpa y26 out (26) & cpa y26 out (26) & cpa y26 out (26) &cpa y26 out (26) & cpa y26 out (26) & cpa y26 out (26) &cpa y26 out ;

cpa y3 : cpa y 26b i tport map( cpa y in1 , cpa y in2 , zero , cpa y26 out ) ;

cpa y1 : cpa yport map( e x t r a s h i f t x 2 o u t , y in , inv sigma , cpa y2 out ) ;

cpa y2 : cpa yport map( cpa y2 out , e x t r a s h i f t y 1 o u t , s igma cpa1 inv ,

y out2 ) ;

s igma cpa1 inv <= ( sigma cpa1 ) ;

opmode mux y : opmode yport map( s h i f t y 2 o u t , s h i f t y 1 o u t , opmode , i , opmode y out ) ;

e x t r a s h i f t y 2 : e x t r a s h i f t y 2port map( mux x out , sigma , cpa y2 in1 , cpa y2 in2 ) ;

mux y1 : mux y

262 VHDL

port map( opmode x out , sigma , mux y out ) ;

y out <= y out2 ;

−− z pathz : z pathport map( z in , i , sigma , opmode , z out2 ) ;

opmode inv <= opmode (0 ) & opmode (1 ) ;z out <= z out2 ;

−− s e l e c t i o ns e l : s e l e c t i o nport map( s e l e c t i o n i n , opmode , next sigma , d1 out , d3 out ,

d5 out , d7 out ) ;

mux sel : mux se l e c t i onport map( y out2 , z out2 , opmode , s e l e c t i o n i n ) ;

end behav io ra l ;

Gray controller


entity c o n t r o l l e r i sport ( c l k : in s t d l o g i c ;

r e s e t : in s t d l o g i c ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;s igma 1 : in s t d l o g i c v e c t o r (3 downto 0) ;i o u t : out s t d l o g i c v e c t o r (2 downto 0) ;s c a l e f a c t o r : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end c o n t r o l l e r ;

architecture behav io ra l of c o n t r o l l e r i s

signal next i , i : s t d l o g i c v e c t o r (2 downto 0) ;signal s igma 4 save , s igma 3 save : s t d l o g i c v e c t o r (2 downto

0) ;signal aa , s igma 3 reg , s i gma 4 reg : s t d l o g i c v e c t o r (2 downto

0) ;

begin


i o u t <= i ;

process (opmode , i , r e s e t , s igma 4 reg , s igma 3 reg , s igma 1 )begin


i f r e s e t = ’1 ’ and opmode = ”01” thenn e x t i <= ”001” ;

e l s i f r e s e t = ’1 ’ and opmode = ”10” thenn e x t i <= ”001” ;

elsen e x t i <= ”000” ;

end i f ;s i gma 4 save <= sigma 4 reg ;s igma 3 save <= sigma 3 reg ;aa <= ”100” ;

when ”001” =>n e x t i <= ”011” ;s igma 3 save <= sigma 1 (3) & sigma 1 (1 downto 0) ;s igma 4 save <= sigma 4 reg ;aa <= ”100” ;

when ”011” =>n e x t i <= ”010” ;s igma 3 save <= sigma 3 reg ;s igma 4 save <= sigma 1 (3) & sigma 1 (1 downto 0) ;aa <= ”100” ;

when ”010” =>n e x t i <= ”110” ;s igma 4 save <= sigma 4 reg ;s igma 3 save <= sigma 3 reg ;aa <= ”100” ;

when ”110” =>n e x t i <= ”100” ;s igma 4 save <= sigma 4 reg ;s igma 3 save <= sigma 3 reg ;aa <= sigma 3 reg ;

when others =>n e x t i <= ”000” ;s igma 4 save <= sigma 4 reg ;s igma 3 save <= sigma 3 reg ;aa <= sigma 4 reg ;


s c a l e f a c t o r (3 ) <= aa (2) or opmode (1 ) ;s c a l e f a c t o r (2 ) <= ’ 0 ’ ; −− s i g n b i ts c a l e f a c t o r (1 ) <= aa (1) and opmode (0 ) ;

264 VHDL

s c a l e f a c t o r (0 ) <= aa (0) and opmode (0 ) ;


i f r e s e t = ’0 ’ theni <= ”000” ;s i gma 3 reg <= ”100” ;s i gma 4 reg <= ”100” ;

e l s i f c lk ’ event and c l k = ’1 ’ theni <= n e x t i ;s i gma 4 reg <= sigma 4 save ;s i gma 3 reg <= sigma 3 save ;


end behav io ra l ;

CPA x


entity cpa x i sport ( x : in word length x ;


end cpa x ;

architecture behav io ra l of cpa x i s

begin


end behav io ra l ;

CPA y



entity cpa y i sport ( x : in word length y ;


end cpa y ;

architecture behav io ra l of cpa y i s

begin


end behav io ra l ;

CPA x 22-bit


entity cpa x 22b i t i sport ( x : in s t d l o g i c v e c t o r (22 downto 0) ;


end cpa x 22b i t ;

architecture behav io ra l of cpa x 22b i t i s

begin


end behav io ra l ;

CPA y 26-bit


entity cpa y 26b i t i sport ( x : in s t d l o g i c v e c t o r (26 downto 0) ;

266 VHDL


end cpa y 26b i t ;

architecture behav io ra l of cpa y 26b i t i s

begin


end behav io ra l ;

Inverter x


entity i n v e r t e r x i sport ( x in : in s t d l o g i c v e c t o r (34 downto 0) ;

x out : out s t d l o g i c v e c t o r (34 downto 0) ) ;end i n v e r t e r x ;

architecture behav io ra l of i n v e r t e r x i s

begin


end behav io ra l ;

Inverter y


entity i n v e r t e r y i sport ( x in : in word length y ;

x out : out word length y ) ;end i n v e r t e r y ;

architecture behav io ra l of i n v e r t e r y i s

begin



end behav io ra l ;

Multiplexer


entity mux1 i sport ( x r o t : in word length x ;

y r o t : in word length y ;z r o t : in word length z ;s i gma rot : in s t d l o g i c v e c t o r (3 downto 0) ;x vec : in word length x ;y vec : in word length y ;z vec : in word length z ;s igma vec : in s t d l o g i c v e c t o r (3 downto 0) ;x i t e r a t i v e : in word length x ;y i t e r a t i v e : in word length y ;z i t e r a t i v e : in word length z ;s i g m a i t e r a t i v e : in s t d l o g i c v e c t o r (3 downto 0) ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x : out word length x ;y : out word length y ;z : out word length z ;sigma : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end mux1 ;


signal s e l : s t d l o g i c v e c t o r (1 downto 0) ;

begin

s e l <= opmode when i = ”000” else ”00” ;

process ( x rot , y rot , z ro t , s igma rot , x vec , y vec , z vec ,s igma vec , x i t e r a t i v e , y i t e r a t i v e , z i t e r a t i v e , s e l ,s i g m a i t e r a t i v e ) i sbegin

i f s e l = ”00” thenx <= x i t e r a t i v e ;y <= y i t e r a t i v e ;

268 VHDL

z <= z i t e r a t i v e ;sigma <= s i g m a i t e r a t i v e ;

e l s i f s e l = ”01” thenx <= x r o t ;y <= y r o t ;z <= z r o t ;sigma <= sigma rot ;

elsex <= x vec ;y <= y vec ;z <= z vec ;sigma <= sigma vec ;


end behav io ra l ;

Register



r e s e t : in s t d l o g i c ;enable : in s t d l o g i c ;x in : in word length x ;y in : in s t d l o g i c v e c t o r (31 downto 0) ;z i n : in s t d l o g i c v e c t o r (31 downto 0) ;s igma in : in s t d l o g i c v e c t o r (3 downto 0) ;x out : out word length x ;y out : out s t d l o g i c v e c t o r (31 downto 0) ;z out : out s t d l o g i c v e c t o r (31 downto 0) ;s igma out : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end reg ;


begin


i f r e s e t = ’0 ’ thenx out <= ’0 ’ & word l ength ze ro ;y out <= ”00” & word l ength ze ro ;


z out <= ”00” & word l ength ze ro ;s igma out <= ”0000” ;

e l s i f c lk ’ event and c l k = ’1 ’ theni f enable = ’1 ’ then

x out <= x in ;y out <= y in ;z out <= z i n ;s igma out <= sigma in ;

end i f ;end i f ;

end process ;

end behav io ra l ;

Shift block 1.a





architecture behav io ra l of s h i f t x 1 b 2 i ssignal s e l : s t d l o g i c v e c t o r (1 downto 0) ;

begin

s e l <= ”00” when i = ”000” else”10” when i = ”010” else”01” when i = ”110” else”01” when i = ”001” else”11” ;

process ( x in , s e l ) i sbegin

case s e l i swhen ”00” =>

x s h i f t <= x in (30 downto 8) ;when ”10” =>

x s h i f t <= ” 000000000000000000 ” & x in (30 downto 26);

when ”01” =>

270 VHDL

x s h i f t <= ”000000” & x in (30 downto 14) ;when others =>

x s h i f t <= ” 000000000000 ” & x in (30 downto 20) ;end case ;

end process ;

end behav io ra l ;

Shift block 1.b



i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;x s h i f t : out s t d l o g i c v e c t o r (34 downto 0) ) ;




begin

s e l ( 2 ) <= opmode (1 ) or i ( 2 ) ;s e l ( 1 ) <= opmode (1 ) or i ( 1 ) ;s e l ( 0 ) <= opmode (1 ) or i ( 0 ) ;







x s h i f t <= ” 0000000000000000000 ” & x in (30 downto 15);

when ”111” =>


x s h i f t <= ’0 ’ & x in & ”000” ;when others =>



end behav io ra l ;

Shift block 1.c







begin

s e l <= ”00” when i = ”000” else”10” when i = ”010” else”01” when i = ”110” else”01” when i = ”001” else”11” ;



x s h i f t (26 downto 0) <= x in (34 downto 8) ;when ”10” =>

x s h i f t (26 downto 0) <= x in (34) & x in (34) & x in(34) & x in (34) & x in (34) & x in (34) & x in (34)& x in (34) & x in (34) & x in (34) & x in (34) &

x in (34) & x in (34) & x in (34) & x in (34) & x in(34) & x in (34) & x in (34) &( x in (34 downto 26) );

when ”01” =>

272 VHDL

x s h i f t (26 downto 0) <= x in (34) & x in (34) & x in(34) & x in (34) & x in (34) & x in (34) &( x in (34downto 14) ) ;

when others =>x s h i f t (26 downto 0) <= x in (34) & x in (34) & x in

(34) & x in (34) & x in (34) & x in (34) & x in (34)& x in (34) & x in (34) & x in (34) & x in (34) &

x in (34) &( x in (34 downto 20) ) ;end case ;

end process ;

end behav io ra l ;

Shift block 1.d






begin




x s h i f t <= ”00000000 ” & x in (34 downto 7) ;when ”011” =>





end case ;


end process ;

end behav io ra l ;

Z path block 1


entity z path i sport ( z i n : in word length z ;


end z path ;

architecture b e h a v i o r a l b l o c k of z path i s



end component ;



signal tan add , t a n s h i f t , tan d : s t d l o g i c v e c t o r (31 downto0) ;

signal t a b l e s e l e c t : s t d l o g i c v e c t o r (4 downto 0) ;signal s h i f t s e l e c t : s t d l o g i c v e c t o r (2 downto 0) ;

begin

t a b l e s e l e c t <= i & sigma (1 downto 0) ;s h i f t s e l e c t <= ”111” when sigma (3) = ’1 ’ else

”101” when opmode = ”01” elsei ;


274 VHDL



i f sigma (2) = ’0 ’ thentan add <= not ( t a n s h i f t ) ;

elsetan add <= t a n s h i f t ;


z out <= unsigned ( z i n ) + unsigned ( tan add (31) & tan add (31) &tan add (31) & tan add ) + not ( sigma (2) ) ;








begin






t a n s h i f t <= ” 000000000000000 ” & tan d (31 downto 15);

when ”110” =>





t a n s h i f t <= tan d ;when others =>

t a n s h i f t <= ”00” & word l ength ze ro ;end case ;

end process ;

end behav io ra l ;






begin









tan d <= t a n t a b l e 3 3 (31 downto 0) ;

276 VHDL

















when others =>tan d <= t a n t a b l e 4 7 (31 downto 0) ;


end behav io ra l ;



entity e x t r a s h i f t x 1 i s


port ( x in : in s t d l o g i c v e c t o r (22 downto 0) ;sigma : in s t d l o g i c v e c t o r (3 downto 0) ;x add1 : out s t d l o g i c v e c t o r (22 downto 0) ;x add2 : out s t d l o g i c v e c t o r (22 downto 0) ) ;



begin


i f sigma (3) = ’1 ’ thenx add1 <= X”00000” & ”000” ;

e l s i f sigma (1 downto 0) = ”11” thenx add1 <= not ( x in (22 downto 0) ) ;

e l s i f sigma (1 downto 0) = ”01” thenx add1 <= not ( ”00” & x in (22 downto 2) ) ;

elsex add1 <= not ( ”0000” & x in (22 downto 4) ) ;



i f sigma (3) = ’0 ’ and sigma (1 downto 0) = ”10” thenx add2<= not ( ’ 0 ’ & x in (22 downto 1) ) ;

elsex add2<= X”00000” & ”000” ;


end behav io ra l ;






278 VHDL


begin


i f sigma (3) = ’1 ’ theny add1 <= X”000000” & ”000” ;

e l s i f sigma (1 downto 0) = ”11” theny add1 <= not ( y in (26 downto 0) ) ;

e l s i f sigma (1 downto 0) = ”01” theny add1 <= not ( y in (26) & y in (26) & y in (26 downto

2) ) ;else

y add1 <= not ( y in (26) & y in (26) & y in (26) &y in (26) & y in (26 downto 4) ) ;



i f sigma (3) = ’0 ’ and sigma (1 downto 0) = ”10” theny add2<= not ( y in (26) & y in (26 downto 1) ) ;

elsey add2<= X”000000” & ”000” ;


end behav io ra l ;

Extra shift block 1.c


entity e x t r a s h i f t x 2 i sport ( x in : in word length y ;

sigma : in s t d l o g i c v e c t o r (3 downto 0) ;x add1 : out word length y ;x add2 : out word length y ) ;




begin


i f sigma (1 downto 0) = ”01” thenx add1 <= x in (34) & x in (34 downto 1) ;

e l s i f sigma (1 downto 0) = ”11” thenx add1 <= x in (34 downto 0) ;

elsex add1 <= x in (34) & x in (34) & x in (34 downto 2) ;



i f sigma (1 downto 0) = ”10” thenx add2 <= x in (34) & x in (34 downto 1) ;

elsex add2 <= X”00000000 ” & ”000” ;


end behav io ra l ;

Extra shift block 1.d






begin


i f sigma (3) = ’1 ’ theny add1 <= X”000000” & ”000” ;

280 VHDL

e l s i f sigma (1 downto 0) = ”11” theny add1 <= not ( y in (26 downto 0) ) ;

e l s i f sigma (1 downto 0) = ”01” theny add1 <= not ( y in (26) & y in (26) & y in (26 downto

2) ) ;else

y add1 <= not ( y in (26) & y in (26) & y in (26) &y in (26) & y in (26 downto 4) ) ;



i f sigma (3) = ’0 ’ and sigma (1 downto 0) = ”10” theny add2<= not ( y in (26) & y in (26 downto 1) ) ;

elsey add2<= X”000000” & ”000” ;


end behav io ra l ;

Multiplexer x path


entity mux x i sport ( y in1 : in word length y ;

sigma : in s t d l o g i c v e c t o r (3 downto 0) ;x out : out word length x ;car ry : out s t d l o g i c ) ;

end mux x ;

architecture behav io ra l of mux x i s

begin

process ( sigma , y in1 )begin

i f sigma (3) = ’1 ’ thenx out <= word l ength ze ro & ’ 0 ’ ;car ry <= ’ 0 ’ ;

e l s i f sigma (2) = ’0 ’ then −− i f one then n e g a t i v ex out <= not ( y in1 (30 downto 0) ) ;


car ry <= ’ 1 ’ ;else

x out <= y in1 (30 downto 0) ;car ry <= ’ 0 ’ ;


end behav io ra l ;

Multiplexer y path


entity mux y i sport ( x in1 : in word length y ;


end mux y ;

architecture behav io ra l of mux y i s

begin

process ( sigma , x in1 )begin


e l s i f sigma (2) = ’1 ’ theny out <= not ( x in1 ) ;

elsey out <= x in1 ;


end behav io ra l ;

Multiplexer selection function


entity mux se l e c t i on i sport ( y in : in word length z ;

282 VHDL

z i n : in word length z ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;s e l e c t i o n i n : out word length z ) ;

end mux se l e c t i on ;

architecture behav io ra l of mux se l e c t i on i s

begin

process (opmode , z in , y in )begin

i f opmode = ”01” thens e l e c t i o n i n <= z i n ;

elses e l e c t i o n i n <= y in ;


end behav io ra l ;

Operation mode shifter x path


entity opmode x i sport ( x in1 : in s t d l o g i c v e c t o r (32 downto 0) ;

x in2 : in word length x ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;i : in s t d l o g i c v e c t o r (2 downto 0) ;x out : out word length y ) ;

end opmode x ;

architecture behav io ra l of opmode x i s

begin

process (opmode , x in1 , x in2 )begin

i f opmode = ”01” thenx out <= ”00” & x in1 ;

elsex out <= ’0 ’ & x in2 & ”000” ;



end behav io ra l ;

Operation mode shifter y path


entity opmode y i sport ( y in1 : in s t d l o g i c v e c t o r (35 downto 0) ;

y in2 : in s t d l o g i c v e c t o r (26 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;i : in s t d l o g i c v e c t o r (2 downto 0) ;y out : out word length y ) ;

end opmode y ;

architecture behav io ra l of opmode y i s

begin

process (opmode , y in1 , y in2 , i )begin

i f opmode = ”01” theny out <= y in1 (35 downto 1) ;

e l s i f i = ”100” or i = ”101” theny out <= X”00000000 ” & ”000” ;

elsey out <= y in2 (26) & y in2 (26) & y in2 (26) & y in2

(26) & y in2 (26) & y in2 (26) & y in2 (26) &y in2 (26) & y in2 (26) & y in2 (26) & y in2 (26) &

y in2 (26 downto 3) ;end i f ;

end process ;

end behav io ra l ;

Scaling table


entity s c a l i n g t a b l e i sport (d : in s t d l o g i c v e c t o r (5 downto 0) ;

x out : out s t d l o g i c v e c t o r (30 downto 0) ;

284 VHDL

y out : out s t d l o g i c v e c t o r (30 downto 0) ) ;end s c a l i n g t a b l e ;

architecture behav io ra l of s c a l i n g t a b l e i s

begin

process (d) i sbegin

case d i swhen ”100100” =>

x out <= ” 0100000000000000000000000000000 ”;

y out <= ” 0000000000000000000000000000000 ”;

when ”100000” =>x out <= ” 0011111111111100000000000011111 ”

;y out <= ” 0000000000000000000000000000000 ”

;when ”100001” =>

x out <= ” 0011111111110000000000111111111 ”;

y out <= ” 0000000000000000000000000000000 ”;

when ”100010” =>x out <= ” 0011111111011100000101000011010 ”

;y out <= ” 0000000000000000000000000000000 ”

;when ”100011” =>

x out <= ” 0011111111000000001111111100000 ”;

y out <= ” 0000000000000000000000000000000 ”;

when ”000100” =>x out <= ” 0011111000000111111000000111111 ”

;y out <= ” 0000111111000000111111000000111 ”

;when ”000000” =>

x out <= ” 0011111000000100000000000011110 ”;

y out <= ” 0000111111000000000000000000111 ”;

when ”000001” =>


x out <= ” 0011110111111000011000100110010 ”;

y out <= ” 0000111110111101000011001100110 ”;

when ”000010” =>x out <= ” 0011110111100101000011111010010 ”

;y out <= ” 0000111110111000001001000111101 ”

;when ”000011” =>

x out <= ” 0011110111001010000101100110011 ”;

y out <= ” 0000111110110001010010101100010 ”;

when ”001100” =>x out <= ” 0011100001111000011110000111100 ”

;y out <= ” 0001111000011110000111100001111 ”

;when ”001000” =>

x out <= ” 0011100001110100111100010010100 ”;

y out <= ” 0001111000011100001111000101100 ”;

when ”001001” =>x out <= ” 0011100001101010010111011110000 ”

;y out <= ” 0001111000010110100110000111011 ”

;when ”001010” =>

x out <= ” 0011100001011000110001101000100 ”;

y out <= ” 0001111000001101001101101010111 ”;

when ”001011” =>x out <= ” 0011100001000000001110000011111 ”

;y out <= ” 0001111000000000000111011111111 ”

;when ”010100” =>

x out <= ” 0011000000111000000111000000111 ”;

y out <= ” 0010101000010101000010101000010 ”;

when ”010000” =>x out <= ” 0011000000110101000110001011101 ”

;

286 VHDL

y out <= ” 0010101000010010011010010101111 ”;

when ”010001” =>x out <= ” 0011000000101100000100010000100 ”

;y out <= ” 0010101000001010100001111110001 ”

;when ”010010” =>

x out <= ” 0011000000011101000010111011011 ”;

y out <= ” 0010100111111101011010111111100 ” ;when others =>

x out <= ” 0011000000001000000100111111100 ”;

y out <= ” 0010100111101011000111110110010 ”;


end behav io ra l ;

Initialization rotation mode


entity i n i t r o t i sport ( z i n : in s t d l o g i c v e c t o r (31 downto 0) ;

z out : out word length z ;d : out s t d l o g i c v e c t o r (5 downto 0) ) ;

end i n i t r o t ;

architecture behav io ra l of i n i t r o t i s

signal mux 1 carry : s t d l o g i c ;signal z 0 , z 1 , mux 1 , cpa 1 out : word length z ;

begin

z 0 <= z i n (31 downto 0) & ”000” ;


i f z 0 (33 downto 29) = ”00000” thenmux 1 <= ”000” & X” 00000000 ” ;


mux 1 carry <= ’ 0 ’ ;d (5 downto 3) <= ”100” ;

e l s i f z 0 (33 downto 29) = ”00001” or z 0 (33 downto 29) =”00010” then

mux 1 <= ”11” & not ( t a n t a b l e 1 1 ) ;mux 1 carry <= ’ 1 ’ ;d (5 downto 3) <= ”000” ;

e l s i f z 0 (33 downto 29) = ”00011” or z 0 (33 downto 29) =”00100” then

mux 1 <= ”11” & not ( t a n t a b l e 2 1 ) ;mux 1 carry <= ’ 1 ’ ;d (5 downto 3) <= ”001” ;

elsemux 1 <= ”11” & not ( t a n t a b l e 3 1 ) ;mux 1 carry <= ’ 1 ’ ;d (5 downto 3) <= ”010” ;


cpa 1 out <= unsigned ( z 0 (34 downto 1) & mux 1 carry ) +unsigned ( mux 1 ) ;

z 1 <= cpa 1 out (32) & cpa 1 out (30 downto 0) & ”000” ;z out <= z 1 ;


i f z 1 (34) = ’0 ’ theni f z 1 (33 downto 29) = ”00000” then

d(2 downto 0) <= ”100” ;e l s i f z 1 (33 downto 29) = ”00001” or z 1 (33 downto

29) = ”00010” thend(2 downto 0) <= ”000” ;



29) = ”00110” thend(2 downto 0) <= ”010” ;

elsed(2 downto 0) <= ”011” ;

end i f ;else

i f z 1 (33 downto 29) = ”11111” thend(2 downto 0) <= ”100” ;


d(2 downto 0) <= ”000” ;

288 VHDL



29) = ”11001” thend(2 downto 0) <= ”010” ;

elsed(2 downto 0) <= ”011” ;

end i f ;end i f ;

end process ;

end behav io ra l ;

Initialization vectoring mode


entity i n i t v e c i sport ( y in : in s t d l o g i c v e c t o r (31 downto 0) ;

x out : out word length x ;y out : out word length y ;z out : out word length z ;d1 out : out s t d l o g i c v e c t o r (10 downto 0) ;d3 out : out s t d l o g i c v e c t o r (10 downto 0) ;d5 out : out s t d l o g i c v e c t o r (10 downto 0) ;d7 out : out s t d l o g i c v e c t o r (10 downto 0) ;sigma : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end i n i t v e c ;

architecture behav io ra l of i n i t v e c i s

signal d 0 : s t d l o g i c v e c t o r (3 downto 0) ;signal d1 , d3 , d5 , d7 : s t d l o g i c v e c t o r (10 downto 0) ;signal y inv : s t d l o g i c v e c t o r (9 downto 0) ;signal s e l 1 , s e l 3 , s e l 5 , s e l 7 : s t d l o g i c v e c t o r (10 downto

0) ;signal x s e l : word length x ;signal x add1 : s t d l o g i c v e c t o r (29 downto 0) ;signal x add2 : s t d l o g i c v e c t o r (6 downto 0) ;signal c : s t d l o g i c ;signal y add1 , y add2 , y s e l : word length z ;

begin


process ( y in )begin

i f y in (28 downto 26) = ”000” thend 0 <= ”1000” ;z out <= X”00000000 ” & ”000” ;

e l s i f y in (28 downto 26) = ”001” or y in (28 downto 26)= ”010”thend 0 <= ”0000” ;z out <= ”00000” & t a n t a b l e 1 1 (32 downto 3) ;

e l s i f y in (28 downto 26) = ”011” or y in (28 downto 26)= ”100” or y in (28 downto 25) = ”1010” thend 0 <= ”0001” ;z out <= ”00000” & t a n t a b l e 2 1 (32 downto 3) ;

elsed 0 <= ”0010” ;z out <= ”00000” & t a n t a b l e 3 1 (32 downto 3) ;


process ( d 0 , y in )begin

i f d 0 (3 ) = ’1 ’ thenx add1 <= ” 100000000000000000000000000000 ” ;x add2 <= ”0000000” ;

e l s i f d 0 (1 downto 0) = ”00” thenx add1 <= ’0 ’ & y in (30 downto 2) ;x add2 <= x vec1 (29 downto 23) ;

e l s i f d 0 (1 downto 0) = ”01” thenx add1 <= y in (30 downto 1) ;x add2 <= x vec2 (29 downto 23) ;

elsex add1 <= unsigned ( y in (30 downto 1) ) + unsigned ( ’ 0 ’

& y in (30 downto 2) ) ;x add2 <= x vec3 (29 downto 23) ;


x s e l <= ’0 ’ & ( unsigned ( x add1 (29 downto 23) ) + unsigned( x add2 ) ) & x add1 (22 downto 0) ;

process ( d 0 , y in )begin

i f d 0 (3 ) = ’1 ’ theny add2 <= X”00000000 ” & ”000” ;c <= ’ 0 ’ ;

e l s i f d 0 (1 downto 0) = ”00”then

290 VHDL

y add2 <= ”111101” & not ( y in (31 downto 3) ) ;c <= ’ 1 ’ ;

e l s i f d 0 (1 downto 0) = ”01”theny add2 <= ”1110” & not ( y in (31 downto 1) ) ;c <= ’ 1 ’ ;

elsey add2 <= unsigned ( ”111001” & not ( y in (31 downto 3) ) )

+ unsigned (not ( ”000” & y in (31 downto 0) ) ) ;c <= ’ 1 ’ ;


y add1 <= y in (31 downto 0) & ”00” & c ;y s e l <= unsigned ( y add1 ) + unsigned ( y add2 ) ;y out <= y s e l (31 downto 0) & ”000” ;x out <= x s e l ;

d1 <= not ( ”0000” & x s e l (29 downto 23) ) ;d3 <= not ( ”00” &(( unsigned ( ’ 0 ’ & x s e l (29 downto 23) & ’1 ’ )

+ unsigned ( ”00” & x s e l (29 downto 23) ) ) ) ) ;d5 <= not ( ’ 0 ’ & ( ( unsigned ( ’ 0 ’ & x s e l (29 downto 23) & ”01” )

+ unsigned ( ”000” & x s e l (29 downto 23) ) ) ) ) ;d7 <= not ( ( unsigned ( ’ 0 ’ & x s e l (29 downto 23) & ”010” ) +

unsigned (not ( ”0000” & x s e l (29 downto 23) ) ) ) ) ;

d1 out <= d1 ;d3 out <= d3 ;d5 out <= d5 ;d7 out <= d7 ;

s e l 1 <= unsigned ( y inv )+ unsigned ( d1 ) + ’ 1 ’ ;s e l 3 <= unsigned ( y inv )+ unsigned ( d3 ) ;s e l 5 <= unsigned ( y inv )+ unsigned ( d5 ) ;s e l 7 <= unsigned ( y inv )+ unsigned ( d7 ) ;

sigma (3) <= s e l 1 (10) ;sigma (2) <= not ( y s e l (34) ) ;sigma (1 downto 0) <= ”11” when s e l 7 (10) = ’0 ’ else

”10” when s e l 5 (10) = ’0 ’ else”01” when s e l 3 (10) = ’0 ’ else”00” ;

process ( y s e l )begin

i f y s e l (34) = ’1 ’ theny inv <= not ( y s e l (29 downto 20) ) ;

elsey inv <= ( y s e l (29 downto 20) ) ;



end behav io ra l ;

Selection function


entity s e l e c t i o n i sport ( s e l e c t i o n i n : in word length z ;

opmode : in s t d l o g i c v e c t o r (1 downto 0) ;sigma : out s t d l o g i c v e c t o r (3 downto 0) ;d1 out : in s t d l o g i c v e c t o r (10 downto 0) ;d3 out : in s t d l o g i c v e c t o r (10 downto 0) ;d5 out : in s t d l o g i c v e c t o r (10 downto 0) ;d7 out : in s t d l o g i c v e c t o r (10 downto 0) ) ;

end s e l e c t i o n ;

architecture behav io ra l of s e l e c t i o n i ssignal z : word length z ;signal sigma z , sigma y : s t d l o g i c v e c t o r (3 downto 0) ;signal s e l 1 , s e l 3 , s e l 5 , s e l 7 : s t d l o g i c v e c t o r (10 downto

0) ;

begin

s e l 1 <= unsigned ( z (29 downto 20) )+ unsigned ( d1 out ) + ’ 1 ’ ;s e l 3 <= unsigned ( z (29 downto 20) )+ unsigned ( d3 out ) ;s e l 5 <= unsigned ( z (29 downto 20) )+ unsigned ( d5 out ) ;s e l 7 <= unsigned ( z (29 downto 20) )+ unsigned ( d7 out ) ;

s igma y (3) <= s e l 1 (10) ;s igma y (2) <= not ( s e l e c t i o n i n (31) ) ;s igma y (1 downto 0) <= ”11” when s e l 7 (10) = ’0 ’ else

”10” when s e l 5 (10) = ’0 ’ else”01” when s e l 3 (10) = ’0 ’ else”00” ;

s igma z (2 ) <= s e l e c t i o n i n (31) ;

process ( z )begin

i f z (30 downto 26) = ”00000” then

292 VHDL

s igma z (3 ) <= ’ 1 ’ ;s igma z (1 downto 0) <= ”00” ;

e l s i f z (30 downto 26) = ”00001” or z (30 downto 26) = ”00010” then






elses igma z (3 ) <= ’ 0 ’ ;s igma z (1 downto 0) <= ”11” ;


process ( s e l e c t i o n i n )begin

i f s e l e c t i o n i n (31) = ’1 ’ thenz <= not ( s e l e c t i o n i n ) ;

elsez <= ( s e l e c t i o n i n ) ;


sigma <= sigma z when opmode = ”01” elses igma y ;

end behav io ra l ;

Last iteration block


entity l a s t i t e r a t i o n i sport ( x in : in word length x ;

y in : in word length y ;z i n : in word length y ;i : in s t d l o g i c v e c t o r (2 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;


s igma in : in s t d l o g i c v e c t o r (3 downto 0) ;x out : out s t d l o g i c v e c t o r (29 downto 0) ) ;

end l a s t i t e r a t i o n ;

architecture behav io ra l of l a s t i t e r a t i o n i s

signal z i n v e r t , i t e r 8 2 , xy inver t , xy extra , z l u t , i t e r 8 ,i ter 8 mux , i t e r 8 o u t : word length y ;

signal s igma 8 : s t d l o g i c v e c t o r (3 downto 0) ;signal carry out , enable : s t d l o g i c ;signal x and gate : word length x ;signal y and gate , z and gate : word length y ;

begin

enable <= ’1 ’ when i = ”100” else ’ 0 ’ ;s igma 8 <= sigma in when i = ”100” else ”1000” ;

process ( enable , x in )begin

for i in 0 to 30 loopx and gate ( i ) <= x in ( i ) and enable ;


process ( enable , y in )begin

for i in 0 to 34 loopy and gate ( i ) <= y in ( i ) and enable ;


process ( enable , z i n )begin

for i in 0 to 34 loopz and gate ( i ) <= z i n ( i ) and enable ;


process ( x and gate , y and gate , z and gate , opmode )begin

i f opmode = ”01” theni t e r 8 <= ”0000” & x and gate ;i t e r 8 2 <= y and gate ;

e l s i f opmode = ”11” theni t e r 8 <= y and gate ;

294 VHDL

i t e r 8 2 <= ”0000” & x and gate ;e l s i f opmode = ”10” then

i t e r 8 <= z and gate ;i t e r 8 2 <= X”00000000 ” & ”000” ;

elsei t e r 8 <= X”00000000 ” & ”000” ;i t e r 8 2 <= X”00000000 ” & ”000” ;


i t e r 8 o u t <= unsigned ( i t e r 8 ) + unsigned ( i t e r 8 mux ) ;x out <= i t e r 8 o u t (29 downto 0) ;

process ( sigma 8 , opmode )begin

i f s igma 8 (3 ) = ’1 ’ or opmode = ”11” or opmode = ”01”thenz l u t <= X”00000000 ” & ”000” ;

e l s i f s igma 8 (1 downto 0) = ”00” thenz l u t <= ”00” & t a n t a b l e 1 8 ;



elsez l u t <= ”00” & t a n t a b l e 4 8 ;


process ( sigma 8 , z l u t )begin

i f s igma 8 (2 ) = ’0 ’ thenz i n v e r t <= not ( z l u t ) ;

elsez i n v e r t <= z l u t ;


process ( xy inver t , z i n v e r t , opmode )begin

i f opmode = ”10” theni t e r 8 mux <= z i n v e r t (34) & z i n v e r t (34) & z i n v e r t

(34) & z i n v e r t (34) & z i n v e r t (34) & z i n v e r t (34)& z i n v e r t (34) & z i n v e r t (34) & z i n v e r t (34) &

z i n v e r t (34) & z i n v e r t (34) & z i n v e r t (34) &z i n v e r t (34) & z i n v e r t (34) & z i n v e r t (34) &z i n v e r t (34) & z i n v e r t (34) & z i n v e r t (34) &


z i n v e r t (34) & z i n v e r t (34) & z i n v e r t (34) &z i n v e r t (34) & z i n v e r t (34) & z i n v e r t (34) &z i n v e r t (34 downto 24) ;

elsei t e r 8 mux <= x y i n v e r t (34) & x y i n v e r t (34) &

x y i n v e r t (34) & x y i n v e r t (34) & x y i n v e r t (34) &x y i n v e r t (34) & x y i n v e r t (34) & x y i n v e r t (34) &x y i n v e r t (34) & x y i n v e r t (34) & x y i n v e r t (34) &x y i n v e r t (34) & x y i n v e r t (34) & x y i n v e r t (34) &x y i n v e r t (34) & x y i n v e r t (34) & x y i n v e r t (34) &x y i n v e r t (34) & x y i n v e r t (34) & x y i n v e r t (34) &x y i n v e r t (34) & x y i n v e r t (34) & x y i n v e r t (34) &x y i n v e r t (34) & x y i n v e r t (34 downto 24) ;


process ( xy extra , sigma 8 , opmode )begin

i f s igma 8 (2 ) = ’1 ’ and opmode = ”01”thenx y i n v e r t <= xy extra ;ca r ry out <= ’ 0 ’ ;

e l s i f s igma 8 (2 ) = ’0 ’ and opmode = ”11”thenx y i n v e r t <= xy extra ;ca r ry out <= ’ 0 ’ ;

elsex y i n v e r t <= not ( xy ext ra ) ;ca r ry out <= ’ 1 ’ ;


process ( i t e r 8 2 , s igma 8 )begin

i f s igma 8 (3 ) = ’1 ’ thenxy extra <= X”00000000 ” & ”000” ;

e l s i f s igma 8 (1 downto 0) = ”00” thenxy extra <= i t e r 8 2 (33 downto 0) & ’ 0 ’ ;

e l s i f s igma 8 (1 downto 0) = ”01” thenxy extra <= i t e r 8 2 (32 downto 0) & ”00” ;

e l s i f s igma 8 (1 downto 0) = ”10” thenxy extra <= unsigned ( i t e r 8 2 (33 downto 0) & ’0 ’ ) +

unsigned ( i t e r 8 2 (32 downto 0) & ”00” ) ;else

xy extra <= i t e r 8 2 (31 downto 0) & ”000” ;end i f ;

end process ;

end behav io ra l ;

296 VHDL

Shift y (multiplication by 8)


entity sh i f t mode i sport ( y in : in s t d l o g i c v e c t o r (31 downto 0) ;


end sh i f t mode ;

architecture behav io ra l of sh i f t mode i s

begin

process ( y in , opmode )begin

i f opmode = ”01” theny s h i f t <= ”000” & y in ;

elsey s h i f t <= y in & ”000” ;


end behav io ra l ;

Shift z (multiplication by 8)


entity sh i f t modez i sport ( y in : in s t d l o g i c v e c t o r (31 downto 0) ;

opmode : in s t d l o g i c v e c t o r (1 downto 0) ;y s h i f t : out word length z ) ;

end sh i f t modez ;

architecture behav io ra l of sh i f t modez i s

begin

process ( y in , opmode )begin

i f opmode = ”01” theny s h i f t <= ”000” & y in ;


elsey s h i f t <= y in & ”000” ;


end behav io ra l ;

298 VHDL

D.5 Scaling free

VHDL files for the implementation of the scaling free algorithm. The architecturesis described in section 3.6 beginning on page 52. The correlation between the files islisted below.

• Top level

– Scaling free block 1

∗ Shift block 1.a

∗ Shift block 1.b

∗ CPA

∗ Inverter


∗ Shift block 2.a

∗ Shift block 2.b

∗ CPA

∗ Inverter


∗ Shift block 3.a

∗ Shift block 3.b

∗ CPA

∗ Inverter


∗ Shift block 4.a

∗ Shift block 4.b

∗ CPA

∗ Inverter

– Register

– Multiplexer

– Controller

– Z decoder

– Lookup table

Used VHDL types

D.5 Scaling free 299


subtype p r e c i s i o n o u t i s s t d l o g i c v e c t o r (15 downto 0) ;subtype p r e c i s i o n i n i s s t d l o g i c v e c t o r (15 downto 0) ;subtype word length i s s t d l o g i c v e c t o r (29 downto 0) ;

−− c o n s t a n t sconstant word l ength ze ro : word length := ”

000000000000000000000000000000 ” ;constant s c a l e f a c t o r : unsigned (21 downto 0) := ”

0010011011011101001111 ” ;end package c o r d i c t y p e s ;

Top level






component l u tport ( z : in s t d l o g i c v e c t o r (4 downto 0) ;

x : out word length ;y : out word length ) ;

end component ;

component mux1port ( x1 : in word length ;

y1 : in word length ;z1 : in s t d l o g i c v e c t o r (3 downto 0) ;x i t e r a t i v e : in word length ;y i t e r a t i v e : in word length ;z i t e r a t i v e : in s t d l o g i c v e c t o r (3 downto 0) ;

300 VHDL

s e l : in s t d l o g i c v e c t o r (2 downto 0) ;x : out word length ;y : out word length ;z : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end component ;


r e s e t : in s t d l o g i c ;x in : in word length ;y in : in word length ;z i n : in s t d l o g i c v e c t o r (3 downto 0) ;x out : out word length ;y out : out word length ;z out : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end component ;

component c o r d i c b l o c k 1port ( x in : in word length ;

y in : in word length ;enable : in s t d l o g i c ;i : in s t d l o g i c v e c t o r (2 downto 0) ;x out : out word length ;y out : out word length ) ;

end component ;



end component ;



end component ;




end component ;


r e s e t : in s t d l o g i c ;z decoded : in s t d l o g i c v e c t o r (19 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;i o u t : out s t d l o g i c v e c t o r (2 downto 0) ;z out : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end component ;

component z decodeport ( z i n : in s t d l o g i c v e c t o r (3 downto 0) ;

z out : out s t d l o g i c v e c t o r (7 downto 0) ) ;end component ;

signal z decoded : s t d l o g i c v e c t o r (23 downto 0) ;signal x1 , y1 , x mux out , y mux out , argument reduct ion out :

word length ;signal y reg , x reg : word length ;signal x block1 out , y b lock1 out , x b lock2 out , y b lock2 out

: word length ;signal x block3 out , y b lock3 out , x b lock4 out , y b lock4 out

: word length ;signal count : s t d l o g i c v e c t o r (2 downto 0) ;signal z reg , z mux out , next z : s t d l o g i c v e c t o r (3 downto 0)

;

begin

argument reduct ion out <= z i n (29 downto 0) ;z decoded (15 downto 0) <= argument reduct ion out (19 downto 4) ;

l ookup tab l e : l u tport map( argument reduct ion out (28 downto 24) , x1 , y1 ) ;

z decoder : z decodeport map( argument reduct ion out (23 downto 20) , z decoded (23

downto 16) ) ;

mu l t i p l exe r : mux1

302 VHDL

port map( x1 , y1 , z decoded (23 downto 20) , x reg , y reg , z reg ,count , x mux out , y mux out , z mux out ) ;

b lock1 : c o r d i c b l o c k 1port map( x mux out , y mux out , z mux out (3 ) , count ,

x b lock1 out , y b lock1 out ) ;

b lock2 : c o r d i c b l o c k 2port map( x b lock1 out , y b lock1 out , z mux out (2 ) , count ,






r e g i s t e r 1 : regport map( c lk , r e s e t , x b lock4 out , y b lock4 out , next z , x reg

, y reg , z r e g ) ;

x out <= x block4 out ;y out <= y block4 out ;

c y c l e c o u n t e r : counterport map( c lk , r e s e t , z decoded (19 downto 0) , opmode , count ,

next z ) ;

end behav io ra l ;

Scaling free block 1



entity c o r d i c b l o c k 1 i sport ( x in : in word length ;









end component ;



end component ;



end component ;

signal i o u t l a t c h : s t d l o g i c v e c t o r (2 downto 0) ;signal zero , one : s t d l o g i c ;signal x s h i f t o u t , y s h i f t o u t , x s h i f t 2 o u t , y s h i f t 2 o u t :

word length ;signal x cpa1 out , y cpa1 out , i n v x s h i f t o u t ,

i n v y s h i f t o u t , i n v y s h i f t 2 o u t : word length ;signal x cpa2 out , y cpa2 out , i n v x s h i f t 2 o u t , x out l a t ch ,

y o u t l a t c h : word length ;

begin

zero <= ’ 0 ’ ;one <= ’ 1 ’ ;

s h i f t x : s h i f t 1port map( x out l a t ch , i o u t l a t c h , x s h i f t o u t ) ;

304 VHDL

s h i f t y : s h i f t 1port map( y out l a t ch , i o u t l a t c h , y s h i f t o u t ) ;

s h i f t 2 x : s h i f t 2 1port map( x out l a t ch , i o u t l a t c h , x s h i f t 2 o u t ) ;

s h i f t 2 y : s h i f t 2 1port map( y out l a t ch , i o u t l a t c h , y s h i f t 2 o u t ) ;

cpax1 : cpaport map( x out l a t ch , i n v x s h i f t 2 o u t , one , x cpa1 out ) ;

cpay1 : cpaport map( y out l a t ch , i n v y s h i f t 2 o u t , one , y cpa1 out ) ;

cpax2 : cpaport map( x cpa1 out , i n v y s h i f t o u t , one , x cpa2 out ) ;

cpay2 : cpaport map( y cpa1 out , x s h i f t o u t , zero , y cpa2 out ) ;

i n v e r t y s h i f t 2 : i n v e r t e rport map( y s h i f t 2 o u t , i n v y s h i f t 2 o u t ) ;

i n v e r t x s h i f t 2 : i n v e r t e rport map( x s h i f t 2 o u t , i n v x s h i f t 2 o u t ) ;


process ( x in , enable , x cpa2 out )begin

i f enable = ’1 ’ thenx out <= x cpa2 out ;

elsex out <= x in ;


process ( y in , enable , y cpa2 out )begin

i f enable = ’1 ’ theny out <= y cpa2 out ;

elsey out <= y in ;

end i f ;


end process ;

x o u t l a t c h <= x in ;y o u t l a t c h <= y in ;i o u t l a t c h <= i ;

end behav io ra l ;












end component ;



end component ;

component cpa

306 VHDL

port ( x : in word length ;y : in word length ;neg : in s t d l o g i c ;s : out word length ) ;

end component ;





begin

zero <= ’ 0 ’ ;one <= ’ 1 ’ ;















elsex out <= x in ;




elsey out <= y in ;



end behav io ra l ;







308 VHDL






end component ;



end component ;



end component ;





begin

zero <= ’ 0 ’ ;one <= ’ 1 ’ ;


s h i f t y : s h i f t 3


port map( y out l a t ch , i o u t l a t c h , y s h i f t o u t ) ;












elsex out <= x in ;




elsey out <= y in ;


310 VHDL


end behav io ra l ;












end component ;



end component ;




end component ;





begin

zero <= ’ 0 ’ ;one <= ’ 1 ’ ;











312 VHDL




elsex out <= x in ;




elsey out <= y in ;



end behav io ra l ;

Controller



r e s e t : in s t d l o g i c ;z decoded : in s t d l o g i c v e c t o r (19 downto 0) ;opmode : in s t d l o g i c v e c t o r (1 downto 0) ;i o u t : out s t d l o g i c v e c t o r (2 downto 0) ;z out : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end counter ;




begin

i o u t <= i ;

process ( i , r e s e t , z decoded , opmode )begin


i f r e s e t = ’1 ’ and opmode = ”01” thenn e x t i <= ”001” ;

elsen e x t i <= ”000” ;

end i f ;z out <= z decoded (19 downto 16) ;

when ”001” =>n e x t i <= ”010” ;z out <= z decoded (15 downto 12) ;




when ”101” =>n e x t i <= ”000” ;z out <= ”0000” ;

when others =>n e x t i <= ”000” ;z out <= ”0000” ;



i f r e s e t = ’0 ’ theni <= ”000” ;

e l s i f c lk ’ event and c l k = ’1 ’ theni <= n e x t i ;


end behav io ra l ;

314 VHDL

CPA


entity cpa i sport ( x : in word length ;


end cpa ;


signal car ry : word length ;

begin

s <= unsigned ( x )+unsigned ( y )+neg ;

end behav io ra l ;

Inverter


entity i n v e r t e r i sport ( x in : in word length ;

x out : out word length ) ;end i n v e r t e r ;


begin


end behav io ra l ;

Multiplexer

l ibrary IEEE ;


use IEEE . s t d l o g i c 1 1 6 4 . a l l ;use work . c o r d i c t y p e s . a l l ;use IEEE . s t d l o g i c a r i t h . a l l ;

entity mux1 i sport ( x1 : in word length ;

y1 : in word length ;z1 : in s t d l o g i c v e c t o r (3 downto 0) ;x i t e r a t i v e : in word length ;y i t e r a t i v e : in word length ;z i t e r a t i v e : in s t d l o g i c v e c t o r (3 downto 0) ;s e l : in s t d l o g i c v e c t o r (2 downto 0) ;x : out word length ;y : out word length ;z : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end mux1 ;


begin

process ( x1 , y1 , z1 , x i t e r a t i v e , y i t e r a t i v e , z i t e r a t i v e ,s e l ) i sbegin

i f s e l = ”000” thenx <= x1 ;y <= y1 ;z <= z1 ;

elsex <= x i t e r a t i v e ;y <= y i t e r a t i v e ;z <= z i t e r a t i v e ;


end behav io ra l ;

Register



r e s e t : in s t d l o g i c ;

316 VHDL

x in : in word length ;y in : in word length ;z i n : in s t d l o g i c v e c t o r (3 downto 0) ;x out : out word length ;y out : out word length ;z out : out s t d l o g i c v e c t o r (3 downto 0) ) ;

end reg ;


begin


i f r e s e t = ’0 ’ thenx out <= word l ength ze ro ;y out <= word l ength ze ro ;z out <= ”0000” ;

e l s i f c lk ’ event and c l k = ’1 ’ thenx out <= x in ;y out <= y in ;z out <= z i n ;


end behav io ra l ;

Shift block 1.a




end s h i f t 1 ;

architecture b e h a v i o r a l b l o c k 1 of s h i f t 1 i s

begin


case i i s


when ”000” =>x s h i f t <= ”00000000 ” & x in (29 downto 8) ;

when ”001” =>x s h i f t <= ”00000000 ” & x in (29 downto 8) ;

when ”010” =>x s h i f t <= ” 0000000000 ” & x in (29 downto 10) ;

when ”011” =>x s h i f t <= ” 00000000000000 ” & x in (29 downto 14) ;

when ”100” =>x s h i f t <= ” 000000000000000000 ” & x in (29 downto 18)

;when others =>



end b e h a v i o r a l b l o c k 1 ;

Shift block 1.b




end s h i f t 2 1 ;

architecture b e h a v i o r a l b l o c k 1 of s h i f t 2 1 i s

begin






when ”011” =>

318 VHDL

x s h i f t <= ” 00000000000000000000000000000 ” & x in(29) ;

when others =>x s h i f t <= word l ength ze ro ;



Shift block 2.a




end s h i f t 2 ;


begin










end process ;



Shift block 2.b




end s h i f t 2 2 ;


begin





x s h i f t <= ” 00000000000000000000000 ” & x in (29downto 23) ;




Shift block 3.a




end s h i f t 3 ;

320 VHDL


begin










end process ;


Shift block 3.b




end s h i f t 2 3 ;


begin






x s h i f t <= ” 0000000000000000000000000 ” & x in (29downto 25) ;




Shift block 4.a




end s h i f t 4 ;


begin









downto 25) ;

322 VHDL



Shift block 4.b




end s h i f t 2 4 ;


begin





when ”010” =>x s h i f t <= ” 000000000000000000000000000 ” & x in (29


x s h i f t <= word l ength ze ro ;end case ;

end process ;


Z decoder



entity z decode i sport ( z i n : in s t d l o g i c v e c t o r (3 downto 0) ;

z out : out s t d l o g i c v e c t o r (7 downto 0) ) ;end z decode ;

architecture behav io ra l of z decode i s

begin

process ( z i n )begincase z i n (3 downto 1) i s

when ”000” =>z out (7 downto 1) <= ”0000” & ”000” ;

when ”001” =>z out (7 downto 1) <= ”1000” & ”000” ;

when ”010” =>z out (7 downto 1) <= ”1100” & ”000” ;

when ”011” =>z out (7 downto 1) <= ”1110” & ”000” ;

when ”100” =>z out (7 downto 1) <= ”1111” & ”000” ;

when ”101” =>z out (7 downto 1) <= ”1111” & ”100” ;

when ”110” =>z out (7 downto 1) <= ”1111” & ”110” ;

when others =>z out (7 downto 1) <= ”1111” & ”111” ;


z out (0 ) <= z i n (0 ) ;

end behav io ra l ;

Lookup table


entity l u t i sport ( z : in s t d l o g i c v e c t o r (4 downto 0) ;

x : out word length ;y : out word length ) ;

end l u t ;

324 VHDL

architecture behav io ra l of l u t i s

begin

process ( z ) i sbegincase z i s

when ”00000” =>x <= ” 100000000000000000000000000000 ”

;y <= ” 000000000000000000000000000000 ”

;when ”00001” =>

x <= ” 011111111111000000000000010101 ”;

y <= ” 000000111111111111010101010101 ”;

when ”00010” =>x <= ” 011111111100000000000101010101 ”

;y <= ” 000001111111111010101010101110 ”

;when ”00011” =>

x <= ” 011111110111000000011010111111 ”;

y <= ” 000010111111101110000000100000 ”;

when ”00100” =>x <= ” 011111110000000001010101010010 ”

;y <= ” 000011111111010101010111011101 ”

;when ”00101” =>

x <= ” 011111100111000011010000001010 ”;

y <= ” 000100111110101100110001001011 ”;

when ”00110” =>x <= ” 011111011100000110101111011111 ”

;y <= ” 000101111101110000010000001011 ”

;when ”00111” =>

x <= ” 011111001111001100011111000011 ”;

y <= ” 000110111100011011111000010011 ”;


when ”01000” =>x <= ” 011111000000010101010010011111 ”

;y <= ” 000111111010101011101110110101 ”

;when ”01001” =>

x <= ” 011110101111100010000101001111 ”;

y <= ” 001000111000011011111010110010 ”;

when ”01010” =>x <= ” 011110011100110011111010100000 ”

;y <= ” 001001110101101000100101001011 ”

;when ”01011” =>

x <= ” 011110001000001011111101001001 ”;

y <= ” 001010110010001101111001001111 ”;

when ”01100” =>x <= ” 011101110001101011011111101011 ”

;y <= ” 001011101110001000000100101010 ”

;when ”01101” =>

x <= ” 011101011001010011111100000111 ”;

y <= ” 001100101001010011010111110100 ”;

when ”01110” =>x <= ” 011100111111000110110011111010 ”

;y <= ” 001101100011101100000110000010 ”

;when ”01111” =>

x <= ” 011100100011000101101111111000 ”;

y <= ” 001110011101001110100101110001 ”;

when ”10000” =>x <= ” 011100000101010010100000000110 ”

;y <= ” 001111010101110111010000111010 ”

;when ”10001” =>

326 VHDL

x <= ” 011011100101101110111011101111 ”;

y <= ” 010000001101100010100100111001 ”;

when ”10010” =>x <= ” 011011000100011101000001000001 ”

;y <= ” 010001000100001101000011000100 ”

;when ”10011” =>

x <= ” 011010100001011110110101000011 ”;

y <= ” 010001111001110011010000110010 ”;

when ”10100” =>x <= ” 011001111100110110100011101101 ”

;y <= ” 010010101110010001110111101010 ”

;when ”10101” =>

x <= ” 011001010110100110011111100000 ”;

y <= ” 010011100001100101100101110100 ”;

when ”10110” =>x <= ” 011000101110110001000001011010 ”

;y <= ” 010100010011101011001110000001 ”

;when ”10111” =>

x <= ” 011000000101011000101000110001 ”;

y <= ” 010101000100011111100111111101 ”;

when ”11000” =>x <= ” 010111011010011111111011000110 ”

;y <= ” 010101110011111111110000010110 ”

;when others =>

x <= ” 010110101110001001100011111010 ”;

y <= ” 010110100010001000101001001110 ”;



end behav io ra l ;

328 VHDL

Bibliography

[1] Ray Andraka. A survey of CORDIC algorithms for FPGA based computers. InProceedings of the ACM/SIGDA Sixth International Symposium on Field Pro-grammable Gate Arrays (FPGA-98), pages 191–200, New York, February 22–241998. ACM Press.

[2] E. Antelo, J.D. Bruguera, and E.L. Zapata. Unified mixed radix 2-4 redundantcordic processor. IEEE Transactions on Computers, 45(9):1068–1073, 1996.

[3] E. Antelo, J. Villalba, J.D. Bruguera, and E.L. Zapata. High performance rota-tion architectures based on the radix-4 cordic algorithm. IEEE Transactions onComputers, 46(8):855–870, 1997.

[4] J.D. Bruguera, E. Antelo, and E.L. Zapata. Design of a pipelined radix 4 cordicprocessor. Parallel Computing, 19(7):729–744, 1993.

[5] J.D. Bruguera, J. Villalba, E. Antelo, and E.L. Zapata. Radix-4 vectoring cordicalgorithm and architectures. The Journal of VLSI Signal Processing, 19(2):127–147, 1998.

[6] Li-Ping Chu, Jia-Ching Wang, and Jhing-Fa Wang. Vlsi architecture design forconcatenative speech synthesizer. TENCON 2005 2005 IEEE Region 10, pages1–5, 2005.

[7] H. Dawid and H. Meyr. Chapter 24 - CORDIC Algorithms and Architectures.

[8] A.S. Dhar and S. Banerjee. An array architecture for fast computation of discretehartley transform. IEEE Transactions on Circuits and Systems, 38(9):1095–1098,1991.

[9] M. D. Ercegovac and T. Lang. Digital Arithmetic. Morgan Kaufmann Publishers,2004.

[10] J. C. Gomes, R. P. Nunes, D. Barone, and S. Bampi. Design of functional blocksfor a speech recognition portable system. 2001.

330 BIBLIOGRAPHY

[11] Shen-Fu Hsiao and Jen-Yin Chen. Design, implementation and analysis of a newredundant cordic processor with constant scaling factor and regular structure.The Journal of VLSI Signal Processing, 20(3):267–278, 1998.

[12] Harold H. Kim and David M. Barrs. Hearing aids: A review of what’s new.Otolaryngology - Head and Neck Surgery, 134(6):1043–1050, 2006.

[13] K. Kota and J.R. Cavallaro. Numerical accuracy and hardware tradeoffs forcordic arithmetic for special-purpose processors. IEEE Transactions on Comput-ers, 42(7):769–779, 1993.

[14] T. Lang and E. Antelo. Cordic-based computation of arccos and arcsin.Application-Specific Systems, Architectures and Processors, 1997. Proceedings.,IEEE International Conference on, pages 132–143, 1997.

[15] J.-A. Lee and T. Lang. Constant-factor redundant cordic for angle calculationand rotation. IEEE Transactions on Computers, 41(8):1016–1025, 1992.

[16] H. Luo and H. Arndt. Digital signal processing technology and applications inhearing aids. Signal Processing, 2002 6th International Conference on, 2:1727–1730, 2002.

[17] K. Maharatna, A. Troya, S. Banerjee, and E. Grass. Virtually scaling-freeadaptive cordic rotator. Computers and Digital Techniques, IEE Proceedings-,151(6):448–456, 2004.

[18] A. Maheshwari, W. Burleson, and R. Tessier. Trading off reliability and power-consumption in ultra-low power systems. Quality Electronic Design, 2002. Pro-ceedings. International Symposium on, pages 361–366, 2002.

[19] C. Mazenc, X. Merrheim, and J.-M. Muller. Computing functions cos/sup -1/and sin/sup -1/ using cordic. IEEE Transactions on Computers, 42(1):118–122,1993.

[20] U. Meyer-Base, A. Meyer-Base, J. Mellott, and F. Taylor. A fast modified cordic-implementation of radial basis neural networks. The Journal of VLSI SignalProcessing, 20(3):211–218, 1998.

[21] R.R. Osorio, E. Antelo, J.D. Bruguera, J. Villalba, and E.L. Zapata. Digit on-linelarge radix cordic rotator. Application Specific Array Processors, 1995. Proceed-ings., International Conference on, pages 246–257, 1995.

[22] J.-A. Pineiro, S.F. Oberman, J.-M. Muller, and J.D. Bruguera. High-speed func-tion approximation using a minimax quadratic interpolator. IEEE Transactionson Computers, 54(3):304–318, 2005.

[23] J.-A. Pineiro, S.F. Oberman, J.-M. Muller, and J.D. Bruguera. High-speed func-tion approximation using a minimax quadratic interpolator. IEEE Transactionson Computers, 54(3):304–318, 2005.

[24] Konstantinos Sarrigeorgidis and Jan Rabaey. Ultra low power cordic processorfor wireless communication algorithms. The Journal of VLSI Signal Processing,38(2):115–130, 2004.

[25] Spiegel, M. R. Mathematical Handbook of Formulas and Tables. McGraw-Hill,1999.

BIBLIOGRAPHY 331

[26] A. Stammermann, L. Kruse, W. Nebel, A. Pratsch, E. Schmidt, M. Schulte,and A. Schulz. System level optimization and design space exploration for lowpower. System Synthesis, 2001. Proceedings. The 14th International Symposiumon, pages 142–146, 2001.

[27] S. Suchitra, S.K. Lam, and T. Srikanthan. Novel schemes for high-throughputimage rotation. Signals, Systems and Computers, 2004. Conference Record of theThirty-Eighth Asilomar Conference on, 2:1884–1888, 2004.

[28] N. Takagi, T. Asada, and S. Yajima. A hardware algorithm for computing sineand cosine using redundant binary representation. Systems and Computers inJapan, 18(8):1–9, 1987.

[29] Jack E. Volder. The birth of cordic. The Journal of VLSI Signal Processing,25(2):101–105, 2000.

[30] J.E. Volder. Cordic trigonometric computing technique. Institute of Radio Engi-neers – Transactions on Electronic Computers, EC-8(3):330–334, 1959.

[31] J. S. Walther. A unified algorithm for elementary functions. volume 38, pages379–385, 1971.

[32] Neil H. Weste and David Harris. CMOS VLSI Design: A Circuits and SystemsPerspective. Addison-Wesley, 2004.

[33] Ruiqi Zhang, Jong Hun Han, A.T. Erdogan, and T. Arslan. Low power cordicip core implementation. Acoustics, Speech and Signal Processing, 2006. ICASSP2006 Proceedings. 2006 IEEE International Conference on, 3:III–III, 2006.

design of a low power processor for trigonometric functions for

Documents