fpga frequency estimation

16
Digital Signal Processing 18 (2008) 1029–1044 Contents lists available at ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp FPGA-based system for frequency detection of the main periodic component in time series information E. Cabal-Yepez a,, T.D. Carozzi b , R. de J. Romero-Troncoso a , M.P. Gough c , N. Huber c a Digital Systems Group, FIMEE, Universidad de Guanajuato, Mexico b Astronomy and Astrophysics Group, University of Glasgow, United Kingdom c Space Science Group, University of Sussex, United Kingdom article info abstract Article history: Available online 9 April 2008 Keywords: FPGA Frequency estimation Time series Fast computation engine This paper presents a novel algorithm, called the DFSWT, and its FPGA-based hardware processing unit for frequency estimation of a time series main periodic component. Since the DFSWT uses just additions and subtractions, it is simpler to compute than the FFT, and since its spectrum is a frequency function, it is more intuitive than the Walsh transform. The results show that the proposed algorithm is very efficient in detecting the frequency of the main periodic component, even in low SNR. The proposed hardware processing unit is 3 orders of magnitude faster than its respective software implementation and presents advantages regarding to power consumption, footprint, and computation speed against highly optimized commercially available FFT cores. © 2008 Elsevier Inc. All rights reserved. 1. Introduction Analysis of information in order to find hidden periodicities has habilitated scientists for better understanding the phe- nomena taking place in the surrounding world and contributing to the improvement of technology by making possible the development of faster and more reliable systems, since main periodic component frequency determination of a signal embedded in noise is a problem in many common applications including space science analysis and it is at the heart of signal processing to solve problems of control, communications, instrumentation, etc. For instance, Martin and Johnston [1] used frequency detection in control systems to reduce disturbance effects acting on some linear and nonlinear systems. The process consists of finding the frequency of the external disturbance and designing a controller accordingly. Sheu et al. [2] used frequency detection of the main periodic signal (or carrier) to increase the intelligibility of the transmitted informa- tion in communication systems. In this case, the process consists of suppressing noise from the incoming periodic signal and then finding its frequency by a multiple filter array. Homs-Corbera et al. [3] used fundamental frequency detection of sounds produced during breathing for diagnosing respiratory illnesses. This process requires of computing a normalized power spectral density that is split into subsets and compared against constant threshold values to locate possible wheezing frequencies. In space science applications, frequency detection can provide knowledge that helps in the understanding of events happening in the near Earth environment. Frequency detection can be used for detecting wave-particle interactions in space plasma scenarios as described in the work of Gough et al. [4]. Carozzi et al. [5] use the digital wave processing (DWP) instrument onboard the Cluster-II space exploration mission for detecting electron waves in the space plasma. In this process the information is taken and sent to an on-ground station to be Fourier transformed in order to find the electron disturbance frequency. * Corresponding author. E-mail address: [email protected] (E. Cabal-Yepez). 1051-2004/$ – see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.dsp.2008.04.002

Upload: julius-teo

Post on 14-Sep-2015

27 views

Category:

Documents


2 download

DESCRIPTION

fpga estimate signal frequency

TRANSCRIPT

  • Digital Signal Processing 18 (2008) 10291044

    Contents lists available at ScienceDirect

    Fc

    Ea Db Ac S

    a

    ArAv

    KeFPFrTiFa

    1.

    nothemsiusprustianofpofrevin(Dprdi

    *

    10doDigital Signal Processing

    www.elsevier.com/locate/dsp

    PGA-based system for frequency detection of the main periodicomponent in time series information

    . Cabal-Yepez a,, T.D. Carozzi b, R. de J. Romero-Troncoso a, M.P. Gough c, N. Huber c

    igital Systems Group, FIMEE, Universidad de Guanajuato, Mexicostronomy and Astrophysics Group, University of Glasgow, United Kingdompace Science Group, University of Sussex, United Kingdom

    r t i c l e i n f o a b s t r a c t

    ticle history:ailable online 9 April 2008

    ywords:GAequency estimationme seriesst computation engine

    This paper presents a novel algorithm, called the DFSWT, and its FPGA-based hardwareprocessing unit for frequency estimation of a time series main periodic component. Sincethe DFSWT uses just additions and subtractions, it is simpler to compute than the FFT, andsince its spectrum is a frequency function, it is more intuitive than the Walsh transform.The results show that the proposed algorithm is very ecient in detecting the frequencyof the main periodic component, even in low SNR. The proposed hardware processing unitis 3 orders of magnitude faster than its respective software implementation and presentsadvantages regarding to power consumption, footprint, and computation speed againsthighly optimized commercially available FFT cores.

    2008 Elsevier Inc. All rights reserved.

    Introduction

    Analysis of information in order to nd hidden periodicities has habilitated scientists for better understanding the phe-mena taking place in the surrounding world and contributing to the improvement of technology by making possiblee development of faster and more reliable systems, since main periodic component frequency determination of a signalbedded in noise is a problem in many common applications including space science analysis and it is at the heart of

    gnal processing to solve problems of control, communications, instrumentation, etc. For instance, Martin and Johnston [1]ed frequency detection in control systems to reduce disturbance effects acting on some linear and nonlinear systems. Theocess consists of nding the frequency of the external disturbance and designing a controller accordingly. Sheu et al. [2]ed frequency detection of the main periodic signal (or carrier) to increase the intelligibility of the transmitted informa-on in communication systems. In this case, the process consists of suppressing noise from the incoming periodic signald then nding its frequency by a multiple lter array. Homs-Corbera et al. [3] used fundamental frequency detectionsounds produced during breathing for diagnosing respiratory illnesses. This process requires of computing a normalizedwer spectral density that is split into subsets and compared against constant threshold values to locate possible wheezingequencies. In space science applications, frequency detection can provide knowledge that helps in the understanding ofents happening in the near Earth environment. Frequency detection can be used for detecting wave-particle interactionsspace plasma scenarios as described in the work of Gough et al. [4]. Carozzi et al. [5] use the digital wave processingWP) instrument onboard the Cluster-II space exploration mission for detecting electron waves in the space plasma. In thisocess the information is taken and sent to an on-ground station to be Fourier transformed in order to nd the electronsturbance frequency.

    Corresponding author.E-mail address: [email protected] (E. Cabal-Yepez).

    51-2004/$ see front matter 2008 Elsevier Inc. All rights reserved.i:10.1016/j.dsp.2008.04.002

  • 1030 E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044

    memcocomimanloAlcoonofWantrAhco

    neitsdeVi6FFinetorfoenwcrcapa

    itslorepobynoinarmwththpe

    icdefociCo

    2.

    2.1

    trHidden periodicity problem is quite important as exposed above in [15]; therefore, its reexamination to make theost of technological advances is justied. Many methods have been proposed for frequency estimation of periodic signalsbedded in noise; most of them relying on the fast Fourier transform (FFT). Ganesan [6] proposed an algorithm to estimatemplex sinusoid frequencies at low signal-to-noise ratio (SNR), unfortunately his algorithm requires (besides of the FFT) themputation of a covariance matrix and the modelling of the signal eigenvectors. Yang et al. [7] presented a carrier recoveryethod that requires the FFT and a Kalman lter in an open-loop system, which increases its complexity in its hardwareplementation. Park et al. [8] proposed an algorithm to detect the maximum Doppler frequency using a windowed FFT,average function and the difference between adjacent power spectrums; besides its complexity, this algorithm requires

    ng sequences of data to work properly in low SNR, which generates an estimation delay and degrades its effectiveness.though the FFT offers a fast processing engine, it requires the computation of transcendental functions like sin(x) ands(x), which undermines its performance, and moreover, as it has been thoroughly studied, its throughput depends heavilythe SNR of the analysed information as shown by Palmer [9], Kanai et al. [10] and Calvo et al. [11]. In order to ease somethe complexities of the FFT, different types of transforms have been introduced for frequency estimation. For instance, thealsh functions, which consist of a set of irregular asymmetric rectangular waveforms with only two amplitude values (1d 1), are easy to implement since, in principle, they require no multiplications or trigonometry. Unfortunately the Walshansform, also known as the Hadamard transform or binary Fourier representation (BIFORE) as described in the work ofmed et al. [12], suffers from the fact that its spectrum is dicult to interpret during frequency estimation since it uses thencept of sequency rather than frequency, as shown by Beauchamp [13,14], requiring additional effort during its utilization.Field programmable gate arrays (FPGAs) and hardware description languages (HDLs) have motivated the development ofw architectures for frequency estimation focusing on the improvement of the FFT shortcomings, especially those regardingthroughput. Sukhsawas and Benkrid [15] carried out an FPGA implementation of a scalable pipelined 1024-point FFTscribed in HDL that processes about 34 MSPS by implementing a radix-4 algorithm, but using radix-2 buttery structures.te-Frias et al. [16] described and implemented a 1024-point radix-4 FFT core under HDL for FPGA, which processes aboutMSPS by simplifying the computational structures of the complex multiplication and the buttery units involved in theT calculation. Benhamid and Othman [17] proposed an 8-point FFT hardware architecture under HDL and implemented itto an FPGA, which processes about 229 MSPS by replacing complex multiplications with shift-and-add operations, and Linal. [18] proposed and implemented in an ASIC a 64-point FFT/IFFT processor for communications based on double-ratethogonal frequency-division multiplexing (OFDM), which processes about 8 MSPS. Although there has been a great effortr increasing the FFT throughput, there still are applications with tougher constrains that demand faster computationsgines with smaller footprints. On the other hand, the popularity of HDLs comes up from the fact that they are used forriting executable specications that provide system designers with the ability to model a piece of hardware before beingeated physically in an FPGA or an ASIC. FPGAs are rapid-prototyping vehicles whose computations are done spatially andn be programmed by the end-user (using HDLs) to perform the desired functionality. Given its large amount of ne-grainrallelism, an FPGA-based system achieves its performance improvement by executing many several functions concurrently.This paper presents the development of the discrete Fourier square-wave transform (DFSWT) in an open architecture andhardware implementation in an FPGA for frequency estimation of the main periodic component of a discrete signal with

    w SNR, the approach shows to be highly ecient for this task, and its hardware implementation unit is highly parallel,aching processing rates of around 251 MSPS, which overcomes by far the previously reported performances with lowwer consumption. The proposed DFSWT improves the square wave transform introduced by Haweel and Alhasan in [19]facilitating its implementation. It is signicantly simpler to compute than the FFT, since it requires neither trigonometryr complex arithmetic, and in contrast to the Walsh transform, its spectrum is a function of frequency making it moretuitive. All of these make the DFSWT suitable for applications where a fast engine, with low power consumption and smallea utilization is required (e.g., on-board instrumentation on space science missions [5]) for estimating the frequency of theain periodic component of signals embedded in noise. In order to test its effectiveness, the DFSWT FPGA implementationas compared against a highly optimized FFT core regarding resource consumption and computing power. The results showat the DFSWT algorithm can be implemented in a single chip with a highly parallel architecture, which in contrast toe FFT, just requires additions, subtractions and accumulations, making it a powerful computation engine, able to detectriodicity in discrete signals with low SNR and consuming low power.The remainder of this paper is organized as follows. Section 2 formally introduces the DFSWT by given its mathemat-

    al background, carrying out an analysis of its harmonic distortion, and describing its algorithmic computation. Section 3scribes each component of the FPGA-based DFSWT hardware processing unit. In Section 4 two different cases of studyr testing the DFSWT effectiveness are dened. The results of the FPGA-based DFSWT hardware implementation, its e-ency and its performance comparison against a highly optimised commercially available FFT core are given in Section 5.nclusions and remarks are presented in Section 6.

    Discrete Fourier square-wave transform

    . Mathematical background of the DFSWT

    A general discrete transform Z(k) of a discrete time signal z(n) with sampling period N is obtained by applying aansformation kernel KnkN as shown in the equation

  • E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044 1031

    ha

    andiDad

    diQTh

    is

    AsiapdiththaZsynosian

    InseZ(k) = 1N

    N1n=0

    z(n)KnkN for 0 k < N and 0 n < N. (1)

    The well-known discrete Fourier transform (DFT) Z F (k) of a discrete time signal z(n), which is shown in the equation

    Z F (k) = 1N

    N1n=0

    z(n)WnkN for 0 k < N and 0 n < N, (2)

    s a transformation kernel WnkN dened by the equation

    WnkN = cos(k2n

    N

    )+ j sin

    (k2n

    N

    )= I F + jQ F . (3)

    From (3), the real components of the transformation kernel WnkN are known as the in-phase components I F of WnkN

    d the imaginary components are known as the quadrature components Q F of WnkN . The FFT is obtained by applying avide-and-conquer approach to the DFT, the approach consists on decomposing an N-point DFT into successively smallerFTs in order to exploit the symmetry and periodicity of the phase factor WN . The number of complex multiplications andditions required by a radix-2 FFT is given by (4) and (5), respectively, as shown in the work of Proakis and Manolakis [20].(

    N

    2

    )log2 N, (4)

    N log2 N. (5)

    Haweel and Alhasan [19] considered basis sets of discrete-waves HnkN and linear transforms ZH (k) based on them. Thescrete square-wave basis sets are obtained by limiting the Fourier transform basis set of sinusoids WnkN allowing IH andH just to take the values of 1 and 1, which relaxes the computational load by trivializing the multiplication operations.e discrete transform ZH (k) and its kernel HnkN are given by the following equations:

    ZH (k) = 1N

    N1n=0

    z(n)HnkN for 0 k < N/2 and 0 n < N, (6)

    HnkN = sgn[cos

    (k2n

    N

    )]+ jsgn

    [sin

    (k2n

    N

    )]= IH + jQ H . (7)

    In (6) and (7), N is the size of the transform, n is the discrete time index, k is the discrete frequency index, and sgn(x)the signum function dened by Haweel and Alhasan as in the equation

    sgn(x) ={

    1, if x > 0 or x = 0+,1, if x < 0 or x = 0. (8)

    Though ZH (k) reduces the operations at expenses of introducing distortion, the denition provided by Haweel andlhasan is dicult to implement in a digital system since the system precision is nite and the values of the functionsn(x) and cos(x) cannot be represented exactly (except for zero). The evaluation of the functions sin(x) and cos(x) atproximate roots will be in general zero, which will produce a seemingly random assignation of 1 or 1 during thescrete square-wave basis set generation, thus in practice the assignment of 1 or 1 in (7) must be done carefully whene argument of the functions sin(x) and cos(x) is equal to m/2 for m = 1,2,3, . . . . Although it is possible to work arounde problem in the Haweel and Alhasan proposal by checking and treating the zeros, the proposal in this paper introducescompletely different denition for the square-wave basis set called the discrete Fourier square-wave transform (DFSWT)S (k), which eliminates the seemingly random assignment of 1 or 1 by avoiding the zero crossing points with perfectlymmetric square-waves basis sets taking the values of 1 or 1, only. Similar to ZH (k), the DFSWT ZS (k) computation doest involve trigonometric functions, neither complex arithmetic operations as multiplication or division. It just requiresmple arithmetic operations as addition and subtraction which makes it easier and faster to calculate. The DFSWT ZS (k)d its transformation kernel SnkN are dened by the following equations:

    ZS (k) = 1N

    N1n=0

    z(n)SnkN for 0 k < N/2 and 0 n < N, (9)

    SnkN = sgn[cos

    (k2n

    N

    )+ 0

    ]+ jsgn

    [sin

    (k2n

    N

    )+ 0

    ]= I TS + jQ TS . (10)

    (10), the introduction of a small phase 0 eliminates the zero crossings of the transposed square-wave basis sets: in-phaset I TS and quadrature set Q

    TS . The introduced phase 0 is dened by

    0 = 12

    (2

    N

    )=

    N. (11)

  • 1032 E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044

    2.

    th

    an

    Itof

    itpe

    an

    te

    fredi2. Distortion introduced by the DFSWT

    This section presents an estimate of the distortion introduced during the DFSWT utilization for frequency estimation ofe main periodic component of a discrete signal z(n). By using the variable proposed in

    r = k2nN

    + 0 = k2nN

    + N

    = (k2n + 1)N

    = pN

    , (12)

    d substituting it into (10), I TS and QTS can be stated as

    p = 2kn + 1 ={1 for k = 0,1,3,5, . . . ,2N + 1 for k = 1,1,5,9, . . . ,4N + 1 for k = 2,

    I TS ={

    1 for 0 < r < /2 and 3/2 < r < 2,1 for /2 < r < 3/2, (13)

    Q TS ={

    1 for 0 < r < ,1 for < r < 2. (14)

    must be noticed that r never takes the values of: 0, /2, , 3/2 or 2 to make zero I TS or QTS due to the introduction

    the small phase 0, for N > 2 and N = 2m; therefore, I TS and Q TS are symmetric square functions.From (13) and (14), it is easy to see that I TS (r) = Q TS (r+/2) and being Q TS an even function with symmetric amplitude,can be established that there is a sine series representation for Q TS as shown in (15) where the average value during oneriod is zero (a0 = 0) and bi are the sine expansion coecients.

    Q TS (r) = b1 sin(r) + b2 sin(2r) + b3 sin(3r) + + bk sin(kr), k < N/2. (15)From (15), it can be obtained

    I TS (r) = b1 cos(r) + b2 cos(2r) + b3 cos(3r) + + bk cos(kr), k < N/2, (16)d the DFSWT kernel SnkN can be written as

    SnkN =[b1 cos(r) + b2 cos(2r) + b3 cos(3r) + + bk cos(kr)

    ]+ j[b1 sin(r) + b2 sin(2r) + b3 sin(3r) + + bk sin(kr)]. (17)

    By replacing r in (17) with the value obtained in (12) and rearranging terms. SnkN can be rewritten as

    SnkN = b1[cos

    (2kn + 1)N

    + j sin (2kn + 1)N

    ]+ b2

    [cos2

    (2kn + 1)N

    + j sin2(2kn + 1)N

    ]

    + b3[cos3

    (2kn + 1)N

    + j sin3(2kn + 1)N

    ]+ + bk

    [cosk

    (2kn + 1)N

    + j sink(2kn + 1)N

    ], (18)

    cos(2kn + 1)

    N+ j sin (2kn + 1)

    N= e j (2kn+1)N = e j k2nN e j N = e j N WnkN . (19)

    Utilizing the Euler identity in (19) and substituting it in (18), (20) is obtained.

    Snkn = b1e jN WnkN + b2e j

    2N W 2nkN + b3e j

    3N W 3nkN + + bke

    kN WknkN . (20)

    Consequently, from (20) the proposed DFSWT can be stated as

    ZS (k) = 1N

    N1n=0

    z(n)[b1e

    j N WnkN + b2e j2N W 2nkN + b3e j

    3N W 3nkN + + bke

    kN WknkN

    ],

    ZS (k) = 1Nb1e

    j N

    N1n=0

    z(n)WnkN +1

    Nb2e

    j 2N

    N1n=0

    z(n)W 2nkN + +1

    Nbke

    kN

    N1n=0

    z(n)WknkN . (21)

    From (21), ZS (k) can be seen as an scaled by b1e jN version of the DFT plus some harmonic distortion introduced by the

    rms: b2,b3, . . . ,bk located at superior order harmonics; therefore, (21) can be rewritten as

    ZS (k) = 1Nb1e

    j N Z F (k) + 1Nb2e

    j 2N

    N1n=0

    z(n)W 2nkN + +1

    Nbke

    kN

    N1n=0

    z(n)WknkN . (22)

    From (22), it can be seen that the proposed DFSWT ZS (k) is a good approximation to the DFT Z F (k) when the onlyquency of interest is that of the main periodic component of the signal. Also, from (22) it can be seen that the introducedstortion is found in the harmonics of superior order than that of the fundamental frequency.

  • E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044 1033

    TaPe

    N

    10

    Ta

    inSn

    th

    2.

    XQ

    msafraleq

    sesqinco

    direinThag

    foon(NXble 1rcent value of HDi for different N-point DFSWT

    HD1 HD2 HD3 HD4

    8 41.42 16 35.12 23.46 19.89 32 33.77 20.79 15.45 12.6864 33.44 20.19 14.56 11.4824 33.33 20.00 14.28 11.11

    The harmonic distortion (HD) of the DFSWT, compared with the DFT, can be obtained for each bi by the equation

    HDi = bib1

    100%. (23)ble 1 shows different numerical values for HDi at different N .From the stated above, it can be said that if the only interest is to nd the frequency of the main periodic componenta signal, then the DFSWT is equivalent to the DFT for this purpose, since the distortion introduced by the DFSWT kernelkN is found in the harmonics of superior order than the fundamental frequency, which do not affect the detectability ofe fundamental component.

    3. Algorithmic computation of the DFSWT

    An N-point DFSWT ZS (k) of a sampled signal z(n) is obtained by computing its discrete-frequency components in-phase(k) and quadrature Y (k), which in turn are computed by multiplying the N N/2 transposed in-phase I TS and quadratureTS matrices by the 1 N signal vector z(n) = [z(0), z(1), . . . , z(N 1)] as shown in the following equations:

    X(k) =N1n=0

    z(n)I TS , (24)

    Y (k) =N1n=0

    z(n)Q TS . (25)

    In (24) and (25), k [0, (N/2)1] correspond to the discrete frequency index, the in-phase matrix I S and the quadratureatrix Q S consist of a set of N/2 square waves (discrete frequency components) with amplitude 1 or 1 and length Nmples which create the discrete square-wave basis set of the DFSWT, the new basis set denition is completely differentom that proposed by Haweel and Alhasan in (7), and eliminates the seemingly random assignment of 1 or 1. Thegorithmic denition for the in-phase I S and quadrature Q S discrete square-wave basis sets are given by the followinguations:

    I S = sgn(N 12

    nk mod N)

    for

    {k = 0, . . . ,N/2n = 0, . . . ,N 1 (N even), (26)

    Q S = sgn[N 12

    (nk + N

    4

    )mod N

    ]for

    {k = 1, . . . ,N/2n = 0, . . . ,N 1 (N even), (27)

    Q S = 0 for k = 0 and n = 0, . . . ,N 1 (N even). (28)The in-phase I S and quadrature Q S basis sets of the DFSWT are the counterparts of the cosinusoidal and sinusoidal basis

    ts of the DFT respectively. The lowest (nonzero) frequency component of the DFSWT basis sets are the in-phase/quadratureuare waves where only one cycle occurs during the sampling period. Successively, the higher frequency componentscrease linearly up to obtain one wave cycle per two samples. Examples of the in-phase I S and quadrature Q S basis setsmponents of the DFSWT are presented in Fig. 1.Fig. 2 shows the elements for computing algorithmically the in-phase component X(k) of the DFSWT on a set of N = 2m

    screte data. From the gure, it can be seen that (N/2 + 1) accumulations are performed in order to store the partialsults on each interaction during the computation of an N-point DFSWT in-phase component X(k). On each iteration thecoming data will be added to, or subtracted from the stored value (input data 1 or input data 1, respectively).e selection between adding or subtracting is taken by comparing the present output of an associated N-module counterainst a reference value of (N 1)/2.Each sawtooth wave (k) generator (k = 1,2, . . . ,N/2) is started at N/4 for generating the in-phase basis set I S , and 0

    r generating the quadrature basis set Q S . Each generator counts synchronously up to N 1 at different rates dependingthe basis set square-wave to be obtained. If the output value of the sawtooth wave is less than the reference value 1)/2, an addition is carried out; on the contrary a subtraction takes place. This excludes the computation of the mean

    (0), which just adds up the discrete data z(n) received on each iteration.

  • 1034 E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044

    corere

    ar

    3.

    I SraThquFig. 1. In-phase and quadrature spectral components of the DFSWT for an 8-data sampling period.

    Fig. 2. Flow diagram of the algorithmic computation of the in-phase component in an N-point DFSWT.

    The norm for selecting an addition or subtraction during the computation of the in-phase X(k) and quadrature Y (k)mponents of an 8-point DFSWT is illustrated graphically in Fig. 3. Here if the value of the sawtooth wave is below theference ((N 1)/2 = 3.5), the input data must be added to the stored value; on the other hand, if the count is above theference value the input data must be subtracted from the stored value.The elements involved in the algorithmic computation of the DFSWT allow the proposal of a highly parallel hardware

    chitecture using basic functional blocks, present in any digital system design as registers and adders.

    DFSWT hardware processing unit implementation

    The implementation of the hardware processing unit of the DFSWT requires 4 main elements, as shown in Fig. 4: theand Q S matrices whose rows are made up of square waves toggling between the logic values of 1 and 1 at differenttes according to the frequency they represent, and shifted 90 degrees from its corresponding in the opposite matrix.e accumulator banks AI (0) AI (N/2) and AQ (1) AQ (N/2 1) storing the partial results of the in-phase X(k) andadrature Y (k) components during an N-point DFSWT computation.

  • E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044 1035

    FiDF

    inS/ofthopanarbe

    Nofou(a)

    (b)

    g. 3. Operation toggling for selecting an addition or subtraction during (a) in-phase (I S ) and (b) quadrature (Q S ) component computations of an 8-pointSWT.

    Each accumulator A(k) in the accumulator banks AI (0) AI (N/2) and AQ (1) AQ (N/2 1) adds or subtracts theput data z(n) from its stored value according to the input signal S/A as shown in Fig. 4. Each subtraction/addition signalA corresponds to one row (square wave) in the in-phase I S and quadrature Q S basis sets. At the same time, each elementthe row (each column of the matrix) represents the value of the square wave in t = n. As previously suggested, duringe computation of an N = 2m-point DFSWT, the S/A signals (the I S and Q S rows) controlling the subtraction/additioneration on each accumulator can be implemented by counters (see Figs. 2 and 3). Consequently, there are necessary 2m1d 2m2 modulo-N counters to obtain each element of I S and Q S , respectively. The counters implementing the rows of I Se initialized to the value 2m2, and the counters implementing the rows of Q S are initialized to zero. Each counter willincreased at different rates according to the index of the row they represent.Fig. 5 shows the implementation of the Q S rows for k = 2i , (i = 0,1, . . . , log2(N)1) by a counter with a count step = 1.

    ote that this is a special case counter since it provides a control signal for system synchronization and reset (output Q 6the counter), and since Q S = I S for k = N/2, the highest frequency square wave in Q S and I S can be obtained from thetput Q 0 of the counter in Fig. 5. The same method is used for obtaining the remaining Q S rows, but different from the

  • 1036 E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044

    cowanou

    oucocoimimea

    afre

    acFPsuthFig. 4. Block diagram of the DFSWT hardware processing unit.

    Fig. 5. Counter implementing the synchronization and reset signal, and the Q S rows for k = 2i , (i = 0,1, . . . , log2(N) 1) and N = 64.

    unter in Fig. 5, not all of the counter outputs implementing the remaining Q S rows will be used. For instance, a counterith a count step = 3 implements the Q S rows of k = 3 2i (i = 0,1, . . .; while k < 32) utilizing its outputs Q 5, Q 4, Q 3,d Q 2, a counter with a count step = 5 implements the Q S rows of k = 5 2i (i = 0,1, . . .; while k < 32) utilizing itstputs Q 5, Q 4, and Q 3.On the other hand, the counters implementing the I S rows, similarly to those implementing the Q S rows, require antput width of log2(N) 1 bits, but due to the asymmetry among the square waves implementing the I S rows, eachunter is started at N/4 and just the most signicant bit (MSB) of its output is used as shown in Fig. 6. For instance, aunter with a count step = 1 implements the I S row for k = 1 utilizing its output Q 5, a counter with a count step = 2plements the I S row for k = 2 utilizing its output Q 5, in general a counter with a count step = k (k = 1,2, . . . ,N/2 1)plements the I S row for the specied k utilizing its MSB output Q 5. Hence, N/2 1 counters are required to generatech element of I S .The frequency f f (i) represented by each counter output Q i generating the elements of I S and Q S can be obtained as

    function of the number of points N in the transformation, the counter step, the counter-output index, and the samplingquency f s as given in the equation

    f f (i) =(2index step

    N

    ) f s. (29)

    As described in this section, the DFSWT hardware processing unit consists fundamentally of a group of counters andcumulators organized in a highly parallel architecture, thus it ts well in the spatial computation engine contained inGA devices. The proposed architecture is divided into two sections, which just require a series of selective additions andbtractions to compute the in-phase X(k) and quadrature Y (k) components. Because its straightforward implementation,e eciency of the DFSWT hardware processing unit is guaranteed.

  • E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044 1037

    4.

    ininfrofan

    4.

    (3haXFig. 6. Counter implementing I S rows for k = 2i (i = 0,1, . . . , log2(N) 1) and N = 64.

    Fig. 7. Experiment setup for cases of study.

    Experiment setup

    This section describes the experiment setup to test the effectiveness of the proposed DFSWT hardware processing unitestimating the frequency of the main periodic component in two cases of study. In both cases, a time series is streamedto the DFSWT hardware processing unit implemented in a Xilinx Virtex-II device on the V2MB1000 development boardom Memec [21] using Simulink and Xilinx System Generator. Fig. 7 shows the experiment setup for analyzing both casesstudy. In this gure, the data acquisition system applies only for the case of study 2 since in the case of study 1 thealyzed noisy signal was emulated by software.

    1. Case of study 1: monochromatic signal

    The rst case of study considers a monochromatic signal with additive poissonian noise that is obtained by implementing0) in software, utilizing MatLab in the PC of Fig. 7. The resulting time series is stored in a MAT le and sent to the DFSWTrdware processing unit implemented in the Xilinx Virtex II device on the V2MB1000 development board, through theilinx USB platform cable and the JTAG interface using Simulink and Xilinx System Generator. The in-phase X(k) and

  • 1038 E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044

    quus

    sim

    an

    4.

    Thmwahaushaits

    5.

    pecomahi

    5.1

    imXiFig. 8. Monochromatic signal with 50% modulation.

    adrature Y (k) components computed by the DFSWT hardware processing unit are retrieved and stored in a new MAT leing the same tools in opposite direction, to compute its corresponding power spectrum later on.

    P

    {b + A

    [1+ cos

    (2c

    N 1)n +

    ]}, n = 0,1,2, . . . , (N 1). (30)

    In (30), P [] is a poissonian random vector, b is a constant background, A is the amplitude of the embedded sinusoidalgnal, c is the number of cycles tted into the whole set of N discrete data with a phase . Fig. 8 shows a 50%-modulatedonochromatic signal with a maximum value of 55 and a minimum value of 8.The monochromatic signal in Fig. 8 is composed of a sinusoid signal with an amplitude of A = 15, a phase = (5/17)d c = 7 cycles tted into the 64-samples on a constant background b = 15.2. Case of study 2: weak square wave

    The second case of study considers a signal obtained from a square wave generator and modulated with white noise.e square wave of period 10 ns and peak-to-peak amplitude App = 3.0 V was obtained from a square wave generator andodulated with white noise. The outcome, shown in Fig. 9, was sampled using a Lecroy LT372 500 MHz 4 Gs Oscilloscopeith a sampling rate of 500 Msamples/s. The result of the sampling process is a discrete data le which is imported intoPC to follow the analysis process employed in the case of study 1 using MatLab. The discrete data are sent to the DFSWTrdware processing unit in the V2MB1000 development board through the Xilinx USB platform cable and the JTAG interfaceing Simulink and Xilinx System Generator. The in-phase X(k) and quadrature Y (k) components computed by the DFSWTrdware processing unit are retrieved and stored in a new MAT le using the same tools in opposite direction, to computecorresponding power spectrum later on.

    Results

    This section presents the results of the DFSWT FPGA implementation, its utilization estimating the frequency of the mainriodic component of two signals with low SNR, and nally its performance comparison against a highly optimized FFTre, obtaining favorable results on each case. The DFSWT hardware processing unit was described in VHDL and imple-ented in a Xilinx Virtex-II device using the Xilinx ISE 7.1i, SP 4 development suite. The results of a 32-point DFSWT and64-point DFSWT FPGA implementation are given in Table 1; Section 5.3 where the performance comparison against theghly optimized FFT core is carried out.

    . Frequency estimation in a monochromatic signal

    The time series information representing the monochromatic signal described in Section 4.1 is processed by the hardwareplementation of the DFSWT, the in-phase inphv and quadrature quphv components are retrieved using Simulink andlinx System Generator once more. Finally, the power spectrum pwsp of the DFSWT is computed using the equation[

    pwsp(n)]2 = [inphv(n + 1)]2 + [quphv(n)]2, n = 1,2,3, . . . ,32. (31)

  • E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044 1039

    spnmth64

    pr

    reSewot

    5.

    setipr

    FF

    13mFFpr

    5.

    (RwDde

    cocoFig. 9. Sampled noisy signal from a pulse generator.

    The accumulated magnitudes on each element of inphv and quphv are presented graphically in Figs. 10a and 10b, re-ectively. In both gures, the sampling frequency Fs = 1/T S is a normalized frequency. In Fig. 10a, the element at position= 1 represents the magnitude of the dataset mean producing a right shifting of the remaining elements of inphv (i.e., theagnitude at position n corresponds to n 1 periods tted into the whole 64-data set). In Fig. 10b, there is no shifting one elements of quphv; therefore, the magnitude at position n corresponds to the actual number of periods tted into the-data set.Figs. 11a and 11b show, respectively, the power spectrum resulting from (31) by using the results of the DFSWT hardwareocessing unit and those of applying the FFT on the monochromatic signal in Fig. 8.Since in Fig. 11a the component at position 7 has the largest magnitude with a considerable difference respect the

    maining components, it can be concluded that the main periodic component of the monochromatic signal described inction 4.1 has an approximated periodicity of (64 T S )/7 s. Comparing this result against that one of the FFT in Fig. 11bhere F = (7/64) f s , it can be seen that the frequency of the main periodic component detected by both cases match eachher.

    2. Frequency estimation of a weak square wave

    The results of the in-phase and quadrature components from the DFSWT hardware processing unit analyzing the timeries information representing the weak square wave described in Section 4.2, are presented in Figs. 12a and 12b, respec-vely. Similar to Fig. 10, the element at position n = 1 in Fig. 12a represents the magnitude of the dataset mean, whichoduces a right shifting of the remaining elements of inphv; in Fig. 12b there is no shifting on the elements of quphv.The power spectrum obtained from (31) by using the results of the DFSWT hardware processing unit and those of theT applied to the weak square wave in Fig. 9, are presented in Figs. 13a and 13b, respectively.From Fig. 13, the DFSWT hardware processing unit, as well as the FFT, detects a main periodic component at positionof its spectrum. In Fig. 13a, the sampling period Ts of the DFSWT power spectrum is 2 ns, hence the frequency of the

    ain periodic component of the weak square wave in Fig. 9 is estimated at F = ( 1364 ) ( 12109 ) = 101.5 MHz. Similarly, theT power spectrum (Fig. 13b) estimates the main periodic component at F = (13/64) fs , where f s = 500 MHz. As in theevious example, the estimated frequencies match in both cases.

    3. Performance comparison

    The performance analysis of the DFSWT shows that a 64-point DFSWT computation implemented using MatLab 7.2.0.2322006a) in a 2.4 GHz Intel Pentium 4 processor takes around 1.4805 103 s, while the DFSWT hardware processing unit,ith an operating speed of 100 MHz, carries out the same computation in just 0.64 106 s. These results show that theFSWT hardware processing unit is 3 orders of magnitude faster than its software implementation for the case of analysisscribed here.To get a hardware performance indicator of the DFSWT FPGA implementation, a comparison against the highly optimized,mmercially available Xilinx FFT v3.2 core [22] was carried out. The Xilinx FFT v3.2 core was chosen because its highngurability allows generating an FFT hardware processing unit with similar operational characteristics to those of the

  • 1040 E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044

    DFpoainhainusop

    thasduXi(a)

    (b)

    Fig. 10. DFSWT (a) in-phase and (b) quadrature components of the modulated monochromatic signal from Fig. 8.

    SWT hardware processing unit for a fairer performance comparison. Table 2 gives the implementation gures for a 32-int and a 64-point DFSWT hardware processing unit, as well as for a 32-point and a 64-point Xilinx FFT v3.2 core inVirtex-II device in a pipelined architecture, and compares their performance in terms of processing speed. Additionally,order to emulate the serialized feeding provided by the Xilinx FFT v3.2 core, the implementation gures for a DFSWTrdware processing unit with a independent parallel-to-serial converter connected to its output (DFSWT_so) are includedthis table too as well as its corresponding processing speed. The reported processing speed in Table 2 was computeding (32), where Ps represents the processing speed of a hardware processing unit (in samples per second, SPS) thaterates at a maximal frequency ( fmax), and has a latency of L clock cycles for computing an N-point transform.

    Ps = NLfmax. (32)

    Table 2 shows that a rapid increment in the number of output ports derived from an increment of the input-width ande highly parallel architecture of the DFSWT hardware processing unit, may limit the FPGA implementation of the algorithma stand alone system. On the other hand, the variation of the input-width barely affects the programmable logic usedring its hardware implementation. The performance comparison of the DFSWT hardware processing unit against thelinx FFT v3.2 core brings out the following results:

  • E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044 1041(a)

    (b)

    Fig. 11. Power spectrum of the (a) DFSWT and (b) FFT applied to the monochromatic signal with 50% modulation from Fig. 8.

    Although the FFT and DFSWT algorithms require N log2(N) and N N operations, respectively. The Xilinx FFT archi-tecture in its lowest latency requires 192 clock cycles to perform a 64-point Transform, which is more than twice thenumber of clock cycles required by the 64-point DFSWT hardware processing unit (64 clock cycles).

    The 64-point DFSWT hardware processing unit can reach operation frequencies up to 251 MHz while the 64-point XilinxFFT core reaches operation frequencies up to 229 MHz. Considering the maximum operation frequencies for both coresand their lowest latencies, the 64-point DFSWT hardware processing unit can be 3.3 times faster than the 64-pointXilinx FFT v3.2 core.

    The footprint of the 64-point DFSWT hardware processing unit is 654 slices while the 64-point Xilinx FFT core uses1014 slices and 6, 18 18-bit embedded multipliers, each of these multipliers is equivalent to 182 slices, making ita total of 2106 slices. Therefore, the 64-point DFSWT hardware processing unit uses 69% less area than the 64-pointXilinx FFT v3.2 core.

    Finally, the implementation gures of the serial-output DFSWT hardware processing unit present a disadvantage regard-ing maximum operation frequency against those of the Xilinx FFT v3.2 core, this drawback results from including theindependent parallel-to-serial converter at the output of the DFSWT hardware processing unit; nevertheless, regardlessits lower operating frequency, the DFSWT_so processes twice the information processed by the Xilinx FFT v3.2 core dueto the long latency of the latter.

  • 1042 E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044

    6.

    miscoabytoco

    asmrano(a)

    (b)

    Fig. 12. DFSWT (a) in-phase and (b) quadrature components from the weak square wave in Fig. 9.

    Conclusions

    This paper presents a novel algorithm, the DFSWT, and its hardware processing unit to estimate the frequency of theain periodic component in a time series using just additions and subtractions. The obtained results show that the DFSWTvery ecient in carrying out this task for low SNR signals using just accumulative additions and subtractions. Because nomplex computations like multiplication or trigonometric functions are required, the proposed hardware processing unit isfast and ecient computation engine with a highly parallel architecture and low power consumption, which outperforms3 orders of magnitude its corresponding software implementation. Although computationally the DFSWT is not meantreplace the FFT, its usage could be advantageous in cases where just the frequency estimation of the main periodicmponent in a time series is required.The DFSWT improves the square wave transform proposed by Haweel and Alhasan by eliminating the seemingly random

    signment of 1 and 1 induced by the Haweel and Alhasan square-wave basis set generation during its hardware imple-entation. The DFSWT has a more intuitive spectrum than the Walsh transform since its spectrum is a function of frequencyther than sequency. Furthermore, the DFSWT is simpler to compute than the FFT since it requires neither trigonometryr complex arithmetic.

  • E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044 1043

    TaFP

    Co

    DF

    DF

    Xi

    thonApe(a)

    (b)

    Fig. 13. Power spectrum of (a) the DFSWT and (b) the FFT applied to the weak square wave in Fig. 9.

    ble 2GA implementation gures for the hardware processing units treated in this paper

    re Points N Data width Used resources Latency L fmax (MHz) Ps MSPS

    In Out Slices Ports 18 18 MultSWT 32 8 13 310 426 0 32 265 265.0

    64 8 14 654 906 0 64 251 251.0SWT-so 32 8 13 542 63 0 40 188 150.4

    64 8 14 1144 67 0 80 194 155.2linx FFT 32 8 14 770 63 6 122 233 61.1

    64 8 15 1014 67 6 192 229 76.3

    The performance comparison results against the highly optimized, commercially available Xilinx FFT v3.2 core shows thate DFSWT hardware processing unit processes 175 MSPS more than the Xilinx FFT core with a shorter latency of less thane-third. Regarding to area utilization, the DFSWT hardware processing unit uses 69% less area that the Xilinx FFT core.ll these make the DFSWT hardware processing unit highly suitable for applications where just the frequency of the mainriodic component of a discrete signal with low SNR has to be estimated.

  • 1044 E. Cabal-Yepez et al. / Digital Signal Processing 18 (2008) 10291044

    Acknowledgment

    This project was partially sponsored by the national council for science and technology of Mexico (CONACyT) throughthe scholarship 144966.

    References

    [1] T.W. Martin, W.G. Johnston, Disturbance accommodation control with fast Fourier transform frequency detection, in: IEEE Symp. Circuits Syst., 1997,pp. 509513.

    [[[

    [

    [[[

    [[1

    [1

    [1[1[1[1

    [1[1

    [1[1[2[2[2

    Unw

    inRepa

    Melthdi

    deEn2] M.H. Sheu, H.E. Liao, S.S. Yang, A new VLSI design for adaptive frequency-detection based on the active oscillator, IEEE AP-ASIC (2000) 123126.3] A. Homs-Corbera, R. Jan, J.A. Fiz, J. Morera, Algorithm for time-frequency detection and analysis of wheezes, in: IEEE EMBS Conf., 2000, pp. 29772980.4] M.P. Gough, A.M. Buckley, T.D. Carozzi, S.I. Klimov, V.E. Korepanov, N. Huber, G. Seferiadis, M. Pouchet, E. Chambers, Electron Correlation in Space

    Plasmas, URSI GA, New Delhi, 2005.5] T.D. Carozzi, A.M. Buckley, M.P. Gough, E. Chambers, Detection of Weak Plasma Oscillations Using the Electron Autocorrelator on Cluster, URSI GA, New

    Delhi, 2005.6] T. Ganesan, Complex sinusoid frequency estimation at very low SNR, IEEE ICASSP (1995) 17721775.7] X. Yang, X.W. Cui, M.Q. Lu, Z.M. Feng, Carrier recovery using FFT and Kalman lter, IEEE ISPA (2003) 10941096.8] G. Park, D. Heo, D. Hong, C. Kang, A new maximum Doppler frequency estimation algorithm in frequency domain, IEEE Trans. Consum. Electron. (2005)

    442448.9] L.C. Palmer, Coarse frequency estimation using the discrete Fourier transform, IEEE Trans. Inform. Theory (1974) 104109.0] H. Kanai, N. Chubachi, H. Suzuki, A method to evaluate accuracy of FFT-based periodicity analysis for short length signal in low SNR, IEEE ICASSP

    (1992) 4548.1] P.M. Calvo, J.F. Sevillano, I. Velez, A. Irizar, Enhanced implementation of blind carrier frequency estimators for QPSK satellite receivers at low SNR, IEEE

    Trans. Consum. Electron. (2005) 442448.2] N. Ahmed, K.R. Rao, A.L. Abdussattar, BIFORE or Hadamard transform, IEEE Trans. Audio Electroacoust. 19 (3) (1971) 225234.3] K.G. Beauchamp, Walsh Functions and Their Applications, Academic Press, New York, 1975.4] K.G. Beauchamp, Application of Walsh and Related Functions, Academic Press, New York, 1984.5] S. Sukhsawas, K. Benkrid, A high-level implementation of a high performance pipeline FFT on Virtex-E FPGA, in: Annual Symposium on VLSI, IEEE

    Computer Society, 2004, pp. 229232.6] J.A. Vite-Frias, R. de J. Romero-Troncoso, A. Ordaz-Moreno, VHDL core for 1024-point Radix-4 FFT computation, in: IEEE ReConFig, 2005.7] M. Benhamid, M. Othman, FPGA implementation of a canonical signed digit multiplier-less based FFT processor for wireless communication applica-

    tions, IEEE International Conference on Semiconductor Electronics, ICSE2006, 2006, pp. 641645.8] H.L. Lin, H. Lin, Y.C. Chen, R.C. Chang, A novel pipelined fast Fourier transform architecture for double rate OFDM systems, IEEE SIPS (2004) 711.9] T.I. Haweel, A.M. Alhasan, A simplied square wave transform for signal processing, Contempor. Math. (1995) 265271.0] J.G. Proakis, D.G. Manolakis, Digital Signal Processing Principles, Algorithms, and Applications, PrenticeHall, New Jersey, 1996.1] Memec Design, Virtex-II V2MB1000 Development Board Users Guide, Version 3.0, December 2002.2] Xilinx Inc. Fast Fourier Transform v3.2, Xilinx LogiCore, DS260, August 31, 2005.

    E. Cabal-Yepez received his Ph.D. degree from the University of Sussex in the United Kingdom, his B.E. and M.E. degrees from FIMEE-iversidad de Guanajuato, where he currently is working as a full-time professor and doing research work at the HSP Digital Group,hich is focused on hardware signal processing on FPGA for applications in mechatronics. He can be contacted at [email protected].

    T.D. Carozzi is a research fellow at the Astronomy and Astrophysics Group of the University of Glasgow. He received his M.Sc. degreeengineering physics from Uppsala University, Sweden, and his Ph.D. degree from the Swedish Institute of Space Physics. He co-foundedd Snake Radio Technology AB, Stockholm, Sweden, where he was a senior researcher and with which he holds several internationaltents. He can be contacted at [email protected].

    R. de J. Romero-Troncoso received his Ph.D. degree with honours from the Universidad Autonoma de Queretaro, and his B.E. and.Sc. degrees from FIMEE-Universidad de Guanajuato, where he currently is full-time professor, coordinator of master degree program inectrical engineering, and leader of the HSP Digital Group. He had supervised over 140 dissertation works, published 2 books and morean 60 journals and conference papers. He received the technologic innovation award ADIAT 2004 and the ReConFig 2005 award forgital systems groups. He can be contacted at [email protected].

    M. Paul Gough is Professor of space science at the University of Sussex. He received his B.Sc. degree with honours in 1967, M.Sc.gree from the University of Leicester, and Ph.D. degree from the University of Southampton. He is a fellow of the Institute of Electricalgineers, fellow of the Royal Astronomical Society, and a Chartered Engineer. He can be contacted at [email protected].

    N. Huber received his Ph.D. degree from the University of Sussex. He can be contacted at [email protected].