abstract introduction new recursive dft/idft architecture low computation cycle 1/2: chebyshev...

1
ABSTRACT Introduction NEW Recursive DFT/IDFT architecture Low computation cycle 1/2: Chebyshev polynomial 2/N: Folded architecture High speed Register-splitting and computation-sharing scheme New Recursive Formula For DFT/IDFT Challenges: High-Performance and Area-Save References [1] R. D. J. van Nee and R. Prasad, OFDM for wireless multimedia communications. Norwood, MA: Artch House, 2000. [2] K. Maharatna, E. Grass, and U. Jagdhold, “A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM,” IEEE Trans. of Solid-State Circuits, vol. 39, no. 3, pp. 484-493, Mar. 2004. [3] M. D. Felder, J. C. Mason, and B. L. Evans, “Efficient dual-tone multifrequency detection using the nonuniform discrete Fourier transform,” IEEE Signal Processing Lett. , vol. 5, pp. 160-163, Jul. 1998. [4] G. Goertzel, “An algorithm for the evaluation of finite trigonometric series,” American Math. Monthly, vol. 65, pp. 34-35, Jan. 1958. [5] V. V. Cizek, “Recursive calculation of Fourier transform of discrete signal,” IEEE Int. Conf. Acoustics, Speech, and Signal Processing, May 1982, pp. 28-31. [6] T. E. Curtis and M. J. Curtis, “Recursive implementation of prime radix and composite radix Fourier transforms,” IEE Colloquium on Signal Processing Applications of Finite Field Mathematics , Jun. 1989, pp. 2/1-2/9. [7] L. D. Van and C. C. Yang, "High-speed area-efficient recursive DFT/IDFT architectures," in Proc. IEEE Int. Symp. Circuits Syst., May 2004, vol. 3, pp. 357-360, Vancuover, Canada. [8] C. P. Fan and G. A. Su, “Novel recursive discrete Fourier transform with compact architecture,” IEEE Asia-Pacific Conf. Circuits Syst., Dec. 2004, pp. 1081-1084, Tainan, Taiwan. [9] H. V. Sorensen, D. L. Jones, M. T. Heideman, C S. Burrus, “Real-valued fast Fourier transform algorithms,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 35, pp. 849-863, June 1987. [10] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989. [11] S. A. White, “Applications of distributed arithmetic to digital signal processing: A tutorial review,” IEEE Acoust., Speech, Signal Processing Mag., pp. 4-19, Jul. 1989. [12] J. Hunter and J. V. McCanny, Discrete cosine transform generator for VLSI synthesis,” IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 1998, pp. 2997-3000. [13] L. P. Chau and W. C. Sin, “Recursive algorithm for the discrete cosine transform with general lengths,” Elect. Lett., vol. 30, no. 3, pp. 197-198, Feb. 1994. [14] Z. Wang, G. A. Jullien and W. C. Miller, “Recursive algorithms for the forward and inverse discrete cosine The Proposed folded type DFT/IDFT Architecture Lan-Da Van, Yuan-Chu Yu*, Chun-Ming Huang, and Chin-Teng Lin* National Chip Implementation Center (CIC), No. 1, Prosperity Rd. 1, Science-Based Industrial Park, Hsinchu, Taiwan, R.O.C. Dept. of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu, 300, Taiwan, R.O.C. Cost & Speed Achieve Achieve a high-performance( T T m +2T +2T a ) and area-save (save the Real Multipliers to 6 ) The Proposed recursive Processing Element(PE) Architecture Regularity construct by the N/2 PEs in parallel No intermediate register bank needed Further reduce the computation cycle to N N = (N 2 /2) / (N/2) Processor latency: 64 clock (Computation cycles) Critical Path: T m +2T a Conclusion A new recursive DFT/IDFT architecture based on the hybrid of Chebyshev polynomial register-splitting scheme is proposed. The proposed VLSI algorithms lead to the fewest computation cycle and higher speed than others. The proposed folding recursive architecture with regular organization is certainly amenable to VLSI implementation. Chip Characteristics of the Proposed Recursive PE Proposed two design Core Type: N 2 /2 Folded Architecture: N OFDM-based WLAN application. Comparisons Results In this paper, we propose two low-computation cycle and high-speed recursive discrete DFT/IDFT architectures adopting the hybrid of Chebyshev polynomial and register- splitting scheme. The proposed core-type recursive architecture achieves half computation-cycle reduction as well as less critical period compared with the conventional second-order DFT/IDFT architecture. So as to further reduce the number of computation cycles, based on the new core-type design, we develop the folded recursive DFT/IDFT architecture with the same operating frequency. Moreover, from the derivation results, the operation of DFT/IDFT can be performed with the same structure under different configurations. kn N N n W n x k y 1 0 ] [ ] [ N j N e W / 2 ) 2 ( cos ] 1 [ ] [ ] [ 1 2 / 0 N kn n N x W n x k y N n k N DCT ) 2 ( sin ] 1 [ ] [ ] [ 1 2 / 0 N kn n N x W n x k y N n k N DST ] 1 [ ] [ ] [ n N x W n x n r k N k 1 2 / 0 ) 2 cos( ] [ ] [ N n k DCT N kn n r k y 1 2 / 0 ) ) 1 2 / ( 2 cos( ] 1 2 / [ N n k N n N k n N r k g N k 1 2 / 1 1 2 / 0 1 2 / ) 1 2 cos( ] 1 2 / [ ) ( N n k N N n k n N r k g N k k 2 i n k k i n n i r k g 0 1 cos ) ) 2 cos(( cos ) ) 1 cos(( 2 ) cos( r r r ) ) 2 sin(( cos ) ) 1 sin(( 2 ) sin( r r r i n i n k k k k k n n i r n n i r 0 0 ) ) 1 cos(( ] [ cos cos ] [ 2 2 1 1 cos 2 1 cos ) ( ) , ( z z z z r z k g k k k 2 1 cos 2 1 sin ) ( ) , ( z z z s z k h k k k 2 1 1 cos 2 1 cos ) ( ) , ( z z z z r z n g n n n 2 1 cos 2 1 sin ) ( ) , ( z z z s z n h n n n The Core-Type DFT Architecture Computation Sharing Common multiplier usage for cos θ K Register Splitting Reduce critical path Notations 0 and 1: delay elements With the minimum critical path: 10 With the fewest registers: one-bit left shit the forward pipeline register Com. 00 01 10 11 Period 2 2 T T m +3T +3T a 2 2 T T m +3T +3T a T T m +2T +2T a T T m +2T +2T a # of Reg. 2 2 2 2 2 2 2 2 Opt. No No No No Yes Yes Yes Yes k y k y W n N x W n x k y DST DCT N n n N k N N n kn N 1 2 / 0 ) 1 ( 1 2 / 0 ] 1 [ ] [ ] [ ,where ,where or the DCT, and replace “n” with “N/2 – 1 – n” Take Z- transform, to get DCT/DST transformation of the DFT/IDFT by the similar methodology And apply the Chebyshev polynomials: The Proposed Recursive DFT formula: The Proposed Recursive IDFT formula: The Core-Type IDFT Architecture 0 1 N /2-1 N/2 N /2+1 N-1 0 1 N /2-1 Clock x[n] N/2 N/2+1 N-1 N-1 N-2 N/2 y[k] 0 1 N /2-1 N/2 N/2+1 N-1 N/2 N /2+1 N-1 N/2 N/2+1 N-1 N /2-1 N /2-2 0 0 1 N /2-1 N /2-1 N /2-2 0 x[N-1-n'] x[n'] N-1 0 0 Data Buffer 64 16-bit word length complex data storage Control Unit Clock Gated Control Sequence Controller Parameter Controller 32 PEs32 PEs Pre-processing for S k and r k TWO PEs for DST and DCT DFT Length(N) 64 64 points points Input Word Length 16 16 bits bits Critical Delay Time 11.47 11.47 ns(Over ns(Over 80 MHz) 80 MHz) Active Chip Area 318 318 um x 414 um um x 414 um Process Technology TSMC 0.18 um TSMC 0.18 um CMOS CMOS Recursive PE layout for DFT/IDFT architecture For IEEE 802.11a standard: 3.2 μs Parameters Second Order DFT/IDFT V-Y’s Structure [7] (Core Type) Y-C’s Structure [8] (FFR- DFT) Proposed Work1 (Core Type) Proposed Work2 (Folded Type) # of Computation Cycles for Each y[k] or x[n] N N N N/2 N/2 # of Computation Cycles for N- Point DFT/IDFT N 2 N 2 N 2 N 2 /2 N Critical Period T m +3T a T m +2T a 2T m +5T a T m +2T a T m +2T a # of Real Multipliers 6 4 6 6 3N Simulation Results •Lower round of error due to the fewest computation cycle •Full 12 band signal averages •AWGN Channel ] 1 [ ] [ ] [ n N x W n x n s k N k Assume i n k k k k i n n n i r k g 0 1 cos cos cos 2 ] [ ) ( 1 0 cos ) ) 1 cos(( ] 1 [ 2 cos ] [ 2 i n k k k k k n n i r i r 2 0 1 cos ] 2 [ ] 1 [ cos ] [ i n k k k k k n n i r i r i r k g k g i r i r i i k k k k 2 1 ) ( cos 2 ] 1 [ cos ] [ 1 6 Real Multipliers Critical Period: T m + 2T a 6 Real Multipliers Critical Period: T m + 2T a + PE Re W N -k PE Im InputUnit N/2 N/2 RD FT U nit (1) RD FT U nit (2) RD FT U nit (3) RD FT U nit ( N /2) Clock Start Reset RD FT U nit x[ n ] recursive PE Pre-processing ControlUnit Clock- gated controller Sequence controller Param eter controller r k [ n ] x[ N-1- n ] x[ n ] s k [ n ] x[ n ] x[ N-1- n ] - O u t p u t U n i t M ode ] [ k r n n cos 2 ] [ k s n n cos n sin 1 ] [ n x N n ) 1 ( j 1 z 1 z 1 z 1 z ] [ n r k k cos 2 ] [ n s k k cos k sin 1 k ) 1 ( ] [ k y j 1 z 1 z 1 z 1 z Z -1 Z -1 ] [ n r k k ) 1 ( ] [ k y DCT k cos 2 k cos 0 0 1 z 1 z Z -1 k cos 2 ] [ n s k k ) 1 ( ] [ k y DST k sin 1 z 1 z Receiver Transm itter IFFT (floating point) Channel AW GN FFT (fixed point) Signal x[t]

Upload: cameron-griffin

Post on 16-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ABSTRACT Introduction NEW Recursive DFT/IDFT architecture Low computation cycle  1/2: Chebyshev polynomial  2/N: Folded architecture High speed  Register-splitting

ABSTRACT

Introduction NEW Recursive DFT/IDFT architecture

Low computation cycle 1/2: Chebyshev polynomial 2/N: Folded architecture

High speed Register-splitting and computation-sharing scheme

New Recursive Formula For DFT/IDFT

Challenges: High-Performance and Area-Save

References[1] R. D. J. van Nee and R. Prasad, OFDM for wireless multimedia communications. Norwood, MA: Artch House, 2000.

[2]    K. Maharatna, E. Grass, and U. Jagdhold, “A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM,” IEEE Trans. of Solid-State Circuits, vol. 39, no. 3, pp. 484-493, Mar. 2004.

[3]    M. D. Felder, J. C. Mason, and B. L. Evans, “Efficient dual-tone multifrequency detection using the nonuniform discrete Fourier transform,” IEEE Signal Processing Lett., vol. 5, pp. 160-163, Jul. 1998.

[4]    G. Goertzel, “An algorithm for the evaluation of finite trigonometric series,” American Math. Monthly, vol. 65, pp. 34-35, Jan. 1958.

[5]    V. V. Cizek, “Recursive calculation of Fourier transform of discrete signal,” IEEE Int. Conf. Acoustics, Speech, and Signal Processing, May 1982, pp. 28-31.

[6]    T. E. Curtis and M. J. Curtis, “Recursive implementation of prime radix and composite radix Fourier transforms,” IEE Colloquium on Signal Processing Applications of Finite Field Mathematics, Jun. 1989, pp. 2/1-2/9.

[7]    L. D. Van and C. C. Yang, "High-speed area-efficient recursive DFT/IDFT architectures," in Proc. IEEE Int. Symp. Circuits Syst., May 2004, vol. 3, pp. 357-360, Vancuover, Canada.

[8]    C. P. Fan and G. A. Su, “Novel recursive discrete Fourier transform with compact architecture,” IEEE Asia-Pacific Conf. Circuits Syst., Dec. 2004, pp. 1081-1084, Tainan, Taiwan.

[9]    H. V. Sorensen, D. L. Jones, M. T. Heideman, C S. Burrus, “Real-valued fast Fourier transform algorithms,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 35, pp. 849-863, June 1987.

[10] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989.

[11] S. A. White, “Applications of distributed arithmetic to digital signal processing: A tutorial review,” IEEE Acoust., Speech, Signal Processing Mag., pp. 4-19, Jul. 1989.

[12] J. Hunter and J. V. McCanny, Discrete cosine transform generator for VLSI synthesis,” IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 1998, pp. 2997-3000.

[13] L. P. Chau and W. C. Sin, “Recursive algorithm for the discrete cosine transform with general lengths,” Elect. Lett., vol. 30, no. 3, pp. 197-198, Feb. 1994.

[14] Z. Wang, G. A. Jullien and W. C. Miller, “Recursive algorithms for the forward and inverse discrete cosine transform with arbitrary length,” IEEE Signal Processing Lett., vol. 1, no. 7, pp. 101-102, Jul. 1994.

[15] C. H. Chen, B. D. Liu, J. F. Yang, and J. L. Wang, “Efficient Recursive Structures for forward and inverse discrete cosine transform,” IEEE Trans. Signal Processing, vol. 52, pp. 2665-2669, Sep. 2004.

The Proposed folded type DFT/IDFT Architecture

Lan-Da Van, Yuan-Chu Yu*, Chun-Ming Huang, and Chin-Teng Lin* National Chip Implementation Center (CIC),

No. 1, Prosperity Rd. 1, Science-Based Industrial Park, Hsinchu, Taiwan, R.O.C.Dept. of Electrical and Control Engineering, National Chiao-Tung University,

Hsinchu, 300, Taiwan, R.O.C.

Cost

&

Speed

AchieveAchieve

a high-performance(TTmm+2T+2Taa) and area-save (save the Real Multipliers to 6 ) The Proposed recursive

Processing Element(PE) Architecture

Regularity construct by the N/2 PEs in parallel No intermediate register bank needed Further reduce the computation cycle to N

N = (N2/2) / (N/2)

Processor latency: 64 clock

(Computation cycles) Critical Path: Tm+2Ta

Conclusion A new recursive DFT/IDFT architecture based on the hybrid of Chebyshev polynomial and register-splitting scheme is

proposed. The proposed VLSI algorithms lead to the fewest computation cycle and higher speed than others. The proposed folding recursive architecture with regular organization is certainly amenable to VLSI implementation.

Chip Characteristics of the Proposed Recursive PE

Proposed two designCore Type: N2/2Folded Architecture: N

OFDM-based WLAN application.

Comparisons Results

In this paper, we propose two low-computation cycle and high-speed recursive discrete DFT/IDFT architectures adopting the hybrid of Chebyshev polynomial and register-splitting scheme. The proposed core-type recursive architecture achieves half computation-cycle reduction as well as less critical period compared with the conventional second-order DFT/IDFT architecture. So as to further reduce the number of computation cycles, based on the new core-type design, we develop the folded recursive DFT/IDFT architecture with the same operating frequency. Moreover, from the derivation results, the operation of DFT/IDFT can be performed with the same structure under different configurations.

][krn

ncos2

][ksn

ncos

nsin

1

1z

1z

1z

1z

][nx

N

n)1(

j

knN

N

n

Wnxky

1

0

][][ NjN eW /2

)2

(cos]1[][][12/

0N

knnNxWnxky

N

n

kNDCT

)2

(sin]1[][][12/

0N

knnNxWnxky

N

n

kNDST

]1[][][ nNxWnxnr kNk

12/

0

)2

cos(][][N

nkDCT N

knnrky

12/

0

))12/(2

cos(]12/[N

nk N

nNknNr

kg Nk

12/1

12/

012/ )

12cos(]12/[)(

N

nkN N

nknNrkg

N

kk

2

i

nkki nnirkg

0

1cos ))2cos((cos))1cos((2)cos( rrr

))2sin((cos))1sin((2)sin( rrr

i

n

i

nkkkkk nnirnnir

0 0

))1cos((][coscos][2

21

1

cos21

cos

)(

),(

zz

z

zr

zkg

k

k

k

21cos21

sin

)(

),(

zzzs

zkh

k

k

k

21

1

cos21

cos

)(

),(

zz

z

zr

zng

n

n

n

21cos21

sin

)(

),(

zzzs

znh

n

n

n

The Core-Type DFT Architecture

Computation Sharing Common multiplier usage for cos θK

Register Splitting Reduce critical path Notations 0 and 1: delay elements With the minimum critical path: 10 With the fewest registers: 10 : one-bit left shit : the forward pipeline register

Com. 00 01 10 11

Period 22TTmm+3T+3Taa 22TTmm+3T+3Taa TTmm+2T+2Taa TTmm+2T+2Taa

# of Reg. 22 22 22 22

Opt. NoNo NoNo YesYes YesYes

kykyWnNxWnxky DSTDCT

N

n

nNkN

N

n

knN

12/

0

)1(12/

0

]1[][][

,where

,where

For the DCT, and replace “n” with “N/2 – 1 – n”

Take Z-transform,to get DCT/DST

transformation of the DFT/IDFT by the similar methodology

And apply the Chebyshev polynomials:

The Proposed Recursive DFT formula:

The Proposed Recursive IDFT formula:

The Core-Type IDFT Architecture

0 1 N/2-1 N/2 N/2+1 N-1 0 1 N/2-1

Clock x[n]

N/2 N/2+1 N-1 N-1 N-2 N/2

y[k] 0 1 N/2-1

N/2 N/2+1 N-1

N/2 N/2+1 N-1

N/2 N/2+1 N-1

N/2-1 N/2-2 0 0 1 N/2-1 N/2-1 N/2-2 0x[N-1-n']

x[n'] N-1

0

0

Data Buffer 64 16-bit word

length complex data storage

Control Unit Clock Gated Control Sequence Controller Parameter

Controller 32 PEs32 PEs

Pre-processing for Sk and rk

TWO PEs for DST and DCT

DFT Length(N) 64 64 pointspoints

Input Word Length 16 16 bitsbits

Critical Delay Time 11.47 11.47 ns(Over 80 MHz)ns(Over 80 MHz)

Active Chip Area 318 318 um x 414 umum x 414 um

Process Technology TSMC 0.18 um CMOSTSMC 0.18 um CMOS

Recursive PE layout for DFT/IDFT architectureFor IEEE 802.11a standard: 3.2 μs

Parameters Second Order DFT/IDFT

V-Y’s Structure [7] (Core Type)

Y-C’s Structure [8] (FFR-DFT)

Proposed Work1 (Core Type)

Proposed Work2 (Folded

Type)

# of Computation Cycles for Each y[k] or x[n]

N N N N/2 N/2

# of Computation Cycles for N-Point DFT/IDFT

N2 N2 N2 N2/2 N

Critical Period Tm+3Ta Tm+2Ta 2Tm+5Ta Tm+2Ta Tm+2Ta

# of Real Multipliers 6 4 6 6 3N

Simulation Results•Lower round of error due to the fewest computation cycle•Full 12 band signal averages•AWGN Channel

]1[][][ nNxWnxns kNk

Assume

i

n

kkkki nnnirkg0

1coscoscos2][)(

1

0

cos))1cos((]1[2cos][2i

nkkkkk nnirir

2

0

1cos]2[]1[cos][i

nkkkkk nniririr

kgkgirir iikkkk 21 )(cos2]1[cos][

1

6 Real Multipliers Critical Period: Tm + 2Ta

6 Real Multipliers Critical Period: Tm + 2Ta

+

PERe

WN-k

PEIm

Input Unit

N/2

N/2

RDFT Unit(1)

RDFT Unit(2)

RDFT Unit(3)

RDFT Unit(N/2)

ClockStartReset

RDFT Unit

x[n]

recursive PEPre-processing

Control Unit

Clock-gated

controller

Sequencecontroller

Parametercontroller

rk[n]x[N-1-n]

x[n]

sk[n]

x[n]

x[N-1-n]

-

Output

Unit

Mode

][krn

ncos2

][ksn

ncos

nsin

1

1z

1z

1z

1z

][nx

N

n)1(

j

1z

1z

1z

1z

][nrk

kcos2

][nsk

kcos

ksin

1

1z

1z

1z

1z

k)1(

][kyj

1z

1z

1z

1z

Z-1Z-1

][nrk

k)1(][kyDCT

kcos2

kcos

0

01z

1z

1z

1z

Z-1

kcos2

][nsk

k)1(

][kyDST

ksin1z

1z

1z

1z

ReceiverTransmitter

IFFT(floating point)

Channel AWGN

FFT(fixed point)

Signal x[t]