abstract introduction new recursive dft/idft architecture low computation cycle 1/2: chebyshev...
TRANSCRIPT
ABSTRACT
Introduction NEW Recursive DFT/IDFT architecture
Low computation cycle 1/2: Chebyshev polynomial 2/N: Folded architecture
High speed Register-splitting and computation-sharing scheme
New Recursive Formula For DFT/IDFT
Challenges: High-Performance and Area-Save
References[1] R. D. J. van Nee and R. Prasad, OFDM for wireless multimedia communications. Norwood, MA: Artch House, 2000.
[2] K. Maharatna, E. Grass, and U. Jagdhold, “A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM,” IEEE Trans. of Solid-State Circuits, vol. 39, no. 3, pp. 484-493, Mar. 2004.
[3] M. D. Felder, J. C. Mason, and B. L. Evans, “Efficient dual-tone multifrequency detection using the nonuniform discrete Fourier transform,” IEEE Signal Processing Lett., vol. 5, pp. 160-163, Jul. 1998.
[4] G. Goertzel, “An algorithm for the evaluation of finite trigonometric series,” American Math. Monthly, vol. 65, pp. 34-35, Jan. 1958.
[5] V. V. Cizek, “Recursive calculation of Fourier transform of discrete signal,” IEEE Int. Conf. Acoustics, Speech, and Signal Processing, May 1982, pp. 28-31.
[6] T. E. Curtis and M. J. Curtis, “Recursive implementation of prime radix and composite radix Fourier transforms,” IEE Colloquium on Signal Processing Applications of Finite Field Mathematics, Jun. 1989, pp. 2/1-2/9.
[7] L. D. Van and C. C. Yang, "High-speed area-efficient recursive DFT/IDFT architectures," in Proc. IEEE Int. Symp. Circuits Syst., May 2004, vol. 3, pp. 357-360, Vancuover, Canada.
[8] C. P. Fan and G. A. Su, “Novel recursive discrete Fourier transform with compact architecture,” IEEE Asia-Pacific Conf. Circuits Syst., Dec. 2004, pp. 1081-1084, Tainan, Taiwan.
[9] H. V. Sorensen, D. L. Jones, M. T. Heideman, C S. Burrus, “Real-valued fast Fourier transform algorithms,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 35, pp. 849-863, June 1987.
[10] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989.
[11] S. A. White, “Applications of distributed arithmetic to digital signal processing: A tutorial review,” IEEE Acoust., Speech, Signal Processing Mag., pp. 4-19, Jul. 1989.
[12] J. Hunter and J. V. McCanny, Discrete cosine transform generator for VLSI synthesis,” IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 1998, pp. 2997-3000.
[13] L. P. Chau and W. C. Sin, “Recursive algorithm for the discrete cosine transform with general lengths,” Elect. Lett., vol. 30, no. 3, pp. 197-198, Feb. 1994.
[14] Z. Wang, G. A. Jullien and W. C. Miller, “Recursive algorithms for the forward and inverse discrete cosine transform with arbitrary length,” IEEE Signal Processing Lett., vol. 1, no. 7, pp. 101-102, Jul. 1994.
[15] C. H. Chen, B. D. Liu, J. F. Yang, and J. L. Wang, “Efficient Recursive Structures for forward and inverse discrete cosine transform,” IEEE Trans. Signal Processing, vol. 52, pp. 2665-2669, Sep. 2004.
The Proposed folded type DFT/IDFT Architecture
Lan-Da Van, Yuan-Chu Yu*, Chun-Ming Huang, and Chin-Teng Lin* National Chip Implementation Center (CIC),
No. 1, Prosperity Rd. 1, Science-Based Industrial Park, Hsinchu, Taiwan, R.O.C.Dept. of Electrical and Control Engineering, National Chiao-Tung University,
Hsinchu, 300, Taiwan, R.O.C.
Cost
&
Speed
AchieveAchieve
a high-performance(TTmm+2T+2Taa) and area-save (save the Real Multipliers to 6 ) The Proposed recursive
Processing Element(PE) Architecture
Regularity construct by the N/2 PEs in parallel No intermediate register bank needed Further reduce the computation cycle to N
N = (N2/2) / (N/2)
Processor latency: 64 clock
(Computation cycles) Critical Path: Tm+2Ta
Conclusion A new recursive DFT/IDFT architecture based on the hybrid of Chebyshev polynomial and register-splitting scheme is
proposed. The proposed VLSI algorithms lead to the fewest computation cycle and higher speed than others. The proposed folding recursive architecture with regular organization is certainly amenable to VLSI implementation.
Chip Characteristics of the Proposed Recursive PE
Proposed two designCore Type: N2/2Folded Architecture: N
OFDM-based WLAN application.
Comparisons Results
In this paper, we propose two low-computation cycle and high-speed recursive discrete DFT/IDFT architectures adopting the hybrid of Chebyshev polynomial and register-splitting scheme. The proposed core-type recursive architecture achieves half computation-cycle reduction as well as less critical period compared with the conventional second-order DFT/IDFT architecture. So as to further reduce the number of computation cycles, based on the new core-type design, we develop the folded recursive DFT/IDFT architecture with the same operating frequency. Moreover, from the derivation results, the operation of DFT/IDFT can be performed with the same structure under different configurations.
][krn
ncos2
][ksn
ncos
nsin
1
1z
1z
1z
1z
][nx
N
n)1(
j
knN
N
n
Wnxky
1
0
][][ NjN eW /2
)2
(cos]1[][][12/
0N
knnNxWnxky
N
n
kNDCT
)2
(sin]1[][][12/
0N
knnNxWnxky
N
n
kNDST
]1[][][ nNxWnxnr kNk
12/
0
)2
cos(][][N
nkDCT N
knnrky
12/
0
))12/(2
cos(]12/[N
nk N
nNknNr
kg Nk
12/1
12/
012/ )
12cos(]12/[)(
N
nkN N
nknNrkg
N
kk
2
i
nkki nnirkg
0
1cos ))2cos((cos))1cos((2)cos( rrr
))2sin((cos))1sin((2)sin( rrr
i
n
i
nkkkkk nnirnnir
0 0
))1cos((][coscos][2
21
1
cos21
cos
)(
),(
zz
z
zr
zkg
k
k
k
21cos21
sin
)(
),(
zzzs
zkh
k
k
k
21
1
cos21
cos
)(
),(
zz
z
zr
zng
n
n
n
21cos21
sin
)(
),(
zzzs
znh
n
n
n
The Core-Type DFT Architecture
Computation Sharing Common multiplier usage for cos θK
Register Splitting Reduce critical path Notations 0 and 1: delay elements With the minimum critical path: 10 With the fewest registers: 10 : one-bit left shit : the forward pipeline register
Com. 00 01 10 11
Period 22TTmm+3T+3Taa 22TTmm+3T+3Taa TTmm+2T+2Taa TTmm+2T+2Taa
# of Reg. 22 22 22 22
Opt. NoNo NoNo YesYes YesYes
kykyWnNxWnxky DSTDCT
N
n
nNkN
N
n
knN
12/
0
)1(12/
0
]1[][][
,where
,where
For the DCT, and replace “n” with “N/2 – 1 – n”
Take Z-transform,to get DCT/DST
transformation of the DFT/IDFT by the similar methodology
And apply the Chebyshev polynomials:
The Proposed Recursive DFT formula:
The Proposed Recursive IDFT formula:
The Core-Type IDFT Architecture
0 1 N/2-1 N/2 N/2+1 N-1 0 1 N/2-1
Clock x[n]
N/2 N/2+1 N-1 N-1 N-2 N/2
y[k] 0 1 N/2-1
N/2 N/2+1 N-1
N/2 N/2+1 N-1
N/2 N/2+1 N-1
N/2-1 N/2-2 0 0 1 N/2-1 N/2-1 N/2-2 0x[N-1-n']
x[n'] N-1
0
0
Data Buffer 64 16-bit word
length complex data storage
Control Unit Clock Gated Control Sequence Controller Parameter
Controller 32 PEs32 PEs
Pre-processing for Sk and rk
TWO PEs for DST and DCT
DFT Length(N) 64 64 pointspoints
Input Word Length 16 16 bitsbits
Critical Delay Time 11.47 11.47 ns(Over 80 MHz)ns(Over 80 MHz)
Active Chip Area 318 318 um x 414 umum x 414 um
Process Technology TSMC 0.18 um CMOSTSMC 0.18 um CMOS
Recursive PE layout for DFT/IDFT architectureFor IEEE 802.11a standard: 3.2 μs
Parameters Second Order DFT/IDFT
V-Y’s Structure [7] (Core Type)
Y-C’s Structure [8] (FFR-DFT)
Proposed Work1 (Core Type)
Proposed Work2 (Folded
Type)
# of Computation Cycles for Each y[k] or x[n]
N N N N/2 N/2
# of Computation Cycles for N-Point DFT/IDFT
N2 N2 N2 N2/2 N
Critical Period Tm+3Ta Tm+2Ta 2Tm+5Ta Tm+2Ta Tm+2Ta
# of Real Multipliers 6 4 6 6 3N
Simulation Results•Lower round of error due to the fewest computation cycle•Full 12 band signal averages•AWGN Channel
]1[][][ nNxWnxns kNk
Assume
i
n
kkkki nnnirkg0
1coscoscos2][)(
1
0
cos))1cos((]1[2cos][2i
nkkkkk nnirir
2
0
1cos]2[]1[cos][i
nkkkkk nniririr
kgkgirir iikkkk 21 )(cos2]1[cos][
1
6 Real Multipliers Critical Period: Tm + 2Ta
6 Real Multipliers Critical Period: Tm + 2Ta
+
PERe
WN-k
PEIm
Input Unit
N/2
N/2
RDFT Unit(1)
RDFT Unit(2)
RDFT Unit(3)
RDFT Unit(N/2)
ClockStartReset
RDFT Unit
x[n]
recursive PEPre-processing
Control Unit
Clock-gated
controller
Sequencecontroller
Parametercontroller
rk[n]x[N-1-n]
x[n]
sk[n]
x[n]
x[N-1-n]
-
Output
Unit
Mode
][krn
ncos2
][ksn
ncos
nsin
1
1z
1z
1z
1z
][nx
N
n)1(
j
1z
1z
1z
1z
][nrk
kcos2
][nsk
kcos
ksin
1
1z
1z
1z
1z
k)1(
][kyj
1z
1z
1z
1z
Z-1Z-1
][nrk
k)1(][kyDCT
kcos2
kcos
0
01z
1z
1z
1z
Z-1
kcos2
][nsk
k)1(
][kyDST
ksin1z
1z
1z
1z
ReceiverTransmitter
IFFT(floating point)
Channel AWGN
FFT(fixed point)
Signal x[t]