introduction to fft processors

51
Introduction to FFT Processors Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao-Tung University

Upload: others

Post on 19-Mar-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to FFT Processors

Introduction to FFT Processors

Chih-Wei LiuVLSI Signal Processing LabDepartment of Electronics EngineeringNational Chiao-Tung University

Page 2: Introduction to FFT Processors

FFT DesignFFT

• Consists of a series of complex additions and complex multiplications

Algorithm• Cooley-Tukey decomposition for power of two

length FFT

Architecture• Systematic mapping procedure

Page 3: Introduction to FFT Processors

Algorithm LevelCooley-Tukey decomposition

Radix-2, decimation-in-frequency

12/

02/2/

1

0

)12(12

12/

02/2/

1

0

22

)(

)(

N

n

nkN

nNNnn

N

n

knNnk

N

n

nkNNnn

N

n

knNnk

WWxxWxA

WxxWxA

Variants based on CT algorithmFixed radix: Radix-2, Radix-4, Radix-8, Radix-22

Mixed radix: Split-radix, Radix-2/8, Radix-2/4/8Number of addition

• Same for any mixed-radix or fixed-radix algorithm.Number of multiplication

• Depends on the reduction of trivial multiplications.

WNn

A2k+1

A2k

xn+N/2

xn

-1

Hence, increase additions

Page 4: Introduction to FFT Processors

FFT AlgorithmsReview of Radix-2r algorithm

DIF(decimation in frequency) and DIT(decimation in time) versionRadix-2 algorithmRadix-4 and Radix-22 algorithmRadix-8 and Radix-23 algorithmSplit-radix 2/4 and Split-radix 2/8

Page 5: Introduction to FFT Processors

FFT Algorithms

1

0

1

0

2

)()()(N

n

knN

N

n

knN

jWnxenxkX

.1,...1,0, Nk

4/3NNW

8/7 NNW

0NW

8/NNW

4/NNW

8/3NNW

2/NNW

8/5NNW

)]()[(22*)(

)]()[(22*)(

)1(22

)1(22

1

38

18

8/78/3

8/58/

4/34/

2/0

abjabWjba

abjbaWjba

jWW

jWW

jWW

WW

NN

NN

NN

NN

NN

NN

NNN

DFT

Page 6: Introduction to FFT Processors

FFT AlgorithmsRadix-2 Algorithm

DIF Radix-2 Algorithm

12/

0 2

12/

0 2

)]2/()([)12(

)]2/()([)2(

N

n

nkN

nNl

N

n

nkNl

l

l

WWNnxnxkX

WNnxnxkX

.12/,,1,0 Nkl

Butterfly of Radix-2 Algorithm

DIF Form

Page 7: Introduction to FFT Processors

FFT AlgorithmsRadix-4 Algorithm

Radix-22 Algorithm

14

N

0k

nk

4N

nlN

l34

l24

l41

1WWW4N3nxW

2NnxW

4Nnxnxlk4X ])()()()([)(

112112`1

112121212

4

)2(14/

0

14

0 4

)2(364

244

24

121

)]}4/3()1()4/([)()1()]2/()1()({[

])4/3()2/()4/()([

)24(

nkN

llnN

lllN

n

l

N

k

nkN

llnN

llllll

WWNnxNnxjNnxnx

WWWNnxWNnxWNnxnx

llkX

;3,2,1,0l

;1,0, 21 ll

;14/~01 Nk

.14/~01 Nk

Page 8: Introduction to FFT Processors

FFT Algorithms

Butterfly of Radix-4 Algorithm

x(n)

x(n+N/4)

x(n+N/2)

x(n+3N/4)

a(n)

a(n+N/4)

a(n+N/2)

a(n+3N/4)

WN

0 n

WN

2 n

WN

1 n

WN

3 n

l = 0

l = 1

l = 2

l = 3

(Data Ordering: Digit Reversed)

Page 9: Introduction to FFT Processors

k 1k 0

0

1

2

3

0123012301230123

X(0)

x( , )k1 k0

X( , )k0 k1

X(4)X(8)X(12)X(1)X(5)X(9)X(13)X(2)X(6)X(10)X(14)X(3)X(7)X(11)X(15)

FFT Algorithms

Data Ordering of Radix-4 (N=16)

00 0001 0010 0011 00

00 0000 0100 1000 11…

……

……

……

..

……

……

……

….

0k 1k

Digit-reversed ordering

Page 10: Introduction to FFT Processors

x(n)

x(n+N/4)

x(n+N/2)

x(n+3N/4)

a(n)

a(n+N/4)

a(n+N/2)

a(n+3N/4)

l =01l =02

l =01

l =11

l =11

l =12

l =02

l =12

WN

0 n

WN

2 n

WN

1 n

WN

3 n

W4

1

FFT Algorithms

Butterfly of radix-22 Algorithm

(Data Ordering: Bit Reversed)

Page 11: Introduction to FFT Processors

FFT Algorithms

0000100001001100…

……

……

……

0000000100100011…

……

……

……

…0

0

0

0

0

0

0

01

1

1

1

1

1

1

1

1

1

1

1

0

0

0

0

0

0

0

1

1

1

)( 0123 kkkkx

3k 2k 1k)( 3210 kkkkX

X(0)

X(8)

X(12)X(4)

0k

X(2)X(10)

8421

X(6)X(14)X(1)X(9)

X(13)X(5)

X(3)X(11)

X(15)X(7)

Data Ordering of Radix- (N=16)22

Bit-reversed ordering

Page 12: Introduction to FFT Processors

nkN

nlN

llll

N

n

lll

m

nkN

nlN

lmN

n

m

N

n

nmNlkN

N

n

nlkN

WWWWNnxWNnxWNnxNnx

WNnxWNnxWNnxnx

WWWmNnx

WmNnxWnxlkX

8/842

44

18/

04

244

7

0 88

18/

0

7

0

18/

0

)8/)(8(1

0

)8(

}])8

7()8

5()8

3()8

([

])8

6()8

4()8

2()({[

])8

([

)8

()()8(

FFT AlgorithmsDIF Radix-8 Algorithm

;7,6,5,4,3,2,1,0l .18/~0 Nk

Page 13: Introduction to FFT Processors

nkN

lllnN

llllll

N

n

llll

nkN

nlN

llll

N

n

lll

WWWNnxWNnxWWNnxWNnx

NnxWNnxWWNnxWnx

WWWWNnxWNnxWNnxNnx

WNnxWNnxWNnxnx

8/)24(2

82422

18/

02422

8/842

44

18/

04

244

123121121

1121

}))]8

7()8

3(())8

5()8

([(

))]8

6()8

2(())8

4()({[(

}])8

7()8

5()8

3()8

([

])8

6()8

4()8

2()({[

FFT AlgorithmsDIF Radix-23 Algorithm

)248( 123 lllkX

;1,0,, 321 lll .18/~0 Nk

Page 14: Introduction to FFT Processors

FFT Algorithms

Butterfly of Radix-8 Algorithm

x ( n + 4 N /8 )

x ( n )

x ( n + N /8 )

x ( n + 2 N /8 )

x ( n + 3 N /8 )

x ( n + 5 N /8 )

x ( n + 6 N /8 )

x ( n + 7 N /8 )

l= 0

l= 1

l= 2

l= 3

l= 4

l= 5

l= 6

l= 7

W N

0 n

W N

1 n

W N

2 n

W N

3 n

W N

4 n

W N

5 n

W N

6 n

W N

7 n

Page 15: Introduction to FFT Processors

l = 01

l = 01

l = 01

l = 01

l = 11

l = 11

l = 11

l = 11

l = 02

l = 12

l = 02

l = 12

l = 02

l = 12

l = 02

l = 12

l = 03

l = 13

l = 03

l = 13

l = 03

l = 13

l = 03

l = 13

W N

0 n

W N

4 n

W N

2 n

W N

6 n

W N

1 n

W N

5 n

W N

3 n

W

x(n)

x(n+N/8)

x(n+2N/8)

x(n+3N/8)

x(n+4N/8)

x(n+5N/8)

x(n+6N/8)

x(n+7N/8)N

7n

W 4

1

W 4

1

W 8

0

W 8

2

W 8

1

W 8

3

FFT Algorithms

Butterfly of Radix-23 Algorithm

Page 16: Introduction to FFT Processors

nkN

nN

N

n

nkN

nN

N

n

nkN

N

n

WWNnxNnxjNnxnxkX

WWNnxNnxjNnxnxkX

WNnxnxkX

4314/

0

414/

0

2

12/

0

)]}4

3()4

([)4

2()({)34(

)]}4

3()4

([)4

2()({)14(

])4

2()([)2(

FFT Algorithms

DIF Split-Radix 2/4 Algorithm

k in X(2k) is from 0 to N/2-1, and in X(4k+1) and X(4k+3) are from 0 to N/4-1

Page 17: Introduction to FFT Processors

FFT Algorithms

Butterfly of Split-Radix 2/4 Algorithm

W 4

1

W N

n

W N

3n

x (n )

x (n+N /4)

x(n+2N /4)

x (n+3N /4)

Page 18: Introduction to FFT Processors

FFT AlgorithmsAdvantage of Radix-2/4 Algorithm

Low Computational ComplexityFlexible as radix-2 algorithmBit reversed output (when normally ordered input)

Page 19: Introduction to FFT Processors

nkN

nlN

llll

N

n

lll

nkN

N

n

WWWWNnxWNnxWNnxNnx

WNnxWNnxWNnxnxlkX

WNnxnxkX

8/842

44

18/

04

244

212/

0

}])8

7()8

5()8

3()8

([

])8

6()8

4()8

2()({[)8(

])4

2()([)2(

FFT Algorithms

DIF Split-Radix 2/8 Algorithm

7,5,3,1l

Page 20: Introduction to FFT Processors

x(n)

x(n+N /8)

x(n+2N /8)

x(n+3N /8)

x(n+4N /8)

x(n+5N /8)

x(n+6N /8)

x(n+7N /8)

-j

-j

W81

W83

FFT Algorithms

Butterfly of Split-Radix 2/8 Algorithm

Page 21: Introduction to FFT Processors

Multiplicative ComplexityTrivial multiplications in FFT

Multiplied by• Radix-2: ±1 removed• Radix-4: ±1 and ±j (partially) removed• Split-radix(2/4): ±1 and ±j removed• Radix-8: ±1, ±j, (1±j)/2 (partially) removed• Radix-2/8: ±1, ±j, (1±j)/2 removed

Page 22: Introduction to FFT Processors

Radix-4 Signal Flow Graph

Page 23: Introduction to FFT Processors

Split-Radix Signal Flow Graph

Page 24: Introduction to FFT Processors

Multiplicative ComplexityN Radix-2 Radix-4 Split-

RadixRadix-8 Const.

MulRadix-2/8

Const. Mul

8 2 3 2 0 2 0 2 16 10 8 8 6 4 4 6 32 34 31 26 20 8 16 14 64 98 76 72 48 32 44 38

128 258 215 186 152 64 120 94

256 642 492 456 376 128 308 214 512 1538 1239 1082 824 384 736 494

1024 3586 2732 2504 2104 768 1724 1126 2048 8194 6487 5690 4792 1536 3976 2494 4096 18434 13996 12744 10168 4096 8964 5494 8192 40962 32087 28218 23992 8192 19952 12046

How to obtain regular SR FFT architecture?

Page 25: Introduction to FFT Processors

Architecture LevelMapping procedure

Systolic array techniques• Operation scheduling, resource sharing

Pipeline architecture• One-dimensional linear array• Delay-feedback vs. Delay-commutator.

Single PE architecture• Shared-memory, Single Processing Element (PE)

Page 26: Introduction to FFT Processors

0

16w4

16w

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Z-8 Z-4

Z-4A

Z-2

Z-2

SW1 SW2Z-1

Z-1

SW3

Stage 1 Stage 2 Stage 3 Stage 4

VerticalProjection

x(0)

x(1)

x(2)

x(3)

x(4)

x(5)

x(6)

x(7)

x(8)

x(9)

x(10)

x(11)

x(12)

x(13)

x(14)

x(15)

X(0)

X(8)

X(4)

X(12)

X(2)

X(10)

X(6)

X(14)

X(1)

X(9)

X(5)

X(13)

X(3)

X(11)

X(7)

X(15)

016w116w216w316w416w516w616w716w

016w216w416w616w

016w216w416w616w

0

16w4

16w

0

16w4

16w

0

16w4

16w

R2MDC Radix-2 Multi-Path Delay Commutator

Delay Commutator orDelay-Switch-Delay

Page 27: Introduction to FFT Processors

1st and 2nd stages in R2MDC (N=16)

)( nx)(1 nx

B

C

F

G

D

E

H

I

Z -8 Z -4

Z -4

A

SW1

Stage 1 Stage 2

7 6 5 4 3 2 1 015 14 13 12 11 10 9 8

3 2 1 0

11 10 9 8

7 6 5 415 14 13 12

(1)(2)

Input pairs : N/2

Stage 1 Stage 2

(3)

Input pairs : N/4

(4)

(5)

(6)

(7)

Input pairs : N/8

(8)

Input pairs : N/16

(9)

(10)

(11)

(12)

(13)

(14)

(15)

Stage 3 Stage 4

0

16w4

16w

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

x(0)

x(1)

x(2)

x(3)

x(4)

x(5)

x(6)

x(7)

x(8)

x(9)

x (10)

x (11)

x (12)

x (13)

x (14)

x (15)

X(0)

X(8)

X(4)

X(12)

X(2)

X(10)

X(6)

X(14)

X(1)

X(9)

X(5)

X(13)

X(3)

X(11)

X(7)

X(15)

016w116w216w316w416w516w616w716w

016w216w416w616w

016w216w416w616w

0

16w

4

16w

0

16w4

16w

016w

4

16w

8 3 2 1

12 11 10 9

7 6 515 14 13

0

49 8 3 2

13 12 11 10

7 615 14

1 0

5 411 10 9 8

15 14 13 12

3 2 1 0

7 6 5 4

Page 28: Introduction to FFT Processors

R4MDC Radix-4 Multi-Path Delay Commutator

0

0

00

0

0

0

0

0

0

00

0

0

00

0

0

0

00

1

2

30

2

4

6

0

3

6

9

x(0)

x(1)

x(2)

x(3)

x(4)

x(5)

x(6)

x(7)

x(8)

x(9)

x (10)

x (11)

x (12)

x (13)

x (14)

x (15)

X(0)

X(4)

X(8)

X(12)

X(1)

X(5)

X(9)

X(13)

X(2)

X(6)

X(10)

X(14)

X(3)

X(7)

X(11)

X(15)

Stage 1 Stage 2

12

8

4

BF4

3

2

1

COMMUTATOR

1

2

3

BF4

Coefficients Coefficients

COMMUTATOR

Stage 1 Stage 2

Inputs

A B

ControlControl

Page 29: Introduction to FFT Processors

RrMDC

Input stage

k th stage

stagesNr

r 1

stagesNr

r 2

Nr1

Computational Element

InputOutputs

Coefficients

1

1

krN

rr

Computational Element

COMMUTATOR

12

krN

rr

1

1kr

Nr

krN

rr 1

krN

rr 2

krN

r1

Outputsfrom

previousstage

Tonextstage

Coefficients

Commutator Control

(a)

(b)

Page 30: Introduction to FFT Processors

Delay Feedback R2SDF

R4SDF

R22SDF

Page 31: Introduction to FFT Processors

R2SDF(N=16) Radix-2 Single-Path Delay Feedback

Page 32: Introduction to FFT Processors

R2SDF (N=16) vs. R4SDF (N=128)

BF4 BF4 BF4 BF4

646464

161616

444

111

Page 33: Introduction to FFT Processors

Buffer Styles of pipeline architecture• R2 delay-commutator: inefficient (50%) MEM

usage. (R2MDC)

• R2 delay-feedback: 100% MEM usage.(R2SDF)

Page 34: Introduction to FFT Processors

single BF_PE radix-2 shared memory architecture

RAM

BF

1

BF_PE

Single PE Architecture

Page 35: Introduction to FFT Processors

Concluding RemarksThe Split-Radix algorithm has less computation complexity, comparing with the fixed Radix algorithm. However, its butterfly operation is irregular (L-shape).The processing speed of pipeline architecture is faster than single-PE architecture. However, the single PE architecture is the most area-efficient, especially for long length FFT/IFFT application.

Page 36: Introduction to FFT Processors

Review Traditional FFT DesignSteps

1. Given N-point FFT spec., choose fixed-radix algorithm2. Design radix-r butterfly, multiplier, etc.3. Cascade logrN stages to compute N point FFT.

Arbitrary radix can be used Base on Cooley-Tukey decomposition for any composite number

Page 37: Introduction to FFT Processors

Problem of Traditional Approach

Cannot drive architecture for mixed-Radix algorithmThe processing speed is no longer the critical issue any more nowadays.The chip area and the power consumption dominate the design quality. Re-configurable FFT/IFFT architecture design is necessary for various applications.

A length-scalable and latency-specified FFT/IFFT core is necessary.

Page 38: Introduction to FFT Processors

Proposed Solution

We implement FFT module by single PE architecture

Radix-rButterfly

Processing ElementReg Reg

Mutiple-portMemory

Pre-fetchbuffer

Page 39: Introduction to FFT Processors

Design IssuePerformance-enough, Chip area, power consumption.Scalable processing element.Limited Storage block(s).Efficient memory address generator.

Page 40: Introduction to FFT Processors

Algorithm Level

We adopt split-radix 2/4 algorithm to realize the FFT module.

kn

N

N

nNnn WXX 2/

12/

02/ )(2kA

kn

Nn

N

N

nNnNnNnn

knN

nN

N

nNnNnNnn

WWXjXXjX

WWXjXXjX

4/3

14/

04/32/4/

4/

14/

04/32/4/

)(

)(

34k

14k

A

A

Page 41: Introduction to FFT Processors

The Kernel of Processing Element

0A

1A

2A

3A

4A

5A

6A

7A

8A

10A

11A

12A

13A

14A

15A

9A

11111111 1

111

1111

11

11

11

11

1

1

1

1

1

1

1

1jjjj

j

j

j

08W1

8W0

8W3

8W

116W

016W

216W3

16W0

16W3

16W6

16W9

16W

jj

0X

1X

2X

3X

4X

5X

6X

7X

8X

10X

11X

12X

13X

14X

15X

9X

Page 42: Introduction to FFT Processors

Folded Butterfly UnitsComparing with Radix-2/Radix-22, it saves half memory access times.

Butterfly unit

Butterfly unit

MuxMux

MuxMux

Feedback path

Page 43: Introduction to FFT Processors

Storage Blocks

We use multiple single-port memory banks to replace the multi-port memory.The concept of conflict-free memory. (Vertex coloring problem)

x(0)

x(1)

x(2)

x(3)

x(4)

x(5)

x(6)

x(7)

0

4

26

3

7

5

1

x(0)

x(3)

x(5)

x(6)

x(1)

x(2)

x(4)

x(7)

Bank0 Bank1

Page 44: Introduction to FFT Processors

Scalable Memory Address GeneratorThere must exist a solution for such vertex coloring problem.The best solution --- The proposed Interleave Rotated Data Allocation (IRDA) algorithm.

RAM-0

Address Switcher (AS)

Address Generator(AG)for 64-length

AT AT AT AT

RAM-1 RAM-2 RAM-3 RAM-0

Rotator

Address Generator(AG)for 16-length

RAM-1 RAM-2 RAM-3

Page 45: Introduction to FFT Processors

The IRDA ConceptA conflict-free memory banks.Simple and length-scalable design.The circular shift rotator.

00 01 02 0307 04 05 0610 11 08 0913 14 15 1219 16 17 1822 23 20 2125 26 27 2428 29 30 3134 35 32 3337 38 39 3640 41 42 4347 44 45 4649 50 51 4852 53 54 5559 56 57 5862 63 60 61

RAM-A RAM-B RAM-C RAM-D

Page 46: Introduction to FFT Processors

Length-Scalable FFT/IFFT Core

Reg

Reg

Reg

Reg

Mux

Mux

Mux

Mux

Radix-2butterfly

processing elementRadix-2butterfly

processing element

Rotator

Mux

Mux

Mux

Mux

Rotator

Reg

Reg

Reg

Reg

Adder Reg

Addressgenerator

RAM-D

RAM-C

RAM-B

RAM-A

Page 47: Introduction to FFT Processors

Further Performance Improvement

Multiple PEs architecture.2 pipeline PEs, for example.

RAM 0 RAM 1 RAM 2 RAM 3

00 02 04 06

14 08 10 12

20 22 16 18

26 28 30 24

RAM 4 RAM 5 RAM 6 RAM 7

01 03 05 07

15 09 11 13

21 23 17 19

27 29 31 25

Group1 Group2

Page 48: Introduction to FFT Processors

The Cached-FFT Algorithm

Page 49: Introduction to FFT Processors

Overview1. Input data are loaded into an N-word main memory.2. C of the N words are loaded into the cache.3. As many butterflies as possible are computed using the data

in the cache.4. Processed data in the cache are flushed to main memory.5. Steps 2-4 are repeated until all N words have been processed

once.6. Steps 2-5 are repeated until the FFT has been completed.

Processor cache Main Memory

Page 50: Introduction to FFT Processors

Result 0x

15x7x11x3x13x5x9x1x14x6x10x2x

12x4x8x

0X

1X

2X

3X

4X

5X

6X

7X

8X

9X

10X

11X

12X

13X

14X

15X

W

W

W

W

W

W

W

W

WW

WW

WW

W

W

W

WWW

WWWW

WWWWWWWW

Page 51: Introduction to FFT Processors

N=64, E=2, Radix-2 Cached-FFT