10/12/20151 this presentation will probably involve audience discussion, which will create action...

37
06/16/22 1 algorithms does not always mean speed A look at DFT / FFT issues Frequency domain version of Lab. 1 FIR operations. M. R. Smith, Electrical and Computer Engineering, University of Calgary, Alberta, Canada [email protected]

Upload: gyles-atkins

Post on 31-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23 1

Customizing DSP algorithms does not always mean speed

A look at DFT / FFT issuesFrequency domain version of Lab. 1 FIR operations.

M. R. Smith, Electrical and Computer Engineering,

University of Calgary, Alberta, [email protected]

Page 2: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 2 / 37

Overview Introduction Industrial Example of DFT/FFT

DFT -- FFT Theory Straight application Proper application “The KNOW-WHEN” application

Future Talks The implications on DSP processor architecture How are actual DSP processors optimized for FFT

operations?

Page 3: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 3 / 37

References Work originally done for

“Beta Monitors”, Calgary Talk first given to AMD FAE Meeting, Santa

Clara Published in

Microprocessors and Microsystems

FFT - fRISCy Fourier Transforms Copy made available on the ENCM515

web-site

Page 4: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 4 / 37

Testing and using DSP Algorithms

Typical testing pattern -- use something simple Simple test of algorithm correctness Time Signal = sum of sinusoids In test, expect, and get, sharp peaks in spectrum

Algorithms used in my research DFT -- Discrete Fourier Transform FFT -- Fast Fourier Transform ARMA -- Autoregressive Moving Average Wavelet

Page 5: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 5 / 37

Testing and using DSP Algorithms

Typical testing pattern Simple test of algorithm correctness Time Signal = sum of sinusoids In test: expect, and get, sharp peaks in spectrum

IN REAL LIFE -- this is not a valid test as following example shows and many people working in the field don’t get the best out of their algorithms because they don’t realize that.

DFT -- Discrete Fourier Transform Implemented directly (Order(N x N) ) operations Implemented by FFT (Order(N x log2N))

Page 6: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 6 / 37

Industrial Example -- Equipment

Page 7: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 7 / 37

Industrial Problem -- Result

Page 8: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 8 / 37

Planned Solution -- Theory Unwanted “noise” on a data set can be

removed if the “noise” has particular frequency characteristics

Improvement is obtained By transforming to the frequency domain, Cutting out (filtering) the unwanted “noise” and then, Inverse transforming to recover the

original data form Actually faster to operate in Frequency domain than

Time domain (You can show algorithms to be equivalent) Frequency domain -- more memory needed

Page 9: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 9 / 37

Planned SolutionVisual Model

Page 10: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 10 / 37

What algorithm could be used

Time domain filtering 40 -- 300 tap FIR – same as in Lab. 1, 2 and 3 N = size of the data (1000+ -- infinite) Complexity Order(N x Tap Length)

1024 * 300 = 300,000 operations

Frequency domain filtering N-sized DFT Complexity

Direct Order(2 * N * N) = 2,000,000 operations FFT Order(2 * (N log N)) = 20,000 operations

Page 11: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 11 / 37

Direct DFT and FFT

Time savings -- Number of complex multiplications compared for DFT and FFT

N DIRECT (DFT) Radix 2 (FFT) %Change4 16 4 400%32 1024 80 1300%128 16384 448 2100%1024 1048576 5120 20488%

Key issue -- How can you handle the memory accesses and operations associated with the complex multiplications of data and Fourier Coefficients? -- Data/Instruction Fetch Conflicts

Page 12: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 12 / 37

Fast DFT algorithm implementation

DFT -- Require Order(N ^ 2) operations

FFT -- Divide and Conquer Principle N pt DFT can be decimated into 2 of N/2 pt DFT plus “some

twiddling on N terms” Then

each N/2 pt DFT becomes 2 * N / 4 DFT “plus twiddling” Then

each N/4 pt DFT becomes 2 * N / 8 DFT etc Order(N x log N) PROVIDED you can handle bit reverse

addressing efficiently. This is a crazy FFT addressing issue that must be handled when you store the data after doing FFT algorithm.

Page 13: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 13 / 37

FFT -- divide and conquer

Ability to do “complex” BUTTERFLY quickly is needed!

Page 14: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 14 / 37

Bit reverse addressing ability -- KEY

INPUT OUTPUT NEEDADDRESS ADDRESS000 000 000100 001 100010 010 010110 011 110101 101 101011 110 011111 111 111

Placing the array into the correct memory locationstakes “time”

Page 15: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 15 / 37

Algorithm -- Different formsx, y == real/imaginary parts of the input one fetched on J-Bus the other on K?wr, wi = precalculated cosine/sine values -- J-Bus and K-bus?m = log2(N) where N is the number of points (power of 2)

n2 = Nfor (k = 0; k < m; k++) { /* Outer loop */ n1 = n2; n2 = n2 / 2; ie = n / n1; ia = 1;

for (j = 0; j < n2; j++) { /* Middle loop */c = wr[ia]; s = wi[ia]; ia += ie;for (i = j; i < N; i += n1) { /* inner loop */

l = i + n2 /* BUTTERFLY offset */xt = x[i] - x[l]; /* Common */yt = y[i] - y[l];x[i] += x[l]; /* Upper */y[i] += y[l];x[l] = c * xt + s * yt; /* Lower */y[l] = c * yt - s * xt;

} }}

Page 16: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 16 / 37

What processors can be used? CISC

Complex instruction set processor Basic and complex functions Control logic requires much real estate Many cycle instructions

DSP Digital signal processing chip Specifically designed for DSP Specialized resources provided Dual cycle instructions (many now one)

RISC Reduced instruction set processor Simple instructions done well Instructions complete in single cycle Intelligent compiler needed

Page 17: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 17 / 37

Real life application of Theory Take 360 data points Pad to 512 with zeros to size of algorithm

Everybody “knows” FFT is faster when you use “power of two” points

Use standard FFT algorithm Zero unwanted “noise” components Use standard inverse FFT Transform “Angle” measurement to

“Volume” Area between hystersis loop is associated

with compressor efficiency

Page 18: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 18 / 37

Frequency domain -- filtering Distortions associated with “edge effects”

mean that frequency domain signal is not clean. Last point and first point of data -- connected in discrete domain

“Cut” will remove more than just “resonance” components

Page 19: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 19 / 37

Time Domain Result Channel resonance -- old problem greatly

reduced New distortions evident at edges of data

Page 20: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 20 / 37

Real Life versus Theory Perfect data

infinitely long perfectly sampled

Actual data Nyquist must be met (sample fast enough to

cover signal and noise characteristics) finite length of the data manipulated

Can be analysed using Fourier Theory by treating as infinitely long signal multiplied by a square window

Page 21: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 21 / 37

Signal Characteristics -- Time/Frequency

MAGNITUDE

Page 22: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 22 / 37

Windowing -- implied and deliberate

Windowing the data in the “TIME” domain spreads the “SPECTRUM”

MAIN LOBE -- width of main lobe determines resolutions, or how close two similar sized peaks can be placed but yet be separated

SIDE LOBES -- height of side lobes determine how close a small peak can be placed to a large peak and be believed as being a “true peak” and not being a “false” peak (side lobe)

Choose a window with the narrowest main lobe and smallest side lobe

MRI, seismic, telecommunicationsall have similar problems

This form of data distortion often missed by naive users

KEY REFERENCE -- HARRIS -- Proc. IEEE 666, p51, 1978

Page 23: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 23 / 37

Windowing occurs -- when? ALL DATA ANYBODY GATHERS is always windowed

NO EXCEPTIONS -- finite length in either time or frequency domain

DFT (and many other algorithms) treat data AS CYCLIC No problems if CYCLIC model results in continuous data

across the cycles (Nth order continuity is needed – amplitude continuous, slope continuous, 2nd derivative continuous )

Discontinuities in data cause BIG problems in frequency domain -- in particular padding with zeros in order to use any DFT algorithm

Some diseases in magnetic resonance imaging (MRI) are

mimicked by discontinuity artifacts

Page 24: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 24 / 37

How to fix? Chose a better window Naturally window

Take data in a way that the data goes more smoothly to zero at ends so that meet Nth order continuity requirements

Synchronously sample Very special case -- and possible for this

data set Use a different DSP algorithm approach

Not always stable -- MA, AR, ARMA, Burg, wavelet etc.

Page 25: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 25 / 37

Windows

W(m) = a0 + a1 cos (2 PI m / N) + a2 cos (4 PI m / N )(0 <= m < N)

BEWARE -N/2 <= m < N/2 -- flips sign of a11) Normal (Rect. window) a0 = 1, a1 =0, a2 = 0 Rectangular window in time becomes sinc function (with side lobes) in frequency2) Simple (Rect + 2 sinusoids) a0 = 0.54, a1 = -0.46, a2 = 0; becomes rectangular sinc function + two shifted sinc functions. Adjust position and amplitude to compensate for errors3) Blackman-Harris 3 term -- optimized

a0 = 0.44959, a1 = -0.49364, a2 = 0.05677

1 2 3

Page 26: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 26 / 37

Windowing -- 2 cycles Remember to “window” NOT cut out

the channel resonance in Frequency Domain too!

Page 27: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 27 / 37

Natural Window in time domain

1. Rearrange the way you sample so that data “naturally goes to same DC level” near ends

2. Remove DC offset then pad with zeros

Resolution between peaks in the frequency domain is function of data length.

This example uses 2.5 cycles of the original data sequence

Page 28: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 28 / 37

Naturally window -- Match ends at “DC”

Not always possible with “real data”Advantage -- no data distortion occurring

when window gets applied. Actually does occur, butis hidden -- see later

Page 29: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 29 / 37

Naturally windowed – frequency domain

Page 30: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 30 / 37

Naturally windowed – time domain

Page 31: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 31 / 37

Synchronously Sample the Data

As an engineer, you have to be able to reach back into your “ENCM and ENEL theory” and recognize when this sort of thing is possible and correct!

Not a solution for most data sets There must be a “TRUE”, exact, cyclic property

present in the original data set. Algorithm must be applied “exactly correctly” Windowing is still there! All the windowing distortions are still present --

BUT!!!!!!

Page 32: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 32 / 37

Synchronously Sample -- Time/Frequency

SAMPLED AT “ZEROS”IN WINDOW’S SPECTRUM

Have an “exact”number of cyclesin the window

Page 33: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 33 / 37

Synchronously Sample – Frequency domain

Page 34: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 34 / 37

Synchronously Sample – Time domain

Page 35: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 35 / 37

Synchronously Sample

Not possible for most situations There is a “TRUE” cyclic property present in data Don’t Pad with zeros -- use 720 pt DFT This industrial example

360 points round the cycle Would a specialized FFT algorithm improve things? (2

x 2 x 3 x 3 x 2 * 5) – speed much improved Implemented directly using a specialized 720 point DFT Customer satisfied with integer implementation on Z80 There are custom versions of FFT available for

TigerSHARC Very highly parallel – C:\ProgramFiles\Analog4.5\TS\Examples

Page 36: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 36 / 37

This sort of customization-- NOT NORMALLY POSSIBLE

What are the characteristics of general DSP algorithms?

What needs to be present on a processor to meet those requirements?

Covered in earlier lecture See IEEE Micro Magazine, Dec. 1992

“How RISCy is DSP”

Page 37: 10/12/20151 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

04/19/23ENCM515 -- Custom DSP -- not necessarily speed

Copyright [email protected] 37 / 37

Overview

Introduction Industrial Example of DFT/FFT

DFT -- FFT Theory Straight application Proper application “The KNOW-WHEN” application

Future talks The implications on DSP processor architecture How are actual DSP processors optimized for FFT

operations?