acceleration of cooley-tukey algorithm using maxeler machine

23
Acceleration of Cooley-Tukey algorithm using Maxeler machine Author: Nemanja Trifunović Mentor: Professor dr. Veljko Milutinović

Upload: willow-beck

Post on 30-Dec-2015

43 views

Category:

Documents


2 download

DESCRIPTION

Acceleration of Cooley-Tukey algorithm using Maxeler machine. Author : Nemanja Trifunović Mentor : Profe s sor dr. Veljko Milutinović. Introduction. Cooley-Tukey algorit h m Fast Fourier Transform Divide and conquer - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Acceleration of Cooley-Tukey algorithmusing Maxeler machine

Author: Nemanja Trifunović Mentor: Professor dr. Veljko Milutinović

Page 2: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Introduction

● Cooley-Tukey algorithm○ Fast Fourier Transform○ Divide and conquer○ Uses: Digital Signal Processing,

Telecommunications, The analysis of sound signals, …

● Maxeler platform○ Data flow

(vs Control flow)○ FPGA

Example of Fourier transformation.

(Source: https://en.wikipedia.org/wiki/File:Rectangular_function.svg; https://en.wikipedia.org/wiki/File:Sinc_function_(normalized).svg, Illustration is published under Creative Commons licencom)

1/22

Page 3: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Problem statement

Design and implementation of:

● The fastest possible system for calculating Fast Fourier Transform using Maxeler machine.

● System that will outperform currently existing solutions to this problem.

2/22

Page 4: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Problem statement

Benefits

● Higher speed of calculation.

● Lower power consumption.● Lower space consumption.

Conditions

● Huge amounts of data.

• Benefits of calculating Fast Fourier Transformwith Maxeler machines

3/22

Page 5: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Conditions and assumptions

● Used Maxeler machine○ Two Maxeler card

type MAX3424A.

● In experiments with multiprocessor systems only one processor core was used.

4/22

Page 6: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Overview of existing solutions

● FFT algorithms: Prime-factor, Bruun’s, Rader’s, Winograd, Bluestein’s, …

● The time complexity: O(N log N).

● Performance comparisonof publicly available implementations.

○ Matteo Frigo and Steven G. Johnson (from MIT)

5/22

Page 7: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Illustration of Matteo Frigo’s and Steven G. Johnson’s experiments. (Soruce: http://www.fftw.org/speed/Pentium4-3.60GHz-icc)

6/22

Page 8: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

The proposed solution

● Parallelized radix 2 algorithm.

● Pipeline of depth O(log N), where N is the length of input sequence.

● Latency is proportional to the depth of pipeline.

● After initial delay (latency) one result in every cycle.

7/22

Page 9: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Formal analysis

Radix 2 Cooley-Tukey algorithmoperates as follows:

1. Input sequence is divided into two equal subsequences where even elements make first, while the odd elements make second sequence.

2. Then, using the calculated DFT's of subsequences DFT of the whole sequence is calculated.

8/22

Page 10: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Formal analysisDetailed derivation of the following formula is given it the paper

● DFT of even sequence is denoted by Ek,

● DFT of odd sequence is denoted by a Ok and

● e-2πk/N is denoted by Wkn.

9/22

Page 11: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Illustration of pipelined execution of radix 2 algorithm. 10/22

Page 12: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Measurment and analysis of the performance of proposed implementation

Types of performed experiments

● Calculation of Fourier transformof 100, 1.000, 10.000, 1.000.000 and 10.000.000 consecutive input sequencesof length 8, 16, 32 i 64 points.

● Maxeler implementationvs reference CPU implementation

● Maxeler implementationvs best publicly available implementations

11/22

Page 13: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Generated graphs:

● Maxeler vs best publicly available implementations of FFT algorithm.

● Run-times, depending on the number of consecutive FFT calculations(for input sequences of length 8, 16, 32 and 64).

● Acceleration obtained using Maxeler machine, compared to the CPU execution,depending on the number of consecutive FFT calculations(for input sequences of length 8, 16, 32 and 64).

12/22

Page 14: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

The average execution time in seconds of publicly available algorithms for calculating FFT on different architectures

for input sequence of 8 elements. 13/22

Page 15: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Acceleration of Maxeler implementation compared to CPU implementation depending on the number of elements in the input sequence .

14/22

Page 16: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Computation time of consecutive fast Fourier transforms expressed in seconds depending on the number of consecutive calculations.

15/22

Page 17: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Acceleration of Maxeler implementation compared to CPU implementation depending on the number of consecutive calculations.

.

16/22

Page 18: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Analysis of scalability and bottlenecks of proposed solution

● Transfer of data to Maxeler cardand from Maxeler card

● Limited number of hardware resources on single Maxeler card

● Limited number of Maxeler cards

17/22

Page 19: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Analysis of implementation

Maxeler implementation of Cooley-Tukey algorithm consists of:

1. Rearrangement of the input sequencein bit reverse order and

2. Radix 2 algorithm.

18/22

Page 20: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Illustration of the kernel19/22

Page 21: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Implementation details

● Two input and two output streams ● These streams are of type: arrayType

DFEType floatType = dfeFloat(8, 24);DFEArrayType<DFEVar> arrayType =

new DFEArrayType<DFEVar>(floatType, n);

● Ratios Wnk aren’t calculated on Maxeler machine

● Parameters:○ N○ first_level○ last_level

20/22

Page 22: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Conclusion

➔ It’s show that proposed solutionhas expected performance and that it works correctly.

➔ Performance of the proposed solutionis better than performance ofany publicly available implementation of Fast Fourier Transform.

➔ To achieve these speedups it is needed to do consecutive calculations of Fast Fourier Transform

21/22

Page 23: Acceleration of  Cooley-Tukey  algorithm using  Maxeler  machine

Q/AThank you for attention