implementing the fft on gpus tips & trickscomputer-graphics.se/multicore/pdf/2013/12a fft on...
TRANSCRIPT
![Page 1: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/1.jpg)
1
LINKÖPING UNIVERSITY
Linköping, 2013
IMPLEMENTING THE FFT ON GPUs
TIPS & TRICKS
Department of Electrical
Engineering
Mario Garrido Gálvez [email protected]
![Page 2: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/2.jpg)
2
MARIO GARRIDO
Associate Professor at ES, ISY
PhD in Electrical Engineering (Spain).
Research background:
Optimized implementation of signal processing algorithms.
Transforms (FFT, STFT,…), statistical operations
(regressions, median filter,…).
Data management (matrix transposition, interleavers,…).
Hardware designer (FPGAs, ASICs,…).
![Page 3: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/3.jpg)
3
A STORY ABOUT GPUs
![Page 4: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/4.jpg)
4
< 2011
Mainly FFTs on FPGAs.
Hundreds of papers in the topic since the 70’s.
Is not everything done???
OPTIMIZE
OPTIMIZE
OPTIMIZE
![Page 5: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/5.jpg)
5
DFT / FFT
Discrete Fourier Transform / Fast Fourier Transform.
The most widely used algorithm in signal processing
- Audio and Image Processing. - 3G, 4G.
- Medical applications: EEG, ECG. - ADSL.
![Page 6: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/6.jpg)
6
…,2011,…
FFT FPGA FFT FFT FPGA FPGA FFT FPGA
FPGA FFT FPGA FFT FFT FPGA FPGA FFT…
I should do something new!!
What about GPUs?…Shouldn’t it be the same…
![Page 7: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/7.jpg)
7
FPGA vs GPU
FPGA GPU
Altera Cyclone II NVIDIA Fermi
![Page 8: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/8.jpg)
8
…,2011,…
Started Master Thesis (Sreehari Ambuluri): FFTs on GPUs.
Read articles and a book on GPUs.
Asked Ingemar, Jens, Gabriel.
![Page 9: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/9.jpg)
9
…,2012,…
Finish the Master Thesis.
The work is good.
Why not to improve it and
publish a paper?
Asked Ingemar, Jens and
Gabriel for collaboration.
![Page 10: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/10.jpg)
10
…,2013,…
![Page 11: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/11.jpg)
11
…,2013 BEST PAPER AWARD
![Page 12: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/12.jpg)
12
NVIDIA
NVIDIA WE
![Page 13: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/13.jpg)
13
LEVEL OF ABSTRACTION
The compiler decides We decide
COMPILATION
COMPILATION
DESCRIPTION
OPTIMIZATION
&
DESCRIPTION
ALGORITHM ALGORITHM =
IMPLEMENTATION IMPLEMENTATION <
DE
SIG
N P
RO
CE
SS
High level of abstraction Low level of abstraction
![Page 14: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/14.jpg)
14
ABSTRACTION vs PERFORMANCE
z = 15 * x;
z = (x<<3)+(x<<2)+(x<<1) +x;
z = (x<<4)-x;
x z
15
x << 3
z
<< 2
<< 1
x << 4
z
LANGUAGE
DESCRIPTION HARDWARE
IMPLEMENTATION
![Page 15: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/15.jpg)
15
UNDERSTANDING GPUs
1.- The performance is related to the computation time. The lower
the computation time, the higher the performance. Try to simplify
the operations in the algorithm.
2.- Transactions to global memory very expensive. Try to avoid or
minimize. Try to use shared memory.
3.- Threads must be synchronized if we want to share information
among them. Unless they are in the same warp. Try to reduce the
number of synchronization points.
4.- We have to calculate the index of the data processed by each
thread. Try to minimize the number of index calculations.
5.- Threads process data in parallel and the synchronization is not
possible until all the threads have finished the calculations. Balance
the load among thread.
![Page 16: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/16.jpg)
FFT FLOW GRAPH (RADIX -2)
4 03
7
6
5
4
0
07
15
11
2
3
1
0
0
4
0
0
0
0
0
0
4
6 4
0
2
0 0
0
5
13
1
9
6
14
2
10
0
0
0
0
0
0
4
0
0
0 0
0
4
12
0
8
14
15
13
12
10
11
8
9
7
6
1
5
4
3
2
0
STAGE 1 STAGE 2 STAGE 3 STAGE 4
4
6
2
0
0
0
0
0
x[n]
n
X[k]
k
logrN
stages
butterflies
(radix-2)
rotations
Nj
e
2
N points
16
![Page 17: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/17.jpg)
1. SIMPIFY THE ALGORITHM
17
USE RADIX-22
![Page 18: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/18.jpg)
2. USE SHARED MEMORY
18
![Page 19: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/19.jpg)
3. REDUCE SYNC. POINTS 2-word group 4-word group
USE WORD GROUPS
19
![Page 20: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/20.jpg)
4. REDUCE INDEX CALCULATIONS
USE CONSTANT GEOMETRY
Conventional flow graph Constant Geometry
20
![Page 21: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/21.jpg)
5. BALANCE LOAD AMONG THREADS
USE SCHEDULING
Unbalanced scheduling Balanced scheduling
21
![Page 22: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/22.jpg)
CONCLUSIONS
22
Optimization:
- Depends on the details and the level of abstraction.
- Requires to understand in-depth what you are doing.
Teamwork makes a difference.
GPUs are fun.
![Page 23: IMPLEMENTING THE FFT ON GPUs TIPS & TRICKScomputer-graphics.se/multicore/pdf/2013/12a FFT on GPU.pdf · Presentación de PowerPoint Author: IBM APTIVA Created Date: 12/6/2013 2:45:18](https://reader033.vdocuments.net/reader033/viewer/2022051902/5ff1ff7bddde8040e9615117/html5/thumbnails/23.jpg)
23