the fastest convolution in the west - university of...
TRANSCRIPT
![Page 1: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/1.jpg)
The Fastest Convolution in the West
Malcolm Roberts and John C. Bowman
Aix-Marseille University, University of Alberta
CEMRACS, 2012-08-14
![Page 2: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/2.jpg)
Convolutions
The convolution of the functions F and G is
(F ∗ G )(t) =
∫ ∞−∞
F (τ)G (t − τ) dτ.
For example, if F = G = χ(−1,1)(t)
Then F ∗ G is:
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 3: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/3.jpg)
Convolutions
I Out-of-focus images are a convolution.
I Image filtering.
I Digital signal processing.
I Correlation analysis.
I The Lucas–Lehmer primality test uses fast convolutions.
I Pseudospectral simulations of nonlinear PDEs.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 4: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/4.jpg)
Convolutions
The convolution of F = {Fk}k∈Z and G = {Gk}k∈Z is denotedF ∗ G , with
(F ∗ G )k =∑`,m ∈Z
F`Gmδk,`+m =∑` ∈Z
F`Gk−`
Properties:
I Commutativity: F ∗ G = G ∗ FI Associativity: ∗(F ,G ,H) = (F ∗ G ) ∗ H = F ∗ (G ∗ H),
where
∗(F ,G ,H)k =∑
`1,`2,`3∈ZF`1G`2H`3δk,`1+`2+`3
I Identify element: F ∗ δ = δ ∗ F = F
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 5: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/5.jpg)
Application: Correlation Analysis
I The cross-correlation of F and G is F ? G , with
(F ? G )k =∑`
F ∗` Gk+`.
I This can be computed as the convolution of F ∗k with G−k .
I Cross-correlation is useful in signal processing and dataanalysis.
I In this case, input data is {Fk}N−1k=0 , or non-centered.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 6: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/6.jpg)
Non-centered data
I Input data: {Fk}N−1k=0 and {Gk}N−1
k=0 .
I This produces non-centered convolutions:
(F ∗ G )k =k∑`=0
F`Gk−`, k = 0, . . . ,N − 1
I For non-centered data,∗(F ,G ,H) = F ∗ (G ∗ H) = (F ∗ G ) ∗ H .
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 7: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/7.jpg)
Application: Pseudospectral simulations
I The incompressible 2D Navier–Stokes vorticity equation
∂ω
∂t+ (u · ∇)ω = ν∇2ω
is Fourier-transformed into
∂ωk
∂t=∑
p+q=k
εkpqq2
ω∗pω∗q−νk2ωk , εkpq = (z · p × q)δk+p+q
I The nonlinearity becomes a convolution:
(F ∗ G )k =∑k1,k2
Fk1Gk2 δk,k1,k2 .
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 8: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/8.jpg)
Application: Pseudospectral simulations
I Input data {Fk}N−1k=−N+1 is centered.
I It is also Hermitian-symmetric F−k = F ∗k .
I Hermitian symmetry ⇐⇒ F−1[F ] ∈ R
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 9: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/9.jpg)
Centered data
I Input data: {Fk}N−1k=−N+1 and {Gk}N−1
k=−N+1.
(F ∗ G )k =
min (N−1,k+N−1)∑`=max (−N+1,k−N+1)
F`Gk−`
I Considering Hermitian-symmetric data (F−k = F ∗k ), wecompute data for k ≥ 0, so
(F ∗ G )k =N−1∑
`=k−N+1
F`Gk−`.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 10: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/10.jpg)
Centered data
TheoremFor centered data, ∗(F ,G ,H) 6= F ∗ (G ∗ H) 6= (F ∗ G ) ∗ H .
Proof.Let N = 2.
∗(Fa,Gb,Hc)1 (Fa ∗ (Gb ∗ Hc)`)1
a b c a ` b c
1 0 0 1 0 0 00 1 0 0 1 1 00 0 1 0 1 0 11 1 -1 1 0 1 -11 -1 1 1 0 -1 1-1 1 1 N/A
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 11: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/11.jpg)
FFT-based convolutions
I The convolution sum involves O(N2) terms. Using FFTs,we can compute a convolution in O(N logN) operations.
I The inverse discrete Fourier transform (DFT) of {Fk}N−1k=0
is
fn.
= F−1[F ] =N−1∑k=0
ζnkN Fk
I ζN = e2πiN is the N th root of unity. ζaaN = ζN , ζNN = 1.
I For {Fk}k∈Z, {Gk}k∈Z,
F [F ∗ G ] = F [F ]×F [G ].
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 12: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/12.jpg)
FFT-based convolutions
I The discrete Fourier transform treats arrays as periodic.
I A naive application of the convolution theorem producesa cyclic convolution:
{F ∗N G}k .=
N−1∑κ=0
FκmodNG(k−κ)modN ,
I These extra terms are called aliases.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 13: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/13.jpg)
Dealiasing techniques
I compare three dealiasing techniques:
I Phase-shift dealiasing
I Explicit zero-padding
I Implicit zero-padding
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 14: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/14.jpg)
Phase-shift dealiasing
The ∆-shifted Fourier transform,
F−1∆ [F ]j
.=
m−1∑k=0
ζ(j+∆)kN Fk ,
produces a convolution with an aliasing error of opposite signfor ∆ = 1/2:
{F ∗∆ G}k = F∆
[F−1
∆ [F ]×F−1∆ [F ]
]=
k∑κ=0
FκGk−κ −m−1∑κ=k+1
FκGk−κ+m.
One recovers the linear convolution by computing
F ∗ G =1
2[(F ∗N G ) + (F ∗∆ G )].
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 15: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/15.jpg)
Phase-shift dealiasing
{Fk}N−1k=0 {Gk}N−1
k=0
F−1{F} F−1∆ {F} F−1{G} F−1
∆ {G}
F ∗N G F ∗∆ G
F ∗G
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 16: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/16.jpg)
Explicit zero-padding
Another option is to append zero-data to the input array.For non-centered data, pad from length N to length 2N :
{Fk}2N−1n =0 = (F0,F1, . . . ,FN−2,FN−1, 0, . . . , 0︸ ︷︷ ︸
N
)
(F ∗2N G )k =2N−1∑`=0
F`(mod 2N)G(k−`)(mod 2N)
=N−1∑`=0
F`G(k−`)(mod 2N)
=k∑`=0
F`Gk−`.
Centered data is padded from length 2N − 1 to length 3N .Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 17: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/17.jpg)
Explicit zero-padding
{Fk}N−1k=0 {Gk}N−1
k=0
{Fk}N−1k=0 {0}N−1
k=0 {Gk}N−1k=0 {0}N−1
n=0
{fn}2N−1n=0 {gn}2N−1
n=0
{fngn}2N−1n=0
{(F ∗G)k}N−1k=0
{(F ∗G)k}N−1k=0
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 18: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/18.jpg)
Implicit Zero-padding
Implicit padding involves using a separate work array tocompute the DFT:
fx =2N−1∑k=0
ζxk2NFk , Fk = 0 if k ≥ N
is attained by computing
f2x =N−1∑k=0
ζxkN Fk
and
f2x+1 =N−1∑k=0
ζxkN (ζx2NFk).
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 19: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/19.jpg)
Implicit zero-padding
{Fk}N−1k=0 {Gk}N−1
k=0
{fn}N−1n=0 , n even {fn}N−1
n=0 , n odd {gn}N−1n=0 , n even {gn}N−1
n=0 , n odd
{fngn}N−1n=0 , n even {fngn}N−1
n=0 , n odd
{(F ∗G)k}N−1k=0
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 20: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/20.jpg)
Comparison of 1D methods: centered data
For non-centered data:
Phase-shiftdealiasing
Explicitpadding
Implicitpadding
Memory 4N 4N 4NComplexity 6KN logN 6KN logN 6KN logN
For centered Hermitian data:
Phase-shiftdealiasing
Explicitpadding
Implicitpadding
Memory 4N 3N 3NComplexity 6KN logN 9
2KN logN 9
2KN logN
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 21: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/21.jpg)
Comparison of zero-padding methods
4
5
6time/(N
log2N)(ns)
102 103 104 105 106
N
explicit
implicit
Complex non-centered 1D convolution.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 22: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/22.jpg)
Comparison of zero-padding methods
3.5
4
4.5time/(N
log2N)(ns)
102 103 104 105 106
N
explicit
implicit
Hermitian-symmetric centered 1D convolution.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 23: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/23.jpg)
Comparison of zero-padding methods
6
7
8
9
time/(N
log2N)(ns)
102 103 104 105 106
N
explicit
implicit
Hermitian-symmetric centered 1D ternary convolution.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 24: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/24.jpg)
Phase-shift dealiasing: multiple dimensions
A d-dimensional convolution requires computing 2d cyclicconvolutions with different shifts.For 3D pseudospectral simulations, one instead computes
F ∗N G
andF ∗∆ G
with ∆ = (12, 1
2, 1
2). This removes singly-aliased terms.
Doubly and triply aliased terms are removed by setting termsto zero with k ≥ 2
√2
3N ≈ 0.94N .
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 25: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/25.jpg)
Explicit Zero-padding: multiple dimensions
Multi-dimensional convolutions need to be padded in eachdimension.Non-centered convolutions are padded from Nd to
(2N)d .
Centered convolutions are padded from (2N − 1)d to
(3N)d .
Some transforms are performed on arrays or zeroes; these canbe skipped, and the transform is referred to as pruned.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 26: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/26.jpg)
Explicit Zero-padding: multiple dimensions
F GF G
f g
fg
F ∗GF ∗G
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 27: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/27.jpg)
Implicit Zero-padding: multiple dimensions
The 2D FFT-based convolution algorithm is:
F−1y → F−1
x → (multiply)→ Fx → Fy
Note thatF−1
x → (multiply)→ Fx
is just a convolution in the x-direction.
So the 2D convolution algorithm can be written
F−1y → (x-convolution)→ Fy .
Since the implicitly dealiased convolution uses non-contiguousmemory, we can re-use work arrays for sub-convolutions.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 28: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/28.jpg)
Implicit Zero-padding: multiple dimensions
F ∗G
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 29: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/29.jpg)
Comparison for Centered Convolutions
Method Complexity Memory Footprint
Explicit3 · 2dd KNd logN 2d+1 Nd
without PruningExplicit
6(2d − 1
)KNd logN 2d+1 Nd
with Pruning
Implicit 6(2d − 1
)KNd logN 4Nd
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 30: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/30.jpg)
Comparison for Centered Convolutions
Method Complexity Memory Footprint
Phase-Shift3 · 22d−1dKNd logN 22dNd
DealiasingPartial
3 · 2ddKNd logN 2d+1Nd
Phase-shiftExplicit 3d+1
2d KNd logN 3dNd
Explicit 92
(3d − 2d
)KNd logN 3dNd
with Pruning
Implicit 92
(3d − 2d
)KNd logN 3 · 2d−1Nd
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 31: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/31.jpg)
Performance: multiple dimensions
6
7
8
9
10
11
12
13
time/(N
2log2N
2)(ns)
103
N
explicit
y-pruned
implicit
Complex non-centered 2D convolution.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 32: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/32.jpg)
Performance: multiple dimensions
10
11
12
13
time/(N
2log2N
2)(ns)
103
N
explicit
y-pruned
implicit
Hermitian-symmetric centered 2D convolution.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 33: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/33.jpg)
Performance: multiple dimensions
20
25
30time/(N
2log2N
2)(ns)
103
N
explicit
y-pruned
implicit
Hermitian-symmetric centered 2D ternary convolution.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 34: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/34.jpg)
Performance: multiple dimensions
10
15
20
time/(N
3log2N
3)(ns)
101 102
N
explicit
xz-pruned
implicit
Complex non-centered 3D convolutions.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 35: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/35.jpg)
Multi-threaded convolutions
Implicit dealiasing has been implemented with multiplethreads.Each sub-convolution requires its own work array.With P processors, the memory increase is of the order
PNd−1
for d-dimensional convolutions.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 36: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/36.jpg)
Implicit Zero-padding: multiple threads
F ∗G
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 37: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/37.jpg)
Implicit multi-threading performance
2
3
4
5
6
7
8
9
time/(N
log2N)(ns)
102 103 104 105 106
N
serial
4 cores
Non-centered 1D convolution.Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 38: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/38.jpg)
Implicit multi-threading performance
3
4
5
6
time/(N
log2N)(ns)
102 103 104 105 106
N
serial
4 cores
Centered 1D convolution.Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 39: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/39.jpg)
Implicit multi-threading performance
5
6
7
8
9
10
11
12
13
time/(N
log2N)(ns)
102 103 104 105 106
N
serial
4 cores
Centered 1D ternary convolution.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 40: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/40.jpg)
Implicit multi-threading performance
3
4
5
6
7
8
9
10
11
time/(N
2log2N
2)(ns)
103
N
serial
4 cores
Non-centered 2D convolution.Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 41: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/41.jpg)
Implicit multi-threading performance
5
6
7
8
9
10
11
12
13
14
time/(N
2log2N
2)(ns)
103
N
serial
4 cores
Centered 2D convolution.Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 42: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/42.jpg)
Implicit multi-threading performance
10
20
30
time/(N
2log2N
2)(ns)
102 103
N
serial
4 cores
Centered ternary 2D convolution.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 43: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/43.jpg)
Implicit multi-threading performance
10
20
30
40
50
time/(N
3log2N
3)(ns)
101 102
N
serial
4 cores
Non-centered 3D convolution.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 44: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/44.jpg)
Implicit multi-threading performance
30
60
90
120
time/(N
3log2N
3)(ns)
101 102
N
serial
4 cores
Centered 3D convolution.Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 45: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/45.jpg)
Implicit multi-threading performance
I One-dimensional convolutions on four cores are about 2times as fast as on one core.
I Two-dimensional convolutions on four cores are about 3times as fast.
I Three-dimensional convolutions on four cores are about3.5 times as fast.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 46: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/46.jpg)
Multiple threads: explicit vs. implicit
2
3
4
5
6
7
8
time/(N
log2N)(ns)
102 103 104 105 106
N
explicit
implicit
Non-centered 1D convolution.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 47: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/47.jpg)
Multiple threads: explicit vs. implicit
3
4
5
6
7
time/(N
log2N)(ns)
102 103 104 105 106
N
explicit
implicit
Centered 1D convolution.Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 48: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/48.jpg)
Multiple threads: explicit vs. implicit
5
6
7
8
9
10
11
12
time/(N
log2N)(ns)
102 103 104 105 106
N
explicit
implicit
Centered ternary 1D convolution.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 49: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/49.jpg)
Multiple threads: explicit vs. implicit
3
4
5
6
7
8
time/(N
2log2N
2)(ns)
103
N
explicit
implicit
Non-centered 2D convolution.Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 50: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/50.jpg)
Multiple threads: explicit vs. implicit
5
6
7
8
9
10
time/(N
2log2N
2)(ns)
103
N
explicit
implicit
Centered 2D convolution.Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 51: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/51.jpg)
Multiple threads: explicit vs. implicit
10
15
20
time/(N
2log2N
2)(ns)
102 103
N
explicit
implicit
Centered ternary 2D convolution.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 52: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/52.jpg)
Multiple threads: explicit vs. implicit
10
20
30
40
50
60
time/(N
3log2N
3)(ns)
101 102
N
explicit
implicit
Non-centered 3D convolution.Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 53: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/53.jpg)
Summary of Results
I Implicit methods require much less work memory than isrequired by explicit methods .
I The implicit method had a speedup of up to 3.5 on fourcores, while the explicit method sped-up of up to a factorof 3.
I The implicit method is around twice as fast as the explicitmethod for multidimensional convolutions.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 54: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/54.jpg)
Usage example
Computing the nonlinear source of the 2D incompressibleNavier–Stokes equations in a vorticity formulation, whichappears in Fourier space as∑
p
pxky − pykx|k − p|2 ωpωk−p,
is performed as follows:
conv2(ikxω, ikyω, ikyω/k2,−ikxω/k2).
One also has the option of passing work arrays to conv2,
which can then be used elsewhere.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 55: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/55.jpg)
Conclusion
I Implicitly zero-padding multi-dimensional convolutions isfaster and requires less memory than explicit routines.
I The algorithm has been successfully implemented on ashared-memory architecture with only a small increase inwork memory.
I Convolution algorithms are available for complexnon-centered data and centered Hermitian-symmetricdata in 1D, 2D, and 3D.
I Ternary convolution algorithms are available for centeredHermitian-symmetric in 1D and 2D.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 56: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/56.jpg)
Future work
I A distributed-memory implementation based on openMPI.
I Improve multi-threaded parallelization.
I Convolutions on real data.
I Correlation routines.
I Auto-convolution/correlation routines.
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta
![Page 57: The Fastest Convolution in the West - University of Albertamalcolmr/talks/mroberts2012cemracs.pdf · F F1fFg F1 f gF 1fGg 1 G F N G F G F G ... F1 y!(x-convolution) !F y: Since the](https://reader033.vdocuments.net/reader033/viewer/2022041912/5e67c1b529c3d7000c41ccee/html5/thumbnails/57.jpg)
Resources
FFTW++:http://fftwpp.sourceforge.net
Asymptote:http://asymptote.sourceforge.net
Malcolm Roberts:http://www.math.ualberta.ca/~mroberts
Malcolm Roberts and John C. Bowman Aix-Marseille University, University of Alberta