what is the wht anyway, and why are there so many ways to compute it? jeremy johnson 1, 2, 6, 24,...
TRANSCRIPT
![Page 1: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/1.jpg)
What is the WHT anyway, and why are there so many ways to
compute it?
Jeremy Johnson
1, 2, 6, 24, 112, 568, 3032, 16768,…
![Page 2: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/2.jpg)
Walsh-Hadamard Transform
• y = WHTN x, N = 2n
n
N WHTWHTWHT 22...
11
11WHT2
1111
1111
1111
1111
11
11
11
11
WHTWHTWHT 224
![Page 3: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/3.jpg)
WHT Algorithms
• Factor WHTN into a product of sparse structured matrices
• Compute: y = (M1 M2 … Mt)xyt = Mtx…
y2 = M2 y3
y = M1 y2
![Page 4: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/4.jpg)
Factoring the WHT Matrix
• AC DCD• A • A C) = (A C
• Im nmn
1100
1100
0011
0011
1010
0101
1010
0101
1111
1111
1111
1111
4WHT
WHT2 WHT2WHT2WHT2
![Page 5: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/5.jpg)
Recursive and Iterative Factorization
WHT8 WHT2WHT4
WHT2WHT2 I2WHT
WHT2WHT2 I2I2WHT
WHT2WHT2 I2I2WHT
WHT2WHT2 I2I2WHT
WHT2WHT2 I4WHT
![Page 6: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/6.jpg)
WHT8 =WHT2WHT2
I4WHT
11
11
11
11
11
11
11
11
11
11
1
11
11
11
11
11
11
11
11
11
11
11
11
11
11111111
11111111
11111111
11111111
11111111
11111111
11111111
11111111
![Page 7: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/7.jpg)
WHT Algorithms
• Recursive
• Iterative
• General
n
i
ini
N1
IWHTIWHT 2221
WHTIIWHTWHT 2 2/2/2 NNN
nnn t
t
i
nnnnnn tiii
1
1
where
,2222 IWHTIWHT 111
![Page 8: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/8.jpg)
WHT Implementation
• Definition/formula– N=N1* N2Nt Ni=2ni – x=WHTN*x x =(x(b),x(b+s),…x(b+(M-1)s))
• Implementation(nested loop) R=N; S=1; for i=t,…,1 R=R/Ni
for j=0,…,R-1 for k=0,…,S-1 S=S* Ni;
Mb,s
t
i 1
)
nn WHTWHT 222 II( n1+ ··· + ni-1 2ni+1+ ··· + nt
i
ii
i
i
NSkSjNN
NSkSjN xWHTx ,,
i
![Page 9: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/9.jpg)
Partition Trees
4
1 3
1 2
1 1
4
1 1 11
Right Recursive
Iterative
9
3 4 2
1 2 1
1 1
4
13
12
11
Left Recursive
4
2 2
1 1 1 1
Balanced
![Page 10: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/10.jpg)
Ordered Partitions
• There is a 1-1 mapping from ordered partitions of n onto (n-1)-bit binary numbers.
There are 2n-1 ordered partitions of n.
1|1 1|1 1 1 1|1 1 1+2+4+2 = 9
162 = 1 0 1 0 0 0 1 0
![Page 11: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/11.jpg)
Enumerating Partition Trees
3 3
12
00 01
3
12
11
01
3
21
10
3
1 2
1 1
10
3
11
11
1
![Page 12: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/12.jpg)
Counting Partition Trees
1,1
1,1 TTT 1
1
n
nn
nnnn t
tn
8.6),/(
)22(2
8811
))T(1(
)T()1(
2462
2/3
2
432
0
T
)T(
T)T(
n
z
zzzzzz
n
n
n
nn
z
zz
z
zzz
![Page 13: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/13.jpg)
WHT PackagePüschel & Johnson (ICASSP ’00)• Allows easy implementation of any of the possible WHT algorithms• Partition tree representation W(n)=small[n] | split[W(n1),…W(nt)]• Tools
– Measure runtime of any algorithm
– Measure hardware events (coupled with PCL)
– Search for good implementation
• Dynamic programming
• Evolutionary algorithm
![Page 14: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/14.jpg)
Histogram (n = 16, 10,000 samples)
•Wide range in performance despite equal number of arithmetic operations (n2n flops)
•Pentium III consumes more run time (more pipeline stages)
•Ultra SPARC II spans a larger range
![Page 15: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/15.jpg)
Operation Count
Theorem. Let WN be a WHT algorithm of size N. Then the number of floating point operations (flops) used by WN is Nlg(N).
Proof. By induction.
2222
)(2)(
11
1
nninnn
Nflopsnn
Nflops
nnn
WWt
ii
it
i
i
i
t
i
i
![Page 16: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/16.jpg)
Instruction Count Model
)()()A()IC(3
1
3
1AL nlninn
ll
ii
A(n) = number of calls to WHT procedure= number of instructions outside loopsAl(n) = Number of calls to base case of size l l = number of instructions in base case of size l
Li = number of iterations of outer (i=1), middle (i=2), and outer (i=3) loop i = number of instructions in outer (i=1), middle (i=2), and outer (i=3) loop body
![Page 17: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/17.jpg)
Small[1].file "s_1.c"
.version "01.01"
gcc2_compiled.:
.text
.align 4
.globl apply_small1
.type apply_small1,@function
apply_small1:
movl 8(%esp),%edx //load stride S to EDX
movl 12(%esp),%eax //load x array's base address to EAX
fldl (%eax) // st(0)=R7=x[0]
fldl (%eax,%edx,8) //st(0)=R6=x[S]
fld %st(1) //st(0)=R5=x[0]
fadd %st(1),%st // R5=x[0]+x[S]
fxch %st(2) //st(0)=R5=x[0],s(2)=R7=x[0]+x[S]
fsubp %st,%st(1) //st(0)=R6=x[S]-x[0] ?????
fxch %st(1) //st(0)=R6=x[0]+x[S],st(1)=R7=x[S]-x[0]
fstpl (%eax) //store x[0]=x[0]+x[S]
fstpl (%eax,%edx,8) //store x[0]=x[0]-x[S]
ret
![Page 18: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/18.jpg)
Recurrences
leaf a ,0)A(
... ),A(1)A(1
12
n
nnnn
n
nnnti
t
i
i
lnl
ln
llA leaves ofnumber where,)( 2
![Page 19: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/19.jpg)
Recurrences
leaf a ,0)(
... ,)()(
... ,)()(
... ),()(
L
2L2L
2L2L
L2L
11
23
1
...
122
11
11
11
n
nnnn
nnnn
nnnn
n
nnnnn
nnnnn
nntn
i
ti
t
i
ti
t
i
ti
t
i
ii
ii
i
![Page 20: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/20.jpg)
Histogram using Instruction Model (P3)
l = 12, l = 34, and l = 106 = 271 = 18, 2 = 18, and 1 = 20
![Page 21: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/21.jpg)
Algorithm Comparison
Recursive/Iterative Runtime
0.00E+002.00E-014.00E-016.00E-018.00E-011.00E+00
1.20E+001.40E+001.60E+001.80E+002.00E+00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
WHT size(2̂ n)
ratio r1/i1
Rec &Bal/It Instruction Count
0
0.5
1
1.5
2
2.5
1 3 5 7 9 11 13 15 17 19
rr1/i1
lr1/i1
bal1/i1
Rec&It/Best Runtime
0.00E+00
2.00E+00
4.00E+00
6.00E+00
8.00E+00
1.00E+01
1.20E+01
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
WHT size(2̂ n)
ratio
r1/b
r3/b
i1/b
i3/b
b/b
Small/It Runtime
0.00E+002.00E+00
4.00E+006.00E+00
8.00E+001.00E+01
1.20E+01
1 2 3 4 5 6 7 8
WHT size(2̂ n)
ratio I_1/rt
r_1/rt
![Page 22: What is the WHT anyway, and why are there so many ways to compute it? Jeremy Johnson 1, 2, 6, 24, 112, 568, 3032, 16768,…](https://reader036.vdocuments.net/reader036/viewer/2022062801/56649e4d5503460f94b4377b/html5/thumbnails/22.jpg)
Dynamic Programming
minn1+… + nt= n
Cost(T T…
n1 nt
n),
where Tn is the optimial tree of size n.
This depends on the assumption that Cost only depends onthe size of a tree and not where it is located.(true for IC, but false for runtime).
For IC, the optimal tree is iterative with appropriate leaves.For runtime, DP is a good heuristic (used with binary trees).