cache oblivious algorithms zhang jiahui neel kamal
DESCRIPTION
Large Integer Multiplication(1) We have two large Integer x and y. x has m digits and y has n digits If m>n, append zeros to the left side of n If n>m, append zeros to the left side of m Suppose m>n, we now have two large Integers both of length mTRANSCRIPT
Cache Oblivious Algorithms
Zhang JiaHuiNeel Kamal
Introduction
Cache Oblivious vs Cache Aware (Z,L) Idea-Cache-Model Large Integer Multiplication & RSA Dynamic Programming
- Floyd All-Pair Shortest Paths- Longest Common Sequence
Cache-Behavior Simulator Experimental Results
2LZ
Large Integer Multiplication(1)
We have two large Integer x and y. x has m digits and y has n digits
If m>n, append zeros to the left side of nIf n>m, append zeros to the left side of mSuppose m>n, we now have two large
Integers both of length m
Large Integer Multiplication(1)
A B
C D
B x CA x C
B x D
A x D
Final Result
Large Integer Multiplication(1)
m > nCASE I
After k steps, m
4Zm
otherwiseOmQ
ZZmifLm
mQ,)1(
24
)4
,8
(,4
)(
4,
82ZZm
k
LZm
L
mmQmQ
kk
kk
21624
42
4)(
Large Integer Multiplication(1)
m>nCASE II
If n>m, in CASE I, and
So, combine all the cases, we have
4Zm
LmmQ 4)(
LZnnQ
216)(
LnnQ 4)(
Ln
Lm
LZm
LZnnQ
22
)(
Large Integer Multiplication(2)
We do not append zero to the left hand side of the shorter Integer
CASE I
otherwiseOnmQ
nmifotherwiseOnmQ
ZZnmifLnm
Ln
Lm
nmQ
,)1(2
,2
)(,,)1(,2
2
),2
,(,
),(
Znm ,
Large Integer Multiplication(2)
After k1 steps m
After k2 steps n
ZZm
k ,22 1
ZZn
k ,22 2
LZmn
Lnm
L
nmnmQnmQ
kkkkkk
kkkk
122121
2121 22222
222
2,
222),(
Large Integer Multiplication(2)
CASE II
If
Zm
otherwiseOnmQ
ZZnifLnm
Ln
Lm
nmQ,)1(
2,2
),2
(,),(
Ln
LZmnnmQ ),(
Zn
Lm
LZmnnmQ ),(
Large Integer Multiplication(2)
CASE III
Combine all the cases:
Total work
Znm ,
Lnm
Ln
LmnmQ ),(
LZmn
Lnm
Ln
LmnmQ ),(
mnO
Large Integer Multiplication & RSA
Summary of RSA n = pqn = pq where p and q are distinct
primes. phi, φ = (p-1)(q-1)φ = (p-1)(q-1) e < n such that gcd(e, phi)=1 d = e^-1 mod phi. c = m^e mod n. m = c^d mod n.
All-pair shortest Paths Floyd for k=1 to n for i=1 to n for j=1 to n d[i][j][k]=min(d[i][j][k-1],d[i][k][k-1]+d[k][j][k-1]
For each k (1..n)
3nO
All-pair shortest Paths
For each iteration of k CASE I
After k steps
Zn
otherwiseOnQ
ZZnifLn
nQ
,)1(2
4
),2
(,)(
2
Ln
L
nnQnQ
kk
kk
2
2
242
4)(
ZZnn k ,
22
All-pair shortest Paths
CASE II
Combine the cases:
We have n iterations,
Zn
)()( nnQ
LnnnQ
2
)(
LnnnQtotal
32)(
Longest Common Sequence
We have 2 long sequences x and y, x is of length m, and y is of length n.
Try to find the Longest Common Sequence of x and y.
Dynamic Programming
Longest Common Sequencey1 y2 y3 y4 y5 y6
B D C A B A
x1 A 0 0 0 1 1 1
x2 B 1 1 1 1 2 2
x3 C 1 1 2 2 2 2
x4 B 1 1 2 2 3 3
x5 D 1 2 2 2 3 3
x6 A 1 2 2 3 3 4
x7 B 1 2 2 3 4 4
If x[i] == y[j]c[i][j] = c[i -1][j -1]+1;
elsec[i][j] = max{c[i -1][j]; c[i][j -1];
Longest Common Sequencey1 y2 y3 y4 y5 y6
B D C A B A
x1 A 0 0 0 1 1 1
x2 B 1 1 1 1 2 2
x3 C 1 1 2 2 2 2
x4 B 1 1 2 2 3 3
x5 D 1 2 2 2 3 3
x6 A 1 2 2 3 3 4
x7 B 1 2 2 3 4 4
B D C A B A
A 0 0 0 1 1 1
B 1 1 1 1 2 2
C 1 1 2 2 2 2
B 1 1 2 2 3 3
Longest Common Sequence
CASE I Znm ,
otherwiseOnmQ
nmifotherwiseOnmQ
ZZnmifLmn
nmQ
,)1(2
,2
)(,,)1(,2
2
),2
,(,
),(
Longest Common Sequence
Suppose:After k1 steps, m
After k2 steps, n
ZZm
k ,22 1
ZZn
k ,22 2
Lmn
L
nmnmQnmQ
kkkk
kkkk
2121
2121 2222
2,
222),(
Longest Common Sequence
CASE II Zm
otherwiseOnmQ
ZZnifmnmQ
,)1(2
,2
),2
(,1),(
ZmnnmQ ),(
In the case when Zn
otherwiseOnmQ
ZZmifmnmQ
,)1(,2
2
),2
(,1),(
mnmQ ),(
Longest Common Sequence
CASE III
Combine all 3 cases:
Total Work
Znm ,
mnmQ 1),(
Lmn
ZmnmnmQ 21),(
mnO
Cache Oblivious approaches for Dynamic Programming
Dynamic Programming to find an optimal solution Sub-problems overlap
Approaches bottom up (by recursion usually) top down but with a table to memorize earlier solutions
Divide and Conquer method to build the table recursively to make the approach cache oblivious?
Cache Simulator
With a tall cache assumption
A fully associative cache
2LZ
Other Assumptions • No temporary variable put into the cache• All input data is assumed to be already present in cache.
ResultsSummarizing the theoretical Results:
LZmn
Lnm
Ln
LmnmQ ),(Large Integer Multiplication
All-pair shortest Paths
Longest Common Sequence
LnnnQ
2
)(
Lmn
ZmnmnmQ 21),(
Results from Simulation
Target Machine: arch : IA-64 family : Itanium 2 CPU MHz : 896.262997 Cache size : 303312 KB OS Linux version 2.4.22 gcc version 2.96 20000731
We will see that there is a very close match between the theoretical results and the
simulation result.
Results
Cache Oblivious Large Integer Multiplication
248115
289047097 2712 1089 465 270 101 89 810
50000100000150000200000250000300000
10 20 30 40 50 60 70 80 90 100
Size of cache Line
Num
ber o
f Cac
he M
isse
s
Size of Integer: M = 1000 N = 1000
3),(
LmnnmQ
Comparing Results
Case 1: L = 20, Z = 400Theoretical Result = Θ (1000/8)Simulator Result = 28904ratio = 0.0041
Case 2: L=30, Z = 900Theoretical Result = Θ (1000/27)Simulator Result = 7097ratio = 0.0044
Case 3: L=40, Z = 1600Theoretical Result = Θ (1000/64)Simulator Result = 2712ratio = 0.0052
More Results
Cahe Oblivious Longest Common Sequence
0
500000
1000000
1500000
2000000
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Size of Cache Line
Num
ber o
f Cac
he M
isse
s
Size of Sequence: M = 1000 N = 1000
LmnnmQ ),(
More ResultsCache Oblivious Floyd Algorithm
0100000200000300000400000500000600000700000
20 30 40 50 60 70 80 90 100
Size of Cache Line
Num
ber o
f Cac
he M
isse
s
Number of Vertices N = 100
LnnnQ
2
)(
Some More Work
We also implemented Parallel solutions to each of these problems. We had test results of their performance on CILK.
Thank You