cache oblivious algorithms zhang jiahui neel kamal

31
Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Upload: timothy-dean

Post on 08-Jan-2018

227 views

Category:

Documents


4 download

DESCRIPTION

Large Integer Multiplication(1) We have two large Integer x and y. x has m digits and y has n digits If m>n, append zeros to the left side of n If n>m, append zeros to the left side of m Suppose m>n, we now have two large Integers both of length m

TRANSCRIPT

Page 1: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Cache Oblivious Algorithms

Zhang JiaHuiNeel Kamal

Page 2: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Introduction

Cache Oblivious vs Cache Aware (Z,L) Idea-Cache-Model Large Integer Multiplication & RSA Dynamic Programming

- Floyd All-Pair Shortest Paths- Longest Common Sequence

Cache-Behavior Simulator Experimental Results

2LZ

Page 3: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Large Integer Multiplication(1)

We have two large Integer x and y. x has m digits and y has n digits

If m>n, append zeros to the left side of nIf n>m, append zeros to the left side of mSuppose m>n, we now have two large

Integers both of length m

Page 4: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Large Integer Multiplication(1)

A B

C D

B x CA x C

B x D

A x D

Final Result

Page 5: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Large Integer Multiplication(1)

m > nCASE I

After k steps, m

4Zm

otherwiseOmQ

ZZmifLm

mQ,)1(

24

)4

,8

(,4

)(

4,

82ZZm

k

LZm

L

mmQmQ

kk

kk

21624

42

4)(

Page 6: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Large Integer Multiplication(1)

m>nCASE II

If n>m, in CASE I, and

So, combine all the cases, we have

4Zm

LmmQ 4)(

LZnnQ

216)(

LnnQ 4)(

Ln

Lm

LZm

LZnnQ

22

)(

Page 7: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Large Integer Multiplication(2)

We do not append zero to the left hand side of the shorter Integer

CASE I

otherwiseOnmQ

nmifotherwiseOnmQ

ZZnmifLnm

Ln

Lm

nmQ

,)1(2

,2

)(,,)1(,2

2

),2

,(,

),(

Znm ,

Page 8: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Large Integer Multiplication(2)

After k1 steps m

After k2 steps n

ZZm

k ,22 1

ZZn

k ,22 2

LZmn

Lnm

L

nmnmQnmQ

kkkkkk

kkkk

122121

2121 22222

222

2,

222),(

Page 9: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Large Integer Multiplication(2)

CASE II

If

Zm

otherwiseOnmQ

ZZnifLnm

Ln

Lm

nmQ,)1(

2,2

),2

(,),(

Ln

LZmnnmQ ),(

Zn

Lm

LZmnnmQ ),(

Page 10: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Large Integer Multiplication(2)

CASE III

Combine all the cases:

Total work

Znm ,

Lnm

Ln

LmnmQ ),(

LZmn

Lnm

Ln

LmnmQ ),(

mnO

Page 11: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Large Integer Multiplication & RSA

Summary of RSA n = pqn = pq where p and q are distinct

primes. phi, φ = (p-1)(q-1)φ = (p-1)(q-1) e < n such that gcd(e, phi)=1 d = e^-1 mod phi. c = m^e mod n. m = c^d mod n.

Page 12: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

All-pair shortest Paths Floyd for k=1 to n for i=1 to n for j=1 to n d[i][j][k]=min(d[i][j][k-1],d[i][k][k-1]+d[k][j][k-1]

For each k (1..n)

3nO

Page 13: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

All-pair shortest Paths

For each iteration of k CASE I

After k steps

Zn

otherwiseOnQ

ZZnifLn

nQ

,)1(2

4

),2

(,)(

2

Ln

L

nnQnQ

kk

kk

2

2

242

4)(

ZZnn k ,

22

Page 14: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

All-pair shortest Paths

CASE II

Combine the cases:

We have n iterations,

Zn

)()( nnQ

LnnnQ

2

)(

LnnnQtotal

32)(

Page 15: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Longest Common Sequence

We have 2 long sequences x and y, x is of length m, and y is of length n.

Try to find the Longest Common Sequence of x and y.

Dynamic Programming

Page 16: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Longest Common Sequencey1 y2 y3 y4 y5 y6

B D C A B A

x1 A 0 0 0 1 1 1

x2 B 1 1 1 1 2 2

x3 C 1 1 2 2 2 2

x4 B 1 1 2 2 3 3

x5 D 1 2 2 2 3 3

x6 A 1 2 2 3 3 4

x7 B 1 2 2 3 4 4

If x[i] == y[j]c[i][j] = c[i -1][j -1]+1;

elsec[i][j] = max{c[i -1][j]; c[i][j -1];

Page 17: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Longest Common Sequencey1 y2 y3 y4 y5 y6

B D C A B A

x1 A 0 0 0 1 1 1

x2 B 1 1 1 1 2 2

x3 C 1 1 2 2 2 2

x4 B 1 1 2 2 3 3

x5 D 1 2 2 2 3 3

x6 A 1 2 2 3 3 4

x7 B 1 2 2 3 4 4

B D C A B A

A 0 0 0 1 1 1

B 1 1 1 1 2 2

C 1 1 2 2 2 2

B 1 1 2 2 3 3

Page 18: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Longest Common Sequence

CASE I Znm ,

otherwiseOnmQ

nmifotherwiseOnmQ

ZZnmifLmn

nmQ

,)1(2

,2

)(,,)1(,2

2

),2

,(,

),(

Page 19: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Longest Common Sequence

Suppose:After k1 steps, m

After k2 steps, n

ZZm

k ,22 1

ZZn

k ,22 2

Lmn

L

nmnmQnmQ

kkkk

kkkk

2121

2121 2222

2,

222),(

Page 20: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Longest Common Sequence

CASE II Zm

otherwiseOnmQ

ZZnifmnmQ

,)1(2

,2

),2

(,1),(

ZmnnmQ ),(

In the case when Zn

otherwiseOnmQ

ZZmifmnmQ

,)1(,2

2

),2

(,1),(

mnmQ ),(

Page 21: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Longest Common Sequence

CASE III

Combine all 3 cases:

Total Work

Znm ,

mnmQ 1),(

Lmn

ZmnmnmQ 21),(

mnO

Page 22: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Cache Oblivious approaches for Dynamic Programming

Dynamic Programming to find an optimal solution Sub-problems overlap

Approaches bottom up (by recursion usually) top down but with a table to memorize earlier solutions

Divide and Conquer method to build the table recursively to make the approach cache oblivious?

Page 23: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Cache Simulator

With a tall cache assumption

A fully associative cache

2LZ

Other Assumptions • No temporary variable put into the cache• All input data is assumed to be already present in cache.

Page 24: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

ResultsSummarizing the theoretical Results:

LZmn

Lnm

Ln

LmnmQ ),(Large Integer Multiplication

All-pair shortest Paths

Longest Common Sequence

LnnnQ

2

)(

Lmn

ZmnmnmQ 21),(

Page 25: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Results from Simulation

Target Machine: arch : IA-64 family : Itanium 2 CPU MHz : 896.262997 Cache size : 303312 KB OS Linux version 2.4.22 gcc version 2.96 20000731

We will see that there is a very close match between the theoretical results and the

simulation result.

Page 26: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Results

Cache Oblivious Large Integer Multiplication

248115

289047097 2712 1089 465 270 101 89 810

50000100000150000200000250000300000

10 20 30 40 50 60 70 80 90 100

Size of cache Line

Num

ber o

f Cac

he M

isse

s

Size of Integer: M = 1000 N = 1000

3),(

LmnnmQ

Page 27: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Comparing Results

Case 1: L = 20, Z = 400Theoretical Result = Θ (1000/8)Simulator Result = 28904ratio = 0.0041

Case 2: L=30, Z = 900Theoretical Result = Θ (1000/27)Simulator Result = 7097ratio = 0.0044

Case 3: L=40, Z = 1600Theoretical Result = Θ (1000/64)Simulator Result = 2712ratio = 0.0052

Page 28: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

More Results

Cahe Oblivious Longest Common Sequence

0

500000

1000000

1500000

2000000

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Size of Cache Line

Num

ber o

f Cac

he M

isse

s

Size of Sequence: M = 1000 N = 1000

LmnnmQ ),(

Page 29: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

More ResultsCache Oblivious Floyd Algorithm

0100000200000300000400000500000600000700000

20 30 40 50 60 70 80 90 100

Size of Cache Line

Num

ber o

f Cac

he M

isse

s

Number of Vertices N = 100

LnnnQ

2

)(

Page 30: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Some More Work

We also implemented Parallel solutions to each of these problems. We had test results of their performance on CILK.

Page 31: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal

Thank You