introduction to cache-oblivious algorithms

20
Introduction To Cache-Oblivious Algorithms by Christopher Gilbert http://www.twitter.com/bigdatadev http://www.github.com/bigdatadev http://www.cjgilbert.me

Upload: christopher-gilbert

Post on 02-Jul-2015

271 views

Category:

Technology


3 download

DESCRIPTION

In today's world developers are faced with the problem of writing high-performing algorithms that scale efficiently across a range of multi-core processors. Traditional blocked algorithms need to be tuned to each processor, but the discovery of cache-oblivious algorithms give developers new tools to tackle this emerging challenge. In this talk you will learn about the external memory model, the cache-oblivious model, and how to use these tools to create faster, scalable algorithms.

TRANSCRIPT

Page 1: Introduction to Cache-Oblivious Algorithms

Introduction To Cache-Oblivious Algorithmsby Christopher Gilbert

http://www.twitter.com/bigdatadev

http://www.github.com/bigdatadev

http://www.cjgilbert.me

Page 2: Introduction to Cache-Oblivious Algorithms

“A cache-oblivious algorithm is not oblivious to cache memory”

(However, it is oblivious to the size of the cache)

Page 3: Introduction to Cache-Oblivious Algorithms

“Cache-oblivious algorithms are effective on any system, regardless

of memory hierarchy”

Page 4: Introduction to Cache-Oblivious Algorithms

“Cache oblivious algorithms do not improve complexity.”

(But they still improve performance)

Page 5: Introduction to Cache-Oblivious Algorithms

ProcessorGraphics

Core Core Core Core

Shared L3 Cache

Memory Controller I/O

L1 Cache

L2 Cache

L1 Cache

L2 Cache

L1 Cache

L2 Cache

L1 Cache

L2 Cache

SystemAgent &Memory

Controller

Page 6: Introduction to Cache-Oblivious Algorithms

Register

L1 Cache

L2 Cache

L3 Cache

Memory

Disk

Late

ncy

Incr

ease

s

Capacity D

ecreases

Page 7: Introduction to Cache-Oblivious Algorithms

Tall Cache Assumption

Page 8: Introduction to Cache-Oblivious Algorithms

Fully Associative

Page 9: Introduction to Cache-Oblivious Algorithms

Perfect Eviction Policy

Page 10: Introduction to Cache-Oblivious Algorithms

MT(N) = (N/B)

Estimating Memory Transfers

Page 11: Introduction to Cache-Oblivious Algorithms

struct item_t { uint64_t x; uint64_t y;};

boolpredicate(item_t const* i1, item_t const* i2) { return ((i1->x + i2->y) * i2->x == (i1->y + i2->x) * i2->y);}

size_tkernel(item_t const* begin1, item_t const* end1, item_t const* begin2, item_t const* end2) { size_t count = 0; for (item_t const* pos1 = begin1; pos1 != end1; pos1++) { for (item_t const* pos2 = begin2; pos2 != end2; pos2++) { if (predicate(pos1, pos2)) count += 1; } } return count;}

size_tsimple_parallel(item_t const* begin, size_t count, unsigned thread_count) { size_t res = 0; #pragma omp parallel for reduction(+:res) num_threads(thread_count) for (int i = 0; i < (int)count; i += 1) res += kernel(begin + i, begin + i + 1, begin, begin + count); return res;}

Page 12: Introduction to Cache-Oblivious Algorithms

Memory Transfer Estimate

MT(N) = (N2/B)

Page 13: Introduction to Cache-Oblivious Algorithms

0

5000

10000

15000

20000

25000

30000

Number Of Threads

Tim

e (m

s)

Page 14: Introduction to Cache-Oblivious Algorithms

Recursive Approach

Page 15: Introduction to Cache-Oblivious Algorithms

class coba_task : public task {public: task* execute() { if (_count1 + _count2 > 256) { coba_task& a = *new(allocate_child()) coba_task( _begin1, _count1 / 2, _begin2, _count2 / 2 ); coba_task& b = *new(allocate_child()) coba_task( _begin1, _count1 / 2, _begin2 + _count2 / 2, _count2 - _count2 / 2 ); coba_task& c = *new(allocate_child()) coba_task( _begin1 + _count1 / 2, _count1 - _count1 / 2, _begin2 + _count2 / 2, _count2 - _count2 / 2 ); coba_task& d = *new(allocate_child()) coba_task( _begin1 + _count1 / 2, _count1 - _count1 / 2, _begin2, _count2 / 2 ); set_ref_count(5); spawn(b); spawn(c); spawn(d); spawn_and_wait_for_all(a); _result = a.result() + b.result() + c.result() + d.result(); } else { _result = kernel(_begin1, _begin1 + _count1, _begin2, _begin2 + _count2); } return NULL; }};

size_trecursive_parallel(item_t const* begin, size_t count, unsigned thread_count) { coba_task& a = *new(task::allocate_root()) coba_task( begin, count / 2, begin + count / 2, count / 2 ); task::spawn_root_and_wait(a); return a.result();}

Page 16: Introduction to Cache-Oblivious Algorithms

Revised Memory Transfer Estimate

MT(N) = (N2/CB)

Page 17: Introduction to Cache-Oblivious Algorithms

0

5000

10000

15000

20000

25000

30000

Number Of Threads

Tim

e (m

s)

Page 18: Introduction to Cache-Oblivious Algorithms

Exploit both spatial and temporal locality.

Page 19: Introduction to Cache-Oblivious Algorithms

Use recursion.

Page 20: Introduction to Cache-Oblivious Algorithms

Optimise your memory transfers.