introduction to cache-oblivious algorithms

Introduction To Cache-Oblivious Algorithmsby Christopher Gilbert

http://www.twitter.com/bigdatadev

http://www.github.com/bigdatadev

http://www.cjgilbert.me

“A cache-oblivious algorithm is not oblivious to cache memory”

(However, it is oblivious to the size of the cache)

“Cache-oblivious algorithms are effective on any system, regardless

of memory hierarchy”

“Cache oblivious algorithms do not improve complexity.”

(But they still improve performance)

ProcessorGraphics

Core Core Core Core

Shared L3 Cache

Memory Controller I/O

L1 Cache

L2 Cache

L1 Cache

L2 Cache

L1 Cache

L2 Cache

L1 Cache

L2 Cache

SystemAgent &Memory

Controller

Register

L1 Cache

L2 Cache

L3 Cache

Memory

Capacity D

ecreases

Tall Cache Assumption

Fully Associative

Perfect Eviction Policy

MT(N) = (N/B)

Estimating Memory Transfers

struct item_t { uint64_t x; uint64_t y;};

boolpredicate(item_t const* i1, item_t const* i2) { return ((i1->x + i2->y) * i2->x == (i1->y + i2->x) * i2->y);}

size_tkernel(item_t const* begin1, item_t const* end1, item_t const* begin2, item_t const* end2) { size_t count = 0; for (item_t const* pos1 = begin1; pos1 != end1; pos1++) { for (item_t const* pos2 = begin2; pos2 != end2; pos2++) { if (predicate(pos1, pos2)) count += 1; } } return count;}

size_tsimple_parallel(item_t const* begin, size_t count, unsigned thread_count) { size_t res = 0; #pragma omp parallel for reduction(+:res) num_threads(thread_count) for (int i = 0; i < (int)count; i += 1) res += kernel(begin + i, begin + i + 1, begin, begin + count); return res;}

Memory Transfer Estimate

MT(N) = (N2/B)

Number Of Threads

Recursive Approach

class coba_task : public task {public: task* execute() { if (_count1 + _count2 > 256) { coba_task& a = *new(allocate_child()) coba_task( _begin1, _count1 / 2, _begin2, _count2 / 2 ); coba_task& b = *new(allocate_child()) coba_task( _begin1, _count1 / 2, _begin2 + _count2 / 2, _count2 - _count2 / 2 ); coba_task& c = *new(allocate_child()) coba_task( _begin1 + _count1 / 2, _count1 - _count1 / 2, _begin2 + _count2 / 2, _count2 - _count2 / 2 ); coba_task& d = *new(allocate_child()) coba_task( _begin1 + _count1 / 2, _count1 - _count1 / 2, _begin2, _count2 / 2 ); set_ref_count(5); spawn(b); spawn(c); spawn(d); spawn_and_wait_for_all(a); _result = a.result() + b.result() + c.result() + d.result(); } else { _result = kernel(_begin1, _begin1 + _count1, _begin2, _begin2 + _count2); } return NULL; }};

size_trecursive_parallel(item_t const* begin, size_t count, unsigned thread_count) { coba_task& a = *new(task::allocate_root()) coba_task( begin, count / 2, begin + count / 2, count / 2 ); task::spawn_root_and_wait(a); return a.result();}

Revised Memory Transfer Estimate

MT(N) = (N2/CB)

Number Of Threads

Exploit both spatial and temporal locality.

Use recursion.

Optimise your memory transfers.

introduction to cache-oblivious algorithms

Technology

3.2 cache oblivious algorithms

the cache complexity of multithreaded cache oblivious...

cache-oblivious priority queue and graph algorithm...

low depth cache-oblivious...

the study of cache oblivious algorithms

cache-oblivious algorithms - cacs home

cache oblivious databases

external-memory and cache-oblivious algorithms...

cache-oblivious wavefront: improving parallelism of...

cache-oblivious sparse matrix vector...

cache oblivious computation of shortest paths: theoretical

gerth stølting brodal cache-oblivious algorithms - a...

cache-oblivious algorithms and data structures

cache-oblivious algorithms a unified approach to ...

15-853: algorithms in the real world locality ii:...

cache-oblivious ray reordering -...

an optimal cache-oblivious priority queue and its...

engineering a cache-oblivious sorting...

cache obliviousの話

cache-aware and cache-oblivious algorithms and cache...