the history of i/o modelssolves floydโs two-level storage problem (๐๐= 3๐ต๐ต)...
TRANSCRIPT
The History ofI/O Models
Erik Demaine
MASSACHUSETTSINSTITUTE OFTECHNOLOGY
Memory Hierarchies in Practice
CPU Registers Level 1Cache
Level 2Cache
MainMemory
โฆ
Disk
1MB
10GB
100KB100B
1TB
4 ms
14 ns
750 ps
Internet
1EB-1ZB
Outer Space< 1083
Models, Models, ModelsModel Year Blocking Caching Levels SimpleIdealized2-level 1972 โ โ 2 โRed-blue pebble 1981 โ โ 2 โโExternal memory 1987 โ โ 2 โHMM 1987 โ โ โ โBT 1987 ~ โ โ โโ(U)MH 1990 โ โ โ โCache oblivious 1999 โ โ 2โโ โ+
Idealized Two-Level Storage[Floyd โ Complexity of Computer Computations 1972]
PDP-11 (1970โ1990s)
Ken Thompson
Dennis Ritchie
2.4MB RK03 disks
8KB core memory
Idealized Two-Level Storage[Floyd โ Complexity of Computer Computations 1972]
โข RAM = blocks of โค ๐ต๐ต itemsโข Block operation: Read up to ๐ต๐ต items
from two blocks ๐๐, ๐๐ Write to third block ๐๐
โข Ignore item order within block CPU operations considered free
โข Items are indivisible
CPU
๐ต๐ต items
๐๐
๐๐
๐๐
Permutation Lower Bound[Floyd โ Complexity of Computer Computations 1972]
โข Theorem: Permuting ๐๐ items toโ๐๐ ๐ต๐ต (full) specified blocks needs
block operations, in average case Assuming ๐๐
๐ต๐ต> ๐ต๐ต (tall disk)
โข Simplified model: Move items instead of copy Equivalence: Follow itemโs path from start to finish
CPU
๐ต๐ต items
๐๐
๐๐
๐๐
ฮฉ๐๐๐ต๐ต
log๐ต๐ต
Permutation Lower Bound[Floyd โ Complexity of Computer Computations 1972]
โข Potential:
Maximized in target configurationof full blocks (๐๐๐๐๐๐=๐ต๐ต): ฮฆ = ๐๐ log๐ต๐ต
Random configuration with ๐๐๐ต๐ต
> ๐ต๐ตhas E ๐๐๐๐๐๐ = ๐๐(1) โ E ฮฆ = ๐๐(๐๐) Claim: Block operation increases ฮฆ by โค ๐ต๐ต
โ Number of block operations โฅ ๐๐ log ๐ต๐ตโ๐๐(๐๐)๐ต๐ต
๐ต๐ต items
๐๐
๐๐
๐๐
# items in block ๐๐ destined for block ๐๐
ฮฆ = ๏ฟฝ๐๐,๐๐
๐๐๐๐๐๐ log๐๐๐๐๐๐
Permutation Lower Bound[Floyd โ Complexity of Computer Computations 1972]
โข Potential:
Maximized in target configurationof full blocks (๐๐๐๐๐๐=๐ต๐ต): ฮฆ = ๐๐ log๐ต๐ต
Random configuration with ๐๐๐ต๐ต
> ๐ต๐ตhas E ๐๐๐๐๐๐ = ๐๐(1) โ E ฮฆ = ๐๐(๐๐) Claim: Block operation increases ฮฆ by โค ๐ต๐ต
oFact: ๐ฅ๐ฅ + ๐ฆ๐ฆ log ๐ฅ๐ฅ + ๐ฆ๐ฆ โค ๐ฅ๐ฅ log ๐ฅ๐ฅ + ๐ฆ๐ฆ log ๐ฆ๐ฆ + ๐ฅ๐ฅ + ๐ฆ๐ฆo So combining groups ๐ฅ๐ฅ & ๐ฆ๐ฆ increases ฮฆ by โค ๐ฅ๐ฅ + ๐ฆ๐ฆ
๐ต๐ต items
๐๐
๐๐
๐๐
# items in block ๐๐ destined for block ๐๐
ฮฆ = ๏ฟฝ๐๐,๐๐
๐๐๐๐๐๐ log๐๐๐๐๐๐
Permutation Bounds[Floyd โ Complexity of Computer Computations 1972]
โข Theorem: ฮฉ ๐๐๐ต๐ต
log๐ต๐ต Tight for ๐ต๐ต = ๐๐(1)
โข Theorem: ๐๐ ๐๐๐ต๐ต
log ๐๐๐ต๐ต
Similar to radix sort,where key = target block index
Accidental claim: tight for all ๐ต๐ต < ๐๐๐ต๐ต
We will see: tight for ๐ต๐ต > log ๐๐๐ต๐ต
๐ต๐ต items
๐๐
๐๐
๐๐
CPU
Idealized Two-Level Storage[Floyd โ Complexity of Computer Computations 1972]
โข External memory & word RAM: ๐ต๐ต items
๐๐
๐๐
๐๐
Idealized Two-Level Storage[Floyd โ Complexity of Computer Computations 1972]
โข External memory & word RAM:
โข Foreshadowing future models:
๐ต๐ต items
๐๐
๐๐
๐๐
CPU
Red-Blue Pebble Game[Hong & Kung โ STOC 1981]
CPU
๐๐ items
cache disk
TRS-80[1977โ1981]
VIC-20[1980โ1985]
Apple II[1977โ1993]
4KB RAM
5KB RAM
4KB RAM
Pebble Game[Hopcroft, Paul, Valiant โ J. ACM 1977]
โข View computation as DAGof data dependencies
โข Pebble = โin memoryโโข Moves: Place pebble on node
if all predecessors have a pebble Remove pebble from node
โข Goal: Pebbles on all output nodes Minimize maximum number of pebbles over time
inputs:
outputs:
Pebble Game[Hopcroft, Paul, Valiant โ J. ACM 1977]
โข Theorem: Any DAG can be โexecutedโ using ๐๐ ๐๐
log ๐๐maximum pebbles
โข Corollary:
DTIME ๐ก๐ก โ DSPACE๐ก๐ก
log ๐ก๐ก
inputs:
outputs:
Red-Blue Pebble Game[Hong & Kung โ STOC 1981]
โข Red pebble = โin cacheโโข Blue pebble = โon diskโโข Moves: Place red pebble on node if all
predecessors have red pebble Remove pebble from node Write: Red pebble โ blue pebble Read: Blue pebble โ red pebble
โข Goal: Blue inputs to blue outputs โค ๐๐ red pebbles at any time
inputs:
outputs:
minimize
Red-Blue Pebble Game[Hong & Kung โ STOC 1981]
โข Red pebble = โin cacheโโข Blue pebble = โon diskโ
inputs:
outputs:
CPU
๐๐ items
cache disk
minimize number ofcache โ disk I/Os
(memory transfers)
Red-Blue Pebble Game Results[Hong & Kung โ STOC 1981]
Computation DAG Memory Transfers Speedup
Fast Fourier Transform (FFT) ฮ ๐๐ log๐๐ ๐๐ ฮ log๐๐
Ordinary matrix-vector multiplication ฮ
๐๐2
๐๐ฮ ๐๐
Ordinary matrix-matrix multiplication ฮ
๐๐3
๐๐ฮ ๐๐
Odd-even transposition sort ฮ
๐๐2
๐๐ฮ ๐๐
๐๐ ร ๐๐ ร โฏร ๐๐ grid ฮฉ๐๐๐๐
๐๐1/(๐๐โ1) ฮ ๐๐1/(๐๐โ1)
๐๐
Comparison
CPU
๐๐ items
cache disk
CPU
๐ต๐ต items
Idealized two-level storage
[Floyd 1972]
Red-bluepebble game
[Hong & Kung 1981]
blocks but no cache cache but no blocks
I/O Model [Aggarwal & Vitter โ ICALP 1987, C. ACM 1988]
โข AKA: External Memory Model, Disk Access Modelโข Goal: Minimize number of I/Os (memory transfers)
CPU
๐ต๐ต items
๐๐๐ต๐ต
blocks
cache๐ต๐ต itemsdisk
cache and blocks
Scanning[Aggarwal & Vitter โ ICALP 1987, C. ACM 1988]
โข Visiting ๐๐ elements in order costs๐๐ 1 + ๐๐
๐ต๐ตmemory transfers
โข More generally, can run โค ๐๐๐ต๐ต
parallel scans, keeping 1 block per scan in cache E.g., merge ๐๐ ๐๐
๐ต๐ตlists of total size ๐๐
in ๐๐ 1 + ๐๐๐ต๐ต
memory transfers
๐ต๐ต items
Practical Scanning [Arge]
โข Does the ๐ต๐ต factor matter? Should I presort my linked list before traversal?
โข Example: ๐๐ = 256,000,000 ~ 1GB ๐ต๐ต = 8,000 ~ 32KB (small) 1ms disk access time (small)
๐๐ memory transfers take 256,000 sec โ 71 hours
๐๐๐ต๐ต
memory transfers take โ256 8 = ๐๐๐๐ seconds
Searching[Aggarwal & Vitter โ ICALP 1987, C. ACM 1988]
โข Finding an element ๐ฅ๐ฅ among ๐๐ items requires ฮ log๐ต๐ต+1 ๐๐ memory transfers
โข Lower bound: (comparison model) Each block reveals where ๐ฅ๐ฅ fits among ๐ต๐ต items โ Learn โค log ๐ต๐ต + 1 bits per read Need log ๐๐ + 1 bits
โข Upper bound: B-tree Insert & delete
in ๐๐ log๐ต๐ต+1 ๐๐
๐ต๐ต items
Sorting and Permutation[Aggarwal & Vitter โ ICALP 1987, C. ACM 1988]
โข Sorting bound: ฮ ๐๐๐ต๐ต
log โ๐๐ ๐ต๐ต๐๐๐ต๐ต
โข Permutation bound: ฮ min ๐๐, ๐๐๐ต๐ต
log โ๐๐ ๐ต๐ต๐๐๐ต๐ต
Either sort or use naรฏve RAM algorithm Solves Floydโs two-level storage problem (๐๐ = 3๐ต๐ต)
๐ต๐ต items
Sorting Lower Bound[Aggarwal & Vitter โ ICALP 1987, C. ACM 1988]
โข Sorting bound: ฮฉ ๐๐๐ต๐ต
log โ๐๐ ๐ต๐ต๐๐๐ต๐ต
Always keep cache sorted (free) Might as well presort each block Upon reading a block, learn how those ๐ต๐ต items fit
amongst ๐๐ items in cache
โ Learn lg ๐๐+๐ต๐ต๐ต๐ต โผ ๐ต๐ต lg ๐๐
๐ต๐ตbits
Need lg๐๐! โผ ๐๐ lg๐๐ bits Know ๐๐ lg๐ต๐ต bits from block presort
๐ต๐ต items
Sorting Upper Bound[Aggarwal & Vitter โ ICALP 1987, C. ACM 1988]
โข Sorting bound: ๐๐ ๐๐๐ต๐ต
log โ๐๐ ๐ต๐ต๐๐๐ต๐ต
๐๐ ๐๐๐ต๐ต
-way mergesort
๐๐ ๐๐ = ๐๐๐ต๐ต๐๐ ๏ฟฝ๐๐ ๐๐
๐ต๐ต+ ๐๐ 1 + ๐๐
๐ต๐ต
๐๐ ๐ต๐ต = ๐๐ 1๐ต๐ต items
๐๐/๐ต๐ต
1 1 1
๐๐/๐ต๐ต
๐๐/๐ต๐ต
๐๐/๐ต๐ต
log โ๐๐ ๐ต๐ต๐๐๐ต๐ต
levels
๐๐/๐ต๐ต
Distribution Sort[Aggarwal & Vitter โ ICALP 1987, C. ACM 1988]
โข โ๐๐ ๐ต๐ต-way quicksort
1. Find โ๐๐ ๐ต๐ต partition elements,roughly evenly spaced
2. Partition array into โ๐๐ ๐ต๐ต + 1 pieces
Scan: ๐๐ ๐๐๐ต๐ต
memory transfers
3. Recurse Same recurrence as mergesort
๐ต๐ต items
Distribution Sort Partitioning[Aggarwal & Vitter โ ICALP 1987, C. ACM 1988]
1. For first, second, โฆ interval of ๐๐ items: Sort in ๐๐ โ๐๐ ๐ต๐ต memory transfers Sample every 1
4โ๐๐ ๐ต๐ต th item
Total sample: 4๐๐/ โ๐๐ ๐ต๐ต items
2. For ๐๐ = 1,2, โฆ, โ๐๐ ๐ต๐ต: Run linear-time selection to find
sample element at โ๐๐ โ๐๐ ๐ต๐ต fraction
Cost: ๐๐ ๏ฟฝ๐๐โ๐๐ ๐ต๐ต
๐ต๐ต each
Total: ๐๐ โ๐๐ ๐ต๐ต memory transf.
๐ต๐ต items
Random vs. Sequential I/Os [Farach,Ferragina, Muthukrishnan โ FOCS 1998]
โข Sequential memory transfers are part ofbulk read/write of ฮ ๐๐ items
โข Random memory transfer otherwiseโข Sorting: 2-way mergesort achieves ๐๐ ๐๐
๐ต๐ตlog ๐๐
๐ต๐ตsequential
๐๐ ๐๐๐ต๐ต
log โ๐๐ ๐ต๐ต๐๐๐ต๐ต
random implies ฮฉ ๐๐๐ต๐ต
log ๐๐๐ต๐ต
total
โข Same trade-off forsuffix-tree construction
๐ต๐ต items
Hierarchical Memory Model (HMM) [Aggarwal, Alpern, Chandra, Snir โ STOC 1987]
โข Nonuniform-cost RAM: Accessing memory location ๐ฅ๐ฅ costs ๐๐ ๐ฅ๐ฅ = log ๐ฅ๐ฅ
โparticularly simple model of computation that mimics the behavior of a memory hierarchy consisting of increasingly larger amounts of slower memoryโ
1 1 2 4 8 2๐๐1 1 1 1 1CPU
Why ๐๐ ๐๐ = ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ ๐๐? [Mead & Conway 1980]
HMM Upper & Lower Bounds[Aggarwal, Alpern, Chandra, Snir โ STOC 1987]
Problem Time SlowdownSemiring matrix multiplication ฮ ๐๐3 ฮ(1)
Fast Fourier Transform ฮ(๐๐ log๐๐ log log๐๐) ฮ log log๐๐
Sorting ฮ(๐๐ log๐๐ log log๐๐) ฮ log log๐๐Scanning input (sum, max, DFS, planarity, etc.)
ฮ ๐๐ log๐๐ ฮ log๐๐
Binary search ฮ log2 ๐๐ ฮ log๐๐
๐๐๐๐๐๐๐๐(๐๐)[Aggarwal, Alpern, Chandra, Snir โ STOC 1987]
โข Say accessing memory location ๐ฅ๐ฅ costs ๐๐ ๐ฅ๐ฅโข Assume ๐๐ 2๐ฅ๐ฅ โค ๐๐ ๐๐(๐ฅ๐ฅ) for a constant ๐๐ > 0
(โpolynomially boundedโ)โข Write ๐๐ ๐ฅ๐ฅ = โ๐๐ ๐ค๐ค๐๐ โ [๐ฅ๐ฅ โฅ ๐ฅ๐ฅ๐๐? ]
(weighted sum of threshold functions)
๐ฅ๐ฅ0 ๐ฅ๐ฅ1 โ ๐ฅ๐ฅ0๐ค๐ค0CPU ๐ฅ๐ฅ2 โ ๐ฅ๐ฅ1
๐ค๐ค1 ๐ค๐ค2 โฆ0
๐ฅ๐ฅ๐๐
Uniform Optimality [Aggarwal, Alpern, Chandra, Snir โ STOC 1987]
โข Consider one term ๐๐๐๐ ๐ฅ๐ฅ = [๐ฅ๐ฅ โฅ ๐๐? ]โข Algorithm is uniformly optimal
if optimal on HMM๐๐๐๐ ๐ฅ๐ฅ for all ๐๐โข Implies optimality for all ๐๐ ๐ฅ๐ฅ
โCPU ๐๐ 10
๐๐
CPU
๐๐ items
cache disk red-bluepebble game!
๐๐๐๐๐๐๐๐๐ด๐ด ๐๐ Upper & Lower Bounds[Aggarwal, Alpern, Chandra, Snir โ STOC 1987]
Problem Time SpeedupSemiring matrix multiplication ฮ
๐๐3
๐๐ฮ ๐๐
Fast Fourier Transform ฮ ๐๐ log๐๐ ๐๐ ฮ log๐๐
Sorting ฮ ๐๐ log๐๐ ๐๐ ฮ log๐๐Scanning input (sum, max, DFS, planarity, etc.)
ฮ ๐๐ โ๐๐ 1 + 1/๐๐
Binary search ฮ log๐๐ โ log๐๐ 1 + 1/ log๐๐
upper bounds knownby Hong & Kung 1981
other bounds follow fromAggarwal & Vitter 1987
Implicit HMM Memory Management[Aggarwal, Alpern, Chandra, Snir โ STOC 1987]
โข Instead of algorithm explicitly moving data,use any conservative replacement strategy (e.g., FIFO or LRU) to evict from cache[Sleator & Tarjan โ C. ACM 1985]
โข ๐๐LRU ๐๐,๐๐ โค 2 โ ๐๐OPT ๐๐, โ๐๐ 2= ๐๐ ๐๐OPT ๐๐,๐๐ assuming ๐๐ 2๐ฅ๐ฅ โค ๐๐ ๐๐(๐ฅ๐ฅ)
โข Not uniform!
โCPU ๐๐ 10
Implicit HMM Memory Management[Aggarwal, Alpern, Chandra, Snir โ STOC 1987]
โข For general ๐๐, split memory into chunks at ๐ฅ๐ฅwhere ๐๐(๐ฅ๐ฅ) doubles (up to rounding)
๐ฅ๐ฅ0 ๐ฅ๐ฅ1 โ ๐ฅ๐ฅ0๐ค๐ค0CPU ๐ฅ๐ฅ2 โ ๐ฅ๐ฅ1
๐ค๐ค1 ๐ค๐ค2 โฆ0
Implicit HMM Memory Management[Aggarwal, Alpern, Chandra, Snir โ STOC 1987]
โข For general ๐๐, split memory into chunks at ๐ฅ๐ฅwhere ๐๐(๐ฅ๐ฅ) doubles (up to rounding)
โข LRU eviction from first chunk into second;LRU eviction from second chunk into third; etc.
โข ๐๐๐ฟ๐ฟ๐ฟ๐ฟ๐ฟ๐ฟ ๐๐ = ๐๐ ๐๐๐๐๐๐๐๐ ๐๐ + ๐๐ โ ๐๐ ๐๐ Like MTF
1 1 2 4 8 2๐๐1 1 1 1 1CPU
HMM with Block Transfer (BT) [Aggarwal, Chandra, Snir โ FOCS 1987]
โข Accessing memory location ๐ฅ๐ฅ costs ๐๐ ๐ฅ๐ฅโข Copying memory interval from ๐ฅ๐ฅ โ ๐ฟ๐ฟโฆ ๐ฅ๐ฅ
to ๐ฆ๐ฆ โ ๐ฟ๐ฟโฆ๐ฆ๐ฆ costs ๐๐ max ๐ฅ๐ฅ, ๐ฆ๐ฆ + ๐ฟ๐ฟ Models memory pipelining ~ block transfer Ignores block alignment, explicit levels, etc.
1 1 2 4 8 2๐๐1 1 1 1 1CPU
BT Results[Aggarwal, Chandra, Snir โ FOCS 1987]
Problem ๐๐ ๐ฅ๐ฅ = log ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ = ๐ฅ๐ฅ๐ผ๐ผ, 0 < ๐ผ๐ผ < 1 ๐๐ ๐ฅ๐ฅ = ๐ฅ๐ฅ ๐๐ ๐ฅ๐ฅ = ๐ฅ๐ฅ๐ผ๐ผ,
๐ผ๐ผ > 1Dot product, merging lists ฮ ๐๐ logโ ๐๐ ฮ ๐๐ log log๐๐ ฮ(๐๐ log๐๐) ฮ ๐๐๐ผ๐ผ
Matrix mult. ฮ ๐๐3 ฮ ๐๐3 ฮ ๐๐3 ฮ ๐๐๐ผ๐ผ
if ๐ผ๐ผ > 1.5Fast Fourier Transform ฮ ๐๐ log๐๐ ฮ ๐๐ log๐๐ ฮ ๐๐ log2 ๐๐ ฮ ๐๐๐ผ๐ผ
Sorting ฮ ๐๐ log๐๐ ฮ ๐๐ log๐๐ ฮ ๐๐ log2 ๐๐ ฮ ๐๐๐ผ๐ผ
Binary search ฮ
log2 ๐๐log log๐๐ ฮ ๐๐๐ผ๐ผ ฮ ๐๐ ฮ ๐๐๐ผ๐ผ
Memory Hierarchy Model (MH) [Alpern, Carter, Feig, Selker โ FOCS 1990]
โข Multilevel version of external-memory modelโข ๐๐๐๐ โ ๐๐๐๐+1 transfers happen in blocks of size ๐ต๐ต๐๐
(subblocks of ๐๐๐๐+1), and take ๐ก๐ก๐๐ timeโข All levels can be actively transferring at once
๐๐0 ๐๐1 ๐๐2 ๐๐3 ๐๐40 ๐ก๐ก0 ๐ก๐ก1 ๐ก๐ก2 ๐ก๐ก3CPU
๐ต๐ต0 ๐ต๐ต1๐ต๐ต2
๐ต๐ต3๐ต๐ต4
โฆ
Uniform Memory Hierarchy (UMH) [Alpern, Carter, Feig, Selker โ FOCS 1990]
โข Fix aspect ratio ๐ผ๐ผ = ๐๐/๐ต๐ต๐ต๐ต
, block growth ๐ฝ๐ฝ = ๐ต๐ต๐๐+1๐ต๐ต๐๐
โข ๐ต๐ต๐๐ = ๐ฝ๐ฝ๐๐
โข ๐๐๐๐๐ต๐ต๐๐
= ๐ผ๐ผ โ ๐ฝ๐ฝ๐๐
โข ๐ก๐ก๐๐ = ๐ฝ๐ฝ๐๐ โ ๐๐(๐๐)
๐๐0 ๐๐1 ๐๐2 ๐๐3 ๐๐40 ๐ก๐ก0 ๐ก๐ก1 ๐ก๐ก2 ๐ก๐ก3CPU
๐ต๐ต0 ๐ต๐ต1๐ต๐ต2
๐ต๐ต3๐ต๐ต4
โฆ
2 parameters1 function
UMH Results[Alpern, Carter, Feig, Selker โ FOCS 1990]Problem Upper Bound Lower Bound
Matrix transpose๐๐ ๐๐ = 1 ๐๐ 1 +
1๐ฝ๐ฝ2
๐๐2 ฮฉ 1 +1๐ผ๐ผ๐ฝ๐ฝ4
๐๐2
Matrix mult.๐๐ ๐๐ = O ๐ฝ๐ฝ๐๐ ๐๐ 1 +
1๐ฝ๐ฝ3
๐๐3 ฮฉ 1 +1๐ฝ๐ฝ3
๐๐3
FFT๐๐ ๐๐ โค ๐๐ ๐๐ 1 ฮฉ 1
General approach: Divide & conquer
Cache-Oblivious Model [Frigo, Leiserson, Prokop, Ramachandran โ FOCS 1999]
โข Analyze RAM algorithm (not knowing ๐ต๐ต or ๐๐) on external-memory model Must work well for all ๐ต๐ต and ๐๐
CPU
๐ต๐ต items
๐๐๐ต๐ต
blocks
cache๐ต๐ต itemsdisk
Cache-Oblivious Model [Frigo,Leiserson, Prokop, Ramachandran โ FOCS 1999]
โข Automatic block transfers via LRU or FIFOโข Lose factor of 2 in ๐๐ and number of transfers Assume ๐๐ ๐ต๐ต, 2๐๐ โค ๐๐ ๐๐(๐ต๐ต,๐๐)
CPU
๐ต๐ต items
๐๐๐ต๐ต
blocks
cache๐ต๐ต itemsdisk
Cache-Oblivious Model [Frigo,Leiserson, Prokop, Ramachandran โ FOCS 1999]
โข Clean modelโข Adapts to changing ๐ต๐ต (e.g., disk tracks) and
changing ๐๐ (e.g., competing processes)โข Adapts to multilevel memory hierarchy (MH) Assuming inclusion
๐๐0 ๐๐1 ๐๐2 ๐๐3 ๐๐40 ๐ก๐ก0 ๐ก๐ก1 ๐ก๐ก2 ๐ก๐ก3CPU
๐ต๐ต0 ๐ต๐ต1๐ต๐ต2
๐ต๐ต3๐ต๐ต4
โฆ
Scanning [Frigo, Leiserson, Prokop, Ramachandran โ FOCS 1999]
โข Visiting ๐๐ elements in order costs๐๐ 1 + ๐๐
๐ต๐ตmemory transfers
โข More generally, can run ๐๐ 1 parallel scans Assume ๐๐ โฅ ๐๐ ๐ต๐ต for appropriate constant ๐๐ > 0
โข E.g., merge two lists in ๐๐ ๐๐๐ต๐ต
๐ต๐ต items
Searching [Prokop โ Meng 1999]
โvan Emde Boas layoutโ
Searching [Prokop โ Meng 1999]
โข height(โณ)
โฅ12
lg๐ต๐ต
โข โค 2 memory transfers per โณ
โข โค 4 log๐ต๐ต ๐๐ total
Cache-Oblivious Searching
โข lg ๐๐ + ๐๐ 1 log๐ต๐ต ๐๐ is optimal[Bender, Brodal, Fagerberg, Ge, He, Hu, Iacono,Lรณpez-Ortiz โ FOCS 2003]
โข Dynamic B-tree in ๐๐ log๐ต๐ต ๐๐ per operation[Bender, Demaine, Farach-Colton โ FOCS 2000][Bender, Duan, Iacono, Wu โ SODA 2002][Brodal,Fagerberg,Jacob โSODA 2002]
Cache-Oblivious Sorting
โข ๐๐ ๐๐๐ต๐ต
log โ๐๐ ๐ต๐ต๐๐๐ต๐ต
possible, assuming๐๐ โฅ ฮฉ ๐ต๐ต1+๐๐ (tall cache) Funnel sort:
mergesort analog Distribution sort[Frigo, Leiserson, Prokop,Ramachandran โ FOCS 1999;Brodal & Fagerberg โ ICALP 2002]
โข Impossible without tall-cache assumption[Brodal & Fagerberg โ STOC 2003]
http://courses.csail.mit.edu/6.851/
Models, Models, ModelsModel Year Blocking Caching Levels SimpleIdealized2-level 1972 โ โ 2 โRed-blue pebble 1981 โ โ 2 โโExternal memory 1987 โ โ 2 โHMM 1987 โ โ โ โBT 1987 ~ โ โ โโ(U)MH 1990 โ โ โ โCache oblivious 1999 โ โ 2โโ โ+