equivalence between priority queues and sorting in external memory

32
Equivalence Between Priority Queues and Sorting in External Memory Zhewei Wei Renmin University of China MADALGO, Aarhus University Ke Yi The Hong Kong University of Science and Technology

Upload: agnes-bailey

Post on 06-Jan-2018

255 views

Category:

Documents


0 download

DESCRIPTION

Priority Queue Maintain a set of keys Support insertions, deletions and findmin (deletemin) Fundamental data structure Used as subroutines in greedy algorithms Dijkstra’s single source shortest path algorithm Prim’s minimum spanning tree algorithm

TRANSCRIPT

Page 1: Equivalence Between Priority Queues and Sorting in External Memory

Equivalence Between Priority Queues and Sorting in External Memory

Zhewei WeiRenmin University of China

MADALGO, Aarhus University

Ke YiThe Hong Kong University of Science and Technology

Page 2: Equivalence Between Priority Queues and Sorting in External Memory

Priority Queue

• Maintain a set of keys• Support insertions, deletions and findmin

(deletemin)• Fundamental data structure• Used as subroutines in greedy algorithms– Dijkstra’s single source shortest path algorithm– Prim’s minimum spanning tree algorithm

Page 3: Equivalence Between Priority Queues and Sorting in External Memory

Sorting to Priority Queue

• Priority queue can do sorting• Given N unsorted keys– Insert the keys to the priority queue– Perform N deletemin operations (find minimum

and delete it)• If a priority queue can support insertion,

deletion, findmin in S(N) time, then the sorting algorithm runs in O(NS(N)) time.

Page 4: Equivalence Between Priority Queues and Sorting in External Memory

Priority Queue to Sorting

• Thorup [2007]: sorting can do priority queue!A sorting algorithm sorts N keys in

N*S(N) time in RAM model

• O(Nloglog N) sorting -> O(loglog N) priority queue

• O() sorting -> O() priority queue

A priority queue support all operations in O(S(N)) time

Use sorting algorithm as a black box

Page 5: Equivalence Between Priority Queues and Sorting in External Memory

The I/O Model [Aggarwal and Vitter 1988]

DiskMemor

yCPU

Block

• Complexity: # of block transfers (I/Os)• CPU computations and memory accesses are free

Size: M Unlimited sizeSize: B

Page 6: Equivalence Between Priority Queues and Sorting in External Memory

Cache-Oblivious Model

DiskMemor

yCPU

Block

• Optimal without knowledge of M and B • Optimal for all M and B

Size: ?

Unlimited sizeSize: ?

Page 7: Equivalence Between Priority Queues and Sorting in External Memory

Sorting in the I/O Model

• Sorting bound:

• Upper bound: external merge sort• Lower bound: holds for comparison model or

indivisibility assumption• Conjecture: lower bound holds for B not too

small, even without indivisibility assumption

Sort(N)= Θ(N/B * logM/BN ) I/Os

Treat keys as atoms

Page 8: Equivalence Between Priority Queues and Sorting in External Memory

Priority Queue in External Memory

• Tree-based: do not give any priority queue-to-sorting reduction

O(1/B*logM/BN ) amortized cost

• I/O model– Buffer tree [Arge 1995]– M/B-ary heaps [Fadel et. al. 1999]– Array heaps[Brodal and Katajainen 1998]

Page 9: Equivalence Between Priority Queues and Sorting in External Memory

Priority Queue in External Memory

• Cache-oblivious priority queue [Arge et.al. 2002]

• Keys are moving around in loglog N levels

O(1/B*logM/BN) with tall cache assumption

M>B2

• Reduction: Given an external sorting algorithm that sorts N keys in NS(N)/B I/Os, there is an external priority queue that support all operations in O(S(N)loglog N/B) amortized I/Os

Page 10: Equivalence Between Priority Queues and Sorting in External Memory

Our Results

• S(N)/B for S(N) = Ω(2log*N), or M = Ω(B*log(c)N)• Other wise O((S(N) log*N) /B)• No new bounds for external priority queue• External priority queue lower bound -> external

sorting lower bound

A sorting algorithm sorts N keys in N*S(N)/B time in the I/O model

A priority queue support all operations in 1/B*Σi≥0S(Blog(i)(N/B)) amortized I/Os

Use sorting algorithm as a black box

S(N) + S(B*log N) + S(B*loglog N)) + …

Page 11: Equivalence Between Priority Queues and Sorting in External Memory

Outline

• How Thorup did it (on a high level)

• How we extend it in external memory (on a high level)

• Open problems

Page 12: Equivalence Between Priority Queues and Sorting in External Memory

Thorup’s Reduction

• Word RAM model: – each word consists of w ≥ log N bits– constant number of registers, each with capacity

for one word

• Atomic heap [Han 2004]: support insertions, deletions, and predecessor queries in set of O(log2 N) size in constant time

Page 13: Equivalence Between Priority Queues and Sorting in External Memory

Thorup’s Reduction – O(S(N)*log N)

O(log N) levels

N keys

N/2 keys

c keys

2c keys

N/4 keys

Keep min in the head

Invariant: Keys in higher level are larger than keys in Lower level

Page 14: Equivalence Between Priority Queues and Sorting in External Memory

Thorup’s Reduction – O(S(N)*log N)

• Rebalance cost for level 2j: 2j*S(N) • # of sorts in N updates: N/2j

• Amortized cost in level 2j: S(N)• log N levels

N keys

N/2 keys

c keys

2c keys

N/4 keysO(log N) levels

Cost: O(S(N)*logN)

Page 15: Equivalence Between Priority Queues and Sorting in External Memory

Thorup’s ReductionN/log N base sets

N/2log Nbase sets

1 base sets

2 base sets

N/4log NBase sets

log NSplit/merge base sets: S(N) amortized Rebalancing level 2j: 2jS(N)/log N# of rebalance in N updates: N/2j Amortized cost for level 2j: S(N)/log N

O(log N) levels

O(S(N)) Amortized

cost

Page 16: Equivalence Between Priority Queues and Sorting in External Memory

Thorup’s ReductionN/log N base sets

N/2log Nbase sets

1 base sets

2 base sets

N/4log NBase sets

Atomic heapof size log N

log NSplit/merge base sets: S(N) amortized Rebalancing level 2j: 2jS(N)/log N# of rebalance in N updates: N/2j Amortized cost for level 2j: S(N)/log N

O(1) cost

O(S(N)) Amortized

cost

Page 17: Equivalence Between Priority Queues and Sorting in External Memory

Thorup’s Reduction

Amortized Cost: O(S(N))

Atomic heapof size log N

N/log N base sets

N/2log Nbase sets

1 base sets

2 base sets

N/4log NBase sets

Atomic heap of size log N

Buffer size: N/log N

Buffer size: N/2log N

Buffer size: N/4log N

O(S(N)) Amortized

cost

O(1) cost

Page 18: Equivalence Between Priority Queues and Sorting in External Memory

Externalize Thorup’s Reduction

• Where does B come in?

• How to replace atomic heap?

• How to handle deletions in external memory?

Page 19: Equivalence Between Priority Queues and Sorting in External Memory

Where does B come in?

Bufferof size B*log N

N/Blog N base sets

N/2Blog Nbase sets

1 base sets

2 base sets

N/4Blog NBase sets

Buffer size: N/log N

Buffer size: N/2log N

Buffer size: N/4log NB*log N

Page 20: Equivalence Between Priority Queues and Sorting in External Memory

I/O-efficient Flush OperationBuffer size |R|

k substructures

• Sort keys in buffer: O(R*S(R)/B)• Distribute keys to k substructures: O(R/B+k)

Total I/O cost: O(RS(N)/B + k)

• If k =O(R/B), total flush cost is O(RS(N)/B), amortized cost is O(S(N)/B)

Page 21: Equivalence Between Priority Queues and Sorting in External Memory

Where does B come in?

Base sets: 2j/(Blog N) Buffer size: 2j/log N

B*log N

… Amortized I/O cost for flushing level buffers: O(S(N)/B)

If a level holds 2j keysLargest buffer size: 2j/log NLargest # of base sets: 2j/Blog NSmallest base set (head) size: B*log N

Page 22: Equivalence Between Priority Queues and Sorting in External Memory

Replacing Atomic HeapR = B*log N

k = log N

Bufferof size B*log N

Page 23: Equivalence Between Priority Queues and Sorting in External Memory

Replacing Atomic Heap

Head of size O(Blog N)

Amortized I/O cost:

O(S(N)/B)

Bufferof size B*log N

…Recursively build the structure in the head

Page 24: Equivalence Between Priority Queues and Sorting in External Memory

Recursively Build LayersN keys

B*log (N/B) keys

cB keys

2^c*B keys

B*loglog(N/B) keys

O(log* N) Layers

… Levels rebalancing- Move base sets around - Redistribute buffer- S(N)/(Blog N) for one level- S(N)/B for one layer- S(N)log* N/B amortized I/O cost

Page 25: Equivalence Between Priority Queues and Sorting in External Memory

Recursively Build LayersN keys

B*log (N/B) keys

cB keys

2^c*B keys

B*loglog(N/B) keys

O(log* N) Layers

Layers Rebalancing- Rebuild the first (last) level- S(N)/B for one layer- S (N)log* N/B amortized I/O cost

Page 26: Equivalence Between Priority Queues and Sorting in External Memory

Recursively Build LayersN keys

B*log (N/B) keys

cB keys

2^c*B keys

B*loglog(N/B) keys

O(log* N) Layers

Page 27: Equivalence Between Priority Queues and Sorting in External Memory

Recursively Build LayersN keys

B*log (N/B) keys

cB keys

2^c*B keys

B*loglog(N/B) keys

Memorybufferof sizeO(B)

R = Bk = log* N

Page 28: Equivalence Between Priority Queues and Sorting in External Memory

Recursively Build LayersN keys

B*log (N/B) keys

cB keys

2^c*B keys

B*loglog(N/B) keys

Memorybufferof sizeO(B)

Amortized cost: log* N/B

I/O cost per update: O(S(N)log* N/B)

Page 29: Equivalence Between Priority Queues and Sorting in External Memory

Handle Deletions

• Follow a pointer to perform deletion takes 1 I/O per deletion

• Deleting signals: Delete x -> Insert (-, x)

• Perform actual deletion afterwards• Unlike buffer tree, we don’t have access to the

“leaves”(base sets)• Invariant: Only process deleting signals in the

head

Page 30: Equivalence Between Priority Queues and Sorting in External Memory

Schedule

• Avoid repeated sorting• If head or memory buffer unbalanced:– Flush stage: flush all overflowed buffers and

rebalance all unbalanced base sets– Push stage: rebalance all overflowed layers and

levels (expand)– Pull stage: deal with delete signals and rebalance

all underflowed layers and levels (shrink)

Page 31: Equivalence Between Priority Queues and Sorting in External Memory

Open problems

• Optimal reduction? – Priority queue that support insertions/deletions in

O(1/B) I/O cost for set of size O(B*log(c) N)– New reduction framework

• Better (than loglog N) reduction in Cache-oblivious model?– Hard to do I/O-efficient flushing and rebalancing

without knowing B

Page 32: Equivalence Between Priority Queues and Sorting in External Memory

Thank You!