sorting
DESCRIPTION
Sorting. We have actually seen already two efficient ways to sort:. A kind of “insertion” sort. Insert the elements into a red-black tree one by one Traverse the tree in in-order and collect the keys Takes O(nlog(n)) time. Heapsort (Willians, Floyd, 1964). Put the elements in an array - PowerPoint PPT PresentationTRANSCRIPT
1
Sorting
• We have actually seen already two efficient ways to sort:
2
A kind of “insertion” sort
• Insert the elements into a red-black tree one by one
• Traverse the tree in in-order and collect the keys
• Takes O(nlog(n)) time
3
Heapsort (Willians, Floyd, 1964)
• Put the elements in an array• Make the array into a heap• Do a deletemin and put the
deleted element at the last position of the array
4
Quicksort (Hoare 1961)
5
quicksort
Input: an array A[p, r]
Quicksort (A, p, r) if (p < r)
then q = Partition (A, p, r) //q is the position of the pivot element
Quicksort (A, p, q-1) Quicksort (A, q+1, r)
6
2 8 7 1 3 5 6 4
i j
2 8 7 1 3 5 6 4
i j
2 8 7 1 3 5 6 4
i j
2 8 7 1 3 5 6 4
i j
2 1 7 8 3 5 6 4
i j
p r
7
2 1 7 8 3 5 6 4
i j
2 1 3 8 7 5 6 4
i j
2 1 3 8 7 5 6 4
i j
2 1 3 8 7 5 6 4
i j
2 1 3 4 7 5 6 8
i j
8
2 8 7 1 3 5 6 4p r
Partition(A, p, r) x ←A[r]
i ← p-1 for j ← p to r-1
do if A[j] ≤ x then i ← i+1 exchange A[i] ↔ A[j] exchange A[i+1] ↔A[r] return i+1
9
Analysis
• Running time is proportional to the number of comparisons
• Each pair is compared at most once O(n2)
• In fact for each n there is an input of size n on which quicksort takes Ω(n2) time
10
But
• Assume that the split is even in each iteration
11
T(n) = 2T(n/2) + n
How do we solve linear recurrences like this ? (read Chapter 4)
12
Recurrence tree
T(n/2)
n
T(n/2)
13
Recurrence tree
n/2
n
n/2
T(n/4)T(n/4)T(n/4)T(n/4)
14
Recurrence tree
n/2
n
n/2
T(n/4)T(n/4)T(n/4)T(n/4)logn
In every level we do bn comparisonsSo the total number of comparisons is O(nlogn)
17
Observations
• We can’t guarantee good splits
• But intuitively on random inputs we will get good splits
18
Randomized quicksort
• Use randomized-partition rather than partition
Randomized-partition (A, p, r) i ← random(p,r)
exchange A[r] ↔ A[i] return partition(A,p,r)
19
• On the same input we will get a different running time in each run !
• Look at the average for one particular input of all these running times
20
Expected # of comparisons
Let X be the expected # of comparisons
This is a random variable
Want to know E(X)
21
Expected # of comparisons
Let z1,z2,.....,zn the elements in sorted order
Let Xij = 1 if zi is compared to zj and 0 otherwise
So,
1n
1i
n
1ijijXX
22
by linearity of expectation
23
Consider zi,zi+1,.......,zj ≡ Zij
Claim: zi and zj are compared either zi or zj is the first chosen in Zij
Proof: 3 cases:– {zi, …, zj} Compared on this
partition, and never again.– {zi, …, zj} the same
– {zi, …, zk, …, zj} Not compared on this partition. Partition separates them, so no future partition uses both.
24
= 1/(j-i+1) + 1/(j-i+1)= 2/(j-i+1)
Pr{zi is compared to zj}
= Pr{zi or zj is first pivot chosen from Zij} just explained
= Pr{zi is first pivot chosen from Zij} + Pr{zj is first pivot chosen from Zij}
mutually exclusivepossibilities
25
Simplify with a change of variable, k=j-i+1.
Simplify and overestimate, by adding terms.
26
Lower bound for sorting in the comparison model
27
A lower bound
• Comparison model: We assume that the operation from which we deduce order among keys are comparisons
• Then we prove that we need Ω(nlogn) comparisons on the worst case
Model the algorithm as a decision tree
Insertion sort
1:2
2:3
<
<2:3
>
1:2
>
1:2
>
< > < >
x y z
x y z y x z
x y z y x z y z x
y z x z y x
x z y
z x yx z y
Quicksort
1:3
2:3
<
<2:3
>
1:2
>
2:3
>
< > < >
<
x y z
x y z
x y z y x z
x z y y z x z x y
z x y z y x
31
Important observations
• Every algorithm can be represented as a (binary) tree like this
• For every node v there is an input on which the algorithm reaches v
• The # of leaves is n!
32
Important observations
• Each path corresponds to a run on some input
• The worst case # of comparisons corresponds to the longest path
33
The lower bound
Let d be the length of the longest path
#leaves ≤ 2dn! ≤
log2(n!) ≤ d
34
Lower bound for sorting
• Any sorting algorithm based on comparisons between elements requires (n log n) comparisons.
35
Beating the lower bound
• We can beat the lower bound if we can deduce order relations between keys not by comparisons
Examples:• Count sort• Radix sort
36
Count sort
• Assume that keys are integers between 0 and k
2 3 0 5 3 5 0 2 0A
37
Count sort
• Allocate a temporary array of size k: cell x counts the # of keys =x
2 3 0 5 3 5 0 2 5A
0 0 0 0 0 0C
38
Count sort
2 3 0 5 3 5 0 2 5A
0 0 1 0 0 0C
39
Count sort
2 3 0 5 3 5 0 2 5A
0 0 1 1 0 0C
40
Count sort
2 3 0 5 3 5 0 2 5A
1 0 1 1 0 0C
41
Count sort
2 3 0 5 3 5 0 2 5A
2 0 2 2 0 3C
42
Count sort
2 3 0 5 3 5 0 2 5A
2 0 2 2 0 3C
• Compute prefix sums of C: cell x holds the # of keys ≤ x (rather than =x)
43
Count sort
2 3 0 5 3 5 0 2 5A
2 2 4 6 6 9C
• Compute prefix sums of C: cell x holds the # of keys ≤ x (rather than =x)
44
Count sort
2 3 0 5 3 5 0 2 5A
2 2 4 6 6 9C
• Move items to output array
/ / / / / / / / /B
45
Count sort
2 3 0 5 3 5 0 2 5A
2 2 4 6 6 9C
/ / / / / / / / /B
46
Count sort
2 3 0 5 3 5 0 2 5A
2 2 4 6 6 8C
/ / / / / / / / 5B
47
Count sort
2 3 0 5 3 5 0 2 5A
2 2 3 6 6 8C
/ / / 2 / / / / 5B
48
Count sort
2 3 0 5 3 5 0 2 5A
1 2 3 6 6 8C
/ 0 / 2 / / / / 5B
49
Count sort
2 3 0 5 3 5 0 2 5A
1 2 3 6 6 7C
/ 0 / 2 / / / 5 5B
50
Count sort
2 3 0 5 3 5 0 2 5A
1 2 3 5 6 7C
/ 0 / 2 / 3 / 5 5B
51
Count sort
2 3 0 5 3 5 0 2 5A
0 2 2 4 6 6C
0 0 2 2 3 3 5 5 5B
52
Count sort
• Complexity: O(n+k)• The sort is stable• Note that count sort does not
perform any comparison
53
Radix sort• Say we have numbers with d digits
each between 0 and k
2 8 7 1
4 5 9 1
6 5 7 2
1 3 0 1
2 4 7 2
3 5 5 5
7 0 2 2
8 3 9 4
4 8 4 4
3 5 3 6
54
Radix sort• Use a stable sort to sort by the
least significant digit (e.g. count sort)
2 8 7 1
4 5 9 1
6 5 7 2
1 3 0 1
2 4 7 2
3 5 5 5
7 0 2 2
8 3 9 4
4 8 4 4
3 5 3 6
55
Radix sort
2 8 7 1
4 5 9 1
6 5 7 2
1 3 0 1
2 4 7 2
3 5 5 5
7 0 2 2
8 3 9 4
4 8 4 4
3 5 3 6
2 8 7 1
4 5 9 1
1 3 0 1
6 5 7 2
2 4 7 2
7 0 2 2
8 3 9 4
4 8 4 4
3 5 5 5
3 5 3 6
56
Radix sort
2 8 7 1
4 5 9 1
6 5 7 2
1 3 0 1
2 4 7 2
3 5 5 5
7 0 2 2
8 3 9 4
4 8 4 4
3 5 3 6
2 8 7 1
4 5 9 1
1 3 0 1
6 5 7 2
2 4 7 2
7 0 2 2
8 3 9 4
4 8 4 4
3 5 5 5
3 5 3 6
57
Radix sort
2 8 7 1
4 5 9 1
6 5 7 2
1 3 0 1
2 4 7 2
3 5 5 5
7 0 2 2
8 3 9 4
4 8 4 4
3 5 3 6
2 8 7 1
4 5 9 1
1 3 0 1
6 5 7 2
2 4 7 2
7 0 2 2
8 3 9 4
4 8 4 4
3 5 5 5
3 5 3 6
1 3 0 1
7 0 2 2
3 5 3 6
4 8 4 4
3 5 5 5
2 8 7 1
6 5 7 2
2 4 7 2
4 5 9 1
8 3 9 4
58
Radix sort
2 8 7 1
4 5 9 1
6 5 7 2
1 3 0 1
2 4 7 2
3 5 5 5
7 0 2 2
8 3 9 4
4 8 4 4
3 5 3 6
2 8 7 1
4 5 9 1
1 3 0 1
6 5 7 2
2 4 7 2
7 0 2 2
8 3 9 4
4 8 4 4
3 5 5 5
3 5 3 6
1 3 0 1
7 0 2 2
3 5 3 6
4 8 4 4
3 5 5 5
2 8 7 1
6 5 7 2
2 4 7 2
4 5 9 1
8 3 9 4
59
Radix sort
2 8 7 1
4 5 9 1
6 5 7 2
1 3 0 1
2 4 7 2
3 5 5 5
7 0 2 2
8 3 9 4
4 8 4 4
3 5 3 6
2 8 7 1
4 5 9 1
1 3 0 1
6 5 7 2
2 4 7 2
7 0 2 2
8 3 9 4
4 8 4 4
3 5 5 5
3 5 3 6
1 3 0 1
7 0 2 2
3 5 3 6
4 8 4 4
3 5 5 5
2 8 7 1
6 5 7 2
2 4 7 2
4 5 9 1
8 3 9 4
7 0 2 2
1 3 0 1
8 3 9 4
2 4 7 2
3 5 3 6
3 5 5 5
6 5 7 2
4 5 9 1
4 8 4 4
2 8 7 1
60
Radix sort
2 8 7 1
4 5 9 1
6 5 7 2
1 3 0 1
2 4 7 2
3 5 5 5
7 0 2 2
8 3 9 4
4 8 4 4
3 5 3 6
2 8 7 1
4 5 9 1
1 3 0 1
6 5 7 2
2 4 7 2
7 0 2 2
8 3 9 4
4 8 4 4
3 5 5 5
3 5 3 6
1 3 0 1
7 0 2 2
3 5 3 6
4 8 4 4
3 5 5 5
2 8 7 1
6 5 7 2
2 4 7 2
4 5 9 1
8 3 9 4
7 0 2 2
1 3 0 1
8 3 9 4
2 4 7 2
3 5 3 6
3 5 5 5
6 5 7 2
4 5 9 1
4 8 4 4
2 8 7 1
61
Radix sort
2 8 7 1
4 5 9 1
6 5 7 2
1 3 0 1
2 4 7 2
3 5 5 5
7 0 2 2
8 3 9 4
4 8 4 4
3 5 3 6
2 8 7 1
4 5 9 1
1 3 0 1
6 5 7 2
2 4 7 2
7 0 2 2
8 3 9 4
4 8 4 4
3 5 5 5
3 5 3 6
1 3 0 1
7 0 2 2
3 5 3 6
4 8 4 4
3 5 5 5
2 8 7 1
6 5 7 2
2 4 7 2
4 5 9 1
8 3 9 4
7 0 2 2
1 3 0 1
8 3 9 4
2 4 7 2
3 5 3 6
3 5 5 5
6 5 7 2
4 5 9 1
4 8 4 4
2 8 7 1
1 3 0 1
2 4 7 2
2 8 7 1
3 5 3 6
3 5 5 5
4 5 9 1
4 8 4 4
6 5 7 2
7 0 2 2
8 3 9 4
62
Radix sort
• Complexity O(d(n+k)) if we use count sort and have d digits each between 0 and k
64
Assume something about the input
• Random, “almost sorted”• For such inputs we want to sort
faster
65
Sorting an almost sorted input
• Suppose we know that the input is “almost” sorted
• Let I be the number of “inversions” in the input: The number of pairs ai,aj such that i<j and ai>aj
66
Example
1, 4 , 5 , 8 , 3
I=3
8, 7 , 5 , 3 , 1 I=10
67
Insertion sort
• Think of “insertion sort”
• How long it takes to insert ak ?
• As the number of inversions ai,ak for i < k lets call this Ik
68
Analysis
The running time is:
1
(1 ) ( )n
ki
I O n I
69
Thoughts
• When I=Ω(n2) the running time is Ω(n2)
• But we would like it to be O(nlog(n)) for any input, and faster when I is small
70
Finger red black trees
71
Finger treeTake a regular search tree and reverse the direction of the pointers on the rightmost spine
We go up from the last leaf until we find the subtree containing the item and we descend into it
72
Finger treesSay we search for a position at distance d from the end
Then we go up to height O(1+log(d))
Insertions and deletions still take O(log n) worst case time but O(1+log(d)) amortized time
So search for the dth position takes O(1+log(d)) time
73
Back to sorting
• Suppose we implement the insertion sort using a finger search tree
• When we insert item k then d=O(Ik+1) and it takes O(1+log(Ik+1)) time
• Total time is bounded by O(n+n log ((I+n)/n))