chap 7 sorting

88
Chap 7 Chap 7 Sorting Sorting

Upload: jam

Post on 14-Jan-2016

53 views

Category:

Documents


0 download

DESCRIPTION

Chap 7 Sorting. The term list here is a collection of records. Each record has one or more fields. Each record has a key to distinguish one record with another. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chap 7 Sorting

Chap 7Chap 7

SortingSorting

Page 2: Chap 7 Sorting

Motivation of Sorting• The term list here is a collection of

records.• Each record has one or more fields.• Each record has a key to distinguish one

record with another.• For example, the phone directory is a list.

Name, phone number, and even address can be the key, depending on the application or need.

Page 3: Chap 7 Sorting

Sorting

• Two ways to store a collection of records– Sequential– Non-sequential

• Assume a sequential list f. To retrieve a record with key f[i].key from such a list, we can do search in the following order:f[n].key, f[n-1].key, …, f[1].key => sequential search

Page 4: Chap 7 Sorting

Example of An Element of A Search List

class Element{public:int getKey() const {return key;};void setKey(int k) {key = k;};

private:int key;// other records…

}

Page 5: Chap 7 Sorting

Sequential Search

int SeqSearch (Element *f, const int n, const int k)// Search a list f with key values // f[1].key, …, f[n].key. Return I such// that f[i].key == k. If there is no such record,

return 0{

int i = n;f[0].setKey(k);while (f[i].getKey() != k) i--;return i;

}

Page 6: Chap 7 Sorting

Sequential Search

• The number of comparisons for a record key i is n – i +1.

• The average number of comparisons for a successful search is

• For the phone directory lookup, there should be a better way than this.

ni

nnin1

2/)1(/)1(

Page 7: Chap 7 Sorting

Search

• A binary search only takes O(log n) time to search a sequential list with n records.

• In fact, if we look up the name start with W in the phone directory, we start search toward the end of the directory rather than the middle. This search method is based on interpolation scheme.

• An interpolation scheme relies on a ordered list.

nkeylfkeyuf

keylfki *)

].[].[

].[(

Page 8: Chap 7 Sorting

Verifying Two Lists With Sequential Search

void Verify1(Element* F1, Element* F2, const int n, const int m)// Compare two unordered lists F1 and F2 of size n and m, respectively{

Boolean *marked = new Boolean[m];for (int i = 1; i <= m; i++) marked[i] = FALSE;

for (i = 1; i<= n; i++) { int j = SeqSearch(F2, m, F1[i].getKey()); if (j == 0) cout << F1[i].getKey() <<“not in F2 “ << endl; else {

if (F1[i].other != F2[j].other) cout << “Discrepancy in “<<F[i].getKey()<<“:”<<F1[i].other

<< “and “ << F2[j].other << endl;marked[j] = TRUE; // marked the record in F2[j] as being seen

}}for (i = 1; i <= m; i++) if (!marked[i]) cout << F2[i].getKey() <<“not in F1. “ << endl;delete [ ] marked;

}

O(mn)

Page 9: Chap 7 Sorting

Fast Verification of Two Lists

void Verify2(Element* F1, Element* F2, const int n, const int m)// Same task as Verfy1. But sort F1 and F2 so that the keys are in// increasing order. Assume the keys in each list are identical{

sort(F1, n);sort(F2, m);int i = 1; int j = 1;while ((i <= n) && (j <= m)) switch(compare(F1[i].getKey(), F2[j].getKey())) {

case ‘<‘: cout<<F1[i].getKey() <<“not in F2”<< endl; i++; break;

case ‘=‘: if (F1[i].other != F2[j].other) cout << “Discrepancy in “ << F1[i].getKey()<<“:” <<F1[i].other<<“ and “<<F2[j].other << endl; i++; j++; break;

case ‘>’: cout <<F2[j].getKey()<<“ not in F1”<<endl; j++;

}if (i <= n)PrintRest(F1, i, n, 1); //print records I through n of F1else if (j <= m) PrintRest(F2, j, m, 2); // print records j through m of F2

}

O(max{n log n, m log m})

Page 10: Chap 7 Sorting

Formal Description of Sorting

• Given a list of records (R1, R2, …, Rn). Each record has a key Ki. The sorting problem is then that of finding permutation, σ, such that Kσ(i) ≤ K σ(i+1) , 1 ≤ i ≤ n – 1. The desired ordering is (Rσ(1), Rσ(2), Rσ(n)).

Page 11: Chap 7 Sorting

Formal Description of Sorting (Cont.)

• If a list has several key values that are identical, the permutation, σs, is not unique. Let σs be the permutation of the following properties:(1) Kσ(i) ≤ K σ(i+1) , 1 ≤ i ≤ n – 1

(2) If i < j and Ki == Kj in the input list, then Ri precedes Rj in the sorted list.

• The above sorting method that generates σs is stable.

Page 12: Chap 7 Sorting

Categories of Sorting Method• Internal Method: Methods to be used

when the list to be sorted is small enough so that the entire sort list can be carried out in the main memory.– Insertion sort, quick sort, merge sort, heap

sort and radix sort.

• External Method: Methods to be used on larger lists.

Page 13: Chap 7 Sorting

Insert Into A Sorted Listvoid insert(const Element e, Element* list, int i)

// Insert element e with key e.key into the ordered sequence list[0], …, list[i] such that the

// resulting sequence is also ordered. Assume that e.key ≥ list[0].key. The array list must

// have space allocated for at least i + 2 elements

{

while (e.getKey() < list[i].getKey())

{

list[i+1] = list[i];

i--;

}

list[i+1] = e;

} O(i)

Page 14: Chap 7 Sorting

Insertion Sort

void InsertSort(Element* list, const int n)

// Sort list in nondecreasing order of key{

list[0].setKey(MININT);

for (int j = 2; j <= n; j++)

insert(list[j], list, j-1);

}

1

1

21n

i

)O(n)(iO(

Page 15: Chap 7 Sorting

Insertion Sort Example 1

• Record Ri is left out of order (LOO) iff Ri <

• Example 7.1: Assume n = 5 and the input key sequence is 5, 4, 3, 2, 1

}{max1

jij

R

j [1] [2] [3] [4] [5]

- 5 4 3 2 1

2 4 5 3 2 1

3 3 4 5 2 1

4 2 3 4 5 1

5 1 2 3 4 5

Page 16: Chap 7 Sorting

Insertion Sort Example 2

• Example 7.2: Assume n = 5 and the input key sequence is 2, 3, 4, 5, 1

j [1] [2] [3] [4] [5]

- 2 3 4 5 1

2 2 3 4 5 1

3 2 3 4 5 1

4 2 3 4 5 1

5 1 2 3 4 5

O(1)

O(1)

O(1)

O(n)

Page 17: Chap 7 Sorting

Insertion Sort Ananlysis

• If there are k LOO records in a list, the computing time for sorting the list via insertion sort is O((k+1)n) = O(kn).

• Therefore, if k << n, then insertion sort might be a good sorting choice.

Page 18: Chap 7 Sorting

Insertion Sort Variations• Binary insertion sort: the number of

comparisons in an insertion sort can be reduced if we replace the sequential search by binary search. The number of records moves remains the same.

• List insertion sort: The elements of the list are represented as a linked list rather than an array. The number of record moves becomes zero because only the link fields require adjustment. However, we must retain the sequential search.

Page 19: Chap 7 Sorting

Quick Sort• Quick sort is developed by C. A. R. Hoare.

The quick sort scheme has the best average behavior among the sorting methods.

• Quick sort differs from insertion sort in that the pivot key Ki is placed at the correct spot with respect to the whole list. Kj ≤ Ks(i) for j < s(i) and Kj ≥ s(i) for j > s(i).

• Therefore, the sublist to the left of S(i) and to the right of s(i) can be sorted independently.

Page 20: Chap 7 Sorting

Quick Sort

void QuickSort(Element* list, const int left, const int right)// Sort records list[left], …, list[right] into nondecreasing order on field key. Key pivot = list[left].key is // arbitrarily chosen as the pivot key. Pointers I and j are used to partition the sublist so that at any time// list[m].key pivot, m < I, and list[m].key pivot, m > j. It is assumed that list[left].key ≤ list[right+1].key.{ if (left < right) {

int i = left, j = right + 1, pivot = list[left].getKey(); do {

do i++; while (list[i].getKey() < pivot);do j--; while (list[j].getKey() > pivot);if (i<j) InterChange(list, i, j);

} while (i < j); InterChange(list, left, j); QuickSort(list, left, j–1); QuickSort(list, j+1, right);

}}

Page 21: Chap 7 Sorting

Quick Sort Example

• Example 7.3: The input list has 10 records with keys (26, 5, 37, 1, 61, 11, 59, 15, 48, 19).

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 Left Right

[26 5 37 1 61 11 59 15 48 19 1 10

[11 5 19 1 15] 26 [59 61 48 37] 1 5

[1 5] 11 [19 15] 26 [59 61 48 37] 1 2

1 5 11 [19 15] 26 [59 61 48 37] 4 5

1 5 11 15 19 26 [59 61 48 37] 7 10

1 5 11 15 19 26 [48 37] 59 [61] 7 8

1 5 11 15 19 26 37 48 59 [61] 10 10

1 5 11 15 19 26 37 48 59 61

Page 22: Chap 7 Sorting

Quick Sort (Cont.)• In QuickSort(), list[n+1] has been set to have a

key at least as large as the remaining keys.• Analysis of QuickSort

– The worst case O(n2)– If each time a record is correctly positioned, the sublist

of its left is of the same size of the sublist of its right. Assume T(n) is the time taken to sort a list of size n:

T(n) ≤ cn + 2T(n/2), for some constant c ≤ ≤ cn + 2(cn/2 +2T(n/4))

≤ 2cn + 4T(n/4)::

≤ cn log2n + T(1) = O(n logn)

Page 23: Chap 7 Sorting

Lemma 7.1

• Lemma 7.1: Let Tavg(n) be the expected time for function QuickSort to sort a list with n records. Then there exists a constant k such that Tavg(n) ≤ kn logen for n ≥ 2.

Page 24: Chap 7 Sorting

Analysis of Quick Sort• Unlike insertion sort (which only needs additional

space for a record), quick sort needs stack space to implement the recursion.

• If the lists split evenly, the maximum recursion depth would be log n and the stack space is of O(log n).

• The worst case is when the lists split into a left sublist of size n – 1 and a right sublist of size 0 at each level of recursion. In this case, the recursion depth is n, the stack space of O(n).

• The worst case stack space can be reduced by a factor of 4 by realizing that right sublists of size less than 2 need not be stacked. Asymptotic reduction in stack space can be achieved by sorting smaller sublists first. In this case the additional stack space is at most O(log n).

Page 25: Chap 7 Sorting

Quick Sort Variations

• Quick sort using a median of three: Pick the median of the first, middle, and last keys in the current sublist as the pivot. Thus, pivot = median{Kl, K(l+r)/2, Kr}.

Page 26: Chap 7 Sorting

Decision Tree• So far both insertion sorting and quick

sorting have worst-case complexity of O(n2).• If we restrict the question to sorting

algorithms in which the only operations permitted on keys are comparisons and interchanges, then O(n logn) is the best possible time.

• This is done by using a tree that describes the sorting process. Each vertex of the tree represents a key comparison, and the branches indicate the result. Such a tree is called decision tree.

Page 27: Chap 7 Sorting

Decision Tree for Insertion Sort

K1 ≤ K2

stop K1 ≤ K3

K2 ≤ K3 K1 ≤ K3

stop K2 ≤ K2

stop stop stop stop

Yes No

Yes

Yes

Yes

Yes

No

No No

No

I

II III

IV

V VI

Page 28: Chap 7 Sorting

Decision Tree (Cont.)

• Theorem 7.1: Any decision tree that sorts n distinct elements has a height of at least log2(n!) + 1

• Corollary: Any algorithm that sorts only by comparisons must have a worst-case computing time of Ω(n log n)

Page 29: Chap 7 Sorting

Simple Merge

void merge(Element* initList, Element* mergeList, const int l, const int m, const int n)

{

for (int i1 =l,iResult = l, i2 = m+1; i1<=m && i2<=n; iResult++){

if (initList[i1].getKey() <= initList[i2].getKey()) {

mergeList[iResult] = initList[i1];

i1++;

}

else {

mergeList[iResult] = initList[i2];

i2++;

}

}

if (i1 > m)

for (int t = i2; t <= n; t++)

mergeList[iResult + t - i2] = initList[t];

else

for (int t = i1; t <= m; t++)

mergeList[iResult + t - i1] = initList[t];

}

O(n - l + 1)

Page 30: Chap 7 Sorting

Analysis of Simple Merge

• If an array is used, additional space for n – l +1 records is needed.

• If linked list is used instead, then additional space for n – l + 1 links is needed.

Page 31: Chap 7 Sorting

O(1) Space Merge

• A second merge algorithm only requires O(1) additional space.

• Assume total of n records to be merged into a list, where n is a perfect square. And the numbers of records in the left sublist and the right sublist are multiple of

n

Page 32: Chap 7 Sorting

O(1) Space Merge Steps

Step 1: Identify the records with largest keys. This is done by following right to left along the two lists to be merged.

Step 2: Exchange the records of the second list that were identified in Step 1 with those just to the left of those identified from the first list so that the records with largest keys are contiguous.

Step 3: Swap the block of largest with the leftmost block (unless it is already the leftmost block). Sort the rightmost block.

Step 4: Reorder the blocks, excluding the block of largest records, into nondecreasing order of the last key in the blocks.

Step 5: Perform as many merge substeps as needed to merge the blocks, other than the block with the largest keys.

Step 6: Sort the block with the largest keys.

n

1n

n

n

Page 33: Chap 7 Sorting

O(1) Space Merge Example (First 8 Lines)

0 2 4 6 8 a c e g i j k l m n t w z|1 3 5 7 9 b d f h o p q r s u v x y

0 2 4 6 8 a c e g i j k l m n t w z 1 3 5 7 9 b d f h o p q r s u v x y

0 2 4 6 8 a|c e g i j k|u v x y w z|1 3 5 7 9 b|d f h o p q|r s l m n t

u v x y w z|c e g i j k|0 2 4 6 8 a|1 3 5 7 9 b|d f h o p q|l m n r s t

u v x y w z 0 2 4 6 8 a|1 3 5 7 9 b|c e g I j k|d f h o p q|l m n r s t

0 v x y w z u 2 4 6 8 a|1 3 5 7 9 b|c e g I j k|d f h o p q|l m n r s t

0 1 x y w z u 2 4 6 8 a|v 3 5 7 9 b|c e g I j k|d f h o p q|l m n r s t

0 1 2 y w z u x 4 6 8 a|v 3 5 7 9 b|c e g I j k|d f h o p q|l m n r s t

Page 34: Chap 7 Sorting

O(1) Space Merge Example (Last 8 Lines)

0 1 2 3 4 5 u x w 6 8 a|v y z 7 9 b|c e g i j k|d f h o p q|l m n r s t

0 1 2 3 4 5 6 7 8 u w a|v y z x 9 b|c e g i j k|d f h o p q|l m n r s t

0 1 2 3 4 5 6 7 8 9 a w|v y z x u b|c e g i j k|d f h o p q|l m n r s t

0 1 2 3 4 5 6 7 8 9 a w v y z x u b c e g i j k|d f h o p q|l m n r s t

0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k v z u|y x w o p q|l m n r s t

0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k v z u y x w o p q|l m n r s t

0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q y x w|v z u r s t

0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t|v z u y x w

Page 35: Chap 7 Sorting

Analysis of O(1) Space Merge• Step 1 and 2 and the swapping of Step 3 each

take O( ) time and O(1) space.• The sort of Step 3 can be done in O(n) time and

O(1) space using an insertion sort.• Step 4 can be done in O(n) time and O(1) space

using a selection sort. (Selection sort sorts m records using O(m2) key comparisons and O(m) record moves. So it needs O(n) comparisons and the time to move blocks is O(n).

• If insertion sort is used in Step 4, then the time becomes O(n1.5) since insertion sort needs O(m2) record moves ( records per block * n record moves).

n

n

Page 36: Chap 7 Sorting

Analysis of O(1) Space Merge (Cont.)

• The total number of merge substeps is at most . The total time for Step 5 is O(n).

• The sort of Step 6 can be done in O(n) by using either a selection sort or an insertion sort.

• Therefore, the total time is O(n) and the additional space used is O(1).

1n

Page 37: Chap 7 Sorting

Iterative Merge Sort

• Treat the input as n sorted lists, each of length 1.

• Lists are merged by pairs to obtain n/2 lists, each of size 2 (if n is odd, the one list is of length 1).

• The n/2 lists are then merged by pairs, and so on until we are left with only one list.

Page 38: Chap 7 Sorting

Merge Tree

26 5 77 1 61 11 59 15 48 19

5 26 1 77 11 61 15 59 19 48

1 5 26 77 11 15 59 61 19 48

1 5 11 15 26 59 61 77 19 48

1 5 11 15 19 26 48 59 61 77

Page 39: Chap 7 Sorting

Iterative Merge Sort

void MergeSort(Element* list, const int n)// Sort list list into non-decreasing order of the keys list[1].key, …,list[n].key.{

Element* tempList = new Element[n+1];// l is the length of the sublist currently being merged.for (int l = 1; l < n; l *= 2){ MergePass(list, tempList, n, l); l *= 2; MergePass(tempList, list, n, l); //interchange role of list and tempList}delete[ ] tempList;

}

Page 40: Chap 7 Sorting

Merge Pass

void MergePass(Element* initList, Elemen* resultList, const int n, const int l)

// One pass of merge sort. Adjacent pairs of sublists of length l are merged

// from list initList to list resultList. n is the number of records in initList

{

for (int i = 1;

i <= n – 2*l + 1; // Are enough elements remaining to form two sublists of length l?

i += 2*l)

merge(initList, resultList, i, i + l - 1, i + 2*l – 1);

// merge remaining list of length < 2 * l

if ((i + l – 1) < n) merge(initList, resultList, i, i+l–1, n);

else for (int t = i; t <= n; t++) resultList[t] = initList[t];

}

Page 41: Chap 7 Sorting

Analysis of MergeSort

• Total of passes are made over the data. Each pass of merge sort takes O(n) time.

• The total of computing time is O(n log n)

n2log

Page 42: Chap 7 Sorting

Recursive Merge Sort• Recursive merge sort divides the list to be

sorted into two roughly equal parts:– the left sublist [left : (left+right)/2]

– the right sublist [(left+right)/2 +1 : right]

• These sublists are sorted recursively, and the sorted sublists are merged.

• To avoid copying, the use of a linked list (integer instead of real link) for sublist is desirable.

Page 43: Chap 7 Sorting

Sublist Partitioning For Recursive Merge Sort

26 5 77 1 61 11 59 15 48 19

5 26 11 59 19 48

5 26 77 11 15 59 19 48

1 5 26 61 77 11 15 19 48 59

1 5 11 15 19 26 48 59 61 77

1 61

Page 44: Chap 7 Sorting

Program 7.11 (Recursive Merge Sort )

class Element{ private:

int key; Field other; int link;public: Element() {link = 0;};

};

int rMergeSort(Element* list, const int left, const int right)// List list = (list[left], …, list[right]) is to be sorted on the field key.// link is a link field in each record that is initially 0// list[0] is a record for intermediate results used only in ListMerge{ if (left >= right) return left; int mid = (left + right)/2; return ListMerge(list, rMergeSort(list, left, mid), rMergeSort(list, mid+1, right));}

O(n log n)

Page 45: Chap 7 Sorting

Program 7.12 (Merging Linked Lists)

int ListMerge(Element* list, const int start1, const int start2){ int iResult = 0; for (int i1 = start1, i2 = start2; i1 && i2;){ if (list[i1].key <= list[i2].key) { list[iResult].link = i1; iResult = i1; i1 = list[i1].link; } else { list[iResult].link = i2; iResult = i2; i2 = list[i2].link; } } // move remainder if (i1 == 0) list[iResult].link = i2; else list[iResult] = i1; return list[0].link;}

Page 46: Chap 7 Sorting

Natural Merge Sort

• Natural merge sort takes advantage of the prevailing order within the list before performing merge sort.

• It runs an initial pass over the data to determine the sublists of records that are in order.

• Then it uses the sublists for the merge sort.

Page 47: Chap 7 Sorting

Natural Merge Sort Example

26 5 77 1 61 11 59 15 48 19

1 11 59 61 15 19 485 26 77

1 5 11 26 59 61 77 15 19 48

1 5 11 15 19 26 48 59 61 77

Page 48: Chap 7 Sorting

Heap Sort• Merge sort needs additional storage space

proportional to the number of records in the file being sorted, even though its computing time is O(n log n)

• O(1) merge only needs O(1) space but the sorting algorithm is much slower.

• We will see that heap sort only requires a fixed amount of additional storage and achieves worst-case and average computing time O(n log n).

• Heap sort uses the max-heap structure.

Page 49: Chap 7 Sorting

Heap Sort (Cont.)

• For heap sort, first of all, the n records are inserted into an empty heap.

• Next, the records are extracted from the heap one at a time.

• With the use of a special function adjust(), we can create a heap of n records faster.

Page 50: Chap 7 Sorting

Program 7.13 (Adjusting A Max Heap)

void adjust (Element* tree, const int root, const int n)// Adjust the binary tree with root root to satisfy the heap property. The left and right subtrees of root // already satisfy the heap property. No node has index greater than n.{ Element e = tree[root]; int k = e.getKey(); for (int j = 2*root; j <= n; j *= 2) { if (j < n) if (tree[j].getKey() < tree[j+1].getKey()) j++; // compare max child with k. If k is max, then done. if (k >= tree[j].getKey()) break; tree[j/2] = tree[j]; // move jth record up the tree } tree[j/2] = e;}

Page 51: Chap 7 Sorting

Program 7.14 (Heap Sort)

void HeapSort (Element* list, const int n)

// The list list = (list[1], …, list[n]) is sorted into nondecreasing order of the field key.

{

for (int i = n/2; i >= 1; i--) // convert list into a heap

adjust(list, i, n);

for (i = n-1; i >= 1; i--) // sort list

{

Element t = list[i+1]; // interchange list1 and list i+1

list[i+1] = list[1];

list[1] = t;

adjust(list, 1, i);

}

}

ki ki

iik

ki

i nOniniik1 1

1

1

1 )(22/2)(2

O(n log n)

Page 52: Chap 7 Sorting

Converting An Array Into A Max Heap

26

15 48 19

5

1 61

77

11 59

77

15 1 5

61

48 19

59

11 26

[1]

[2]

[3]

[4]

[5]

[6] [7]

[8] [9] [10]

[2]

[3]

[4]

[5]

[6] [7]

[8] [9] [10]

(a) Input array

(b) Initial heap

[1]

Page 53: Chap 7 Sorting

Heap Sort Example

61

5 1

48

15 19

59

11 26

[2]

[3]

[4]

[5]

[6] [7]

[8] [9]

[1]

Heap size = 9, Sorted = [77]

59

5

48

15 19

26

11 1

[2]

[3]

[5]

[6] [7]

[8]

[1]

Heap size = 8, Sorted = [61, 77]

Page 54: Chap 7 Sorting

Sorting Several Keys• A list of records are said to be sorted with respect to

the keys K1, K2, …, Kr iff for every pair of records i and j, i < j and (K1

i, K2i, …, Kr

i) ≤ (K1j, K2

j, …, Krj).

• The r-tuple (x1, x2, …, xr) is less than or equal to the r-tuple (y1, y2, …, yr) iff either xi = yi, 1 ≤ i ≤ j, and xj+1 < yj+1 for some j < r or xi = yi , 1 ≤ i ≤ r.

• Two popular ways to sort on multiple keys– sort on the most significant key into multiple piles. For

each pile, sort on the second significant key, and so on. Then piles are combined. This method is called sort on most-significant-digit-first (MSD).

– The other way is to sort on the least significant digit first (LSD).

• Example, sorting a deck of cards: suite and face value.– Spade > Heart > Diamond > Club

Page 55: Chap 7 Sorting

Sorting Several Keys (Cont.)• LSD and MSD only defines the order in

which the keys are to be sorted. But they do not specify how each key is sorted.

• Bin sort can be used for sorting on each key. The complexity of bin sort is O(n).

• LSD and MSD can be used even when there is only one key. – E.g., if the keys are numeric, then each

decimal digit may be regarded as a subkey. => Radix sort.

Page 56: Chap 7 Sorting

Radix Sort• In a radix sort, we decompose the sort key

using some radix r. – The number of bins needed is r.

• Assume the records R1, R2, …, Rn to be sorted based on a radix of r. Each key has d digits in the range of 0 to r-1.

• Assume each record has a link field. Then the records in the same bin are linked together into a chain:– f[i], 0 ≤ i ≤ r (the pointer to the first record in bin i)– e[i], (the pointer to the end record in bin i)– The chain will operate as a queue.– Each record is assumed to have an array key[d], 0 ≤

key[i] ≤ r, 0 ≤ i ≤ d.

Page 57: Chap 7 Sorting

Program 7.15 (LSD Radix Sort)void RadixSort (Element* list, const int d, const int n){ int e[radix], f[radix]; // queue pointers for (int i = 1; i <= n; i++) list[i].link = i + 1; // link into a chain starting at current list[n].link = 0; int current = 1; for (i = d – 1; i >= 0; i--) // sort on key key[i] { for (int j = 0; j < radix; j++) f[j] = 0; // initialize bins to empty queues for (; current; current = list[current].link) { // put records into queues int k = list[current].key[i]; if (f[k] == 0) f[k] = current; else list[e[k]].link = current; e[k] = current; } for (j = 0; f[j] == 0; j++); // find the first non-empty queue current = f[j]; int last = e[j]; for (int k = j+1; k < radix; k++){ // concatenate remaining queues if (f[k]){ list[last].link = f[k]; last = e[k]; } } list[last].link = 0; } }

O(n)

O(r)

d passes

O(d(n+r))

Page 58: Chap 7 Sorting

Radix Sort Example

179 208 306 93 859 984 55 9 271 33

list[1]

list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

e[0] e[1] e[2] e[3] e[4] e[5] e[6] e[7] e[8] e[9]

f[0] f[1] f[2] f[3] f[4] f[5] f[6] f[7] f[8] f[9]

179

859

9

2083065598493

33

271

271 93 33 984 55 306 208 179 859 9

list[1]

list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

Page 59: Chap 7 Sorting

Radix Sort Example (Cont.)

e[0] e[1] e[2] e[3] e[4] e[5] e[6] e[7] e[8] e[9]

f[0] f[1] f[2] f[3] f[4] f[5] f[6] f[7] f[8] f[9]

179859

9

208

306 55 984 9333 271

271 93 33 984 55 306 208 179 859 9

list[1]

list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

306 208 9 33 55 859 271 179 984 93

list[1]

list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

Page 60: Chap 7 Sorting

Radix Sort Example (Cont.)

e[0] e[1] e[2] e[3] e[4] e[5] e[6] e[7] e[8] e[9]

f[0] f[1] f[2] f[3] f[4] f[5] f[6] f[7] f[8] f[9]

179

55

33

9 859 984306

271

9 33 55 93 179 208 271 306 859 948

list[1]

list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

306 208 9 33 55 859 271 179 984 93

list[1]

list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

93

208

Page 61: Chap 7 Sorting

List And Table Sorts• Apart from radix sort and recursive merge

sort, all the sorting methods we have looked at so far require excessive data movement.

• When the amount of data to be sorted is large, data movement tends to slow down the process.

• It is desirable to modify sorting algorithms to minimize the data movement.

• Methods such as insertion sort or merge sort can be modified to work with a linked list rather than a sequential list. Instead of physical movement, an additional link field is used to reflect the change in the position of the record in the list.

Page 62: Chap 7 Sorting

Program 7.16 (Rearranging Records Using A Doubly Linked List

void list (Element* list, const int n, int first){ int prev = 0; for (int current = first; current; current = list[current].link)

// convert chain into a doubly linked list { list[current].linkb = prev; prev = current; }

for (int i = 1; i < n; i++) // move listfirst to position i while maintaining the list { if (first != i) { if (list[i].link) list[list[i].link].linkb = first; list[list[i].linkb].link = first; Element a = list[first]; list[first] = list[i];list[i] = a; } first = list[i].link; }}

O(n)

O(nm)

Assume each record is m words

Page 63: Chap 7 Sorting

Example 7.9

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 26 5 77 1 61 11 59 15 48 19

link 9 6 0 2 3 8 5 10 7 1

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 26 5 77 1 61 11 59 15 48 19

link 9 6 0 2 3 8 5 10 7 1

linkb

10 4 5 0 7 2 9 6 1 8

Page 64: Chap 7 Sorting

Example 7.9 (Cont.)

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 77 26 61 11 59 15 48 19

link 2 6 0 9 3 8 5 10 7 4

linkb 0 4 5 10 7 2 9 6 4 8

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 77 26 61 11 59 15 48 19

link 2 6 0 9 3 8 5 10 7 1

linkb 0 4 5 10 7 2 9 6 1 8

Configuration after first iteration of the for loop of list1, first = 2

Configuration after second iteration of the for loop of list1, first = 6

Page 65: Chap 7 Sorting

Example 7.9 (Cont.)

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 11 26 61 77 59 15 48 19

link 2 6 8 9 6 0 5 10 7 4

linkb 0 4 2 10 7 5 9 6 4 8

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 11 15 61 77 59 26 48 19

link 2 6 8 10 6 0 5 9 7 8

linkb 0 4 2 6 7 5 9 10 8 8

Configuration after third iteration of the for loop of list1, first = 8

Configuration after fourth iteration of the for loop of list1, first = 10

Page 66: Chap 7 Sorting

Example 7.10 (Rearranging Records Using Only One Link)

void list2(Element* list, const int n, int first)// Same function as list1 except that a second link field linkb is not required{ for (int i = 1; i < n; i++) { while (first < i ) first = list[first].link;

int q = list[first].link; // listq is next record in nondecreasing order if (first != i)

// interchange listi and listfirst moving listfirst to its correct spot as listfirst has ith smallest key.

// Also set link from old position of listi to new one { Element t = list[i]; list[i] = list[first]; list[first] = t; list[i].link = first; } first = q; }}

O(n)

O(nm)

Page 67: Chap 7 Sorting

Example 7.10 (Rearranging Records Using Only One Link)

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 77 26 61 11 59 15 48 19

link 4 6 0 9 3 8 5 10 7 1

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 77 26 61 11 59 15 48 19

link 4 6 0 9 3 8 5 10 7 1

Configuration after first iteration of the for loop of list1, first = 2

Configuration after second iteration of the for loop of list1, first = 6

Page 68: Chap 7 Sorting

Example 7.10 (Rearranging Records Using Only One Link)

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 11 26 61 77 59 15 48 19

link 4 6 6 9 3 0 5 10 7 1

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 11 15 61 77 59 26 48 19

link 4 6 6 8 3 0 5 9 7 1

Configuration after third iteration of the for loop of list1, first = 8

Configuration after fourth iteration of the for loop of list1, first = 10

Page 69: Chap 7 Sorting

Example 7.10 (Rearranging Records Using Only One Link)

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R1

0

key 1 5 11 15 19 77 59 26 48 61

link 4 6 6 8 10 0 5 9 7 3

i R1 R2 R3 R4 R5 R6 R7 R8 R9 R10

key 1 5 11 15 19 26 59 77 48 61

link 4 6 6 8 1 8 5 0 7 3

Configuration after fifth iteration of the for loop of list1, first = 1

Configuration after sixth iteration of the for loop of list1, first = 9

Page 70: Chap 7 Sorting

Table Sort• The list-sort technique is not well suited for quick

sort and heap sort. • One can maintain an auxiliary table, t, with one

entry per record. The entries serve as an indirect reference to the records.

• Initially, t[i] = i. When interchanges are required, only the table entries are exchanged.

• It may be necessary to physically rearrange the records according to the permutation specified by t sometimes.

Page 71: Chap 7 Sorting

Table Sort (Cont.)

• The function to rearrange records corresponding to the permutation t[1], t[2], …, t[n] can be considered as an application of a theorem from mathematics:– Every permutation is made up of

disjoint cycles. The cycle for any element i is made up of i, t[i], t2[i], …, tk[i], where tj[i]=t[tj-1[i]], t0[i]=i, tk[i]=i.

Page 72: Chap 7 Sorting

Program 7.18 (Table Sort)

void table(Element* list, const int n, int *t)

{

for (int i = 1; i < n; i++) {

if (t[i] != i) {

Element p = list[i]; int j = i;

do {

int k = t[j]; list[j] = list[k]; t[j] = j;

j = k;

} while (t[j] != i)

list[j] = p;

t[j] = j;

}

}

}

Page 73: Chap 7 Sorting

Table Sort Example

R1 R2 R3 R4 R5 R6 R7 R8

key 35 14 12 42 26 50 31 18

t 3 2 8 5 7 1 4 6

key 12 14 18 42 26 35 31 50

t 1 2 3 5 7 6 4 8

key 12 14 18 26 31 35 42 50

t 1 2 3 4 5 6 7 8

Initial configuration

Configuration after rearrangement of first cycle

Configuration after rearrangement of second cycle

1 2 3

45

Page 74: Chap 7 Sorting

Analysis of Table Sort

• To rearrange a nontrivial cycle including k distinct records one needs k+1 moves. Total of record moves

• Since the records on all nontrivial cycles must be different, then

• The total number of record moves is maximum when and there are cycles.• Assume that one record move costs O(m) time,

the total computing time is O(mn).

1

0,0

)1(n

kll

l

k

nkl

nkl 2/n

Page 75: Chap 7 Sorting

Summary Of Internal Sorting• No one method is best for all conditions.

– Insertion sort is good when the list is already partially ordered. And it is best for small number of n.

– Merge sort has the best worst-case behavior but needs more storage than heap sort.

– Quick sort has the best average behavior, but its worst-case behavior is O(n2).

– The behavior of radix sort depends on the size of the keys and the choice of r.

Page 76: Chap 7 Sorting

External Sorting• There are some lists that are too large to fit in

the memory of a computer. So internal sorting is not possible.

• Some records are stored in the disk (tape, etc.). System retrieves a block of data from a disk at a time. A block contains multiple records.

• The most popular method for sorting on external storage devices is merge sort.– Segments of the input list are sorted. – Sorted segments (called runs) are written onto external

storage.– Runs are merged until only run is left.

Page 77: Chap 7 Sorting

Example 7.12• Consider a computer which is capable of sorting

750 records is used to sort 4500 records.• Six runs are generated with each run sorting 750

records.• Allocate three 250-record blocks of internal

memory for performing merging runs. Two for input run 2 runs and the last one is for output.

• Three factors contributing to the read/write time of disk:– seek time– latency time– transmission time

Page 78: Chap 7 Sorting

Example 7.12 (Cont.)

run1 (1 – 750)

run2 (751 – 1500)

run3 (1501 – 2250)

run4 (2251 – 3000)

run5 (3001 – 3750)

run6 (3751 – 4500)

Page 79: Chap 7 Sorting

Example 7.12 (Cont.)

• tIO = ts + tl +trw

tIS = time to internal sort 750 records

ntm = time to merge n records from input buffers to the output buffer

ts = maximum seek time

tl = maximum latency time

trw = time to read or write on block of 250 records

Page 80: Chap 7 Sorting

Example 7.12 (Cont.)

Operation Time

(1) Read 18 blocks of input, 18tIO, internally sort, 6tIS , write 18 blocks, 18tIO

36tIO + 6tIS

(2) Merge runs 1 to 6 in pairs 36tIO + 4500tm

(3) Merge two runs fo 1500 records each, 12 blocks

24tIO + 3000tm

(4) Merge one run of 3000 records with one run of 1500 records

36tIO + 4500tm

Total time 132tIO + 12000tm+ 6tIS

Page 81: Chap 7 Sorting

K-Way Merging• To merge m runs via 2-way merging will need

passes.• If we use higher order merge, the number of

passes over would be reduced.• With k-way merge on m runs, we need passes over.• But is it always true that the higher order of

merging, the less computing time we will have?– Not necessary!– k-1 comparisons are needed to determine the next

output.– If loser tree is used to reduce the number of

comparisons, we can achieve complexity of O(n log2 m)

– The data block size reduced as k increases. Reduced block size implies the increase of data passes over

1log2 m

mklog

Page 82: Chap 7 Sorting

Buffer Handling for Parallel Operation

• To achieve better performance, multiple input buffers and two output buffers are used to avoid idle time.

• Evenly distributing input buffers among all runs may still have idle time problem. Buffers should be dynamically assigned to whoever needs to retrieve more data to avoid halting the computing process.

• We should take advantage of task overlapping and keep computing process busy and avoid idle time.

Page 83: Chap 7 Sorting

Buffering Algorithm

Step 1: Input the first block of each of the k runs, setting up k linked queues, each having one block of data.

Step 2: Let LastKey[i] be the last key input from run i. Let NextRun be the run for which LastKey is minimum.

Step 3: Use a function kwaymerge to merge records from the k input queues into the output buffer.

Step 4: Wait for any ongoing disk I/O to complete.Step 5: If an input buffer has been read, add it to the

queue for the appropriate run.Step 6: If LastKey[NextRun] != +infinity, then initiate

reading the next block from run NextRun into a free input buffer.

Step 7: Initiate the writing of output buffer.Step 8: If a record with key +infinity has not been merged

into the output buffer, go back to Step 3. Otherwise, wait for the ongoing write to complete and then terminate.

Page 84: Chap 7 Sorting

Optimal Merging of Runs

26

11

6

2 4

5

15 6

2 4

20

5 15

26

weighted external path length = 2*3 + 4*3 + 5*2 + 15*1

= 43

weighted external path length = 2*2 + 4*2 + 5*2 + 15*2

= 52

Page 85: Chap 7 Sorting

Huffman Code• Assume we want to obtain an optimal set of codes for

messages M1, M2, …, Mn+1. Each code is a binary string that will be used for transmission of the corresponding message.

• At receiving end, a decode tree is used to decode the binary string and get back the message.

• A zero is interpreted as a left branch and a one as a right branch. These codes are called Huffman codes.

• The cost of decoding a code word is proportional to the number bits in the code. This number is equal to the distance of the corresponding external node from the root node.

• If qi is the relative frequency with which message Mi will be transmitted, then the expected decoding time is

where di is the distance of the external node for message Mi from the root node.

11 ni

iidq

Page 86: Chap 7 Sorting

Huffman Codes (Cont.)

• The expected decoding time is minimized by choosing code words resulting in a decode tree with minimal weighted external path length.

M1 M2

M3

M4

0

0

0

1

1

1

Page 87: Chap 7 Sorting

Huffman Function

class BinaryTree {public: BinaryTree(BinaryTree bt1, BinaryTree bt2) { root->LeftChild = bt1.root; root->RightChild = bt2.root; root->weight = bt1.root->weight + bt2.root->weight; }private: BinaryTreeNode *root;}

void huffman (List<BinaryTree> l)// l is a list of single node binary trees as decribed above{ int n = l.Size(); // number of binary trees in l for (int i = 0; i < n-1; i++) { // loop n-1 times BinaryTree first = l.DeleteMinWeight(); BinaryTree second = l.DeleteMinWeight(); BinaryTree *bt = new BinaryTree(first, second); l.Insert(bt); }}

O(nlog n)

Page 88: Chap 7 Sorting

Huffman Tree Example

5

2 3 5

2 3

10

5

16

9 7

5

2 3

10

5

23

13

16

9 7

5

2 3

10

5

23

13

39(a)

(b)

(c)

(d) (e)