cs221: algorithms and data structures lecture #4 sorting things out

of 47 /47
CS221: Algorithms and Data Structures Lecture #4 Sorting Things Out Steve Wolfman 2014W1 1

Upload: ethan-bentley

Post on 30-Dec-2015

29 views

Category:

Documents


0 download

Embed Size (px)

DESCRIPTION

CS221: Algorithms and Data Structures Lecture #4 Sorting Things Out. Steve Wolfman 2014W1. Quick Review of Sorts. Insertion Sort: Keep a list of already sorted elements. One by one, insert new elements into the right place in the sorted list. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

CS221: Algorithms and Data Structures

Lecture #4Sorting Things Out

Steve Wolfman

2014W1

1

Page 2: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Quick Review of Sorts

• Insertion Sort: Keep a list of already sorted elements. One by one, insert new elements into the right place in the sorted list.

• Selection Sort: Repeatedly find the smallest (or largest) element and put it in the next slot (where it belongs in the final sorted list).

• Merge Sort: Divide the list in half, sort the halves, merge them back together. (Base case: length 1.)2

Page 3: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Insertion Sort Invariant

3

in sorted order untouched

each iteration moves the line right one element

Page 4: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Selection Sort Invariant

4

smallest, 2nd smallest, 3rd smallest,… in order

remainder in no particular order

each iteration moves the line right one element

Page 5: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Today’s Outline

• Categorizing/Comparing Sorting Algorithms– PQSorts as examples

• MergeSort• QuickSort• More Comparisons• Complexity of Sorting

5

Page 6: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Categorizing Sorting Algorithms

• Computational complexity– Average case behaviour: Why do we care? – Worst/best case behaviour: Why do we care? How

often do we re-sort sorted, reverse sorted, or “almost” sorted (k swaps from sorted where k << n) lists?

• Stability: What happens to elements with identical keys? • Memory Usage: How much extra memory is used?

6

Page 7: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Comparing our “PQSort” Algorithms

• Computational complexity– Selection Sort: Always makes n passes with a

“triangular” shape. Best/worst/average case (n2)– Insertion Sort: Always makes n passes, but if we’re

lucky and search for the maximum from the right, only constant work is needed on each pass. Best case (n); worst/average case: (n2)

– Heap Sort: Always makes n passes needing O(lg n) on each pass. Best/worst/average case: (n lg n).

7Note: best cases assume distinct elements.

With identical elements, Heap Sort can get (n) performance.

Page 8: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Insertion Sort Best Case

8

2 3 4 5 6 7 8 9 10 11 12 13 141

1 2 3 4 5 6 7 8 9 10 11 120 13

PQ

PQ

PQ

PQ

2 3 4 5 6 7 8 9 10 11 12 13 141

2 3 4 5 6 7 8 9 10 11 12 13 141

2 3 4 5 6 7 8 9 10 11 12 13 141

If we search from the right: constant time per pass!

Page 9: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Comparing our “PQSort” Algorithms

• Stability– Selection: Unstable. (Say we swap in the “leftmost”

smallest element, what about the element swapped out?)– Insertion: Easily made stable (when building from the

left, find the rightmost slot for a new element).– Heap: Unstable

• Memory use: All three are essentially “in-place” algorithms with small O(1) extra space requirements.

• Cache access: Not detailed in 221, but… algorithms that don’t “jump around” tend to perform better in modern memory systems. Which of these “jumps around”?

9

But note: there’s a trick to make any sort stable.

Page 10: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Comparison of growth...

nlgnn2

n

n=100

T(n

)=10

0

10Reminder: asymptotic differences matter,but a factor of lg n doesn’t matter as much as a factor of n.

Page 11: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Today’s Outline

• Categorizing/Comparing Sorting Algorithms– PQSorts as examples

• MergeSort• QuickSort• More Comparisons• Complexity of Sorting

11

Page 12: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

MergeSort

Mergesort belongs to a class of algorithms known as “divide and conquer” algorithms (your recursion sense should be tingling here...).

The problem space is continually split in half, recursively applying the algorithm to each half until the base case is reached.

12

Page 13: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

MergeSort Algorithm

1. If the array has 0 or 1 elements, it’s sorted. Else…

2. Split the array into two halves

3. Sort each half recursively (i.e., using mergesort)

4. Merge the sorted halves to produce one sorted result:

1. Consider the two halves to be queues.

2. Repeatedly compare the fronts of the queues. Whichever is smaller (or, if one is empty, whichever is left), dequeue it and insert it into the result.

13

Page 14: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

MergeSort Performance Analysis

1. If the array has 0 or 1 elements, it’s sorted. Else…

2. Split the array into two halves

3. Sort each half recursively (i.e., using mergesort)

4. Merge the sorted halves to produce one sorted result:

1. Consider the two halves to be queues.

2. Repeatedly compare the fronts of the queues. Whichever is smaller (or, if one is empty, whichever is left), dequeue it and insert it into the result.

T(1) = 1

2*T(n/2)

n

14

Page 15: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

MergeSort Performance Analysis

T(1) = 1

T(n) = 2T(n/2) + n

= 4T(n/4) + 2(n/2) + n

= 8T(n/8) + 4(n/4) + 2(n/2) + n

= 8T(n/8) + n + n + n = 8T(n/8) + 3n

= 2iT(n/2i) + in.

Let i = lg n

T(n) = nT(1) + n lg n = n + n lg n (n lg n)We ignored floors/ceilings. To prove performance formally, we’d use

this as a guess and prove it with floors/ceilings by induction.15

Page 16: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Try it on this array!

3 -4 3 5 9 1 2 6

-4 1 2 3 3 5 6 916

Page 17: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Try it on this array!

3 -4 3 5 9 1 2 6

3 -4 3 5 9 1 2 6

3 -4 3 5 9 1 2 6

3 -4 3 5 9 1 2 6

-4 3 3 5 1 9 2 6

-4 3 3 5 1 2 6 9

-4 1 2 3 3 5 6 917

*

**Where does the red 3 go?

Page 18: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Mergesort (by Jon Bentley):

void msort(int x[], int lo, int hi, int tmp[]) { if (lo >= hi) return; int mid = (lo+hi)/2; msort(x, lo, mid, tmp); msort(x, mid+1, hi, tmp); merge(x, lo, mid, hi, tmp);}

void mergesort(int x[], int n) { int *tmp = new int[n]; msort(x, 0, n-1, tmp); delete[] tmp;}

18

Page 19: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Merge (by Jon Bentley):

void merge(int x[],int lo,int mid,int hi,int tmp[]) { int a = lo, b = mid+1; for( int k = lo; k <= hi; k++ ) { if( a <= mid && (b > hi || x[a] < x[b]) ) tmp[k] = x[a++];

else tmp[k] = x[b++];} for( int k = lo; k <= hi; k++ ) x[k] = tmp[k];}

19

Elegant & brilliant… but not how I’d write it.

Page 20: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Mergesort (by Steve):

void msort(int data[], int left, int right, int temp[]) { if (left >= right) return; int mid = (left+right)/2; msort(data, left, mid, temp); msort(data, mid, right, temp); merge(data, left, mid, right, temp);}

void mergesort(int data[], int n) { int *temp = new int[n]; msort(data, 0, n-1, temp); delete[] temp;}

20

Page 21: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Merge (by Jon Bentley):

void merge(int data[], int left, int mid, int right, int temp[]) { int frontL = left, frontR = mid+1; for(int dest = left; dest <= right; ++dest) { if (frontL > mid) { // left is empty? temp[dest] = data[frontR]; ++frontR; } else if (frontR > mid) { // right is empty? temp[dest] = data[frontL]; ++frontL; } else if (data[frontR] < data[frontL]) { temp[dest] = data[frontR]; // right elt ++frontR; // is smaller } else { temp[dest] = data[frontL]; // left elt ++frontL; // is smaller } } for( int k = left; k <= right; ++k ) data[k] = temp[k];}

21

Page 22: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Today’s Outline

• Categorizing/Comparing Sorting Algorithms– PQSorts as examples

• MergeSort• QuickSort• More Comparisons• Complexity of Sorting

22

Page 23: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

QuickSort

In practice, one of the fastest sorting algorithms is Quicksort, developed in 1961 by C.A.R. Hoare.

Comparison-based: examines elements by comparing them to other elements

Divide-and-conquer: divides into “halves” (that may be very unequal) and recursively sorts

23

Page 24: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

QuickSort algorithm

• Pick a pivot• Reorder the list such that all elements < pivot are

on the left, while all elements pivot are on the right

• Recursively sort each side

Are we missing a base case?24

Page 25: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Partitioning

• The act of splitting up an array according to the pivot is called partitioning

• Consider the following:

-4 1 -3 2 3 5 4 7

left partition right partitionpivot

25

Page 26: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

QuickSort Visually

P

P PP

P P P P

Sorted!

26

Page 27: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

QuickSort (by Jon Bentley):

void qsort(int x[], int lo, int hi){ int i, p; if (lo >= hi) return; p = lo; for( i=lo+1; i <= hi; i++ ) if( x[i] < x[lo] ) swap(x[++p], x[i]); swap(x[lo], x[p]); qsort(x, lo, p-1); qsort(x, p+1, hi);}

void quicksort(int x[], int n) { qsort(x, 0, n-1);}

27

Elegant & brilliant… but still not how I’d write it.

Page 28: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

QuickSort Example (using intuitive algorithm, not Bentley’s)(Pick first element as pivot, “scoot” elements to left/right.)

2 -4 6 1 5 -3 3 7

28

Page 29: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

QuickSort: Complexity

• Recall that Quicksort is comparison based– Thus, the operations are comparisons

• In our partitioning task, we compared each element to the pivot– Thus, the total number of comparisons is N– As with MergeSort, if one of the partitions is about half

(or any constant fraction of) the size of the array, complexity is (n lg n).

• In the worst case, however, we end up with a partition with a 1 and n-1 split

29

Page 30: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

QuickSort Visually: Worst case

P

P

P

P

30

Page 31: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

QuickSort: Worst Case

• In the overall worst-case, this happens at every step…– Thus we have N comparisons in the first step– N-1 comparisons in the second step– N-2 comparisons in the third step– :

– …or approximately n2

n (n 1) 2 1n(n 1)

2

n2

2

n

2

31

Page 32: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

QuickSort: Average Case (Intuition)

• Clearly pivot choice is important– It has a direct impact on the performance of the sort– Hence, QuickSort is fragile, or at least “attackable”

• So how do we pick a good pivot?

32

Page 33: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

QuickSort: Average Case (Intuition)

• Let’s assume that pivot choice is random– Half the time the pivot will be from the centre half of

the array

– Thus at worst the split will be n/4 and 3n/4

33

Page 34: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

QuickSort: Average Case (Intuition)

• We can apply this to the notion of a good split– Every “good” split: 2 partitions of size n/4 and 3n/4

• Or divides N by 4/3

– Hence, we make up to log4/3(N) “good” splits

• Expected # of partitions is at most 2 * log4/3(N)– O(lgN)

• Given N comparisons at each partitioning step, we have (N lg N)

34

Page 35: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Quicksort Complexity: How does it compare?

NInsertion

SortQuicksor

t

10,0004.1777

sec0.05 sec

20,00020.52 sec

0.11 sec

300,0004666 sec

(1.25 hrs)

2.15 sec

35

Page 36: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Today’s Outline

• Categorizing/Comparing Sorting Algorithms– PQSorts as examples

• MergeSort• QuickSort• More Comparisons• Complexity of Sorting

36

Page 37: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

How Do Quick, Merge, Heap, Insertion, and Selection Sort Compare?

Complexity– Best case: Insert < Quick, Merge, Heap < Select– Average case: Quick, Merge, Heap < Insert, Select– Worst case: Merge, Heap < Quick, Insert, Select– Usually on “real” data: Quick < Merge < Heap < I/S– On very short lists: quadratic sorts may have an

advantage (so, some quick/merge implementations “bottom out” to these as base cases)

Some details depend on implementation!(E.g., an initial check whether the last elt of the left sublist is less

than first of the right can make merge’s best case linear.)37

(not asymptotic)

Page 38: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

How Do Quick, Merge, Heap, Insertion, and Selection Sort Compare?

Stability– Easily Made Stable: Insert, Merge (prefer the “left” of

the two sorted sublists on ties)– Unstable: Heap– Challenging to Make Stable: Quick, Select

• Memory use:– Insert, Select, Heap < Quick < Merge

38How much stack space does recursive QuickSort use?

In the worst case? Could we make it better?

Page 39: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Today’s Outline

• Categorizing/Comparing Sorting Algorithms– PQSorts as examples

• MergeSort• QuickSort• More Comparisons• Complexity of Sorting

39

Page 40: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Complexity of Sorting Using Comparisons as a Problem

Each comparison is a “choice point” in the algorithm. You can do one thing if the comparison is true and another if false. So, the whole algorithm is like a binary tree…

……

sorted!z < cc < dsorted!

a < da < b

x < y

……

yes no

yes no yes no

yes noyes no40

Page 41: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Complexity of Sorting Using Comparisons as a Problem

The algorithm spits out a (possibly different) sorted list at each leaf. What’s the maximum number of leaves?

……

sorted!z < cc < dsorted!

a < da < b

x < y

……

yes no

yes no yes no

yes noyes no41

Page 42: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Complexity of Sorting Using Comparisons as a Problem

There are n! possible permutations of a sorted list (i.e., input orders for a given set of input elements). How deep must the tree be to distinguish those input orderings?

……

sorted!z < cc < dsorted!

a < da < b

x < y

……

yes no

yes no yes no

yes noyes no42

Page 43: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Complexity of Sorting Using Comparisons as a Problem

If the tree is not at least lg(n!) deep, then there’s some pair of orderings I could feed the algorithm which the algorithm does not distinguish. So, it must not successfully sort one of those two orderings.

……

sorted!z < cc < dsorted!

a < da < b

x < y

……

yes no

yes no yes no

yes noyes no43

Page 44: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Complexity of Sorting Using Comparisons as a Problem

QED: The complexity of sorting using comparisons is (n lg n) in the worst case, regardless of algorithm!

In general, we can lower-bound but not upper-bound the complexity of problems.

(Why not? Because I can give as crappy an algorithm as I please to solve any problem.)

44

Page 45: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Today’s Outline

• Categorizing/Comparing Sorting Algorithms– PQSorts as examples

• MergeSort• QuickSort• More Comparisons• Complexity of Sorting

45

Page 46: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

To Do

• Read: Epp Section 9.5 and KW Section 10.1, 10.4, and 10.7-10.10

46

Page 47: CS221: Algorithms and  Data Structures Lecture #4 Sorting Things Out

Next Up

• B+-trees and Giant Branching Factors

47