cs305/503, spring 2009 balancing and linear-time sorting michael barnathan

25
CS305/503, Spring 2009 Balancing and Linear- Time Sorting Michael Barnathan

Upload: julianna-barbra-parsons

Post on 25-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

CS305/503, Spring 2009Balancing and Linear-Time Sorting

Michael Barnathan

Page 2: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Here’s what we’ll be learning:

• Theory:– Rotations.– DSW Balancing.– Pigeonhole sort.– Radix sort.

Page 3: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

“Treesort”: An Aside.

• Just as with a priority queue, if you insert elements into a Binary Search Tree and traverse them (in inorder), you will get them back sorted.

• This is therefore a simple sorting algorithm.• There are some who call it “Tim” “Treesort”.• What is the complexity?

– We perform n insertions, then inorder traverse.• This algorithm has lousy caching behavior,

however, so it isn’t used much.

Page 4: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Binary Search Trees• Are excellent, fascinating, and very well studied data

structures.– They achieve log(n) performance for insertion, deletion, and

access.• The only linear operation is traversal, which is always going to be

linear (the goal is to visit each node).– Tree algorithms usually have straightforward recursive

structures (except for deletion).– Traversals give us a general tool to perform actions on all nodes

of the tree.• Have one major flaw:

– Degeneracy: the tree’s performance degrades to that of a linked list as balance is lost.

• Clearly, we want to prevent unbalancing.

Page 5: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

One-time Balancing Algorithm• There is an algorithm to balance trees called the “DSW

algorithm”.– It was discovered by Q. Stout and B. Warren based on

work by C. Day.– Reading their paper may be a good exercise to give you an

idea of what sorts of questions people try to answer when they create these algorithms.

• The idea behind this is simple:– Turn the tree into a linked list using left rotations.– Make as many right rotations down the list as are

necessary to turn the list into a balanced tree.• Two major questions:

– What are tree rotations?– How many rotations are necessary?

Page 6: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Illustrated Tree Rotations

5

• Rotations affect up to two levels of the tree.• Nodes get rotated up through the tree and to the other side of the parent.

– A second level of rotation is needed to deal with the orphaned children.• Left and right rotations are inverse operations.

4

Right Rotation About 1

1

2

3 4

Left Rotation About 2

3

5

2

1

Page 7: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Deeper Tree Rotations

5

• Rotations only affect two levels of a tree.– They are therefore O(1).– No matter how many nodes in a tree, a constant number

of swaps are made.• If a tree node two levels down has children, they

“come along for the ride” when the node is moved.

4

Right Rotation About 1

1

2

3 8

6 7

Page 8: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

The Code//Note that Java passes by value, so you will need to set node’s data rather than ref.void rightRotate(BinaryTree node) {

BinaryTree oldnode = (BinaryTree) (node.clone()); //clone() copies an Object.node.setTo(node.left); //Left moves to root.oldnode.left = node.right; //Left.right becomes right.left.node.right = oldnode; //Root moves to right.right.

}

void leftRotate(BinaryTree node) {//The exact opposite occurs.BinaryTree oldnode = (BinaryTree) (node.clone());node.setTo(node.right);oldnode.right = node.left;node.left = oldnode;

}

This takes slow and careful thought to wrap your mind around.Probably better to develop an intuitive sense from the picture.

Page 9: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

DSW Algorithm• To turn a tree into a linked list:

– Start at the root.– If the current node has a left child, rotate right.– Otherwise, set the current node to the right child.– Stop when the current node is null.

• To turn a linked list into a tree:– Find m, the (greatest power of 2 less than n+1) - 1.

• This is equivalent to 2floor(log (n+1)) - 1.– Start at the root.– Rotate left n - m times, moving to the new right child after each

rotation.– While m > 1:

• Divide m by 2 (integer division).• Start at the root.• Rotate left m times, moving to the new right child after each rotation.

Page 10: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

The Codevoid balance(BinaryTree root) {

tree_to_list(root);list_to_tree(root);

}

void tree_to_list(BinaryTree root) {//We could make this recursive, but that would require O(n) stack space.BinaryTree curnode = root;while (curnode != null) {

if (curnode.left != null)rightRotate(curnode);

elsecurnode = curnode.right;

}}

void list_to_tree(BinaryTree root) {int m = (int) (Math.pow(2.0, Math.floor(Math.log(n + 1) / Math.log(2))) - 1); //log2(a) = ln(a) / ln(2)BinaryTree curnode = root;for (int rotatecount = 0; rotatecount < n - m; rotatecount++) {

leftRotate(curnode);curnode = curnode.right;

}

while (m > 1) {m /= 2;curnode = root;for (int rotatecount = 0; rotatecount < m; rotatecount++) {

leftRotate(curnode);curnode = curnode.right;

}}

}

Page 11: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Example

* From http://courses.cs.vt.edu/~cs2604/fall05/mcpherson/note/BalancingTrees.ppt

tree_to_list

list_to_tree

Page 12: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Performance of DSW

• tree_to_list() visits every node and rotates some of them.

• list_to_tree() has a linear component at the top… and then halves m each time in the loop.

• Each iteration of the inner loop performs m operations.

• So we have T(n) = T(n / 2) + O(n).• This is linear, and so is the whole algorithm.

Page 13: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

When to Balance• If you use this algorithm, you are placed in charge of

keeping your trees balanced.• This brings up the obvious question: When should I do it?• And I’ll give an equally obvious answer: When it would be

faster to balance than not.• You can keep track of the depth of the highest and lowest

leaves in the tree and balance when they differ by too much.– How much is up to you, but 3 levels is probably not

unreasonable.• You can also keep a “balance factor” on each node that

compares the left and right subtrees, but we’ll get to that later when we discuss self-balancing trees.

• But this is all we’ll talk about on trees for now.

Page 14: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Intuitive Sorting

• We’ve discussed many sorting algorithms already, but we haven’t yet considered the most intuitive way to sort.– I have some cards here. Let’s discuss how we can

sort them by type.

Page 15: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

From Intuition to Algorithm• Hopefully, you suggested what I consider the most logical way to sort:

– Separate into piles.– Run through the piles in order and place them back together.

• There are 46 cards. How many times did you have to look at a card?– 46, give or take a few?

• If I had n cards, how many comparisons would you need to make?– n, give or take a few?– O(n), in fact.

• “Hey! Didn’t you say it was impossible to sort in O(n) time?”– Only if you compare elements to each other.– There was no comparison going on here.

• How much “extra space” did you need to sort the cards?– 5 piles, because there were 5 types of cards.– Let’s call the number of distinct types of cards “k”.

• Any extra time to merge them?– You just had to look at those 5 piles once to do it.

• This is definitely easy enough to formalize an algorithm from.

Page 16: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Pigeonhole Sort

• The strategy you just employed is known as pigeonhole sorting.

• It’s very easy to understand: you put each category of items into a “hole” (use a queue).

• You then take them back out “in order”.• The space cost is the number

of holes you need to fill.• Using queues, this is stable.

?

Page 17: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Pigeonhole Sort – The Algorithm//Say we’re still sorting cards, which have names.Card[] pigeonhole(Card[] arr) {

//LLs in Java implement queues.TreeMap<LinkedList<Card> > holes = new TreeMap<LinkedList<Card> >();

//This is how you do a “for each” loop in Java.for (Card item : arr) {

LinkedList<Card> hole = holes.get(item.name);if (hole == null) {

hole = new LinkedList<Card>();holes.put(item.name, hole);

}

hole.put(item);}

LinkedList<Card> sorted = new LinkedList<Card>();for (LinkedList<Card> hole : holes)

sorted.addAll(hole);

return sorted.toArray();}

Page 18: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Problems

• “Categories” need to exist for pigeonhole sort.• Continuous data, such as numbers, can be

converted to categories, but it’s tricky:– If I asked you to sort numbers from 1 to 1 million,

you would need a lot of space.

• You also need some sort of map structure to order the “holes”, unless your keys are numeric (in which case an array will work).

• Fortunately, there is a way to improve this…

Page 19: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Back to the Cards• Let’s sort a different set of cards by name.• No longer are we looking at only one letter; now we’re looking at the

whole word.• How did you compare them?

– First letter matches, go to the second, they match, go to the third, etc.?– That works well for you, but it requires remembering that previous letters

matched.– So if you’re sorting recipes and you encounter a dish called:

Lopado temakho selakho galeo kranio leipsano drim hypo trimmato silphio karabo melito katakekhy meno kikhl epi kossypho phatto perister alektryon opto kephallio kigklo peleio lagōio siraio baphē tragano pterýgōne…

– Then you may have problems.• You can still compare them this way – if you do it backwards.

– That is because this strategy is stable.• Remember: That means it preserves the existing ordering of elements with equal keys.

– You don’t need to remember anything if you sort by the last letter first, then by the second to last, then…, until you reach the first.

Page 20: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Radix Sort• This is the optimal sorting algorithm in terms of speed.

It is also a fairly simple algorithm.• Take the last digit of a number (or letter of a word) and

pigeonhole sort it.– We have 10 categories for base-10 numbers, 26 or 52 for

letters, 255 for the extended ASCII charset.– (But 65536 for Unicode characters!)

• Repeat for each additional digit, back to front.• If you have two keys that aren’t of equal lengths, pad

with 0s or null characters.– e.g.: 76, 124 -> 076, 124.– 0s and null chars will occupy the first bucket by nature.

Page 21: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Radix Sort Example

• 76, 24, 5, 412, 128, 300

• 300, 412, 24, 05, 76, 128

• 300, 005, 412, 024, 128, 076

• 5, 24, 76, 128, 300, 412.

300 412 24 5 76 128

0 1 2 3 4 5 6 7 8 9

30005 412

12824 76

0 1 2 3 4 5 6 7 8 9

005024076 128 300 412

0 1 2 3 4 5 6 7 8 9

Page 22: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Radix Sort Codeint[] radixsort(int[] arr) { return radixsort(arr, 10); }int[] radixsort(int[] arr, int base) {

//Determine the number of digits in the largest number.int maxdigits = 0;for (int aidx = 0; aidx < arr.length; aidx++)

maxdigits = Math.max(maxdigits, (int) (Math.log(arr[aidx]) / Math.log(base)) + 1);

LinkedList<int>[] holes = new LinkedList<int>[base];for (int baseinit = 0; baseinit < base; baseinit++)

holes[baseinit] = new LinkedList<int>(); //Initialize each linked list.

for (int digit = 1; digit <= maxdigits; digit++) {int curRadix = (int) (Math.pow(base, digit - 1)); //The “place” we’re considering (i.e., 1s, 10s, …)for (int itemidx = 0; itemidx < arr.length; itemidx++)

holes[arr[itemidx] % (curRadix * base) / curRadix].add(arr[itemidx]); //Extract the target digit.

//We populated the holes. Now replace the array with the holes’ contents in sorted order.LinkedList<int> sorted = new LinkedList<int>();for (int holeidx = 0; holeidx < holes.length; holeidx++) {

sorted.addAll(holes[holeidx]);holes[holeidx].clear();

}

arr = sorted.toArray(); //Replace the array and move to the next digit.}return arr;

}

Page 23: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Radix Sort Analysis• The inner loops run through the array from start to finish, adding

elements into their proper holes.• The outer loop runs in time proportional to the number of digits in the

largest number.– e.g., 3 digits in 400, 5 digits in 16384, … (base 10)– Or… 3 digits in 10, 5 in 17, 6 in 40… (base 2)– Or one digit if you choose base n.

• So why not always choose base n?– Because then it’s equivalent to pigeonhole sort and we run into the same

storage problems.– This is a classic time/space tradeoff: the greater the base, the more storage

you will need but the faster it will run.• The time complexity is O(nk), where k is the number of digits in the largest

element.• Radix sort is stable.

Page 24: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Why the runaround?• This is pretty simple, considering it’s the optimal

algorithm!• Why did we have to go through all of those other

algorithms?• Three reasons:

– Radix sort is not as general-purpose as Quicksort, Mergesort, and others.

• You can only use it easily when you confine yourself to a range of digits, strings, or bits.

– There isn’t much difference between k and log n. It’s usually just the log in another base.

– Learning is a journey, not a destination: other strategies teach you some very useful concepts.

Page 25: CS305/503, Spring 2009 Balancing and Linear-Time Sorting Michael Barnathan

Keep the balance…

• The lesson:– Good ideas need not be complex.

• Next class: Self-balancing trees and amortized analysis.

• Assignment 3 is posted on the website and is due Thursday 3/26.