data structures brett bernstein lecture 13: vla trees and binary...

14
?? 10 5 15 1 12 18 16 17

Upload: others

Post on 05-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

Data Structures � Brett Bernstein

Lecture 13: AVL Trees and Binary Heaps

Review Exercises

1. (??) Interview question: Given an array show how to shu�e it randomly so that anypossible reordering is equally likely.

static void shu�e(int[] arr)

2. Write a function that given the root of a binary search tree returns the node with thelargest value.

public static BSTNode<Integer> getLargest(BSTNode<Integer> root)

3. Explain how you would use a binary search tree to implement the Map ADT. We haveincluded it below to remind you.

Map.java

//Stores a mapping of keys to valuespublic interface Map<K,V> {

//Adds key to the map with the associated value. If key already//exists, the associated value is replaced.void put(K key, V value);//Gets the value for the given key, or null if it isn't found.V get(K key);//Returns true if the key is in the map, or false otherwise.boolean containsKey(K key);//Removes the key from the mapvoid remove(K key);//Number of keys in the mapint size();

}

What requirements must be placed on the Map?

4. Consider the following binary search tree.

10

5 15

1 12 18

16

17

1

Page 2: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

Perform the following operations in order.

(a) Remove 15.

(b) Remove 10.

(c) Add 13.

(d) Add 8.

5. Suppose we begin with an empty BST. What order of add operations will yield thetallest possible tree?

Review Solutions

1. One method (popular in programs like Excel) is to generate a random double corre-sponding to each element of the array, and then sort the array by the correspondingdoubles. Here is a di�erent method that avoids sorting (used by Collections.shu�e).

RandomShu�e.java

import java.util.Random;

public class RandomShu�e{

static void swap(int[] arr, int i, int j){

int tmp = arr[i];arr[i] = arr[j];arr[j] = tmp;

}//Iterative implementationpublic static void shu�e(int[] arr){

Random ran = new Random();for (int i = arr.length−1; i >= 1; −−i) swap(arr,i,ran.nextInt(i+1));

}//Recursive implementationpublic static void shu�eRec(int[] arr){

sRHelp(arr, arr.length, new Random());}public static void sRHelp(int[] arr, int len, Random ran){

if (len <= 1) return;swap(arr, len−1, ran.nextInt(len));sRHelp(arr,len−1,ran);

}}

2

Page 3: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

Above we give an iterative and a recursive implementation. We �rst randomly chooseone of the elements and swap it into the �nal position. Then we repeat the process onthe �rst n− 1 elements (i.e., randomly choose last element, and then randomly shu�e�rst n− 1 elements).

2. public static BSTNode<Integer> getLargest(BSTNode<Integer> root) {while (root.getRight() != null) root = root.getRight();return root;

}

3. We require the keys of the Map to be Comparable or a Comparator to be provided.In each node instead of simply storing a value, store a key-value pair (i.e., an entry).The BST will be ordered by the keys. Here all operations above will be Θ(h) in theworst case (we don't have containsValue above which would always be Θ(n)). In amoment, we will show how to achieve Θ(lg n) height trees which gives an e�cient Mapimplementation without needing a hash function (but we need ordered keys).

4. (a)

10

5 16

1 12 18

17

(b)

12

5 16

1 18

17

(c)

12

5 16

131 18

17

(d)

3

Page 4: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

12

5 16

131 8 18

17

5. Ascending or descending order (i.e., sorted or reverse sorted).

Data Structure: AVL Tree

We have learned how to store values in a BST data structure that supports adding, removal,and searching in time Θ(h), where h is the height of the tree. We have also seen that inthe worst case, the height can be Θ(n), the size of the tree. If we can keep the tree fairlybalanced, then we can reduce the height to Θ(lg n) and obtain a fairly e�cient data structure.

To maintain a balanced structure we will use a technique invented by Adelson-Velskiiand Landis (hence the name AVL tree). The idea is to keep a counter in every node (called abalance factor) that measures the di�erence between the heights of the left and right subtrees(right minus left). We will add the following added constraint on our BST

� No balance factor will be greater than 1 or less than -1.

Any BST that satis�es this constraint will have Θ(lg n) height. Every time we add orremove a node we may violate this constraint. To return the tree to balance we will employa technique called a rotation. Consider the following AVL tree:

10+1

5

-1

15+1

1

0

12

0

20

-1

17

0

Each node has the value and balance factor in it. Now suppose we add 16 (just as we wouldin a BST).

4

Page 5: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

10+2

5

-1

15+2

1

0

12

0

20

-2

17

-1

16

0

As you can see, the tree now violates our constraints on the balance factors. To remedy theissue, we �nd the closest (lowest) ancestor of the newly added node that is out of balance.In this case, it would be the node containing the 20. We then perform a rotation so that 17takes the place of 20, and 20 becomes the right child of 17. The resulting tree is

10+1

5-1

15+1

10

120

170

160

200

Now the tree satis�es the constraint again. Let's generalize this example to all possible waysadding a node can imbalance the tree. First notice that if adding a value will cause a node toviolate its balance factor constraint, then it must have been −1 or +1 already, since addinga single node can add at most 1 to the height of any subtree. Let's assume a node currentlyhas balance factor −1 and some value X. We will model the possible ways that the nodecontaining X will be the lowest node that violates the balance factor constraint (by addingvalues).

5

Page 6: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

X-1

Y0

Ak B k

C k

Above Y < X and all of the labeled subtrees have the same height k ≥ 0. The subtreerooted at X has height k + 1. We �rst handle the case that a node is added to subtree Athat increases its height.

X-2

Y-1

Z

A1 A2

B k

C k

Here we assumed the value added was smaller than Y . Note that the subtree rooted atX now has height k + 2. We then zoom in on the subtree A by drawing its root Z andsubtrees A1 and A2. The possible heights of A1, A2 are (this will not e�ect our course ofaction though):

1. If k = 0 then A1, A2 are both empty. Z has a balance factor of 0

2. If k > 0 and the new value is smaller than Z then A1 has height k and A2 has heightk − 1. Z has a balance factor of −1.

3. If k > 0 and the new value is greater than Z then A1 has height k − 1 and A2 hasheight k. Z has a balance factor of +1.

To �x this we perform a rotation that puts Y where X is:

6

Page 7: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

Y0

Z X0

A1 A2 B C k

The balance constraints are no longer violated. Note that the subtree rooted at Y has heightk + 1, the same as the subtree rooted at X before a value was added. Thus all nodes aboveY in the tree are now balanced as well. This is the so called �left-left� case since Z is theleft child of Y which is the left child of X. The other �left-right� case occurs when we add avalue that is larger than Y to our original tree (the picture with A, B, and C) above.

X-2

Y+1

W

Ak

B1 B2

C k

Here we have zoomed in on the subtree B. We have similar cases as above that determinethe heights of B1, B2. Here we perform a rotation that puts W in place of Y :

X-2

W

Y

Ak B1

B2

C k

This gives us a case like the �left-left� situation. Thus we perform a second rotation as aboveputting W in place of X:

7

Page 8: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

W0

Y X

Ak B1 B2 C k

Again we have restored balance and the resulting tree has height k + 1. Thus all ancestorsof W are balanced too. Using the operations above and their re�ections (�right-right� and�right-left� cases) we can enforce the balance constraints on every add operation.

The TreeMap and TreeSet in Java use a related data structure called a red-black treethat uses a slightly di�erent constraint for balance, but also uses rotations to enforce theconstraint.

AVL Tree Exercises

1. Consider the following BST.

10

5 15

6 12

(a) Add balance factors to all of the nodes.

(b) Show what happens when we add the node 13 (treating the tree as an AVL tree).

(c) Next add 7 to the tree (treating the tree as an AVL tree).

(d) (?) Next delete 5, 7, and 6 from the tree. Make sure to remedy any violatedbalance constraints.

2. Give a simple Θ(n lg n) sorting algorithm assuming you have access to an AVL Treedata structure.

3. (??) The AVL Tree deletion algorithm is similar to the addition algorithm, but cancause as many as Θ(h) rotations to occur where h is the height of the tree. Can youexplain why?

4. (?) Let minNodes(h) denote the minimum number of nodes you need to have an AVL-Tree of height h. For h >= 2 give a recurrence relation satis�ed by minNodes(h).

8

Page 9: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

5. In the Priority Queue ADT elements have an ordering (Comparable or Comparator).When you dequeue an element, instead of removing the earliest added element, youremove the earliest element in the given ordering. You can imagine a situation whereyou are ordering tasks to work on, but each task has a priority governing when youmust work on it. It has the following operations

(a) add: Adds an element to the Priority Queue

(b) dequeue: Removes the smallest element (with respect to the ordering)

Describe an e�cient implementation of this ADT.

AVL Tree Solutions

1. (a)

10

0

5

+1

15

-1

6

0

12

0

(b) We perform the �left-right� sequence of 2 rotations after adding 13.

10

0

5

+1

13

0

6

0

12

0

15

0

(c) After adding 7 we perform the �right-right� rotation which is simply the re�ectionof the �left-left� rotation.

9

Page 10: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

10

0

6

0

13

0

5

0

7

0

12

0

15

0

(d) After the removal the root is unbalanced. We will treat this like the �right-right�case and rotate the 13 into the root.

13

-1

10

+1

15

0

12

0

2. Add all of the elements to the AVL Tree, then perform an inOrder traversal. If we wantto handle duplicates, we can either extend our AVL Tree/BST to allow duplicates, orinstead of storing values, we can store lists of values that are equivalent with respectto the ordering. This method of sorting isn't used.

3. Note that after an add operation, our rotations returned the e�ected subtree to thesame height it was before the add operation. Thus no ancestors above the lowestunbalanced node had to be �xed. After a remove operation the e�ected subtree mayhave a lower height than before, and thus ancestors of the lowest unbalanced node mayneed to be �xed as well.

4. minNodes(h) = 1 + minNodes(h − 1) + minNodes(h − 2). To see why, note that wemust have a subtree of height h− 1 so that the whole tree has height h. Secondly, thesmallest we can make the other subtree is height h− 2 due to the balance constraint.This can be used to show that minNodes grows faster than the Fibonacci sequence,which grows exponentially. This in turn can be used to show that the height of anAVL tree is Θ(lg n).

5. Use an AVL Tree. Adds simply add nodes to the tree. To dequeue we simply removethe smallest value. Both operations require Θ(lg n) in the worst-case.

10

Page 11: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

Data Structure: (Binary) Min Heap

Above we saw how to implement a PriorityQueue using an AVL Tree. Here we will give anew data structure that can be used to implement a PriorityQueue and is way more e�cientin practice (lower constants embedded in the Θ-terms). We will allow duplicates values.

A min heap is a binary tree with the following two constraints:

1. Ordering: Every node is equal to or smaller than its children.

2. Completenees: Every level of the tree is full but the last level. In the last level thenodes �ll up the leftmost positions.

The second constraint sounds odd, but it will enable a very e�cient implementation. Insteadof using binary tree nodes to store our values, we will just use an array containing all of theelements as they would appear in a level-by-level (breadth �rst) traversal.

Consider the following min heap:

4

5 7

9 12 9

We then store this in an ArrayList.

4

0

5

1

7

2

9

3

12

4

9

5

The nice thing about this format is that we can easily �nd the children and parent of anynode. Suppose you are at the node with index k in the array.

1. Left child: 2k + 1

2. Right child: 2k + 2

3. Parent: (k − 1)/2 (Java integer division; gives 0 on root)

We will justify the left child formula. The rest will follow from that. Note that level of nodesat depth d contains the indices 2d − 1 through 2d+1 − 2. Thus the kth node in that level hasindex 2d − 1 + (k − 1). Applying the left child formula, we obtain index

2(2d − 1 + (k − 1)) + 1 = 2d+1 − 1 + 2(k − 1).

This is the index in level d + 1 just after the 2(k − 1) children of the nodes preceding ouroriginal node in level d.

11

Page 12: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

The nice formulas above are made possible by the array storage format. Since our minheaps have the completeness property, the array format isn't wasteful.

What remains is to implement the two main operations of the min heap: add and re-moveMin. The key to all heap operations is to �rst guarantee that the completeness propertyholds. After the �shape� is correct, we then make a few updates to �x the ordering constraint.Suppose we want to add the node 1 to our heap above. We �rst add 1 in the next availablespot in the lowest level:

4

5 7

9 12 9 1

Next we need to �x the ordering constraint. We use an operation called �sift-up�. Takethe newly added node an compare it with its parent. If it is smaller, swap, and repeat theprocess on the parent. This is depicted below.

4

5 7

9 12 9 1

1

5 4

9 12 9 7

To remove the mininum we �rst swap the top value with the last value. Then we can safelyremove the last value and maintain the shape. Finally, we correct the ordering constraintby checking if the root is bigger than its smallest child. If so, swap and then repeat on thenode you swapped with. This process is sometimes called �sift-down�.

Min Heap Exercises

1. Consider the following min heap.

5

9 6

14 11 7 8

15 16 12 15

(a) What is the index of 11 in the corresponding array?

(b) Add an 8 to the min heap.

(c) Then add a 1 to the min heap.

(d) Then removeMin from the min heap.

12

Page 13: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

(e) Then removeMin from the min heap.

2. A max heap is just like a min heap, but each node must be larger than its children.Explain why a max heap data structure is unnecessary if you already have a min heap.

3. What are the worst-case runtimes of add and removeMin?

4. Assuming you have access to a min-heap, show how to sort a list of Comparable valuesin worst-case time Θ(n lg n).

5. How long does it take to �nd a value in a min heap?

6. (?) Sometimes it is useful to remove an arbitrary element from a min heap given itsindex. Explain how to do this in worst-case Θ(lg n) time.

7. (??) Given an array of n comparable values, show how to turn it into a min heap.There is a Θ(n) worst-case implementation.

Min Heap Solutions

1. (a) 4

(b) No sifting is required.

5

9 6

14 11 7 8

15 16 12 15 8

(c) Here we must sift-up performing 3 swaps.

1

9 5

14 11 6 8

15 16 12 15 8 7

(d) We swap the 7 into the root, remove the 1, and then sift-down (swap 7 with 5then 6).

5

9 6

14 11 7 8

15 16 12 15 8

13

Page 14: Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary …brettb/dsSum2016/Lecture13.pdf · 2016-07-15 · Data Structures Brett Bernstein Lecture 13: VLA Trees and Binary

(e) We swap the 5 with the 8, remove the 5, and then sift-down (swap 8 with 6 then7).

6

9 7

14 11 8 8

15 16 12 15

2. Just use a min heap but reverse the ordering.

3. Both are Θ(lg n) since our tree is always well-balanced.

4. Add all elements to a min heap, and then repeatedly call removeMin to pull them outin order. This is called Heapsort (better than our AVL Tree sorting algorithm abovedue to the low constant on all heap operations).

5. Θ(n) in the worst-case (consider a really large value that isn't in the heap) since theheap ordering property doesn't aid in searches like the BST property does.

6. We use the following steps.

(a) Swap the item to be removed with the last item and then remove the last item.

(b) The swap may have broken the ordering property so:

i. Check if the swapped item is smaller than its parent. If so, do the sift-upprocedure on it.

ii. Otherwise, do the sift-down procedure on it.

7. The slow method is to just call add n times giving a worst-case Θ(n lg n) runtime. Abetter method is to loop backwards through the array and run the sift-down procedureon every value. To see why the runtime is Θ(n) we consider the work done by sift-downat every node. For simplicity, let's assume every level of the heap is full. Let the heightof a node be the height of the subtree it is the root of. All nodes will require at most Csteps to compare them with their children in the sift-down procedure. Nodes of heightat least one will require at most an extra C steps, since they they could undergo aswap. Nodes of height at least two will require an extra C steps on top of that, and soforth. But, each time we increase the height we halve the number of nodes we consider.Since

Cn + Cn/2 + Cn/4 + · · · = 2Cn = Θ(n)

the result follows.

14