cmpsci 187 1 introduction to introduction to programming with data structures lecture 21 balanced...
TRANSCRIPT
1
CMPSCI 187CMPSCI 187
Introduction to Introduction to Programming with Data Structures
Introduction to Introduction to Programming with Data Structures
Lecture 21Lecture 21Balanced Trees, AVL Trees, and Balanced Trees, AVL Trees, and
m-way Treesm-way Trees
Announcements:Announcements:
Lecture 21Lecture 21Balanced Trees, AVL Trees, and Balanced Trees, AVL Trees, and
m-way Treesm-way Trees
Announcements:Announcements:
Computer Science 187Computer Science 187Computer Science 187Computer Science 187
2
CMPSCI 187CMPSCI 187
Binary Search TreesBinary Search Trees
What’s the structure of the binary search tree I get in each case when I construct a tree from the keys: 1 , 2 , 3 , 4 , 5 , 6? 6 , 5 , 4 , 3 , 2 , 1? 3 , 2 , 5 , 1 , 4 , 6?
3
CMPSCI 187CMPSCI 187
Balanced TreesBalanced Trees Consider adding 7 chars : a, b, c, d, e, f, and g to an
initially empty binary search tree two ways. in order: a b c d e f g in ‘random’ order: d f b a e c g
a
b
c
d
e
f
f
b
a c
f
e g
d
The order in which data values arrive can make a HUGE difference in what the tree looks like.
“UNBALANCED”
“BALANCED”
Search = O(n) Search = O(log n)
4
CMPSCI 187CMPSCI 187
Definition of Balanced Definition of Balanced
balance: a tree node attribute representing the difference in height between the node’s subtrees.
The ‘balance’ attribute for a balanced binary tree is
-1, 0, or 1. The definition is recursive and holds for all ‘root’
nodes and their left and right subtrees. Achieving balance is important for minimizing the
time required for search. We will look at AVL trees but not in a lot of detail.
5
CMPSCI 187CMPSCI 187
Schemes for Balancing TreesSchemes for Balancing Trees
AVL trees [Adelson-Velskii and Landis1962] named for initials of Russian creators uses rotations to ensure heights of child trees differ by at most 1
23-Trees [Hopcroft 1970] similar to 234-tree, but repairs have to move back up the tree
B-Trees [Bayer & McCreight 1972] Red-Black Trees [Bayer1972]
not the original name Red-black convention & relation to 234-trees [Guibas & Stolfi
1978] Splay Trees [Sleator & Tarjan 1983] Skip Lists [Pugh 1990]
developed at Cornell
6
CMPSCI 187CMPSCI 187
AVL TreesAVL Trees
AVL trees are binary search trees with a balance condition.
First simple idea: left and right subtrees of root must have the same height…... but not good enough:
Balanced but not shallow
In an AVL tree, the height of the left and right subtree of every node can differ by at most 1.
7
CMPSCI 187CMPSCI 187
AVL Tree ExampleAVL Tree Example
10
5 16
2 8
1
02
0
1 0
AVL Tree Not an AVL Tree
Search, insert, etc. Complexity = O(h), where h is height of the tree.
Height of an AVL tree = O(log n), where n is the number of nodes.
10
5 16
2 8 12
12
1 0
10
8
CMPSCI 187CMPSCI 187
Insertion in an AVL TreeInsertion in an AVL Tree
Height of tree rooted at 5 does not change. Tree is still balanced
Insert 7
10
5 16
2 8
7
12
12
0
1 1
10
5 16
2 8 12
12
1 0
10
0
10
9
CMPSCI 187CMPSCI 187
Insertion Example 2Insertion Example 2
Inserting a node causes heights in tree to change
Tree is no longer balanced
Insert 0
10
5 16
2 8 12
13
2 2
00
0
1
0 2
1
0 0
hmmmm….
10
5 16
2 8 12
12
1 0
10
11
10
CMPSCI 187CMPSCI 187
When and Where do Heights Change?When and Where do Heights Change?
Therefore there are four cases to consider Insertion into the left subtree of the left child of 5 Case 1 Insertion into the right subtree of the left child of 5 Case 2 Insertion into the left subtree of the right child of 5 Case 3 Insertion into the right subtree of the right child of 5 Case 4
First and fourth are symmetric, second and third are symmetric
10
5 16
2 8 12
1
T1 T2 T3 T4
Assume this is the node where an imbalance occurs
Insertions here could cause the imbalance.(anywhere else?)
Therefore, the height of these nodes differ by more than 1.
11
CMPSCI 187CMPSCI 187
Case 1(right rotation): Single Rotation
Case 1(right rotation): Single Rotation
Binary search tree property holds in both trees Balance is achieved in the 'rotated' tree
n2
n1
T1
T2
T3
Tree Unbalanced
1. Make N1 the root node2. Make T2 the left child of n2
Tree balanced
n2
n1
T1T2 T3
root root
12
CMPSCI 187CMPSCI 187
Rotation to the RightRotation to the Right
Algorithm for rotation (toward the right):1. Save value of root.left (temp = root.left)
2. Set root.left to value of root.left.right
3. Set temp.right to root
4. Set root to temp Algorithm for rotation toward the left is similar - you
do it. roottemp
13
CMPSCI 187CMPSCI 187
Case 4 (left rotation) Single Rotation
Case 4 (left rotation) Single Rotation
n1
n2
T3
T2
T1
Tree Unbalanced
Binary search tree property holds in both trees Balance is achieved in the 'rotated' tree
1. Make n2 the root node2. Make T2 the right child of n1
Tree balanced
n1
n2
T3T2T1
14
CMPSCI 187CMPSCI 187
Case 2 (left-right): TroubleCase 2 (left-right): Trouble
Single Rotation fails to fix this case Need a more complex manipulation of the tree
n2
n1
T1
T3
Tree Unbalanced
T2
n2
n1
T2
T1
T3
Tree Still Unbalanced
15
CMPSCI 187CMPSCI 187
Case 2 (left-right): Double Rotation(two single rotations)
Case 2 (left-right): Double Rotation(two single rotations)
Tree Unbalanced
n2
n1
T1
T4n3
T2 T3
Old T2
n2
n1
T1
T3
Original Tree
T2
Solution:1. Rotate n3 into n12. Rotate n3 into n2
Expand T2 in original tree into a node and two subtrees We know that neither n1 nor n2 works as the root Solution is then two single rotations to get n3 at root
16
CMPSCI 187CMPSCI 187
Case 2 (left-right): Double Rotation(two single rotations)
Case 2 (left-right): Double Rotation(two single rotations)
Binary search tree property holds in both trees Balance is achieved in the 'rotated' tree
n2
n1
T1
T4n3
T2 T3
n2
n1
T1
T4
n3
T2
T3
n2n1
T1 T4
n3
T2 T3
Rotate n3 into n1 Rotate n3 into n2
17
CMPSCI 187CMPSCI 187
Case 3 (right-left): Double Rotation(two single rotations)
Case 3 (right-left): Double Rotation(two single rotations)
Binary search tree property holds in both trees Balance is achieved in the 'rotated' tree
n2
n1
T4
T1n3
T2 T3
n2 n1
T4T1
n3
T2 T3
1. Rotate n3 into n12. Rotate n3 into n2
18
CMPSCI 187CMPSCI 187
Unbalanced Search TreesUnbalanced Search Trees
Left-Left (parent balance is -2, left child balance is -1) Rotate right around parent
Left-Right (parent balance -2, left child balance +1) Rotate left around child Rotate right around parent
Right-Left (parent balance +2, right child balance -1) Rotate right around child Rotate left around parent
Right-Right (parent balance +2, right child balance +1) Rotate left around parent
19
CMPSCI 187CMPSCI 187
Book SolutionBook Solution
Add boolean flag to indicate height increase
Add +1/0/-1 balance indicator
20
CMPSCI 187CMPSCI 187
A new AVL insertKey MethodA new AVL insertKey Method
Assume our BinaryNode has a fourth attribute called height You make the modification to BinaryNode to accomplish this We'll write a new insertKeyNode method to handle balancing
the tree after insertion Insert pretty much as before After insertion, call balancing routines, each of which
implements one of our four cases:rotateLeft rotateRight (cases 1 and 4)rotateLeftRight rotateRightLeft (cases 2 and 3)
Our old method insertKey doesn't change More or less a skeleton - some details will be missing
21
CMPSCI 187CMPSCI 187
Case 1(right): Single RotationrotateRight()
Case 1(right): Single RotationrotateRight()
n2
n1
T1
T2
T3
n2
n1
T1T2 T3
Detach and save T2Make n1 reference n2 as right childAttach T2 as left child of n2[Update heights] - here or elsewherereturn new root node
public BinaryNode rotateRight(BinaryNode nodeN2){ BinaryNode nodeN1 = (BinaryNode)nodeN2.getLeftChild(); nodeN2.setLeftChild(nodeN1.getRightChild()); nodeN1.setRightChild(nodeN2); return nodeN1; } // end rotateRight
n2 is the root of the treen1 is the left child of the root
Case 4: left is very similar to this case.
22
CMPSCI 187CMPSCI 187 Case 2: rotateLeftRight()(double rotation)
Case 2: rotateLeftRight()(double rotation)
public BinaryNode rotateLeftRight(BinaryNode nodeN){ BinaryNode nodeC = (BinaryNode)nodeN.getLeftChild(); nodeN.setLeftChild(rotateLeft(nodeC)); return rotateRight(nodeN);} // end rotateLeftRight
n2
n1
T1
T4n3
T2 T3
n2n1
T1 T4
n3
T2 T3
n2
n1
T1
T4
n3
T2T3
Rotate n3 into n1 Rotate n3 into n2
23
CMPSCI 187CMPSCI 187
Inserting a node in an AVL Tree(insertKeyNode())
Inserting a node in an AVL Tree(insertKeyNode())
private BinaryNode rebalance(BinaryNode nodeN){ int heightDifference = getHeightDifference(nodeN); if (heightDifference > 1) { // left subtree is taller by more than 1,so addition was in left subtree if (getHeightDifference((BinaryNode)nodeN.getLeftChild()) > 0)
// addition was in left subtree of left childnodeN = TreeRotations.rotateRight(nodeN);
else // addition was in right subtree of left child nodeN = TreeRotations.rotateLeftRight(nodeN); } else if (heightDifference < -1) ………….similar to above….you do it // else nodeN is balanced return nodeN;} // end rebalance
24
CMPSCI 187CMPSCI 187
Removal from AVL TreesRemoval from AVL Trees
Removal causes the same kind of imbalance problems Solution is basically same as for insertion Add a field called decrease to note height change or use
the height difference field in the node to compute it. Adjust the local node’s balance
Rebalance as necessary The balance changed and balancing methods must set decrease appropriately or update the height field
Actual removal is as for binary search tree Involves moving values, and Deleting a suitable leaf node
25
CMPSCI 187CMPSCI 187
Performance of AVL TreesPerformance of AVL Trees
Worst case height 1.44 log n Thus, lookup, insert, remove all O(log n) Empirical cost is 0.25+log n
comparisons to insert
26
CMPSCI 187CMPSCI 187
B-TreesB-Trees Not really binary trees at all. Almost all file systems on almost all computers use B-Trees to keep
track of which portions of which files are in which disk sectors. The selection of choice for very large disk resident databases.
Why??? Very important in computer science.
Disk directoriesDisk resident databases……...
B-Trees are an example of multiway trees. In multiway trees, nodes can have multiple data elements (in contrast
to one for a binary tree node). Each node in a B-Tree can represent possibly many subtrees.
27
CMPSCI 187CMPSCI 187
m-Way Treesm-Way Trees
An m-way tree is a search tree in which each node can have from zero to m subtrees.
m is defined as the order of the tree. In a nonempty m-way tree:
Each node has 0 to m subtrees.Given a node with k<m subtrees, the node contains k
subtrees (some of which may be null) and k-1 data entries.
The keys are ordered, key1<=key2<=key3<=….<=keyk-1.
The key values in the first subtree are less than the key values in the first entry etc.
m-way trees can still be unbalanced (but wait…) See 2-3 and 2-3-4 trees in book….
28
CMPSCI 187CMPSCI 187
An m-way treeAn m-way tree
A 4-way Tree
Keys
Subtrees
K1 K2 K3
Keys < K1 K1 <=Keys < K2 K2 <=Keys < K3 Keys >= K3
A binary search tree is an m-way tree of order 2 or a 2-way tree.
29
CMPSCI 187CMPSCI 187
B-TreesB-Trees
A B-Tree is an m-way tree with the following additional properties:
The root is either a leaf or it has 2….m subtrees.All internal nodes have at least m/2 non-null
subtrees and at most m nonnull subtrees.All leaf nodes are at the same level; that is, the tree is
perfectly balanced.A leaf node has at least m/2 -1 and at the most m-1
key entries. There are four basic operations for B-Trees:
insert (add)delete (remove)traversesearch
30
CMPSCI 187CMPSCI 187
A B-tree of Order 5* (m=5)A B-tree of Order 5* (m=5)
*Min # of subtrees is 3 and max is 5;
*Min # of entries is 2 and max is 4
42
11 14 17 19 20 21 22 23 24 45 52 63 65 74 78 79 85 87 94 97
16 21 58 76 81 93
Root
Node with minimum entries (2)
Node with maximumentries (4)
Four keys, five subtrees
31
CMPSCI 187CMPSCI 187
InsertionInsertion
B-tree insertion takes place at a leaf node. Step 1: locate the leaf node for the data being
inserted. if node is not full (max no. of entries) then insert data in
sequence in the node.
When leaf node is full, we have an overflow condition. Insert the element anyway (temporarily violate tree
conditions) Split node into two nodes Each new node contains half the data middle entry is promoted to the parent (which may in turn
become full!)
B-trees grow in a balanced fashion: bottom up!
32
CMPSCI 187CMPSCI 187
Follow Through An ExampleFollow Through An Example
Given a B-Tree structure of order m=5. Insert 11, 21, 14, 78, and 97. Because order 5, a single node can contain a maximum of 4 (m -1) entries. Step 1.
11 causes the creation of a new node that becomes
the root of the tree.As 21, 14, and 78 are inserted, they are just added (in order) to the root node
(which is the only node in the tree at this point.
Inserting 97 causes a problem, because the node where it should go (the root) is full.
11
root
11 14 21 78
root
33
CMPSCI 187CMPSCI 187
Inserting 97Inserting 97 When root node is full (that is, the node where the current value should go):
CHEAT! Insert 97 in the node anyway.
Now, because the node is larger than allowed, split it into two nodes:
Propagate median value (21) to root node and insert it there (causes creation of a new root node in this case).
11 14 21 78
root
97 Violation!
11 14 21 78 97
34
CMPSCI 187CMPSCI 187
Creation of a new Root NodeCreation of a new Root Node
Tree grows ‘from bottom up’. Tree is always balanced. Depending upon m (typically 100-1000), tree is very shallow -> search is efficient.
11 14 78 97
21
35
CMPSCI 187CMPSCI 187
Continuing the ExampleContinuing the Example
Suppose I now add the following keys to the tree: 85, 74, 63, 42, 45, 57.
Inserting 85 then 74
11 14 78 85
21
97
12
74
Now insert 63…what happens
36
CMPSCI 187CMPSCI 187
Example, cont’d.Example, cont’d. 63 causes the node to overflow - but add it anyway!
11 14 78 85
21
97
3
7463
This node violates the B-tree conditionsso it must be split.
78 85 977463
split it up
37
CMPSCI 187CMPSCI 187
Example: Splitting a nodeExample: Splitting a node
85 977463
78
1
23
4
1. Median value is to be sent to parent node - 78 here2,3: Create a temporary root node with one entry (78) and attach links to right and left subtrees4. Insert this node into the nodelist of the parent
38
CMPSCI 187CMPSCI 187
Example: Tree after inserting 63Example: Tree after inserting 63
Now insert 45 and 42 Then insert 57
11 14 85
21
977463
78
39
CMPSCI 187CMPSCI 187
Example: After adding 42, 45, and 57Example: After adding 42, 45, and 57
Now add 20, 16, and 19
11 14 7463 85 97
21 57 78
4542
40
CMPSCI 187CMPSCI 187
Tree after inserting 20, 16, and 19 Tree after inserting 20, 16, and 19
Now insert 52, 30
11 14 85 974542 7463
21 57 7816
2019
52454230
Then 22 5245423022 VIOLATION: SPLIT
41
CMPSCI 187CMPSCI 187
The Final TreeThe Final Tree
Yggdrasil, the World Tree
42
CMPSCI 187CMPSCI 187
11 14 3022 74632019
42
2116 57 78
5245 85 97
The Final TreeThe Final Tree
B-Tree node deletion is equally as interesting. All deletes take place at a leaf node (when not at a leaf, substitute
data must be found). Underflow can occur when the number of elements in a root falls
below the allowed minimum. May have to ‘borrow’ data from adjacent nodes and/or the parent.
43
CMPSCI 187CMPSCI 187
A Typical B-Tree NodeA Typical B-Tree Node
Suppose we want to represent a node in an order m B-Tree. m data elements, m+1 subtrees Suppose the class defining the tree node is IntBalancedSet
Or we could use a Linked List for each node and alternate keys and trees. Or…….
int[ ] data = new int [m+1]; //+1 for the cheat
int dataCount; //# of data elements in node
IntBalancedSet[ ] subset = new IntBalancedSet[m+2];
int childCount;
44
CMPSCI 187CMPSCI 187
The StructureThe Structure
•••
0 1 2 3 mm-1
dataCount 2 6 15 ? ? ? ?
data:
•••
0 1 2 3 mm-1
null null null
subset:
Smaller subsets:
data elements < 6 6< data elements < 15(or >=6 if duplicates allowed)
data elements > 15(or >=15 if duplicates allowed)
for data[i]: subset[i] - left subtree subset[i+1] - right subtree
45
CMPSCI 187CMPSCI 187
Some NumbersSome Numbers
105 words in a dictionary
106 words in Moby Dick
109 Social Security Numbers
1012 Phone numbers in the world
1015 people who ever lived
1020 grains of sand in the world
1025 manufactured bits of computer memory
1079 electrons in the universe
With 1000 way branching, we could: Find every single bit of memory ever manufactured with less
than 10 probes Find any single electron in the universe with less than 27
probes
46
CMPSCI 187CMPSCI 187
Why B-Trees are ImportantWhy B-Trees are Important
Form the basis for almost every file indexing system: Unix, Windows, Mac OS.
For a file index, cannot assume that the entire index will fit into memory (in fact, it can’t by definition)
Therefore, the file index resides on the disk. Big-O analysis assumes that all operations are equal - not true when
disk I/O is involved: CPUs: ~400 million operations per second Disks take on the order of 2-10 milliseconds to access a block of data So we can do about 500 disk accesses per second. At the same time, we can do about 400 million CPU operations BOTTOM LINE: disk accesses are VERY expensive (STILL!!!)
47
CMPSCI 187CMPSCI 187
A Practical ExampleA Practical Example
Suppose we want to computerize driver’s license information for the state of Massachusetts.
Assume we have a key of 32 bytes (a name), a 1024 byte record of data, and about 20 million records.
Assume this does not fit into memory and that we have about 1/20 of the resources of the system (other people use it as well).
Thus, in one second we can perform 20 million operations or perform 25 disk accesses.
Analyze the performance of various tree representations.
48
CMPSCI 187CMPSCI 187
A Practical ExampleA Practical Example
Unbalanced binary search tree: DISASTERSuccessful search ~1.38 logN disk accesses (average) ~36 disk accesses (or about 1-2 secs)Some accesses would take much longer.This is just to do the lookups to find our data record!
Red-Black Tree (haven’t discussed)also logN, although constant is a little better (~1 secs)
Can’t do better than logN with binary trees. Need to reduce the number of disk accesses to a small
constant, like 3 or 4. Answer is intuitive - if we have more branching, we have
less height in the tree and hence less accesses. Complete binary tree has height that is roughly log2N Complete m-way tree has height that is roughly logmN
49
CMPSCI 187CMPSCI 187
ReminderReminder
M-way trees are good for applications where the differences in access speeds are significant.
E.g. memory versus disk.
Core memory, circa 1960 5MB Disk, circa 1970
50
CMPSCI 187CMPSCI 187
A Bit of HistoryA Bit of History
(right)