cmpsci 187 1 introduction to introduction to programming with data structures lecture 21 balanced...

50
1 CMPSCI 187 CMPSCI 187 Introduction to Introduction to Programming with Data Structures Lecture 21 Lecture 21 Balanced Trees, AVL Trees, Balanced Trees, AVL Trees, and m-way Trees and m-way Trees Announcements: Announcements: Computer Science 187 Computer Science 187

Upload: adrian-long

Post on 03-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

1

CMPSCI 187CMPSCI 187

Introduction to Introduction to Programming with Data Structures

Introduction to Introduction to Programming with Data Structures

Lecture 21Lecture 21Balanced Trees, AVL Trees, and Balanced Trees, AVL Trees, and

m-way Treesm-way Trees

Announcements:Announcements:

Lecture 21Lecture 21Balanced Trees, AVL Trees, and Balanced Trees, AVL Trees, and

m-way Treesm-way Trees

Announcements:Announcements:

Computer Science 187Computer Science 187Computer Science 187Computer Science 187

Page 2: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

2

CMPSCI 187CMPSCI 187

Binary Search TreesBinary Search Trees

What’s the structure of the binary search tree I get in each case when I construct a tree from the keys: 1 , 2 , 3 , 4 , 5 , 6? 6 , 5 , 4 , 3 , 2 , 1? 3 , 2 , 5 , 1 , 4 , 6?

Page 3: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

3

CMPSCI 187CMPSCI 187

Balanced TreesBalanced Trees Consider adding 7 chars : a, b, c, d, e, f, and g to an

initially empty binary search tree two ways. in order: a b c d e f g in ‘random’ order: d f b a e c g

a

b

c

d

e

f

f

b

a c

f

e g

d

The order in which data values arrive can make a HUGE difference in what the tree looks like.

“UNBALANCED”

“BALANCED”

Search = O(n) Search = O(log n)

Page 4: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

4

CMPSCI 187CMPSCI 187

Definition of Balanced Definition of Balanced

balance: a tree node attribute representing the difference in height between the node’s subtrees.

The ‘balance’ attribute for a balanced binary tree is

-1, 0, or 1. The definition is recursive and holds for all ‘root’

nodes and their left and right subtrees. Achieving balance is important for minimizing the

time required for search. We will look at AVL trees but not in a lot of detail.

Page 5: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

5

CMPSCI 187CMPSCI 187

Schemes for Balancing TreesSchemes for Balancing Trees

AVL trees [Adelson-Velskii and Landis1962] named for initials of Russian creators uses rotations to ensure heights of child trees differ by at most 1

23-Trees [Hopcroft 1970] similar to 234-tree, but repairs have to move back up the tree

B-Trees [Bayer & McCreight 1972] Red-Black Trees [Bayer1972]

not the original name Red-black convention & relation to 234-trees [Guibas & Stolfi

1978] Splay Trees [Sleator & Tarjan 1983] Skip Lists [Pugh 1990]

developed at Cornell

Page 6: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

6

CMPSCI 187CMPSCI 187

AVL TreesAVL Trees

AVL trees are binary search trees with a balance condition.

First simple idea: left and right subtrees of root must have the same height…... but not good enough:

Balanced but not shallow

In an AVL tree, the height of the left and right subtree of every node can differ by at most 1.

Page 7: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

7

CMPSCI 187CMPSCI 187

AVL Tree ExampleAVL Tree Example

10

5 16

2 8

1

02

0

1 0

AVL Tree Not an AVL Tree

Search, insert, etc. Complexity = O(h), where h is height of the tree.

Height of an AVL tree = O(log n), where n is the number of nodes.

10

5 16

2 8 12

12

1 0

10

Page 8: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

8

CMPSCI 187CMPSCI 187

Insertion in an AVL TreeInsertion in an AVL Tree

Height of tree rooted at 5 does not change. Tree is still balanced

Insert 7

10

5 16

2 8

7

12

12

0

1 1

10

5 16

2 8 12

12

1 0

10

0

10

Page 9: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

9

CMPSCI 187CMPSCI 187

Insertion Example 2Insertion Example 2

Inserting a node causes heights in tree to change

Tree is no longer balanced

Insert 0

10

5 16

2 8 12

13

2 2

00

0

1

0 2

1

0 0

hmmmm….

10

5 16

2 8 12

12

1 0

10

11

Page 10: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

10

CMPSCI 187CMPSCI 187

When and Where do Heights Change?When and Where do Heights Change?

Therefore there are four cases to consider Insertion into the left subtree of the left child of 5 Case 1 Insertion into the right subtree of the left child of 5 Case 2 Insertion into the left subtree of the right child of 5 Case 3 Insertion into the right subtree of the right child of 5 Case 4

First and fourth are symmetric, second and third are symmetric

10

5 16

2 8 12

1

T1 T2 T3 T4

Assume this is the node where an imbalance occurs

Insertions here could cause the imbalance.(anywhere else?)

Therefore, the height of these nodes differ by more than 1.

Page 11: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

11

CMPSCI 187CMPSCI 187

Case 1(right rotation): Single Rotation

Case 1(right rotation): Single Rotation

Binary search tree property holds in both trees Balance is achieved in the 'rotated' tree

n2

n1

T1

T2

T3

Tree Unbalanced

1. Make N1 the root node2. Make T2 the left child of n2

Tree balanced

n2

n1

T1T2 T3

root root

Page 12: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

12

CMPSCI 187CMPSCI 187

Rotation to the RightRotation to the Right

Algorithm for rotation (toward the right):1. Save value of root.left (temp = root.left)

2. Set root.left to value of root.left.right

3. Set temp.right to root

4. Set root to temp Algorithm for rotation toward the left is similar - you

do it. roottemp

Page 13: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

13

CMPSCI 187CMPSCI 187

Case 4 (left rotation) Single Rotation

Case 4 (left rotation) Single Rotation

n1

n2

T3

T2

T1

Tree Unbalanced

Binary search tree property holds in both trees Balance is achieved in the 'rotated' tree

1. Make n2 the root node2. Make T2 the right child of n1

Tree balanced

n1

n2

T3T2T1

Page 14: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

14

CMPSCI 187CMPSCI 187

Case 2 (left-right): TroubleCase 2 (left-right): Trouble

Single Rotation fails to fix this case Need a more complex manipulation of the tree

n2

n1

T1

T3

Tree Unbalanced

T2

n2

n1

T2

T1

T3

Tree Still Unbalanced

Page 15: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

15

CMPSCI 187CMPSCI 187

Case 2 (left-right): Double Rotation(two single rotations)

Case 2 (left-right): Double Rotation(two single rotations)

Tree Unbalanced

n2

n1

T1

T4n3

T2 T3

Old T2

n2

n1

T1

T3

Original Tree

T2

Solution:1. Rotate n3 into n12. Rotate n3 into n2

Expand T2 in original tree into a node and two subtrees We know that neither n1 nor n2 works as the root Solution is then two single rotations to get n3 at root

Page 16: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

16

CMPSCI 187CMPSCI 187

Case 2 (left-right): Double Rotation(two single rotations)

Case 2 (left-right): Double Rotation(two single rotations)

Binary search tree property holds in both trees Balance is achieved in the 'rotated' tree

n2

n1

T1

T4n3

T2 T3

n2

n1

T1

T4

n3

T2

T3

n2n1

T1 T4

n3

T2 T3

Rotate n3 into n1 Rotate n3 into n2

Page 17: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

17

CMPSCI 187CMPSCI 187

Case 3 (right-left): Double Rotation(two single rotations)

Case 3 (right-left): Double Rotation(two single rotations)

Binary search tree property holds in both trees Balance is achieved in the 'rotated' tree

n2

n1

T4

T1n3

T2 T3

n2 n1

T4T1

n3

T2 T3

1. Rotate n3 into n12. Rotate n3 into n2

Page 18: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

18

CMPSCI 187CMPSCI 187

Unbalanced Search TreesUnbalanced Search Trees

Left-Left (parent balance is -2, left child balance is -1) Rotate right around parent

Left-Right (parent balance -2, left child balance +1) Rotate left around child Rotate right around parent

Right-Left (parent balance +2, right child balance -1) Rotate right around child Rotate left around parent

Right-Right (parent balance +2, right child balance +1) Rotate left around parent

Page 19: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

19

CMPSCI 187CMPSCI 187

Book SolutionBook Solution

Add boolean flag to indicate height increase

Add +1/0/-1 balance indicator

Page 20: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

20

CMPSCI 187CMPSCI 187

A new AVL insertKey MethodA new AVL insertKey Method

Assume our BinaryNode has a fourth attribute called height You make the modification to BinaryNode to accomplish this We'll write a new insertKeyNode method to handle balancing

the tree after insertion Insert pretty much as before After insertion, call balancing routines, each of which

implements one of our four cases:rotateLeft rotateRight (cases 1 and 4)rotateLeftRight rotateRightLeft (cases 2 and 3)

Our old method insertKey doesn't change More or less a skeleton - some details will be missing

Page 21: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

21

CMPSCI 187CMPSCI 187

Case 1(right): Single RotationrotateRight()

Case 1(right): Single RotationrotateRight()

n2

n1

T1

T2

T3

n2

n1

T1T2 T3

Detach and save T2Make n1 reference n2 as right childAttach T2 as left child of n2[Update heights] - here or elsewherereturn new root node

public BinaryNode rotateRight(BinaryNode nodeN2){ BinaryNode nodeN1 = (BinaryNode)nodeN2.getLeftChild(); nodeN2.setLeftChild(nodeN1.getRightChild()); nodeN1.setRightChild(nodeN2); return nodeN1; } // end rotateRight

n2 is the root of the treen1 is the left child of the root

Case 4: left is very similar to this case.

Page 22: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

22

CMPSCI 187CMPSCI 187 Case 2: rotateLeftRight()(double rotation)

Case 2: rotateLeftRight()(double rotation)

public BinaryNode rotateLeftRight(BinaryNode nodeN){ BinaryNode nodeC = (BinaryNode)nodeN.getLeftChild(); nodeN.setLeftChild(rotateLeft(nodeC)); return rotateRight(nodeN);} // end rotateLeftRight

n2

n1

T1

T4n3

T2 T3

n2n1

T1 T4

n3

T2 T3

n2

n1

T1

T4

n3

T2T3

Rotate n3 into n1 Rotate n3 into n2

Page 23: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

23

CMPSCI 187CMPSCI 187

Inserting a node in an AVL Tree(insertKeyNode())

Inserting a node in an AVL Tree(insertKeyNode())

private BinaryNode rebalance(BinaryNode nodeN){ int heightDifference = getHeightDifference(nodeN); if (heightDifference > 1) { // left subtree is taller by more than 1,so addition was in left subtree if (getHeightDifference((BinaryNode)nodeN.getLeftChild()) > 0)

// addition was in left subtree of left childnodeN = TreeRotations.rotateRight(nodeN);

else // addition was in right subtree of left child nodeN = TreeRotations.rotateLeftRight(nodeN); } else if (heightDifference < -1) ………….similar to above….you do it // else nodeN is balanced return nodeN;} // end rebalance

Page 24: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

24

CMPSCI 187CMPSCI 187

Removal from AVL TreesRemoval from AVL Trees

Removal causes the same kind of imbalance problems Solution is basically same as for insertion Add a field called decrease to note height change or use

the height difference field in the node to compute it. Adjust the local node’s balance

Rebalance as necessary The balance changed and balancing methods must set decrease appropriately or update the height field

Actual removal is as for binary search tree Involves moving values, and Deleting a suitable leaf node

Page 25: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

25

CMPSCI 187CMPSCI 187

Performance of AVL TreesPerformance of AVL Trees

Worst case height 1.44 log n Thus, lookup, insert, remove all O(log n) Empirical cost is 0.25+log n

comparisons to insert

Page 26: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

26

CMPSCI 187CMPSCI 187

B-TreesB-Trees Not really binary trees at all. Almost all file systems on almost all computers use B-Trees to keep

track of which portions of which files are in which disk sectors. The selection of choice for very large disk resident databases.

Why??? Very important in computer science.

Disk directoriesDisk resident databases……...

B-Trees are an example of multiway trees. In multiway trees, nodes can have multiple data elements (in contrast

to one for a binary tree node). Each node in a B-Tree can represent possibly many subtrees.

Page 27: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

27

CMPSCI 187CMPSCI 187

m-Way Treesm-Way Trees

An m-way tree is a search tree in which each node can have from zero to m subtrees.

m is defined as the order of the tree. In a nonempty m-way tree:

Each node has 0 to m subtrees.Given a node with k<m subtrees, the node contains k

subtrees (some of which may be null) and k-1 data entries.

The keys are ordered, key1<=key2<=key3<=….<=keyk-1.

The key values in the first subtree are less than the key values in the first entry etc.

m-way trees can still be unbalanced (but wait…) See 2-3 and 2-3-4 trees in book….

Page 28: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

28

CMPSCI 187CMPSCI 187

An m-way treeAn m-way tree

A 4-way Tree

Keys

Subtrees

K1 K2 K3

Keys < K1 K1 <=Keys < K2 K2 <=Keys < K3 Keys >= K3

A binary search tree is an m-way tree of order 2 or a 2-way tree.

Page 29: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

29

CMPSCI 187CMPSCI 187

B-TreesB-Trees

A B-Tree is an m-way tree with the following additional properties:

The root is either a leaf or it has 2….m subtrees.All internal nodes have at least m/2 non-null

subtrees and at most m nonnull subtrees.All leaf nodes are at the same level; that is, the tree is

perfectly balanced.A leaf node has at least m/2 -1 and at the most m-1

key entries. There are four basic operations for B-Trees:

insert (add)delete (remove)traversesearch

Page 30: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

30

CMPSCI 187CMPSCI 187

A B-tree of Order 5* (m=5)A B-tree of Order 5* (m=5)

*Min # of subtrees is 3 and max is 5;

*Min # of entries is 2 and max is 4

42

11 14 17 19 20 21 22 23 24 45 52 63 65 74 78 79 85 87 94 97

16 21 58 76 81 93

Root

Node with minimum entries (2)

Node with maximumentries (4)

Four keys, five subtrees

Page 31: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

31

CMPSCI 187CMPSCI 187

InsertionInsertion

B-tree insertion takes place at a leaf node. Step 1: locate the leaf node for the data being

inserted. if node is not full (max no. of entries) then insert data in

sequence in the node.

When leaf node is full, we have an overflow condition. Insert the element anyway (temporarily violate tree

conditions) Split node into two nodes Each new node contains half the data middle entry is promoted to the parent (which may in turn

become full!)

B-trees grow in a balanced fashion: bottom up!

Page 32: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

32

CMPSCI 187CMPSCI 187

Follow Through An ExampleFollow Through An Example

Given a B-Tree structure of order m=5. Insert 11, 21, 14, 78, and 97. Because order 5, a single node can contain a maximum of 4 (m -1) entries. Step 1.

11 causes the creation of a new node that becomes

the root of the tree.As 21, 14, and 78 are inserted, they are just added (in order) to the root node

(which is the only node in the tree at this point.

Inserting 97 causes a problem, because the node where it should go (the root) is full.

11

root

11 14 21 78

root

Page 33: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

33

CMPSCI 187CMPSCI 187

Inserting 97Inserting 97 When root node is full (that is, the node where the current value should go):

CHEAT! Insert 97 in the node anyway.

Now, because the node is larger than allowed, split it into two nodes:

Propagate median value (21) to root node and insert it there (causes creation of a new root node in this case).

11 14 21 78

root

97 Violation!

11 14 21 78 97

Page 34: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

34

CMPSCI 187CMPSCI 187

Creation of a new Root NodeCreation of a new Root Node

Tree grows ‘from bottom up’. Tree is always balanced. Depending upon m (typically 100-1000), tree is very shallow -> search is efficient.

11 14 78 97

21

Page 35: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

35

CMPSCI 187CMPSCI 187

Continuing the ExampleContinuing the Example

Suppose I now add the following keys to the tree: 85, 74, 63, 42, 45, 57.

Inserting 85 then 74

11 14 78 85

21

97

12

74

Now insert 63…what happens

Page 36: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

36

CMPSCI 187CMPSCI 187

Example, cont’d.Example, cont’d. 63 causes the node to overflow - but add it anyway!

11 14 78 85

21

97

3

7463

This node violates the B-tree conditionsso it must be split.

78 85 977463

split it up

Page 37: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

37

CMPSCI 187CMPSCI 187

Example: Splitting a nodeExample: Splitting a node

85 977463

78

1

23

4

1. Median value is to be sent to parent node - 78 here2,3: Create a temporary root node with one entry (78) and attach links to right and left subtrees4. Insert this node into the nodelist of the parent

Page 38: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

38

CMPSCI 187CMPSCI 187

Example: Tree after inserting 63Example: Tree after inserting 63

Now insert 45 and 42 Then insert 57

11 14 85

21

977463

78

Page 39: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

39

CMPSCI 187CMPSCI 187

Example: After adding 42, 45, and 57Example: After adding 42, 45, and 57

Now add 20, 16, and 19

11 14 7463 85 97

21 57 78

4542

Page 40: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

40

CMPSCI 187CMPSCI 187

Tree after inserting 20, 16, and 19 Tree after inserting 20, 16, and 19

Now insert 52, 30

11 14 85 974542 7463

21 57 7816

2019

52454230

Then 22 5245423022 VIOLATION: SPLIT

Page 41: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

41

CMPSCI 187CMPSCI 187

The Final TreeThe Final Tree

Yggdrasil, the World Tree

Page 42: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

42

CMPSCI 187CMPSCI 187

11 14 3022 74632019

42

2116 57 78

5245 85 97

The Final TreeThe Final Tree

B-Tree node deletion is equally as interesting. All deletes take place at a leaf node (when not at a leaf, substitute

data must be found). Underflow can occur when the number of elements in a root falls

below the allowed minimum. May have to ‘borrow’ data from adjacent nodes and/or the parent.

Page 43: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

43

CMPSCI 187CMPSCI 187

A Typical B-Tree NodeA Typical B-Tree Node

Suppose we want to represent a node in an order m B-Tree. m data elements, m+1 subtrees Suppose the class defining the tree node is IntBalancedSet

Or we could use a Linked List for each node and alternate keys and trees. Or…….

int[ ] data = new int [m+1]; //+1 for the cheat

int dataCount; //# of data elements in node

IntBalancedSet[ ] subset = new IntBalancedSet[m+2];

int childCount;

Page 44: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

44

CMPSCI 187CMPSCI 187

The StructureThe Structure

•••

0 1 2 3 mm-1

dataCount 2 6 15 ? ? ? ?

data:

•••

0 1 2 3 mm-1

null null null

subset:

Smaller subsets:

data elements < 6 6< data elements < 15(or >=6 if duplicates allowed)

data elements > 15(or >=15 if duplicates allowed)

for data[i]: subset[i] - left subtree subset[i+1] - right subtree

Page 45: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

45

CMPSCI 187CMPSCI 187

Some NumbersSome Numbers

105 words in a dictionary

106 words in Moby Dick

109 Social Security Numbers

1012 Phone numbers in the world

1015 people who ever lived

1020 grains of sand in the world

1025 manufactured bits of computer memory

1079 electrons in the universe

With 1000 way branching, we could: Find every single bit of memory ever manufactured with less

than 10 probes Find any single electron in the universe with less than 27

probes

Page 46: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

46

CMPSCI 187CMPSCI 187

Why B-Trees are ImportantWhy B-Trees are Important

Form the basis for almost every file indexing system: Unix, Windows, Mac OS.

For a file index, cannot assume that the entire index will fit into memory (in fact, it can’t by definition)

Therefore, the file index resides on the disk. Big-O analysis assumes that all operations are equal - not true when

disk I/O is involved: CPUs: ~400 million operations per second Disks take on the order of 2-10 milliseconds to access a block of data So we can do about 500 disk accesses per second. At the same time, we can do about 400 million CPU operations BOTTOM LINE: disk accesses are VERY expensive (STILL!!!)

Page 47: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

47

CMPSCI 187CMPSCI 187

A Practical ExampleA Practical Example

Suppose we want to computerize driver’s license information for the state of Massachusetts.

Assume we have a key of 32 bytes (a name), a 1024 byte record of data, and about 20 million records.

Assume this does not fit into memory and that we have about 1/20 of the resources of the system (other people use it as well).

Thus, in one second we can perform 20 million operations or perform 25 disk accesses.

Analyze the performance of various tree representations.

Page 48: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

48

CMPSCI 187CMPSCI 187

A Practical ExampleA Practical Example

Unbalanced binary search tree: DISASTERSuccessful search ~1.38 logN disk accesses (average) ~36 disk accesses (or about 1-2 secs)Some accesses would take much longer.This is just to do the lookups to find our data record!

Red-Black Tree (haven’t discussed)also logN, although constant is a little better (~1 secs)

Can’t do better than logN with binary trees. Need to reduce the number of disk accesses to a small

constant, like 3 or 4. Answer is intuitive - if we have more branching, we have

less height in the tree and hence less accesses. Complete binary tree has height that is roughly log2N Complete m-way tree has height that is roughly logmN

Page 49: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

49

CMPSCI 187CMPSCI 187

ReminderReminder

M-way trees are good for applications where the differences in access speeds are significant.

E.g. memory versus disk.

Core memory, circa 1960 5MB Disk, circa 1970

Page 50: CMPSCI 187 1 Introduction to Introduction to Programming with Data Structures Lecture 21 Balanced Trees, AVL Trees, and m-way Trees Announcements: Lecture

50

CMPSCI 187CMPSCI 187

A Bit of HistoryA Bit of History

(right)