2-3 trees extended tree. tree in which all empty subtrees are replaced by new nodes that are called...

106
2-3 Trees Extended tree. Tree in which all empty subtrees are replaced by new nodes that are called external nodes. Original nodes are called internal nodes.

Upload: milo-richardson

Post on 28-Dec-2015

254 views

Category:

Documents


3 download

TRANSCRIPT

2-3 Trees

• Extended tree. Tree in which all empty subtrees are replaced

by new nodes that are called external nodes. Original nodes are called internal nodes.

Extended Binary Tree

external node internal node

2-3 Tree Definition

• Every internal node is either a 2-node or a 3-node.

• A 2-node has one key and 2 children/subtrees.

All keys in left subtree are smaller than this key. All keys in right subtree are bigger than this key.

8 2-node

L R

2-3 Tree Definition• A 3-node has 2 keys and 3 children/subtrees; first

key is smaller than second key.

1 3 3-node

RML

All keys in left subtree are smaller than first key. All keys in middle subtree are bigger than first key and

smaller than second key. All keys in right subtree are bigger than second key.

• All external nodes are on the same level.

Example 2-3 Tree

15 20

8

4

1 3 5 6 30 409 17

1 38 2-node 3-node

Minimum # Of Pairs/Elements• Happens when all internal nodes are 2-nodes.

Minimum # Of Pairs/Elements

• Number of nodes = 2h – 1, where h is tree height (excluding external nodes).

• Each node has 1 (key, value) pair.

• So, minimum # of pairs = 2h – 1

Maximum # Of Pairs/Elements

• Happens when all internal nodes are 3-nodes.

• Full degree 3 tree.

• # of nodes = 1 + 3 + 32 + 33 + … + 3h-1 = (3h – 1)/2.

• Each node has 2 pairs.

• So, # of pairs = 3h – 1.

2-3 Tree Height Bounds

• 2h – 1 <= n <= 3h – 1.

• log3(n+1) <= h <= log2(n+1).

Node Structure

• 2-node uses LC, P1, and MC.

• 3-node uses all fields.

• May have optional parent field.

• Only internal nodes are represented!

LC P1 MC P2 RC

Search

15 20

8

4

1 3 5 6 30 409 17

External nodes not shown.

Insert

15 20

8

4

1 3 5 6 30 409 17

Insert pair with key = 16.

Insert

15 20

8

4

1 3 5 6 30 409

• Move P1 to P2.

• Insert as P1.

• Now insert a pair with key = 2.

• New pair goes into a 3-node.

16 17

Insert Into A Leaf 3-node• Insert new pair so that the 3 keys are in

ascending order.

• Move third key into a new 2-node.

1 2 3

1 2 3

• Insert second key and pointer to new 2-node into parent.

31

2

Insert

15 20

8

4

1 3 5 6 30 409

• Insert a pair with key = 2.

16 17

Insert

15 20

8

4

5 6 30 409 16 17

31

2

• Insert a pair with key = 2 plus a pointer into parent.

Insert

• Now, insert a pair with key = 18.

15 20

8

1

2 4

5 6 30 409 16 173

Insert Into A Leaf 3-node• Insert new pair so that the 3 keys are in

ascending order.

• Move third key into a new 2-node.

16 17 18

16 17 18

• Insert second key and pointer to new 2-node into parent.

1816

17

Insert

• Insert a pair with key = 18.

15 20

8

1

2 4

5 6 30 409 16 173

Insert

• Insert a pair with key = 17 plus a pointer into parent.

15 20

8

1

2 4

5 6 30 4093

18

16

17

Insert Into A Nonleaf 3-node• Insert new pair and pointer so that the 3

keys are in ascending order.

• Move third key and 3rd and 4th pointers into a new 2-node.

• Insert second key and pointer to new 2-node into parent.

15 17 20

15 17 20

2015

17

Insert

• Insert a pair with key = 17 plus a pointer into parent.

15 20

8

1

2 4

5 6 30 4093

18

16

17

Insert

• Insert a pair with key = 17 plus a pointer into parent.

8

1

2 4

5 6 30 4093 16

17

15

18

20

Insert

• Now, insert a pair with key = 7.

1

2 4

5 6 30 4093 16

15

18

20

8 17

Insert Into A Nonleaf 3-node• Insert new pair and pointer so that the 3

keys are in ascending order.

• Move third key and 3rd and 4th pointers into a new 2-node.

5 6 7

5 6 7

• Insert second key and pointer to new 2-node into parent.

75

6

Insert

• Now, insert a pair with key = 7.

1

2 4

5 6 30 4093 16

15

18

20

8 17

Insert

• Insert a pair with key = 6 plus a pointer into parent.

30 401

2 4

93 16

15

18

20

8 17

5

7

6

Insert Into A Nonleaf 3-node• Insert new pair and pointer so that the 3

keys are in ascending order.

• Move third key and 3rd and 4th pointers into a new 2-node.

2 4 6

2 4 6

• Insert second key and pointer to new 2-node into parent.

62

4

Insert

• Insert a pair with key = 6 plus a pointer into parent.

30 401

2 4

93 16

15

18

20

8 17

5

7

6

Insert

• Insert a pair with key = 4 plus a pointer into parent.

30 401 93 16

15

18

20

8 17

6

4

2

5 7

Insert

• Insert a pair with key = 8 plus a pointer into parent.

• There is no parent. So, create a new root.

30 401

93

16

15

18

206

8

2

5 7

417

Insert

• Height increases by 1.

30 401 93 16

15

18

2062

5 7

4 17

8

Delete

• Delete the pair with key = 8.

• Transform deletion from interior into deletion from a leaf.

• Replace by largest in left subtree.

15 20

8

1

2 4

5 6 30 409 16 173

Delete From A Leaf

• Delete the pair with key = 16.

• 3-node becomes 2-node.

15 20

8

1

2 4

5 6 30 409 16 173

Delete From A Leaf

• Delete the pair with key = 17.

• Deletion from a 2-node.

• Check one sibling and determine if it is a 3-node.

• If so borrow a pair and a subtree via parent node.

15 20

8

1

2 4

5 6 30 4093 17

Delete From A Leaf

• Delete the pair with key = 20.

• Deletion from a 2-node.

• Check one sibling and determine if it is a 3-node.

• If not, combine with sibling and parent pair.

15 30

8

1

2 4

5 6 93 20 40

Delete From A Leaf

• Delete the pair with key = 30.

• Deletion from a 3-node.

• 3-node becomes 2-node.

30 40

8

1

2 4

5 6 93

15

Delete From A Leaf8

1

2 4

5 6 93

15

40

• Delete the pair with key = 3.

• Deletion from a 2-node.

• Check one sibling and determine if it is a 3-node.

• If so borrow a pair and a subtree via parent node.

Delete From A Leaf8

1

2 5

94

15

40

• Delete the pair with key = 6.

• Deletion from a 2-node.

• Check one sibling and determine if it is a 3-node.

• If not, combine with sibling and parent pair.

6

Delete From A Leaf8

1 4 5 9

15

40

• Delete the pair with key = 40.

• Deletion from a 2-node.

• Check one sibling and determine if it is a 3-node.

• If not, combine with sibling and parent pair.

2

Delete From A Leaf8

1 4 5

• Parent pair was from a 2-node.

• Check one sibling and determine if it is a 3-node.

• If not, combine with sibling and parent pair.

2

9 15

Delete From A Leaf

1 4 5

• Parent pair was from a 2-node.

• Check one sibling and determine if it is a 3-node.

• No sibling, so must be the root.

• Discard root. Left child becomes new root.

9 15

2 8

Delete From A Leaf

1 4 5

• Height reduces by 1.

9 15

2 8

2-3-4 Trees

• Problems with 2-3 trees.

LC P1 MC P2 RC

2-3 node structure

• 2-nodes waste space.

• Overhead of moving a pair and pointers when changing between 2-node and 3-node use.

• Extend to 2-3-4 tree, which may be represented as a binary tree.

2-3-4 Tree Definition

• Every internal node is either a 2-, 3-, or 4-node.• 2- and 3-nodes have same properties as in a 2-3 tree.• A 4-node has 3 keys and 4 children; 1st key is smaller than

2nd key which is smaller than 3rd key. All keys in left subtree are smaller than 1st key. All keys in 2nd subtree are bigger than 1st key and smaller than

2nd key. All keys in 3rd subtree are bigger than 2nd key and smaller than

3rd key. All keys in right subtree are bigger than 3rd key.

• All external nodes are on the same level.

4-node

10 30 35

k < 10 10 < k < 30 30 < k < 35 k > 35

Minimum # Of Pairs

• Happens when all internal nodes are 2-nodes.

Minimum # Of Pairs

• Number of nodes = 2h – 1, where h is tree height (excluding external nodes).

• Each node has 1 (key, value) pair.

• So, minimum # of pairs = 2h – 1

Maximum # Of Pairs

• Happens when all internal nodes are 4-nodes.

• Full degree 4 tree.

• # of nodes = 1 + 4 + 42 + 43 + … + 4h-1 = (4h – 1)/3.

• Each node has 3 pairs.

• So, # of pairs = 4h – 1.

2-3-4 Tree Height Bounds

• 2h – 1 <= n <= 4h – 1.

• log4(n+1) <= h <= log2(n+1).

Node Structure

• 2-node uses LC, P1, and LMC.• 3-node uses LC, P1, LMC, P2, and RMC.• 4-node uses all fields.• Optional parent field.• Only internal nodes are represented!

LC P1 LMC P2 RCRMC P3

Two-Pass Insert

• Move down from root to a leaf.

• Insert in leaf.

• If leaf now has 4 pairs, split as below.

A B C D E

10 20 30 40

• Insert 20 and pointer to new 3-node into parent, as was done for 2-3 trees.

10

A B

20

30 40

C D E

Two-Pass Delete

• Transform interior delete to leaf delete.• Delete from a 3-node or 4-node leaf reduces leaf

degree.• Delete from a 2-node leaf.

Check one sibling and determine if it is a 3- or 4-node.

If so, borrow a pair and a subtree via parent node.

If not, combine with sibling 2-node and in-between pair in parent. Continue up the tree if parent was a 2-node.

One-Pass Operations

• No bottom-to-top pass.

• Can pipeline inserts.

• Can pipeline deletes from leaf nodes.

Top-Down Insert

• Bottom-up pass is triggered when new pair is inserted into a 4-node leaf.

• Split 4-nodes on the way down so you never insert into a 4-node leaf!

• Look before you leap! If the node you are about to move to is a 4-node,

split it into two 2-nodes. Then move to a 2-node.

Cases For 4-node Move

• The 4-node we attempt to move to may be: The root. Child of a 2-node. Child of a 3-node.

• It cannot be the child of a 4-node, because we will never be at a 4-node.

Root Is A 4-node

• Height of tree increases by 1.• Compare with y and then move to x or z.

x y z

a b c d

x

y

z

a b c d

4-node Left Child Of 2-node

• No change in height of subtree.

• Compare with x and then move to w or y.

• 4-node right child of 2-node is similar.

w x y

a b c d

z

e w

z

y

a b c d

x

e

4-node Left Child Of 3-node

• No change in height of subtree.• Compare with w and then move to v or x.• 4-node middle or right child of 3-node is similar.

v w x

a b c d

f

zy

e v x

a b c d

w y z

fe

Top-Down Delete

• Bottom-up pass is triggered when deletion is from a 2-node leaf.

• Look before you leap! May start at a 2-node root but may not be at any

other 2-node. If the node you are about to move to is a 2-

node, make it a 3-node or 4-node. Then move to the 3-node 4-node.

Cases For 2-node Move

• Moving to a 2-node root is permitted.• No other move to a 2-node is permitted.• Other attempts to move to a 2-node may be

classified as below. The 2-node’s nearest sibling is also a 2-node. The 2-node’s nearest sibling is a 3-node. The 2-node’s nearest sibling is a 4-node. In each of the preceding cases, the 2-node’s

parent may be a 2-node root, a 3-node, or a 4-node.

Root Is 2-node Leaf

• Delete root.

• Tree becomes empty.

x

Moving To 2-node Whose Nearest Sibling Is 2-node

• Current node is 2-node => at root.

x y z

a b c dx

y

z

a b c d

• Height decreases by 1.• Reapply moving rules before you move down.

• No change in height of subtree.

• Moving to middle or right child is similar.

• Current node is 4-node is also similar.

w x y

a b c d

z

ew

z

y

a b c d

x

e

• Current node is 3-node.

• Moving to w.

Moving To 2-node Whose Nearest Sibling Is 2-node

Moving To 2-node Whose Nearest Sibling Is 3-node

• Current node is 2-node => at root.

w

x

a b c

z

d

y

e

y

z

d ea

x

b

w

c• Moving to w.• No change in height of tree.

• Moving to right child 2-node is similar.

Moving To 2-node Whose Nearest Sibling Is 3-node

• No change in height of subtree.

• Moving to middle or right child 2-node is similar.

• Current node is 4-node is also similar.

• Current node is 3-node.

• Moving to v.

v

z

a c

w

f

b

y

d

x

e

y

z

da

x

f

e

w

b

v

c

Moving To 2-node Whose Nearest Sibling Is 4-node

• Current node is 2-node => at root.

• Moving to u.• No change in height of tree.

• Moving to right child 2-node is similar.

u

v

a b c

x

d

w

e

y

f

w

a

v

b

u

c

yx

d e f

Moving To 2-node Whose Nearest Sibling Is 4-node

• No change in height of subtree.

• Moving to middle or right child 2-node is similar.

• Current node is 4-node is also similar.

• Current node is 3-node.

• Moving to u.

u

z

a c

v

g

b

x

d

w

e

y

f

z

a

w

gv

b

u

c

yx

d e f

Binary Tree Representation Of 2-3-4 Trees

• Problems with 2-3-4 trees.

• 2- and 3-nodes waste space.

• Overhead of moving pairs and pointers when changing among 2-, 3-, and 4-node use.

• Represented as a binary tree for improved space and time performance.

2-3-4 node structure

LC P1 LMC P2 RCRMC P3

Representation Of 4-node

x y z

a b c d

a b c d

z

y

x

Representation Of 3-node

x y

a b c

a

b c

y

xa b

y

x c

or

Representation Of 2-node

a

x

b

x

a b

Example

10

7

8

1 5

30

40

20

25

35

45

60

3

Properties Of Binary Tree Representation

• Nodes and edges are colored. The root is black. Nonroot black node has a black edge from its

parent. Red node has a red edge from its parent.

• Can deduce edge color from node color and vice versa.

• Need to keep either edge or node colors, not both.

Red Black Trees

Colored Nodes Definition• Binary search tree.• Each node is colored red or black.• Root and all external nodes are black.• No root-to-external-node path has two

consecutive red nodes.• All root-to-external-node paths have the

same number of black nodes

Red Black Trees

Colored Edges Definition• Binary search tree.• Child pointers are colored red or black.• Pointer to an external node is black.• No root to external node path has two

consecutive red pointers.• Every root to external node path has the

same number of black pointers.

2-3-4 & Red-Black Equivalence

10

7

8

1 5

30

40

20

25

35

45

60

3

Red Black Tree

• The height of a red black tree that has n (internal) nodes is between log2(n+1) and 2log2(n+1).

• C++ STL implementation

• java.util.TreeMap => red black tree

Top-Down Insert

• Mimic 2-3-4 top-down algorithm.

• Split 4-nodes on the way down.

Root Is A 4-node

x y z

a b c d

x

y

z

a b c d

a b c d

z

y

x

a b c d

z

y

x

4-node Left Child Of 2-node

w x y

a b c d

z

e w

z

y

a b c d

x

e

z

w y

x

a b c d

e

z

w y

x

a b c d

e

4-node Left Child Of 3-node

v w x

a b c d

f

zy

e v x

a b c d

w y z

fe

y

v x

w

a b c d

z

e f

y

v x

w

a b c d

z

e f

4-node Left Child Of 3-node

v w x

a b c d

f

zy

e v x

a b c d

w y z

fe

y

v x

w

a b

z

e

f

c d

y

v x

w

a b c d

z

e f

4-node Middle Child Of 3-node

v

w y

x

b c

z

af

d e

x

w

v

ya

db c

z

e

f

w x y

b c d e

f

zv

a wa

b c

fy

d e

v x z

4-node Middle Child Of 3-node

d

z

w y

x

b c

v

a

f

e

x

w

v

ya

db c

z

e

f

w x y

b c d e

f

zv

a wa

b c

fy

d e

v x z

4-node Right Child Of 3-node

• One orientation of 3-node requires color flip.

• Other orientation requires RR rotation.

Top-Down Delete

• Mimic 2-3-4 top-down delete.

• Color flip followed by possible rotation.

Red-Black Analysis

• Less memory required than by 2-3-4 representation.

• Less time required by 4-node splits when red-black representation is used.

• O(log n) rotations per insert/delete.

B-Trees

• Extension of 2-3 and 2-3-4 trees to higher degree trees.

• Used to represent very large dictionaries that reside on disk.

AVL Trees

• n = 230 = 109 (approx).

• 30 <= height <= 43.

• When the AVL tree resides on a disk, up to 43 disk access are made for a search.

• This takes up to (approx) 4 seconds.

• Not acceptable.

Red-Black Trees

• n = 230 = 109 (approx).

• 30 <= height <= 60.

• When the AVL tree resides on a disk, up to 60 disk access are made for a search.

• This takes up to (approx) 6 seconds.

• Not acceptable.

m-way Search Trees

• Each node has up to m – 1 pairs and m children.

• m = 2 => binary search tree.

4-Way Search Tree

10 30 35

k < 10 10 < k < 30 30 < k < 35 k > 35

Maximum # Of Pairs

• Happens when all internal nodes are m-nodes.

• Full degree m tree.

• # of nodes = 1 + m + m2 + m3 + … + mh-1

= (mh – 1)/(m – 1).

• Each node has m – 1 pairs.

• So, # of pairs = mh – 1.

Capacity Of m-Way Search Tree

m = 2 m = 200 h = 3 7 8 * 106 - 1

h = 5 31 3.2 * 1011 - 1

h = 7 127 1.28 * 1016 - 1

Definition Of B-Tree

• Definition assumes external nodes (extended m-way search tree).

• B-tree of order m. m-way search tree. Not empty => root has at least 2 children. Remaining internal nodes (if any) have at least

ceil(m/2) children. External (or failure) nodes on same level.

2-3 And 2-3-4 Trees

• B-tree of order m. m-way search tree. Not empty => root has at least 2 children. Remaining internal nodes (if any) have at least

ceil(m/2) children. External (or failure) nodes on same level.

• 2-3 tree is B-tree of order 3.

• 2-3-4 tree is B-tree of order 4.

B-Trees Of Order 5 And 2

• B-tree of order m. m-way search tree. Not empty => root has at least 2 children. Remaining internal nodes (if any) have at least

ceil(m/2) children. External (or failure) nodes on same level.

• B-tree of order 5 is 3-4-5 tree (root may be 2-node though).

• B-tree of order 2 is full binary tree.

Minimum # Of Pairs• n = # of pairs.

• # of external nodes = n + 1.

• Height = h => external nodes on level h + 1.

level # of nodes

1 12 >= 23 >= 2*ceil(m/2)

h + 1 >= 2*ceil(m/2)h-1

n + 1 >= 2*ceil(m/2)h-1, h >= 1

Minimum # Of Pairs

• m = 200.

n + 1 >= 2*ceil(m/2)h-1, h >= 1

height # of pairs

2 >= 1993 >= 19,9994 >= 2 * 106 – 1

5 >= 2 * 108 – 1

h <= log ceil(m/2) (n+1)/2 + 1

Choice Of m

• Worst-case search time. (time to fetch a node + time to search node) * height (a + b*m + c * log2m) * h

where a, b and c are constants.

m

search time

50 400

Bottom-Up Insert

15 20

8

4

1 3 5 6 30 409

Insertion into a full leaf triggers bottom-up node splitting pass.

16 17

Split An Overfull Node

• ai is a pointer to a subtree.

• pi is a dictionary pair.

m a0 p1 a1 p2 a2 … pm am

ceil(m/2)-1 a0 p1 a1 p2 a2 … pceil(m/2)-1 aceil(m/2)-1

m-ceil(m/2) aceil(m/2) pceil(m/2)+1 aceil(m/2)+1 … pm am

• pceil(m/2) plus pointer to new node is inserted in parent.

Worst-Case Disk Accesses

15 204

1 3 5 6 30 4013 16 17

7 12

9

8 10

Insert 2.Insert 18.

Insert 14.

Worst-Case Disk Accesses

• Assume enough memory to hold all h nodes accessed on way down.

• h read accesses on way down.

• 2s+1 write accesses on way up, s = number of nodes that split.

• Total h+2s+1 disk accesses.

• Max is 3h+1.

Average Disk Accesses• Start with empty B-tree.• Insert n pairs.• Resulting B-tree has p nodes.• # splits <= p –2, p > 2.

• # pairs >= 1+(ceil(m/2) – 1)(p – 1).

• savg <= (p – 2)/(1+(ceil(m/2) – 1)(p – 1)).

• So, savg < 1/(ceil(m/2) – 1).

• m = 200 => savg < 1/99.

• Average disk accesses < h + 2/99 + 1 ~ h + 1.• Nearly minimum.