1 jeff edmonds york university cosc 2011 lecture 6 balanced trees dictionary/map adt binary search...

1

Jeff Edmonds

York University COSC 2011Lecture 6

Balanced Trees

Dictionary/Map ADTBinary Search TreesInsertions and DeletionsAVL TreesRebalancing AVL TreesUnion-Find PartitionHeaps & Priority QueuesCommunication & Hoffman Codes(Splay Trees)

2

Dictionary/Map ADT

Problem: Store value/data associated with keys.

Inputkey, value

k1,v1k2,v2k3,v3k4,v4

Examples:• key = word, value = definition• key = social insurance number

value = person’s data

3

Dictionary/Map ADT


Inputkey, value

k1,v1k2,v2k3,v3k4,v4

Insert SearchUnordered Array O(n)O(1)Implementations:

Array

0

1

2

3

4

5

6

7

…

k5,v5

4

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Insert Search

Ordered ArrayUnordered Array

O(n)O(n)O(1)O(logn)

Implementations:

Array

0

1

2

3

4

5

6

7

…

6,v5

5

trailerheader

nodes/positions

entries

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Insert Search

Ordered ArrayOrdered Linked List

Unordered ArrayO(n)O(n)

O(n)O(1)O(logn)

Implementations:

O(n)

6,v5

Inserting is O(1) if you have the spot.but O(n) to find the spot.

6

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Insert Search


Binary Search Tree

O(n)O(n)

O(logn)

O(1)

O(logn)

O(logn)

Implementations:

38

25

17

4 21

31

28 35

51

42

40 49

63

55 71

7

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Insert Search


Binary Search Tree

O(n)O(n)

O(logn)

O(1)

O(logn)

O(logn)

Heaps Faster: O(logn) O(n)

Implementations:

Max O(1)

Heaps are good for Priority Queues.

8

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Hash Tables Avg: O(1) O(1)

Next

O(1) (Avg)

O(1)

O(n)

Hash Tables are very fast,but keys have no order.

Insert Search


Binary Search Tree

O(n)O(n)

O(logn)

O(1)

O(logn)

O(logn)

Heaps Faster: O(logn)

Implementations:

Max O(1)

9

Unsorted List

SortedList

Balanced Trees

Splay Trees

Heap Hash Tables

•Search

•Insert•Delete

•Find Max

•Find Next in Order

O(log(n))

Balanced Trees

O(n)

O(1) O(n)

O(1)

O(1)

O(n)

O(n)

O(log(n))

O(log(n))

O(log(n))

O(log(n))

O(1)

O(n)Amortized

O(1)

(Priority Queue)

O(1)

O(n)

O(n)

Worst case O(n)

Practice better

O(log(n))better

(Dictionary)

O(n)

(Static)

I learned AVL trees from slides from

Andy Mirzaian and James Elderand then reworked them

From binary search toBinary Search Trees

11

38

25

17

4 21

31

28 35

51

42

40 49

63

55 71

Binary Search Tree

All nodes in left subtree ≤ Any node ≤ All nodes in right subtree

≤≤ ≤

key 17

38

25

17

4 21

31

28 35

51

42

40 49

63

55 71

Algorithm TreeSearch(k, v)v = T.root()

loop if T.isExternal (v)

return “not there” if k < key(v) v = T.left(v) else if k = key(v) return v else { k > key(v) }

v = T.right(v)end loop

Move down the tree. Loop Invariant: If the key is contained in the original tree, then the key is contained in the sub-tree rooted at the current node.

Iterative Algorithm

key 17

38

25

17

4 21

31

28 35

51

42

40 49

63

55 71

Recursive Algorithm If the key is not at the root, ask a friend to look for it in the

appropriate subtree.

Algorithm TreeSearch(k, v)if T.isExternal (v)

return “not there”if k < key(v)

return TreeSearch(k, T.left(v))else if k = key(v)

return velse { k > key(v) }

return TreeSearch(k, T.right(v))

3

1

11

9 12

8

v

w

2

10

6

5 7

4

Insertions/Deletions

To insert(key, data):

We search for key.

Not being there, we end up in an empty tree.

Insert the key there.

Insert 10

3

1

11

9 12

8

v

w

2

10>

6

5 7

4


To Delete(keydel, data):

If it does not have two children,

point its one child at its parent.

Delete 4

keydel

3

1

11

9 12

8w

2

10>

6

5 7

4


To Delete(keydel, data):

else find the next keynext in order

right left left left …. to empty tree

Replace keydel to delete with keynext

point keynext’s one child at its parent.

Delete 3

keynext

keydel

Performance find, insert and remove take O(height) time

In a balanced tree, the height is O(log n)

In the worst case, it is O(n)

Thus it worthwhile to balance the tree (next topic)!

The AVL tree is the first balanced binary search tree ever invented.

It is named after its two inventors, G.M. Adelson-Velskii and E.M. Landis, who published it in their 1962 paper "An algorithm for the organization of information.”

AVL Trees

http://en.wikipedia.org/wiki/Georgii_Adelson-Velskii

http://en.wikipedia.org/wiki/Yevgeniy_Landis

AVL Trees AVL trees are “mostly” balanced. Tree is said to be an AVL Tree if and only if

heights of siblings differ by at most 1. height(v) = height of the subtree rooted at v. balanceFactor(v) = height(rightChild(v)) -

height(leftChild(v)) Tree is said to be an AVL Tree if and only if

v balanceFactor(v) { -1,0,1 }.

88

44

17 78

32 50

48 62

0

0

0

+1

1

0

0-1=

01

2subtreeheight 3

subtreeheight

balanceFactor = 2-3 = -1

-1

Height of an AVL Tree Claim: The height of an AVL tree storing n keys is ≤ O(log n).

Proof: Let N(h) be the minimum the # of nodes of an AVL tree of height h.

Observe that N(0) = 0 () and N(1) = 1 ( ) For h ≥ 2, the minimal AVL tree contains

the root node,

one minimal AVL subtree of height h – 1,

another of height h - 2.

That is, N(h) = 1 + N(h - 1) + N(h - 2)

> N(h - 1) + N(h - 2)

= Fibonacci(h) ≈ 1.62h.

n ≥ 1.62h

h ≤ log(n)/log(1.62) = 4.78 log(n)

Thus the height of an AVL tree is O(log n)

balanceFactor ≤ 1 = (h-1)-(h-2)height = h

At least one of its subtrees has height

h-1 h-2

Rebalancing

Changes heights/balanceFactorsSubtree [..,5] raises up oneSubtree [5,10] height does not changeSubtree [10,..] lowers one

Does not change Binary Tree Ordering.

T1

T2

T3

Rebalancing after an Insertion

Inserting new leaf 2 in to AVL tree may create an imbalance.

7

4

3

8

5

Problem!

2

rotateR(7)

7

4

T1

T2 T3

3

2 5 8

No longer an AVL Tree

3subtreeheight

balanceFactor = 3-1 = 2

1

Rebalanced into an AVL tree again.

2 2

2-2 = 0

1

1-1 = 0

1

T2

T1

T3


7

4

3

8

5

rotateR(7)

7

4

balanceFactor = 3-1 = 2

Try another example. Inserting new leaf 6 in to AVL tree

6

Problem! 3

T 0

8

T2

5

6

T1

T3

31

1-3 = -2

Oops!Not an AVL Tree.

T2

T1

T3


7

4

3

8

5

rotateR(7)

7

4

balanceFactor {-2,+2}

6

Problem! 3

T 0

8

T2

5

6

T1

T3

There are 6 cases. Two are easier

balanceFactor {-1,0,+1}

There are 6 cases. Two are easier

Half are symmetrically the same.

This leaves two.


x

z

y

height = h

T0 T1

T2

T3

h-1 h-3

h-2

one is h-3 & one is h-4

h-3 x

z

y

height = h

T1 T2

T0

T3

h-1 h-3

h-2


h-3

+2

+1

+2

-1

x

z

y

height = h

T3T2

T1

T0

h-1h-3

h-2


h-3 x

z

y

height = h

T2T1

T3

T0

h-1h-3

h-2


h-3

-2

-1

-2

+1


Inserting new leaf 2 in to AVL tree may create an imbalance in path from leaf to root.

7

4

3

8

5

Problem!

Increases heights along path from leaf to root.

2 balanceFactor0

+1

+1

+2


The repair strategy called trinode restructuring

7

4

3

8

5

Problem!

2

+2

Denote

z = the lowest imbalanced node

y = the child of z with highest subtree

x = the child of y with highest subtree

T1

T2

T3


z

y

x

h-1



Defn: h = height

At least one of its subtrees has height h-1

T1

T2

T3


z

y

x

h-1 h-3

h-2 h-3


+2balanceFactor

+1

Assume



Defn: h = height

balanceFactor {-1,0,1} but only got one worse with insertionbalanceFactor {-2,2}By way of symmetry assume 2.

balanceFactor {-1,0,1}Cases:balanceFactor = +1 we do now.balanceFactor = -1 we will do later.balanceFactor = 0 is the same as -1.

balanceFactor

T1

T2

T3


z

y

x

h-3

h-2 h-3


This subtree is balanced.

rotateR(z)

h-2

T2T3

T1

x z

y

T1 ≤ y ≤ T2 ≤ z ≤ T3

y ≤ z

h-3

h-2

h-3

h-1

T1

T2

T3


z

y

x

h

Is the rest of the tree sufficiently balanced to make it an AVL tree? • Before the insert it was.• Insert made this subtree one higher.• Our restructuring made it • Hence the whole tree is an AVL Tree

h-1

T2T3

T1

x z

y

Rest of Tree Rest of Tree

This subtree is balanced.

back to the original height.



7

4

3

8

5

6

T2

T1

T3


z

y

3

8

5

6

rotateR(z)

3 z

y

T 0

8

T2

5

6

T1

T3

balanceFactor = 1-3 = -2

Oops!Not an AVL Tree.



7

4

3

8

5

Problem!

Increases heights along path from leaf to root.

6 balanceFactor0

-1

+2

-1



7

4

3

8

5

Problem!

6

-1

+2

-1


Denote





z

y

Defn: h = height

h-1 h-3

h-2h-3

one is h-3 & one maybe h-4


+2balanceFactor

-1

Assume second casebalanceFactor




x

T2 T3

T4

T1


z

y h-3

h-3



x

T2 T3

T4

T1

rotateL(y) z

x

y

y ≤ x ≤ z


z

y h-3

h-3



x

T2 T3

T4

T1

rotateL(y)

rotateR(z)

z

x

y

y ≤ x ≤ zy z

x


z

y

height = h

h-3

h-3



x

T2 T3

T4

T1

y z

x

T1 T2 T3T4

T1 ≤ y ≤ T2 ≤ z ≤ T3

h-1

h-2 h-2

• This subtree is balanced.• And shorter by one.• Hence the whole is an AVL Tree


h-3h-3

Rest of Tree Rest of Tree

41

7

19

31

23

13

2

1

15

1

0 0

3

0 0 0 0

0

1

3 2

1

4

5

3

40

0 0

1

2

9

11

0 0

1

2

8

0 0

1

Rebalancing after an InsertionExample: Insert 12

42

7

19

31

23

13

2

1

15

1

0 0

3

0 0 0 0

0

1

3 2

1

4

5

3

40

0 0

1

2

9

11

0 0

1

2

8

0 0

1

w

Step 1.1: top-down search


43

7

19

31

23

13

2

1

15

1

0 0

3

0 0 0 0

0

1

3 2

1

4

5

3

40

0 0

1

2

9

11

0

1

2

8

0 0

1

w

Step 1.2: expand and insert new item in it

12

0 0

1


44

7

19

31

23

13

2

1

15

1

0 0

3

0 0 0 0

0

1

4 2

1

4

5

3

40

0 0

1

2

9

11

0

2

3

8

0 0

1

w

Step 2.1: move up along ancestral path of ; updateancestor heights; find unbalanced node.

12

0 0

1

imbalance


45

7

19

31

23

13

2

1

15

1

0 0

3

0 0 0 0

0

1

4 2

1

4

5

3

40

0 0

1

2

9

11

0

2

3

8

0 0

1

Step 2.2: trinode discovered (needs double rotation)

12

0 0

1

x

y

z


46

7

19

31

23

11

2

1

15

1

0 0

3

0 0

0 0

01

3 2

1

4

5

3

40

0 0

1

2

9 13

0

22

8

0 0

1

Step 2.3: trinode restructured; balance restored. DONE!

12

0 0

1

zy

x


Rebalancing after a deletion

Very similar to before.

Unfortunately, trinode restructuring may reduce the height of the subtree, causing another imbalance further up the tree.

Thus this search and repair process must in the worst case be repeated until we reach the root.

See text for implementation.

Running Times for AVL Trees

a single restructure is O(1) using a linked-structure binary tree

find is O(log n) height of tree is O(log n), no restructures needed

insert is O(log n) initial find is O(log n)

Restructuring is O(1)

remove is O(log n) initial find is O(log n)

Restructuring up the tree, maintaining heights is O(log n)

Other Similar Balanced Trees

Red-Black Trees Balanced because of rules about red and black nodes

(2-4) Trees Balanced by having between 2 and 4 children

Splay Trees Moves used nodes to the root.

Union-Find Partition Structures

Andy Mirzaian 50Last Update: Dec 4, 2014

Partitions with Union-Find OperationsmakeSet(x): Create a singleton set containing

the element x and return the position storing x in this set

union(A,B ): Return the set A U B, destroying the old A and B

find(p): Return the set containing the element at position p


List-based Implementation• Each set is stored in a sequence represented with

a linked-list• Each node should store an object containing the

element and a reference to the set name


Analysis of List-based Representation• When doing a union, always move elements

from the smaller set to the larger set Each time an element is moved it goes to a set of

size at least double its old set Thus, an element can be moved at most O(log n)

times• Total time needed to do n unions and finds is

O(n log n).


Tree-based ImplementationEach set is stored as a rooted tree of its elements:• Each element points to its parent.• The root is the “name” of the set.• Example: The sets “1”, “2”, and “5”:

Andy Mirzaian 54

1

74

2

63

5

108

12

119

Last Update: Dec 4, 2014

Union-Find Operations• To do a union, simply make

the root of one tree point to the root of the other

• To do a find, follow set-name pointers from the starting node until reaching a node whose set-name pointer refers back to itself

Andy Mirzaian 55

2

63

5

108

12

11

9

2

63

5

108

12

11

9


Union-Find Heuristic 1• Union by size:

– When performing a union, make the root of smaller tree point to the root of the larger

• Implies O(n log n) time for performing n union-find operations:– Each time we follow a pointer,

we are going to a subtree of size at least double the size of the previous subtree

– Thus, we will follow at most O(log n) pointers for any find.

Andy Mirzaian 56

2

63

5

108

12

11

9


Union-Find Heuristic 2• Path compression:

– After performing a find, compress all the pointers on the path just traversed so that they all point to the root

• Implies O(n log* n) time for performing n union-find operations:– [Proof is somewhat involved and is covered in EECS 4101]

Andy Mirzaian 57

2

63

5

108

12

11

9

2

63

5

108

12

11

9


Java Implementation


59

Heaps, Heap Sort, &Priority Queues

J. W. J. Williams, 1964

60

Abstract Data Types

Restricted Data Structure:Some times we limit what operation can be done• for efficiency • understandingStack: A list, but elements can only be pushed onto and popped from the top.Queue: A list, but elements can only be added at the end and removed from the front. • Important in handling jobs.Priority Queue: The “highest priority” element is handled next.

61

Priority Queues

Sorted List

UnsortedList

Heap

•Items arrive with a priority.

O(n) O(1) O(logn)

•Item removed is that with highest

priority.

O(1) O(n) O(logn)

62

Heap Definition•Completely Balanced Binary Tree•The value of each node ³ each of the node's children. •Left or right child could be larger.

Where can 1 go?Maximum is at root.

Where can 8 go?

Where can 9 go?

63

Heap Data StructureCompletely Balanced Binary Tree

Implemented by an Array

64

Make Heap

Get help from friends

65

Heapify

?Maximum needs

to be at root.Where is the maximum?

66

Find the maximum. Put it in place

?

Repeat

Heapify

The 5 “bubbles” down until it finds its spot.

67

Heap

Heapify


68

Heap

Running Time:

Heapify

69

Iterative

70

Recursive

71

Make Heap

Get help from friends

T(n) = 2T(n/2) + log(n)

Running time:

= Q(n)

Heapify

Heap

Recursive

72

Heaps Heap?

Iterative

73

?Heap

Iterative

74

?Heap

Iterative

75

?

Iterative

76

Heap

Iterative

77

Running Time:

i

log(n) -i2log(n) -i

Iterative

78

Heap Pop/Push/Changes

With Pop, a Priority Queue returns the highest priority data item.

This is at the root.

21

21

79


But this is now the wrong shape!To keep the shape of the tree,

which space should be deleted?

80


What do we do with the element that was there?Move it to the root.

3

3

81


But now it is not a heap!The left and right subtrees still are heaps.

3

3

82


But now it is not a heap!

3

3


The max of these three moves up.

Time = O(log n)

83

When inserting a new item,to keep the shape of the tree,

which new space should be filled?

21

21


84

21

21


But now it is not a heap!The 21 “bubbles” up until it finds its spot.

The max of these two moves up.30

30

Time = O(log n)

85

Adaptable Heap Pop/Push/Changes

But now it is not a heap!The 39 “bubbles” down or up until it finds its spot.

Suppose some outside userknows about

some data item cand remembers where

it is in the heap.And changes its priority

from 21 to 39 21 c

21

39

3927

86

Adaptable Heap Pop/Push/Changes

But now it is not a heap!The 39 “bubbles” down or up until it finds its spot.

Suppose some outside useralso knows about

data item f and its location in the heap just changed.The Heap must be ableto find this outside userand tell him it moved.21 c

27

39

39

27 f

Time = O(log n)

87

Heap Implementation• A location-aware heap entry

is an object storing key value position of the entry in the

underlying heap

• In turn, each heap position stores an entry

• Back pointers are updated during entry swaps

Last Update: Oct 23, 2014

Andy 87

4 a

2 d

6 b

8 g 5 e 9 c

88

Selection Sort

Largest i values are sorted on side.Remaining values are off to side.

6,7,8,9<3

415

2

Exit

79 km 75 km

Exit

Max is easier to find if a heap.

Selection

89

Heap Sort

Largest i values are sorted on side.Remaining values are in a heap.

Exit

79 km 75 km

Exit

90

Heap Sort

Largest i values are sorted on side.Remaining values are in a heap.

Exit

79 km 75 km

Exit

91

Heap Data Structure

Heap Array

5 3 4 2 1

98

76

Heap

Array

92

Heap SortLargest i values are sorted on side.Remaining values arein a heap.

Exit

Put next value where it belongs.

79 km 75 km

Exit

Heap

?

93

Heap Sort

94

Heap Sort?

? ?

? ?

? ?

Sorted

95

Heap Sort

Running Time:

96

Communication & Entropy

Tell Uncle Lazare the location and the velocity of each particle.

The log of number of possibilities equals the

number of bits to communicate it

01101000

In thermodynamics, entropy is a measure of disorder.

Lazare Carnot (1803)

It is a measured as the logarithm of the number of specific ways in which the micro world may be

arranged, given the macro world.

Few bits needed

Low entropy

Lots of bits needed

High entropy

http://en.wikipedia.org/wiki/Thermodynamics

97

Communication & EntropyTell Uncle Shannon which toy you want

Bla bla bla bla bla bla

No. Please use the minimum number of bits to communicate it. 01101000

Great, but we need a code.

011 01 1 …

011 Oops. Was that or

Claude Shannon (1948)

98


00100 Claude Shannon

(1948)

10 10 10 10 10

10

10

10

10

10

I follow the path and get

Use a Huffman Codedescribed by a binary tree.

99



10 10 10 10 10

10

10

10

10

10

Use a Huffman Codedescribed by a binary tree.

001000101

I first get , the I start over to get

100



10 10 10 10 10

10

10

10

10

10

Objects that are more likely will have shorter codes.

I get it.I am likely to answer .

so you give it a 1 bit code.

101



10 10 10 10 10

10

10

10

10

10

Pi is the probability of the ith toy.Li is the length of the code for the ith toy.

Li

The expected number of bits sent is = i pi Li

0.4950.130.110.050.0310.020.020.080.020.020.01Pi = 0.01

We choose the code lengths Li to minimized this.Then we call it the Entropy of the distribution on toys.

102

Ingredients:• Instances: Probabilities of objects

<p1,p1,p2,… ,pn>.• Solutions: A Huffman code tree.

Cost of Solution: The expected number of bits sent = i pi Li

•Goal: Given probabilities, find code with minimum number of expected bits.


103


0.025

Greedy Algorithm.• Put the objects in a priority queue sorted by probabilities.• Take the two objects with the smallest probabilities.• They should have the longest codes.• Put them in a little tree.• Join them into one object, with the sum probability.• Repeat.

0.4950.130.110.050.030.020.02 0.080.020.020.0150.01

104


0.020.02

Greedy Algorithm.• Put the objects in a priority queue sorted by probabilities.• Take the two objects with the smallest probabilities.• They should have the longest codes.• Put them in a little tree.• Join them into one object, with the sum probability.• Repeat.

0.0250.04

0.4950.130.110.050.030.020.02 0.08

105

Communication & EntropyGreedy Algorithm.• Put the objects in a priority queue sorted by probabilities.• Take the two objects with the smallest probabilities.• They should have the longest codes.• Put them in a little tree.• Join them into one object, with the sum probability.• Repeat.

0.040.04 0.025

0.4950.130.110.050.030.020.02 0.08

106


0.040.025

0.4950.130.110.050.03

0.04

0.08

0.055

107


0.04

0.4950.130.110.05

0.04

0.08

0.0550.08

108


0.4950.130.110.05 0.08

0.055 0.08

0.105

109


0.4950.130.110.08

0.08

0.1050.16

110


0.4950.130.11

0.1050.16

0.215

111


0.4950.13

0.16

0.2150.29

112


0.495

0.2150.29

0.505

113


0.495

0.505

1

114


1

Greedy Algorithm.• Done when one object

(of probability 1)

115




0.4950.130.110.050.0310.020.020.080.020.020.01Pi = 0.01

Li


Huffman’s algorithm says how to choose the code lengths Li to minimize the expected number of bits sent.We want a nice equation for this number.What if relax the condition that Li is an integer?

116




0.4950.130.110.050.0310.020.020.080.020.020.01Pi = 0.01

Li


This is minimized by setting Li = log(1/pi)Why?•Suppose all toys had probability pi = 0.031,•Then there would be 1/pi = 32 toys,•Then the codes would have length •Li = log(1/pi)=5.

117




0.4950.130.110.050.0310.020.020.080.020.020.01Pi = 0.01

Li


This is minimized by setting Li = log(1/pi)giving the expected number of bits is H(p) = i pi log(1/pi). (Entropy)(The answer given by Huffman Codes is at most one bit longer.)

118



Let X, Y, and Z be random variables. i.e. they take on random values according to some probability distributions.

0.4950.130.110.050.0310.020.020.080.020.020.01Pi = 0.01

Li

Once the values are chosen,the expected number of bits needed to communicate the value of X is …

H(p) = i pi log(1/pi). (Entropy)

H(X) = x pr(X=x) log(1/pr(X=x)).

X = toy chosen by this distribution.

119

Entropy

The Entropy H(X) is the expected number of bits to communicate the value of X.

It can be drawn as the area of this circle.

120

Entropy

H(XY) then is the expected number of bits to communicate the value of both X and Y.

121

Entropy

If I tell you the value of Y, then H(X|Y) is the expected number of bits to communicate the value of X. Note that if X and Y are independent, then knowing Y does not help and H(X|Y) = H(X)

122

Entropy

I(X;Y) is the number of bits that are revealed about X by me telling you Y.Note that if X and Y are independent, then knowing Y does not help and I(X;Y) = 0.

Or about Y by telling you X.

123

Entropy

Splay Trees

Self-balancing BST

Invented by Daniel Sleator and Bob Tarjan

Allows quick access to recently accessed elements

Bad: worst-case O(n)

Good: average (amortized) case O(log n)

Often perform better than other BSTs in practice

D. Sleator

R. Tarjan

Splaying

Splaying is an operation performed on a node that iteratively moves the node to the root of the tree.

In splay trees, each BST operation (find, insert, remove) is augmented with a splay operation.

In this way, recently searched and inserted elements are near the top of the tree, for quick access.

3 Types of Splay Steps

Each splay operation on a node consists of a sequence of splay steps.

Each splay step moves the node up toward the root by 1 or 2 levels.

There are 2 types of step: Zig-Zig

Zig-Zag

Zig

These steps are iterated until the node is moved to the root.

Zig-Zig

Performed when the node x forms a linear chain with its parent and grandparent. i.e., right-right or left-left

y

x

T1 T2

T3

z

T4

zig-zig

y

z

T4T3

T2

x

T1

Zig-Zag

Performed when the node x forms a non-linear chain with its parent and grandparent i.e., right-left or left-right

zig-zagy

x

T2 T3

T4

z

T1

y

x

T2 T3 T4

z

T1

Zig

Performed when the node x has no grandparent i.e., its parent is the root

zig

x

w

T1 T2

T3

y

T4

y

x

T2 T3 T4

w

T1

Splay Trees & Ordered Dictionaries

which nodes are splayed after each operation?

use the parent of the internal node w that was actually removed from the tree (the parent of the node that the removed item was swapped with)

remove(k)

use the new node containing the entry insertedinsert(k,v)

if key found, use that node

if key not found, use parent of external node where search terminated

find(k)

splay nodemethod

Recall BST Deletion Now consider the case where the key k to be removed is stored at a

node v whose children are both internal

we find the internal node w that follows v in an inorder traversal

we copy key(w) into node v

we remove node w and its left child z (which must be a leaf) by means of operation removeExternal(z)

Example: remove 3 – which node will be splayed?

3

1

8

6 9

5

v

w

z

2

5

1

8

6 9

v

2

Note on Deletion

The text (Goodrich, p. 463) uses a different convention for BST deletion in their splaying example Instead of deleting the leftmost internal node of the right subtree,

they delete the rightmost internal node of the left subtree.

We will stick with the convention of deleting the leftmost internal node of the right subtree (the node immediately following the element to be removed in an inorder traversal).

Performance

Worst-case is O(n) Example:

Find all elements in sorted order

This will make the tree a left linear chain of height n, with the smallest element at the bottom

Subsequent search for the smallest element will be O(n)

Performance

Average-case is O(log n) Proof uses amortized analysis

We will not cover this

Operations on more frequently-accessed entries are faster. Given a sequence of m operations, the running time to access

entry i is:

where f(i) is the number of times entry i is accessed.

1 jeff edmonds york university cosc 2011 lecture 6 balanced trees dictionary/map adt binary search...

Documents