balanced bst
DESCRIPTION
Balanced BST. Balanced BSTs guarantee O(logN) performance at all times the height or left and right sub-trees are about the same simple BST are O(N) in the worst case Categories of BSTs AVL , SPLAY trees: dynamic environment optimal trees : static environment. AVL Trees. - PowerPoint PPT PresentationTRANSCRIPT
E.G.M. Petrakis Trees in Main Memory 1
Balanced BST
Balanced BSTs guarantee O(logN) performance at all times the height or left and right sub-trees
are about the same simple BST are O(N) in the worst case
Categories of BSTs AVL, SPLAY trees: dynamic
environment optimal trees: static environment
E.G.M. Petrakis Trees in Main Memory 2
AVL Trees
AVL (Adelson, Lelskii, Landis): the height of the left and right subtrees differ by at most 1 the same for every subtree
Number of comparisons for membership operations best case: completely balanced worst case: 1.44 log(N+2) expected case: logN + .25 <= 1)log(N
E.G.M. Petrakis Trees in Main Memory 3
AVL Trees
“—” : completely balanced sub-tree “/” : the left sub-tree is 1 higher “\” : the right sub-tree is 1 higher
E.G.M. Petrakis Trees in Main Memory 4
AVL Trees
E.G.M. Petrakis Trees in Main Memory 5
“//” : the left sub-tree is 2 higher “\\” : the right sub-tree is 2 higher
critical node
Non AVL Trees
E.G.M. Petrakis Trees in Main Memory 6
Single Right Rotations
Insertions or deletions may result in non AVL trees => apply rotations to balance the tree
T3
T1 T2
Α
Β
T1
T3T2
Α
Β
E.G.M. Petrakis Trees in Main Memory 7
T1
Α
Β
Α
T3T2
ΑB
ΑΑT3
T1 T2
Single Left Rotations
E.G.M. Petrakis Trees in Main Memory 8
4
2
4
2
1
single right rotation
2
1 insert 1
4
8
insert 9
4
8
single left rotation
8
4
4
9
9
E.G.M. Petrakis Trees in Main Memory 9
Double Left Rotation
Composition of two single rotations (one right and one left rotation)
ΑΒ
T4
Α
A
C
T1
ΑB
T2 T3
oror
T4
Α
Β
C
Α
T1
oror
Τ2Τ2
T3
Α
A
T3
oror
ΑC
Τ1Τ1T4 T2
E.G.M. Petrakis Trees in Main Memory 10
Example of Double Left Rotation
4
2 8
7 9
6
\\
/
/
== ==
== ==
4
2 7
6 8
9
7
4 8
2 6 9
Critical node
insert 6
E.G.M. Petrakis Trees in Main Memory 11
Double Right Rotation
Composition of two single rotations (one left and one right rotation)
ΑB
ΑΑ
T4
T1
ΑΒ
T2 T3
oror
ΑC
ΑB
T4
T2
oror
A
T1
T3
Α
C
Β
T3
oror
ΑA
ΤΤ44T2T1
E.G.M. Petrakis Trees in Main Memory 12
Insertion (deletion) in AVL
1. Follow the search path to verify that the key is not already there
2. Insert (delete) the key 3. Retreat along the search path
and check the balance factor 4. Rebalance if necessary (see next)
E.G.M. Petrakis Trees in Main Memory 13
Rebalancing
For every node reached coming up from its left sub-tree after insertion readjust balance factor ‘=’ becomes ‘/’ => no operation ‘\’ becomes ‘=’ => no operation ‘/’ becomes ‘//’ => must be rebalanced!!
The “//” node becomes a “critical node” Only the path from the critical node to
the leaf has to be rebalanced!! Rotation is applied only at the critical
node!
E.G.M. Petrakis Trees in Main Memory 14
Rebalancing (cont.)
The balance factor of the critical node determines what rotation is to take place single or double
If the child and the grand child (inserted node) of the critical node are on the same direction (both “/”) => single rotation
Else => double rotation Rebalance similarly if coming up from the
right sub-tree (opposite signs)
E.G.M. Petrakis Trees in Main Memory 15
Performance
Performance of membership operations on AVL trees: easy for the worst case!
An AVL tree will never be more than 45% higher that its perfectly balanced counterpart (Adelson, Velskii, Landis):
log(N+1) <= hb(N) <=
l.4404log(N+2) – 0.302
E.G.M. Petrakis Trees in Main Memory 16
Worst case AVL
Sparse tree => each sub-tree has minimum number of nodes
E.G.M. Petrakis Trees in Main Memory 17
Fibonacci Trees
Th: tree of height h
Th has two sub-trees, one with height h-1 and one with height h-2 else it wouldn’t have minimum number of
nodes
T0 is the empty sub-tree (also Fibonacci)
T1 is the sub-tree with 1 node (also Fibonacci)
E.G.M. Petrakis Trees in Main Memory 18
Fibonacci Trees (cont.)
Average height Nh number of nodes of Th
Nh = Nh-1 + Nh-2 + 1
N0 = 1
N1 = 2
From which h <= 1.44 log(N+1)
12
51
5
12
51
5
1N
2h2h
h
E.G.M. Petrakis Trees in Main Memory 19
single rotation
77
88
77
88
99
insert 9
88
9977
More Examples
E.G.M. Petrakis Trees in Main Memory 20
double rotation
66
88
66
88
77insert 7
77
8866
Examples (cont.)
E.G.M. Petrakis Trees in Main Memory 21
double rotation
88
66
88
66
7insert 7
77
8866
Examples (cont.)
E.G.M. Petrakis Trees in Main Memory 22
delete 6
66
77
99
88
77
99
88
88
9977
single rotation
Examples (cont.)
E.G.M. Petrakis Trees in Main Memory 23
55
66
99
88
66
99
88
88
9966
delete 5
77 77 77
single rotation
Examples (cont.)
E.G.M. Petrakis Trees in Main Memory 24
55
66
88
66
88
77
8866
delete 5
77 77
double rotation
Examples (cont.)
E.G.M. Petrakis Trees in Main Memory 25
5
3
4
8
7 10
96 1111
2
delete 4
5
2
3
8
7 10
96 1111
1
1
delete 8
General Deletions
E.G.M. Petrakis Trees in Main Memory 26
delete 8 delete 5
5
2
3
7
6 10
9 1111
1
delete 6
5
2
3
10
7 11
9
1
General Deletions (cont.)
E.G.M. Petrakis Trees in Main Memory 27
delete 5
3
2 10
7 11
9
1
General Deletions (cont.)
E.G.M. Petrakis Trees in Main Memory 28
Self Organizing Search
Splay Trees: adapt to query patterns move to root (front): whenever a node is
accessed move it to the root using rotations
equivalent to move-to-front in arrays current node
insertions: inserted node deletions: father of deleted node or null if
this is the root membership: the last accessed node
E.G.M. Petrakis Trees in Main Memory 29
20
15 30
13
1014
8
Search(10)
first rotation
20
15 30
10
8 13
14
second rotation
20
3010
158
13
14
third rotation
10
8 20
3015
13
14
E.G.M. Petrakis Trees in Main Memory 30
Splay Cases
If the current node q has no grandfather but it has father p => only one single rotation
– two symmetric cases: L, R
qa
b c
q
a bc
p
p
E.G.M. Petrakis Trees in Main Memory 31
gp qp
q
a
b
c d
gp
p
a b
cd
LL
gp
pq
a
b cd
pq
a b c d
gpRL
LL symmetric of RRRL symmetric of RL
If p has also grandfather qp => 4 cases
E.G.M. Petrakis Trees in Main Memory 32
1
a q
8 j
i2
7
h6
g3
c 4
d5
e f
current node
1
α q
8 j
i2
7
h6
g
c
4
d
5
ef
3
RR LL
E.G.M. Petrakis Trees in Main Memory 33
1
a q
8 j
i2
7
6
g3
c
4
5
e
f
b
LR
1
α q
8
j
i
2
7
6
h
3
c
4
5
ef
dg
RL
a, b, c … are sub-trees
E.G.M. Petrakis Trees in Main Memory 34
1
a
q
8 j
i
2
7
6
h
3
c
4
5
e
f
b
g
d
E.G.M. Petrakis Trees in Main Memory 35
Splay Performance
Splay trees adapt to unknown or changing probability distributions
Splay trees do not guarantee logarithmic cost for each access AVL trees do! asymptotic cost close to the cost of the
optimal BST for unknown probability distributions
It can be shown that the “cost of m operations on an initially empty splay tree, where n are insertions is O(mlogn) in the worst case”
E.G.M. Petrakis Trees in Main Memory 36
Optimal BST
Static environment: no insertions or deletions
Keys are accessed with various frequencies
Have the most frequently accessed keys near the root
Application: a symbol table in main memory
E.G.M. Petrakis Trees in Main Memory 37
Searching
Given symbols a1 < a2 < ….< an and theirprobabilities: p1, p2, … pn minimize cost
Successful search Transform unsuccessful to
successful consider new symbols E1, E2, … En
- … α1 … α2 …αi … αi+1 ….…αn… αn+1
E0 E1 E2 Ei En
Ei= (αi , αi+1 ) E0= (- , α1 ) En= (αn , )
n
iii alevelpt
1
)(cos
E.G.M. Petrakis Trees in Main Memory 38
an
an-1
an-2EEii
failure nodefailure node
an-3
unsuccessful search for all values in Ei
terminates on the samefailure node (in fact, onenode higher)
Unsuccessful Search
E.G.M. Petrakis Trees in Main Memory 39
(a1, a2, a3) = (do, if, read)p i = q i = 1/7
readread
readread
readread ifif
ifif
ifif
dodo
dodo
dodo
cost = 15/7cost = 15/7
cost = 13/7cost = 13/7
cost = 15/7cost = 15/7
Optimal BST
if
Example
E.G.M. Petrakis Trees in Main Memory 40
read
if
do
if
readdo
cost = 15/7cost = 15/7 cost = 15/7cost = 15/7
E.G.M. Petrakis Trees in Main Memory 41
Search Cost
If pi is the probability to search for ai and qi is the probability to search in Ei then
n
1i
n
1iii 1qp
1}(Ei){levelqlevel(ai)pcostn
1i
n
1iii
successfulsearch
unsuccessfulsearch
E.G.M. Petrakis Trees in Main Memory 42
In a BST, a subtree has nodes that are consecutive in a sorted sequence of keys (e.g. [5,26])
10 25
13 24 26
12 14
20
25
13 24 26
14
5
Observation 1
E.G.M. Petrakis Trees in Main Memory 43
If Tij is a sub-tree of an optimal BST holding keys from i to j then Tij must be optimal among all
possible BSTs that store the same keys
optimality lemma: all sub-trees of an optimal BST are also optimal
Observation 2
E.G.M. Petrakis Trees in Main Memory 44
Optimal BST Construction
1) Construct all possible trees: NP-hard, there are trees!!
2) Dynamic programming solves the problem in polynomial time O(n3)
at each step, the algorithm finds and stores the optimal tree in each range of key values
increases the size of the range at each step until the whole range is obtained
n
2n
1n1
E.G.M. Petrakis Trees in Main Memory 45
Example (successful search only):
1) BSTs with 1 node
range 1 10 20 40cost= 0.3 0.2 0.1 0.4
2) BSTs with 2 nodes k=1-10 k=10-20 k=20-20
range 1-10 range 20-40range 10-20
1
10
10
20
20
40cost=0.3 1+0.2 2=0.7 cost=0.2+0.1 2=0.4
101
4020
2010
cost=0.2 1+0,3.2=0.8 cost=0.1+0.2 2=0.5
optimal
keys 1 10 20 40probabilities 0,3 0,2 0,1 0,4
cost=0.1+0.8=0.9
cost=0.4+0.2=0.6
optimal
optimal
E.G.M. Petrakis Trees in Main Memory 46
k=1-20range 1-20
1
20
10cost=0.1+2 0.3+3 0.2=1.3
1 20
10
cost=0.2+2(0.3+0.1)=1
1
10
20
k=10-40range 10-40
1040
20
cost=0.2+2 0.4+3 0.1=1.3
cost=0.3+2 0.2+3 0.1=1
20
10 40cost=0.1+2(0.2+0.4)=1.3
40
10
20
cost=0.4+2 0.2+30.1=1.1
optimal
3) BSTs with 3 nodes
E.G.M. Petrakis Trees in Main Memory 47
4) BSTs with 4 nodesrange 1-40
1
4010
20
cost=0.3+2 0.4+3 0.2+4 0.1=2.1
10
1 40
20
cost=0.2+2(0.3+0.4)+3 0.1=1.9
20
20
40
40
1
1
10
10
OPTIMAL BST
cost=0.1+2(0.3+0.4)+3 0.2=2.1
cost=0.4+2 0.3+3 0.2+4 0.1=2
E.G.M. Petrakis Trees in Main Memory 48
Complexity
Compute all optimal BSTs for all Cij, i,j=1,2..n
Let m=j-i: number of keys in range Cij
n-m-1 Cij’s must be computed The one with the minimum cost must be
found, this takes O(m(n-m-1)) time For all Cij’s it takes There is a better O(n2) algorithm by Knuth There is also a O(n) greedy algorithm
)O(n)m(nm 3
nm1
2
E.G.M. Petrakis Trees in Main Memory 49
Optimal BSTs
High probability keys should be near the root
But, the value of a key is also a factor
It may not be desirable to put the smallest or largest key close to the root => this may result in skinny trees (e.g., lists)