4. treesmouhoubm/=postscript/=c3620/chap4.pdf4. trees 4.3 binary search trees: the binary search...
TRANSCRIPT
4. Trees
4. Trees
4.1 Preliminaries
4.2 Binary trees
4.3 Binary search trees
4.4 AVL trees
4.5 Splay trees
4.6 B-trees
Malek Mouhoub, CS340 Fall 2002 1
4. Trees
4.1 Preliminaries
A
B C D E F G
H I J K L M N
P Q
Root
Leaves
Height=3
Malek Mouhoub, CS340 Fall 2002 2
4. Trees
Terms
� Child
� Parent
� Sibling : nodes that share a common parent.
� Leaf : a node with no children.
� Node depth : the number of links on the path from the root to
the node.
� Tree height : the depth of the deepest node.
Malek Mouhoub, CS340 Fall 2002 3
4. Trees
Recursive view
A tree is either :
� Empty
� Contains a root and N subtrees (N � 0).
Malek Mouhoub, CS340 Fall 2002 4
4. Trees
Implementation of Trees
� The tree is a collection of nodes :
struct TreeNode
�
Object element;
TreeNode �firstChild;
TreeNode �nextSibling;
�;
� The tree stores a reference to the root node, which is the
starting point.
Malek Mouhoub, CS340 Fall 2002 5
4. Trees
Implementation of Trees
A
B C D E F G
H I J K L M N
P Q
Root
Height=3
Malek Mouhoub, CS340 Fall 2002 6
4. Trees
Tree Traversals
// List a directory in a hierarchical file system
void FileSystem::listAll(int depth = 0) const
�
printName( depth );
if (isDirectory())
for each file c in this directory (for each child)
c.listAll(depth + 1); �;
// Calculate the size of a directory
void FileSystem::size( ) const
�
int totalSize = sizeOfThisFile();
if (isDirectory())
for each file c in this directory (for each child)
totalSize += c.size();
return totalSize; �;
Malek Mouhoub, CS340 Fall 2002 7
4. Trees
4.2 Binary Trees
��
��
Recursive view
A binary tree is either :
� Empty
� Contains a root and N binary subtrees (0 � N � 2).
Malek Mouhoub, CS340 Fall 2002 8
4. Trees
Implementation
struct BinaryNode
�
Object element; // the data in the node
BinaryNode �left; // left child
BinaryNode �right; // right child
�;
Malek Mouhoub, CS340 Fall 2002 9
4. Trees
Expression trees
� Inorder traversal � infix notation.
� postorder traversal � postfix notation.
� preorder traversal � prefix notation.
+
+
a *
b c
*
+
*
d e
f
g
Malek Mouhoub, CS340 Fall 2002 10
4. Trees
Constructing an expression tree
Converting a postfix expression into an expression tree :
� Same principle as evaluating an expression in postfix notation.
� Use a stack of pointers.
For infix expression :
Step 1 : Use the algorithm seen in chapter 3 to convert the
expression in postfix notation.
Step 2 : Use the algorithm above to produce the corresponding
expression tree.
Malek Mouhoub, CS340 Fall 2002 11
4. Trees
4.3 Binary Search Trees : The Binary Search Tree ADT
Goals :
� Binary search supports Find in������� worst-case time, but
Insert and Remove are����.
� Would like to support all three operations in������� worst-case
time.
� Today’s result : can support all three operations in�������
average-case time.
� Later : do it in������� worst-case time.
Basic Ideas :
� Use a “tree” to show the logic applied by a binary search.
Malek Mouhoub, CS340 Fall 2002 12
4. Trees
Binary Search Expanded
1 4 8 12 15 19 22
12
4
1 8 15 22
19
Malek Mouhoub, CS340 Fall 2002 13
4. Trees
Binary Search Tree Order
� Ordering property : for every node in the tree, all items in left
subtree are smaller and all items in right subtree are
larger (assume no duplicates).
X
< X> X
Malek Mouhoub, CS340 Fall 2002 14
4. Trees
Thinking Recursively
� Computing the height of a tree is complex without recursion.
� The height of a tree is one more than the maximum of the
heights of the subtrees.
X
< X> X
Hr
Hr+1
Hl
Hl+1
Malek Mouhoub, CS340 Fall 2002 15
4. Trees
Routine to Compute Height
� Handle base case (empty tree).
� Use previous observation for other case.
int height(BinaryNode �T)
�
If (T == null)
return -1;
else
return 1 + Max(height(T-�left),height(T-�right));
�
Malek Mouhoub, CS340 Fall 2002 16
4. Trees
ADT Binary Search Tree
Structure BinarySearchTree is
Data : A set of N nodes (N� 0).
Functions :
for every bstree � Binary Search Tree, item � Comparable
BynarySearchTree Create(item) ::= return a Binary Search Tree containing item
Comparable findMax(bstree) ::= return the maximum of bstree
Comparable findMin(bstree) ::= return the minumum of bstree
Comparable find(bstree,item) ::= return the matching element if found
and ITEM NOT FOUND otherwise.
Bool isempty(list) ::= return bstree == empty
BinarySearchTree insert(bstree,item) ::= add item to bstree and return the new tree
BinarySearchTree remove(bstree,item) ::= removes item and return the new tree
Malek Mouhoub, CS340 Fall 2002 17
4. Trees
Implementation
template �class Comparable�
class BinarySearchTree;
template �class Comparable�class BinaryNode
�
Comparable element;
BinaryNode *left;
BinaryNode *right;
BinaryNode( const Comparable & theElement, BinaryNode *lt, BinaryNode *rt )
: element( theElement ), left( lt ), right( rt ) � � 10
friend class BinarySearchTree�Comparable�;
�;
Malek Mouhoub, CS340 Fall 2002 18
4. Trees
template �class Comparable�
class BinarySearchTree
�
public:
explicit BinarySearchTree( const Comparable & notFound );
BinarySearchTree( const BinarySearchTree & rhs );
˜BinarySearchTree( );
const Comparable & findMin( ) const;
const Comparable & findMax( ) const; 10
const Comparable & find( const Comparable & x ) const;
bool isEmpty( ) const;
void printTree( ) const;
void makeEmpty( );
void insert( const Comparable & x );
void remove( const Comparable & x );
const BinarySearchTree & operator=( const BinarySearchTree & rhs );
Malek Mouhoub, CS340 Fall 2002 19
4. Trees
private:
BinaryNode�Comparable� *root;
const Comparable ITEM NOT FOUND;
const Comparable & elementAt( BinaryNode�Comparable� *t ) const;
void insert( const Comparable & x, BinaryNode�Comparable� * & t ) const;
void remove( const Comparable & x, BinaryNode�Comparable� * & t ) const;
BinaryNode�Comparable� * findMin( BinaryNode�Comparable� *t ) const;
BinaryNode�Comparable� * findMax( BinaryNode�Comparable� *t ) const;
BinaryNode�Comparable� * find( const Comparable & x, BinaryNode�Comparable� *t ) const;
void makeEmpty( BinaryNode�Comparable� * & t ) const; 10
void printTree( BinaryNode�Comparable� *t ) const;
BinaryNode�Comparable� * clone( BinaryNode�Comparable� *t ) const;
�;
Malek Mouhoub, CS340 Fall 2002 20
4. Trees
Searching : the find function
� Set the current node to the root.
� Repeat :
– If current node is null, then item is not found.
– If item is smaller than what is stored in the current node, then
branch left.
– If item is larger than what is stored in the current node, than
branch right.
– Otherwise. we have a match.
Malek Mouhoub, CS340 Fall 2002 21
4. Trees
findMin and findMax
� For findMin, repeatedly branch left until you reach a node
with no left child.
� For findMax, repeatedly branch right until you reach a node
with no right child.
Malek Mouhoub, CS340 Fall 2002 22
4. Trees
Running Time
� Time to perform a search is proportional to the depth of the
node that terminates the search.
� For the perfectly balanced tree, this is roughly ���� .
Malek Mouhoub, CS340 Fall 2002 23
4. Trees
Insertion
� A new item can be inserted by placing it at the location where
an unsuccessful search for it terminates.
12
4
1 8 15 22
19
18new item
Malek Mouhoub, CS340 Fall 2002 24
4. Trees
Coding
� Simplest code uses recursion.Four cases :
Case 1 : If the tree is empty, create and return a new node tree.
Case 2 : If the item matches the item in the root node do
nothing (no duplicates allowed).
Case 3 : If the item is less than the item in the root node,
recursively insert it in left subtree.
Case 4 : If item is greater than the item in the root node,
recursively insert in right subtree.
Malek Mouhoub, CS340 Fall 2002 25
4. Trees
template �class Comparable�
void BinarySearchTree�Comparable�::
insert( const Comparable & x, BinaryNode�Comparable� * & t ) const
�
if( t == NULL )
t = new BinaryNode�Comparable�( x, NUL L, NULL );
else if( x � t��element )
insert( x, t��left );
else if( t��element � x )
insert( x, t��right ); 10
else
; // Duplicate; do nothing
�
Malek Mouhoub, CS340 Fall 2002 26
4. Trees
Running Time
� Running time is still proportional to depth.
� Arbitrary insertion no longer guarantees that the depth of the
tree is �������.
� However it can be shown that the depth of the tree will be
������� on the average, where average means all possible
insertion sequences are equally likely.
Malek Mouhoub, CS340 Fall 2002 27
4. Trees
Worst case
� The most common insertion sequences produce the worst tree.
Inserting items in sorted order is disastrous, yielding ����
operations.
A
B
C
D
E
Malek Mouhoub, CS340 Fall 2002 28
4. Trees
Bottom Line
� Insertion and access will be logarithmic on average, but this is
meaningful only if insertion sequence is reasonably random.
� If insertion sequence has some order in it, the binary search
tree will degenerate.
� Need methods of maintaining balance in the tree that preserve
a logarithmic worst-case bound.
� In the following, we will examine two candidates : AVL tree and
splay tree.
Malek Mouhoub, CS340 Fall 2002 29
4. Trees
Deletion
� Deletion algorithm is complicated. Basic problem : if the node is
deleted, it potentially disconnects the tree.
� Standard algorithm breaks into three cases :
– Node to be deleted is a leaf.
– Node to be deleted has one child.
– Node to be deleted has two children.
Malek Mouhoub, CS340 Fall 2002 30
4. Trees
Deletion of a Leaf
� Simplest case, because leaf does not connect two parts of the
tree. Wipe out the leaf node (its parent should reference a
null TreeNode instead of the leaf).
Malek Mouhoub, CS340 Fall 2002 31
4. Trees
Deletion of a 1 child node
� Bypass the node : parent of deleted node gets new
child (example below deletes 15).
12
4
1 8 15 22
19
18
delete item 15
12
4
1 8 22
19
18
� Note : root with one child is a special case, because it does not
have a parent.
� In this case, we simply obtain a new root.
Malek Mouhoub, CS340 Fall 2002 32
4. Trees
Deletion of two child node
� Step 1 : replace node contents with smallest key in right
subtree. Example below deletes 12.
12
4
1 8 15 22
19
18
delete 124
1 8 22
19
??
15
18
Malek Mouhoub, CS340 Fall 2002 33
4. Trees
Deletion of two child node
� Step 2 : delete node used as replacement.
18
4
1 8 22
19
15
18
4
1 8 22
19
??
15
� The second step in the two-child case is guaranteed to be
simple because that node cannot have a left child (why ???).
Malek Mouhoub, CS340 Fall 2002 34
4. Trees
Implementation
� Simple implementation is recursive.
� Some cases can be combined.
� The deletion algorithm potentially destroys the balance of the
tree because it is a biased algorithm.
� Notice that in the two child case, it always reduces the right
subtree’s size, potentially making the left tree heavy after a large
number of deletion/insertion pairs.
� In practice : not noticeable at all.
Malek Mouhoub, CS340 Fall 2002 35
tem
pla
te�
clas
sC
ompa
rabl
e�
void
Bin
aryS
earc
hTre
e�C
ompa
rabl
e�::
rem
ove(
con
stC
ompa
rabl
e&
x,B
inar
yNod
e�C
ompa
rabl
e�*
&t
)co
nst
�
if(
t=
=N
ULL
)
retu
rn;
//Ite
mno
tfo
und;
dono
thin
g
if(
x�
t��
elem
ent
)
rem
ove(
x,t��
left
);
else
if(
t ��
elem
ent�
x)
rem
ove(
x,t��
right
);10
else
if(
t��
left
!=N
ULL
&&
t��
right
!=N
ULL
)//
Two
child
re
�
t��
elem
ent
=fin
dMin
(t��
right
)��
elem
ent;
rem
ove(
t��
elem
ent,
t��
right
);
� else
�
Bin
aryN
ode�
Com
para
ble�
*old
Nod
e=
t;
t=
(t��
left
!=N
ULL
)?
t��
left
:t��
right
;
del
ete
oldN
ode;��
20
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0236
4. Trees
Printing contents of the tree
� Easy to print entire tree in sorted order.
� Case 1 : if tree is empty, print nothing.
� Case 2 : if tree is not empty :
– Recursively print left tree.
– Print item at the root,
– Recursively print right tree.
� Strategy is an inorder traversal.
� Running time is ���� : constant work per item.
Malek Mouhoub, CS340 Fall 2002 37
4.4
AV
LTr
ees
�T
hest
ruct
ure
ofth
etr
eede
pend
son
the
orde
rin
whi
ch
keys
are
ente
red.
�E
nter
ing
the
lette
rsA
-Gin
alph
abet
ical
orde
rw
illpr
oduc
e
ade
gene
rate
tree
that
isno
thin
gm
ore
than
alin
ked
list.
�T
heso
lutio
nof
this
prob
lem
isso
meh
owto
reor
gani
zeth
e
node
sof
the
tree
asw
ere
ceiv
ene
wke
ys,m
aint
aini
nga
near
optim
altr
eest
ruct
ure.
�A
VL
tree
sar
eus
edfo
rha
ndlin
gsu
chor
gani
zatio
n.
�A
nA
VL
tree
isa
heig
ht-b
alan
ced
1-tr
eeor
HB
(1)
tree
:th
e
max
imum
diffe
renc
eal
low
edbe
twee
nth
ehe
ight
sof
any
two
subt
rees
shar
ing
aco
mm
onro
ot,i
son
e.
�T
hetw
ofe
atur
esth
atm
ake
AV
Ltr
ees
impo
rtan
tare
:
–by
setti
nga
max
imum
allo
wab
ledi
ffere
nce
inth
e
heig
htof
any
two
subt
rees
,AV
Ltr
ees
guar
ante
ea
min
imum
leve
lofp
erfo
rman
cein
sear
chin
g;an
d
–M
aint
aini
nga
tree
inA
VL
form
asne
wno
des
are
inse
rted
invo
lves
the
use
ofon
eof
ase
toff
our
poss
ible
rota
tions
.
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0238
4. Trees
AVL trees
AVL technique used to keep a tree in height-balance :
� Each time an insertion is made, one must do the following :
1. Let the node to be inserted travel down the appropriate branch, keeping
track along the way of the deepest level node on that branch that has a
balance factor of +1 or -1 (this particular node is called pivot).
2. Inclusive and below the pivot node, recompute all balance factors along the
insertion path traced in step 1.
3. Determine whether the absolute value of the pivot node’s balance factor
switched from 1 to 2.
4. If there was such a switch, perform a manipulation of tree pointers centered
at the pivot node to bring the tree back into height-balance. This operation is
frequently referred to as an AVL-rotation.
Malek Mouhoub, CS340 Fall 2002 39
4. Trees
AVL rotations
4 cases to consider :
Case 1 : The insertion that unbalanced the tree occurred in the left subtree of the left child
of the pivot node.
Case 2 : The insertion that unbalanced the tree occurred in the right subtree of the right
child of the pivot node.
Case 3 : The insertion causing the imbalance occurred in the right subtree of the left child of
the pivot node.
Subcase 1 : Neither the pivot node nor its left child has a right child.
Subcase 2 : Insertion is in the left subtree of the right child of the left child of the pivot.
Subcase 3 : Insertion occurs in the right subtree of the right child of the left child of the
pivot.
Case 4 : The insertion that causes the imbalance in the tree is made in the left subtree of
the right child of the pivot node. Case 4 is to case 3 as case 2 is to case 1.
Malek Mouhoub, CS340 Fall 2002 40
4. Trees
Case 1
Pivot S
M
Leftsubtreeof M
Rightsubtreeof M
Rightsubtreeof S
BF=+1
BF=0
Hei
gh
t =
h+2
Height = h
Pivot S
M
Leftsubtreeof M
Rightsubtreeof M
Rightsubtreeof S
BF=+2
BF=+1
inserting a new key
insert
rotation
Pivot
S
M
Leftsubtreeof M
Old rightsubtreeof M
Rightsubtreeof S
BF=0
BF=0
insert
balancing
Hei
gh
t =
h+2
Malek Mouhoub, CS340 Fall 2002 41
4. Trees
Case 2
Pivot S
V
Leftsubtreeof V
Rightsubtreeof V
Leftsubtreeof S
BF=-1
BF=0
Hei
gh
t =
h+2
Hei
gh
t =
h
inserting a new key
rotation
balancing
Hei
gh
t =
h+2
Pivot S
V
Leftsubtreeof V
Rightsubtreeof V
Leftsubtreeof S
BF=-2
BF=-1
insert
Pivot V
S
Leftsubtreeof S
Rightsubtreeof V
Old leftsubtreeof V
BF=0
BF=0
insert
Malek Mouhoub, CS340 Fall 2002 42
4. Trees
Case 3 : subcase 1
Pivot S BF=1
inserting a new key balancingM BF=0
Pivot S BF=2
M BF=-1
Q
Pivot Q BF=0
M BF=0 S
BF=0
BF=0
Malek Mouhoub, CS340 Fall 2002 43
4. Trees
Case 3 : subcase 2
Pivot S
Q
Leftsubtreeof Q
Rightsubtreeof Q
Rightsubtreeof S
BF=+1
BF=0
Hei
gh
t =
h+2
Hei
gh
t =
h
inserting a new key
M
Leftsubtreeof M
BF=0
Hei
gh
t =
h-1
Hei
gh
t =
h
Pivot S
Q
Leftsubtreeof Q
Rightsubtreeof Q
Rightsubtreeof S
BF=+2
BF=1
M
Leftsubtreeof M
BF=-1
insert
Pivot Q
S
Old leftsubtreeof Q
Old rightsubtreeof Q
Rightsubtreeof S
BF=0
BF=-1M
Leftsubtreeof M
BF=0
insert
Balancing
Hei
gh
t =
h+2
Malek Mouhoub, CS340 Fall 2002 44
4. Trees
Case 3 : subcase 3
Pivot S
Q
Leftsubtreeof Q
Rightsubtreeof Q
Rightsubtreeof S
BF=+1
BF=0
Hei
gh
t =
h+2
Hei
gh
t =
h
inserting a new key
M
Leftsubtreeof M
BF=0
Hei
gh
t =
h-1
Hei
gh
t =
h
Pivot S
Q
Leftsubtreeof Q
Rightsubtreeof Q
Rightsubtreeof S
BF=+2
BF=-1
M
Leftsubtreeof M
BF=-1
insert
Pivot Q
S
Old leftsubtreeof Q
Old rightsubtreeof Q
Rightsubtreeof S
BF=0
BF=0M
Leftsubtreeof M
BF=-1
insert
Balancing
Hei
gh
t =
h+2
Malek Mouhoub, CS340 Fall 2002 45
4. Trees
4.5 Splay Trees
� Think in terms of a set of operations instead of a single one.
� Goal : guarantees that any M consecutive tree operations
starting from an empty tree take at most ��� ����� time.
� Idea : After a node is accessed, it is pushed to the root by a
series of AVL tree rotations.
Malek Mouhoub, CS340 Fall 2002 46
4. Trees
the wrong way
� Perform single rotations, bottom up.
� In the case of inserting the keys �� �� � � � � � into an empty
tree :
– ���� is required to build the tree.
– ����� for accessing all the keys in order.
– After the keys are accessed, the tree reverts to its original
state.
Malek Mouhoub, CS340 Fall 2002 47
4. Trees
Splaying
� If the parent of the node to be pushed is the root, perform a
single rotation.
� Otherwise perform a double rotation :
– 2 cases to consider (plus symmetries).
Malek Mouhoub, CS340 Fall 2002 48
4. Trees
Case 1
G
P
X
P G
X
A
B C
A B C D
D
Malek Mouhoub, CS340 Fall 2002 49
4. Trees
Case 2
P
X
G
B
D
C
P
G
X
A
B
DC
Malek Mouhoub, CS340 Fall 2002 50
4. Trees
Advantage of the method
� When inserting items �� �� � � � � � � into an initially empty tree :
– ���� is required to insert all items (same as 1st method).
– After accessing and pushing node 1 to to root, ��� units
only are required to access node 2 (instead of � � � for the
1s method).
– After accessing and pushing the other nodes, the depth
becomes ���� .
Malek Mouhoub, CS340 Fall 2002 51
4. Trees
4.6 B-Trees
Problem : Disk utilization of a binary search tree is extremely
inefficient. A minimum of a single page is read (at least 512
bytes) in order to get the following information : key value,
address of left and right subtrees.
First Solution : Divide a binary tree into pages and then store
each page in a block of contiguous locations on disk. The
number of seeks associated to any search will be then reduced.
� Problem when inserting new items.
� use B-trees.
Malek Mouhoub, CS340 Fall 2002 52
4. Trees
Paged Binary Trees
Malek Mouhoub, CS340 Fall 2002 53
4. Trees
B-Trees
A B-tree of order M is an M-ary tree with the following properties :
1. The data items are stored at leaves.
2. The non leaf nodes store up to M-1 keys to guide the searching;
key i represents the smallest key in subtree i+1.
3. The root is either a leaf or has between two and M children.
4. All non leaf nodes (except the root) have between [M/2] and M
children.
5. All leaves are at the same depth and have between [L/2] and L
children, for some L.
Malek Mouhoub, CS340 Fall 2002 54
4. Trees
B-Tree of order 5
41 66 87
8 18 26 48 51 54 72 78 83 92 97
246
810121416
41424446
66686970
18202224
2628303132
3536373839
484950
515253
54565859
72737476
787981
838485
878990
929395
979899
35
Malek Mouhoub, CS340 Fall 2002 55
Cal
cula
tin
gM
,Lan
dth
ele
velo
fth
eB
-Tre
e
�E
xam
ple
:dr
ivin
gre
cord
sfo
rci
tizen
sin
the
stat
eof
Flo
rida
–10
,000
,000
reco
rds.
–R
ecor
d(d
ata)
:25
6by
tes.
–K
ey(r
epre
sent
ing
ana
me)
:32
byte
s.
–P
oint
er(b
ranc
h):
4by
tes
(add
ress
ofdi
skbl
ock)
.
�N
ode
=di
skbl
ock
=8,
192
byte
s.
�L
=�����
���
���
�(M
-1)
keys
+M
poin
ters
=8,
192
�32�
(M-1
)+
4�
M=
8,19
2
�36
M-
32=
8,19
2
�M
228
�62
5,00
0le
aves
.W
hy?
�Le
aves
wou
ldbe
onle
vel4
inth
ew
orst
case
.W
hy?
�W
orst
case
num
ber
ofac
cess
:����������.
Why
?
Mal
ekM
ouho
ub,C
S34
0Fa
ll20
0256
4. Trees
Inserting item 57
41 66 87
8 18 26 48 51 54 72 78 83 92 97
246
810121416
41424446
66686970
18202224
2628303132
3536373839
484950
515253
5456575859
72737476
787981
838485
878990
929395
979899
inserting item 57
35
Malek Mouhoub, CS340 Fall 2002 57
4. Trees
Inserting 55 : Split into 2 leaves
41 66 87
8 18 26 48 51 54 72 78 83 92 97
246
810121416
41424446
66686970
18202224
2628303132
3536373839
484950
515253
545556
72737476
787981
838485
878990
929395
979899
35
575859
57
Split into 2 leaves after inserting 55
Malek Mouhoub, CS340 Fall 2002 58
4. Trees
Inserting 40 : split into 2 leaves + split of the parent
41 66 87
35 38 48 51 54 72 78 83 92 97
41424446
66686970
2628303132
353637
484950
515253
545556
72737476
787981
838485
878990
929395
979899
575859
578 18
246
810121416
18202224
383940
26
Inserting item 40 causes a split into 2 leaves and then a split of the parent node
Malek Mouhoub, CS340 Fall 2002 59
4. Trees
Deleting 99
41 66 83
35 38 48 51 54 72 78 87 92
41424446
66686970
2628303132
353637
484950
515253
545556
72737476
787981
838485
878990
9293959798
575859
578 18
246
810121416
18202224
383940
26
Malek Mouhoub, CS340 Fall 2002 60