datastructuren - data structuresliacs.leidenuniv.nl/~hoogeboomhj/dat/ohp/dat-present-3.pdf · data...
TRANSCRIPT
Datastructuren
DatastructurenData Structures
Fenia AivaloglouHendrik Jan Hoogeboom
Informatica – LIACSUniversiteit Leiden
najaar 2019
Datastructuren
Table of Contents I
3 Binary Search Trees
Datastructuren
Binary Search Trees
Contents
3 Binary Search TreesIntroductionBST use casesConstructing BSTsAnalysis of treesADT Set and Dictionary
Datastructuren
Binary Search Trees
Introduction
binary search tree BST1
K
< K > K
Definition
A binary search tree is a binary tree such that for each node:
all nodes in its left subtree have smaller values, and
all nodes in its right subtree have larger values
1BZB, zie Algoritmiek
Datastructuren
Binary Search Trees
Introduction
comparables
chico
harpo
groucho
gummo
marx
zeppo 4
5
11
18
25
30 11.6.1509
28.5.1533
30.5.1536
6.1.1540
28.7.1540
12.7.1543
Datastructuren
Binary Search Trees
Introduction
binary search tree BST
worst case search complexity: unsuccessful search in
linear tree: O(n)
optimal tree: O(log2(n)) (complete tree)
Average case behaviour: see later
Datastructuren
Binary Search Trees
Introduction
BST with 31 most common English words
top five frequencies indicated the15568
to5739
this with
was you
which
of9767
and7638
that
on
or
a5074
in
I is
it
not
for
as his
are be he
at
but
from
have herby
had
Inserted in BST by decreasing order of frequencySuccessful search of BST requires 4.042 comparisons (on avg.)
Datastructuren
Binary Search Trees
Introduction
balanced BST
a
5074
and7638
are
as
at
be
but
by
for
from
had
have
he
her
his
I
in
is
it
not
of
9767
on
or
that
the
15568
this
to
5739
was
which
with
you
Perfectly balanced BST
Successful search requires 4.393 comparisons (on avg.)
Datastructuren
Binary Search Trees
Introduction
optimal BST
are at but from have her I which
as by had his is not or was you
a5074
be he it on this with
and7638
in that to5739
for the15568
of9767
Optimal tree taking frequencies into account
Successful search requires 3.437 comparisons (on avg.)
source: Knuth TAoCP Vol.3 (Sorting and Searching)
Datastructuren
Binary Search Trees
BST use cases
search value
bool contains( const Comparable & x, Node *t ) const {
if( t == nullptr )
return false;
else if( x < t->element )
return contains( x, t->left );
else if( t->element < x )
return contains( x, t->right );
else
return true; // found
}
call with: contains(v,root);
Datastructuren
Binary Search Trees
BST use cases
find min/max value
BinaryNode * findMin( BinaryNode *t ) const {
if( t == nullptr )
return nullptr;
if( t->left == nullptr )
return t;
return findMin( t->left );
}
BinaryNode * findMax( BinaryNode *t ) const {
if( t != nullptr )
while( t->right != nullptr )
t = t->right;
return t;
}
call with: findMin(root); and findMax(root);
Datastructuren
Binary Search Trees
BST use cases
inorder is sorted
81
112
153
204
265
336
347
428
519
5710
6111
inorder : 8 11 15 29 26 33 34 42 51 57 61
Datastructuren
Binary Search Trees
BST use cases
find k-th element
Augment each node with the size of its subtree
51
103
141
206
261
302
3511
391
454
512
561
Let r be left->size + 1
If k = r: stop! This node has kth item
If k < r: search kth item in left subtree
If k > r: search (k − r)th item in right subtree
Datastructuren
Binary Search Trees
BST use cases
counting items in [12, 52]
3
6
9
12
X
15
1
18
X
21
24
2
27
X 60
30
33
4
36
39
42
X
45
148
X
51
X
54
57
Datastructuren
Binary Search Trees
Constructing BSTs
insertion (implementation)
template<class T>
void Node<T>::insert(const T& el, Node<T> * & p) {
if( p == nullptr ) {
p = new Node{el, nullptr, nullptr};
} else if (el < p->data) {
insert(el, p->left);
} else if (el > p->data) {
insert(el, p->right);
} else {
; // Duplicate; do nothing
}
}
call with: insert(el,root);
Datastructuren
Binary Search Trees
Constructing BSTs
deletion “by copying”
f
×
T1
Λ
=⇒
f
T1
×
T1 T2
=
×
p
Λ
T2
=⇒
p
×
Λ
T2
Datastructuren
Binary Search Trees
Constructing BSTs
deletion (implementation)
void remove( const Comparable & x, Node * & t ) {
if( t == nullptr ) return;
if( x < t->data ) remove( x, t->left );
else if( x > t->data) remove( x, t->right );
else if( t->left != nullptr && t->right != nullptr ) {
Node *pred = findMax( t->left );
t->element = pred->element;
remove( t->element, t->left );
}
else {
BinaryNode *oldNode = t;
if(t->left != nullptr ) t = t->left
else t = t->right;
delete oldNode;
}
}
aanroepen met: remove(el,root);
Datastructuren
Binary Search Trees
Analysis of trees
counting trees
i
Bi−1 Bn−i
Unlabeled n-node binary trees
Bn =∑n−1
i=0 (Bi−1 ·Bn−i) with B0 = 1
nth Catalan number: Bn = 1n+1
(2nn
)= (2n)!
(n+1)!n! ∼4n
n3/2√π
this is also the number of BST with given values:unique way to store values in given [unlabeled] tree
Datastructuren
Binary Search Trees
Analysis of trees
internal path length
0
1
2 2
1
2ipl = 0 + 1 + 1 + 2 + 2 + 2 = 8
Path length of node: # edges from root to node
Definition (Internal path length)
ipl = sum of all path lengths to all nodes
Avg # comparisons in successful search: ipln + 1
Datastructuren
Binary Search Trees
Analysis of trees
external path length
0
1
2 2
1
2
E = 3 + 3 + 3 + 3 + 2 + 3 + 3 = 20
Definition (External path length)
E = sum of all path lengths to the ‘extended’ leaves
Avg # comparisons in unsuccessful search: En+1 (n + 1 leaves)
Relation to ipl: E = ipl + 2n proof: induction
Datastructuren
Binary Search Trees
Analysis of trees
path length extremal trees
optimal (balanced) worst case (linear)h levels: n = 2h − 1 nodes
h = lg(n+1)
0
1 1
2 2 2 2
0
1
2
6
ipl =∑h−1
i=0 i · 2i, E = 2h · h ipl =∑n−1
i=0 i = n(n−1)2
⇒ ipl = (n+1) lg(n+1)− 2n E = ipl + 2n = n(n+3)2
avg = n+1n lg(n+1)− 1 avg = n+1
2
Datastructuren
Binary Search Trees
Analysis of trees
average tree
intuition: more balance ⇒ more permutations yield that treeexample: 4-node BSTs
1
2
3
4
1234ipl=6
1
2
4
3
1243ipl=6
1
3
2 4
13241342ipl=5
1
4
2
3
1423ipl=6
1
4
3
2
1432ipl=6
2
1 3
4
213423142341ipl=4
2
1 4
3
214324132431ipl=4
14 BSTs (7 symmetric to above)4! = 24 permutationsaverage ipl: 1
24(12× 4 + 4× 5 + 8× 6) = 11624 = 29
6
Datastructuren
Binary Search Trees
Analysis of trees
average ipl BST
In average internal path length BST n nodes
insert permutation 1, . . . , n into BST ⇒ tree structurewe average over permutations
5
2
1 4
3
6
7
permutationdetermines left & right subtrees
2 4 1 35
6 7
any k can be root = first elementIn = (n− 1) + 2
n
∑nk=1(Ik−1 + In−k)
Datastructuren
Binary Search Trees
Analysis of trees
telescope!
In average internal path length n nodes
so In = (n− 1) + 2(I0 + I1 + · · ·+ In−1)/n
also In−1 = (n− 2) + 2(I0 + I1 + · · ·+ In−2)/(n− 1)
subtract n In − (n− 1)In−1 = 2n− 2 + 2In−1
thus n In = (n + 1)In−1 + 2n− 2
In
n+ 1=In−1
n+
2
n+ 1−
2
n(n+ 1)
In−1
n=In−2
n− 1+
2
n−
2
(n− 1)n
. . .
I1
2=I0
1+
2
2−
2
1 · 2In
n+ 1=I0
1+O(lnn)−
2n
n+ 1
Datastructuren
Binary Search Trees
ADT Set and Dictionary
ADT Set
Initialize: construct an empty set.
IsEmpty: check whether there the set is empty (∅, containsno elements).
Size: return the number of elements, the cardinality of theset.
IsElement(a): returns whether a given object from thedomain belongs to the set, a ∈ A.
Insert(a): add an element to the set (if it is not present,A ∪ {a})Delete(a): removes an element from the set (if it is present,A \ {a}).
Efficient implementation of ADT Set possible with BST
Datastructuren
Binary Search Trees
ADT Set and Dictionary
end.