cs213d data structures and algorithms binary search treescs213d/bst.pdf · avl condition: maths...

Post on 08-Jul-2020

14 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CS213d Data Structures and AlgorithmsBinary Search Trees

Milind SohoniIIT Bombay and IIT Dharwad

April 11, 2017 1 / 28

A typical requirementSet S ⊆ U where U is a total-order.Example: Strings under lexico-graphical order. Integers under ≤.

Insert: Add an element S = S ∪ {x}.Find: Answer if x ∈ S .

Delete: Delete an element x , i.e., S = S − x .

Typical Implementations.

Sorted Array A. If |A| = n, then insert, delete take O(n)operations but find will take O(log(n)) operations.

What is O(f (n))? .

Actual number of operations T (n) such thatc1 · f (n) ≤ T (n) ≤ c1 · f (n). Different programmers anddifferent machines will yield different results.

Do the constants matter? YES and NO. For small devices suchas mobile-phones, satellite systems, it matters. For mostapplications 100 · n << 5n log(n) << 0.01 · n2.

April 11, 2017 2 / 28

Linked lists

4 8 99

head

Analysis

insert O(n) Travel along the list todelete O(n) locate the right place.find O(n) Once located, operation is easy.

Even worse then sorted array. Reason: inability to access elements atrandom.Where are linked lists good?: addition/deletion at either ends, i.e.,queues and stacks.

April 11, 2017 3 / 28

Heaps

Analysis

insert O(log(n))delete O(log(n))find O(n)

1

2 3

4 39

78 5

5

Delete: Similar to delmin.

Find: Still hard.

Tree makes Insert, delete easy but find is hard. Opposite tosorted array.

Is there any structure which has the advantages of both?I.e.,have an order but retain ease of addition/deletion?

April 11, 2017 4 / 28

BST Definition

Binary Search Tree: A binary tree with entries in U such that thein-order traversal is in ascending order.

1

2

3

3

4

5

5

7

8

9

6 1

2

3

3

4

5

5

9

Inorder=[1 2 3 3 4 5 5 6 7 8 9]

7

8

6

Note that structure need not be a heap-tree.

Given a structure of size n and an ascending array of the samesize, location of each element gets fixed.

April 11, 2017 5 / 28

Finding recursively

1

2

3

3

4

5

5

7

8

9

6 1

2

3

3

4

5

5

9

Inorder=[1 2 3 3 4 5 5 6 7 8 9]

7

8

6

Great News: Finding v : Compare with the root e(T ).

v = e(T ) Donev < e(T ) Look for v ∈ TL, if none, answer NOv > e(T ) Look for v ∈ TR , if none, answer NO

April 11, 2017 6 / 28

DifferentHow many operations does it take (in terms of number of elements)?

Operations: comparisons and pointer-chasing.

1

2

3

3

4

5

5

7

8

9

6 1

2

3

3

4

5

5

9

Inorder=[1 2 3 3 4 5 5 6 7 8 9]

7

8

6

element t1 t2 element t1 t2

¡1 4 4 2 3 34.5 4 4 5 1 15.5 4 2 7.5 3 5

April 11, 2017 7 / 28

Find and insertvoid insert(pvertex &*T, int val)

if (val < T->vdata)

{ if (T->cindex[0]!=NULL)

{insert(T->cindex[0],val)}

else

{new w; w->vdata=val;

T->cindex[0]=w;};

};// done with left

if (val >=T->vdata)

{ if (T->cindex[1]!=NULL)

{insert(T->cindex[1],val)}

else

{new w; w->vdata=val;

T->cindex[1]=w;};

};// done with right

Go downrecursively till youfind a NULL.

Prepare a newnode and insert asleaf.

Note that if equal,duplicate node onright is created.

Operations:bounded by heightof tree.

April 11, 2017 8 / 28

Insert Example

1

2

3

3

4

5

5

7

8

9

6 1

2

3

3

4

5

5

9

Inorder=[1 2 3 3 4 5 5 6 7 8 9]

7

8

6

7.5

7.5

No control on balance. See for example, elements 7.6, 7.7,...

Balance clearly important to keep good relationship between nand height.

April 11, 2017 9 / 28

Locate and deleting an element

Step I: Locate x in T , as before.Step II:Delete x from T to get T ′: T ′ is structurally different anddepends on the location v ∈ T of x .Case I. v is a leaf. Easy. T ′ = T − v .

1

2

3

3

4

5

5

7

8

9

6

Inorder=[1 2 3 3 4 5 5 6 7 8 9]

Inorder=[1 2 3 3 X 5 5 6 7 8 9]

Note that inorder(T − v) = inorder(T )− x .

April 11, 2017 10 / 28

Case II

Case II: v which has only once child w . Harder. Make parent p of vpoint to w .

1

2

3

3

4

5

5

7

8

9

6

46

Inorder=[1 2 3 3 4 5 5 6 X 8 9]

Inorder=[1 2 3 3 4 5 5 6 7 8 9]Inorder=[1 2 3 3 4 X 5 6 7 8 9]

April 11, 2017 11 / 28

Case II-continuedLet us see when v is the rightchild of the parent p. The othercase is similar.

For both cases w = pRL orw = pRR , we make w = pRto get T ′.

For w = pRL, we see thatinorderT (p) = inorder(TL) ·e(p) · inorder(Tw) · e(v),while for T ′, we now have:inorderT ′(p) = inorder(TL) ·e(p) · inorder(Tw) · e(v).

p

v

Tw

wTL

p

TL

w

Tw

p

TL

w

Tw

p

wTL

v

Tw

The case for w = pRR is similar.

Note that inorder(p) is a subsequence of inorder(T ).

April 11, 2017 12 / 28

Case IIIv has both children Hardest.

1

2

3

3

4

5

5

7

8

9

6

Inorder=[1 2 3 3 4 5 5 6 7 X 9]

Case I

Case II

Inorder=[1 2 3 3 4 5 5 6 7 8 9]

Inorder=[1 2 3 X 4 5 5 6 7 8 9]

Let w = next(v) (or prev(v), either is fine).

Copy y = e(w) into v . Delete location w .

For w , either Case I or II always applies.

April 11, 2017 13 / 28

Imbalance

Ideal case: Complete Binary Tree. Insert will tak O(log(n)) timebut CBT condition gets disturbed.

Worst Case: Path, Insert will take O(n) time. Conditionimproves with more insertions.

Heap-tree:Insertion not local if heap-tree structure to bemaintained. What if a small number is to be inserted in acomplete binary tree?

Is there a via-media?April 11, 2017 14 / 28

Adelson-Velski, Landis(AVL) condition various

examplesAVL Tree: A tree is an AVL tree-structure iff for all nodes v ,|height(vL)− height(vR) ≤ 1|.

v

hL

hR

good

good

good

good

good

good

good

good

good good

good

good

good

good

good good

good

good

good

good

good

good good

good

good

bad

good

good

bad

bad

good

good

bad

bad

good

good

good

bad

April 11, 2017 15 / 28

AVL condition: maths

Theorem: There is a constant 1 < α < 2 so that given any AVL treeTn of height n, the number of vertices |Tn| in Tn obeys therelationship αn ≤ |Tn| ≤ 2n.Proof: That |Tn| ≤ 2n is easy to show since the most number ofvertices a binary tree of height n can have is 2n.

We prove this by induction. Since Tn is of height n, we must eitherhave (i) one of the children, say TL is such that height(TL) = n − 1and height(TR) = n − 2, or (ii) both children have height n − 1, i.e.,height(TL) = height(TR) = n − 1.In case (i), we have

|Tn| = 1 + |TL|+ |TR |≥ αn−1 + αn−2

April 11, 2017 16 / 28

Continued...

So is there an α such that

|Tn| ≥ αn−1 + αn−2 ≥ αn for all n

In case (ii) we get:

|Tn| ≥ 2 · αn−1 ≥ αn for all n

The second condition is satisfied if α < 2. For the first condition,choose α = 1+

√5

2≈ 1.6. Verify that α2 = α + 1.

Proved!Thus, we have shown that 1.6n ≤ |Tn| ≤ 2n

April 11, 2017 17 / 28

Conclude that...If the AVL condition can be maintained, then insertion, find anddelete can be done in O(log(n)) time.

But is this condition disturbed? YES

Can it be recovered? YES Rotations.

good

good

good

good good

good

good

good

good

good

good good

good

good

bad

delete

insert insert

Can we tell for which insertions and deletions will we lose AVLconditions? Some shown. Any others?

At which vertex/node will this condition be lost?

April 11, 2017 18 / 28

Analysis: Insertion

h+1

insert

LL imbalanceh+2

h+1h

h

h+2

h

hh

h+1

insert

LR imbalanceh+2

h

h

h+2

h

hh+1

h

v

w

v

w

v

w

v

w

April 11, 2017 19 / 28

Rotation: The LL case

AB

C

h+2

h+1

h

h

w

h+1 A B C

hh

vh+3 w

h+2

xx

v

Vertex v is the first node where imbalance occurs. Vertex x isthe parent and v the left child. It does not matter whether v isthe left or the right child.

Note that the inorder listing of (T (x))L is AwBvC and isunchanged.

The height of (T (x))L remains unchanged, so that the effectdoes not percolate up.

April 11, 2017 20 / 28

Rotation: LR case

AB

C

h+2

h

w

A B C

h

vh+3

xx

v

h+1h

h+1h

h+3

w

Does Not Work!

What is to be done? Rotate at w followed by v .

April 11, 2017 21 / 28

What to do: Double Rotate

AB

C

h+2

h

w

h+3

x

v

h+1h

A

Ch

w

x

v

h

z

B1 B2h−1

C

x

v

h

AB1

B2

w

z

AB1

w

z

B2C

v

h h h−1

h

h+1h+1

h+2

x

April 11, 2017 22 / 28

Deletions

a

b c

d e

2 4

3

4

5

T2

T2

T3

a

c

5

rotate left

T2 T2

T3

4

impact

above c

a

d

As opposed to insert, deletemay require re-balancing allthe way up.

Also note that if instead ofT2,T2,T3, it was T2,T3,T2,then we would require arotate right at d

April 11, 2017 23 / 28

AVL Trees

Operation Time RemarksFind (O(log(n)) Binary SearchInsert (O(log(n)) Binary Search

O(constant) Add LeafO(constant) Re-balance

Delete (O(log(n)) Binary SearchO(constant) Case I,II or IIIO(log(n)) Re-balance

Many other variations after AVL: 2-3 Trees, Red-Black Trees.

April 11, 2017 24 / 28

Time complexity

Behind each programming step is an actual device which doesthe computing.

This usually means some mathematical operations on operandsof some fixed size.

Fetching operands and depositing them in a some fixed location.

It may also involve indirect addressing, where the actuall addressof the operand or the outcome needs a computation.

April 11, 2017 25 / 28

So far ...

Data Structure D: Sets S1, S2 etc., with some inherent relations.

Operations: Which D will support.

Applications : Places where D is used.

Performance: Number of operations that are needed to performthese oprations.

Example: Queue.

Q ⊆ U , a subset of a universal set. Three operations: push,pop, isempty and an size.

Various relations. Foremost, between two consecutive momentswhen the queue is empty, first-in, first-out.

Performance depends on implementation. Circular array: peroperation, a constant ampunt of steps.

April 11, 2017 26 / 28

Data-structure and addressing

Intimate relationship between mechanism of access to an item,i.e., hardware to relationships in the data-structure.

LHS is the actual or real operations that take place in themachine, while RHS are the abstract operations required in thedata-structure.

Queue: Linear array beginning at 0: push is constant time whilepop will take time proportional to n, the number of elements inthe queue.

A physically circular arraymakes both operations take a constantnumber of real operations (but the constant is bigger). Theimplementation is mathematically done by the mod operation.

April 11, 2017 27 / 28

Thanks

April 11, 2017 28 / 28

top related