searching ( 搜索) 1. introduction 2. sequential search 3. binary search 4. comparison trees

37
Searching ( 搜搜1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Upload: roy-thompson

Post on 18-Jan-2018

333 views

Category:

Documents


0 download

DESCRIPTION

Search may succeed if there exists one record with the same key. There may be more than one record with the same key. In this case, return the first one. Search may fail if there is no record with the given key. Time complexity of the search algorithm: we count how many times one key is compared with another during a search. Internal searching: all the records are kept in high-speed memory. External searching: most of the records are kept in disk files.

TRANSCRIPT

Page 1: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Searching ( 搜索)

1. Introduction

2. Sequential Search

3. Binary Search

4. Comparison Trees

Page 2: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

The Problem

A pile of records

Keys关键字

• Given a pile of records, where each record is associated with one piece of information, which we shall call a key.• Given one key, the target, and are asked to search the pile to find the record (if any) whose key is the same as the target.

Page 3: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

• Search may succeed if there exists one record with the same key. There may be more than one record with the same key. In this case, return the first one.

• Search may fail if there is no record with the given key.

• Time complexity of the search algorithm: we count how many times one key is compared with another during a search.

• Internal searching: all the records are kept in high-speed memory.

• External searching: most of the records are kept in disk files.

Page 4: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Assumptions

• Assume class Record and class Key;• Every Record is associated to a key (of type Key);

– A key member;– a conversion operation: operator Key() const;

• Key objects can be compared with the standard operators == , != , <, >, <= , >= .• Records can be compared to each other or to keys by first converting to their associated keys. • Define a type Error_code:enum Error_code{success, fail, not_present};

Page 5: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Sequential Search

• Algorithm: the simplest way to do search;• Analysis: How much work need to be done to search a

key;• Test: empirical testing on sample data.

Input: a1,a2,…,an

Output: location of the item if search is successful, null otherwise.

Algorithm: for each item ai in the list:

if that item has the desired value, stop the search and return the item's location. return null.

Page 6: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Analysis

How to estimate the amount of work that a search method based on comparison will do?

• The total work is reflected by the number of comparisons of keys that the algorithm makes.

• We compute both successful cases and unsuccessful cases.

Page 7: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

How many comparisons of keys does sequential search make when it is applied to a list of n entries?

Assuming that the value being sought occurs once in the list, and all orderings of the list are

equally likely.

Page 8: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Sequential search

The method is simple; Efficient for short list; The list don’t have to be ordered;

Disaster for long list to be searched;STL implementation: find and count.Question: Suppose you have a list of elements of type Record, and

you want to find a record that satisfies certain property, for example, you want to find if there is pair in vector<pair<string,int> > such that the first component is “Gao”. How do you implement such a function that can take a predicate parameter? What is the counterpart in STL?

There are far more efficient methods for ordered lists.

Page 9: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Binary Search

• The method requires the list is ordered;• The method:

– compare the key with the middle record in the list,– then decide to continue search on the first half or

the second half using the same method. • The method reduces the number of comparisons

dramatically.

Page 10: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Ordered Lists

Ordered lists as an ADT

All the operations on lists apply except insert and replace.

Question: How would you implement such a class by using inheritance?

Page 11: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Binary Search

Simple? However, •The method dates back to 1946, but the first correct implementation appeared in 1962;

•Only five correct implementations out of twenty textbooks (from Kruse’s text);

•90% professional programmers fail to code it correctly.

Page 12: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Making Implementation Correct

To make a correct implementation, we must State exactly what our variables designate; State precisely what conditions must be true before and after each iteration of the loop; make sure the loop will terminate properly.We will

The invariant

Page 13: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees
Page 14: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

The forgetful Version• Forget the possibility that the key target might be found quickly and continue to subdivide the list until what remains has length 1.

• The recursive version takes the range as parameters.

Page 15: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees
Page 16: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Correctness of the Algorithm

What do we need to verify or prove that our algorithm is correct?

1. The target key can be found if it is present in the list and it is the first occurrence.

2. The algorithm always terminates.

Page 17: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees
Page 18: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Termination

The recursion continue when top > bottom. So we prove that the size of the interval [bottom, top] strictly decreases.

1. When bottom < top, we have bottom <= mid < top2. The recursive calls reduce the size of the

intervals either to top – (mid+1) or mid – bottomBoth of which are strictly less then top –

bottom.So the recursion terminates.

Page 19: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Use of recursive_binary_1

If we are searching in a whole list, we can adjust the above function to the following search function:Error_code run_recursive_binary_1(const Ordered_list &the_list, const Key &target, int &position)

{

return recursive_binary_1(the list, target,

0, the_list.size() –1, position);

}

Page 20: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Nonrecursive Version

Because both of the two recursions in the function recursive_binary_1 are tail recursion, it is easy to eliminate them and make an iterative version.

Page 21: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees
Page 22: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Binary search in STL

#include <algorithm> bool binary_search( forward_iterator start, forward_iterator end, const TYPE&

val ); As both random_access_iterator and bidirectional_iterator are refinement of for

ward_iterator, so binary_search can be used to search in both vectors and lists.

However, if one wants to get the position of the record, one needs lower_bound(upper_bound).

• Notice that binary_search uses the default comparison ‘<‘ to decide if two values are equivalent. How about using a relation other than ‘<‘, for example, ‘>’?

• Binary_search has another version that allows you provide your own order relation.

Page 23: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Comparison TreesThe comparison tree of an algorithm is obtained by tracing the action of the algorithm:• A vertex (circle) represents a comparison of keys. Inside the circle we put the index of the key against which we are comparing the target key.• Branches (lines) drawn down from the circle represent the possible outcomes of the comparison. • Leaves (squares): When the algorithm terminates, we put either F (for failure) or the location where the target is found at the end of the appropriate branch.

Page 24: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Comparison tree for sequential

search on a list of n elements.

n comparisons to search nth element

And for unsuccessful search

Page 25: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Comparison Tree for Binary Search

Five comparisons at most for successful and unsuccessful

searches

The comparison tree for a list of 10 elements.

Page 26: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Analysing Binary Search

• Search terminates at leaves, half are successful and half are unsuccessful;

• All leaves are on the two adjacent levels;• Worst case: the number of comparisons is the height of the tree;• Average case?

Page 27: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Comparison count •The total number of comparisons (for successful and unsuccessful) is the external path length of the tree, that is,

(45)+(64)+(45)+(64) = 88

• Half the leaves correspond to successful searches and half to unsuccessful searches, so the average number of comparisons for both successful and unsuccessful searches is

44/10 = 4.4 (for n = 10)

Page 28: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

2-trees

A 2-tree is a tree where every vertex except the leaves has exactly two children.

Lemma. The number of vertices on each level of a 2-tree is at most twice the number on the immediately above. Hence, in a 2-tree, the number of vertices on level t is at most 2t for t0.

Lemma. If a 2-tree has k vertices on level t, then t lg k.

Page 29: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Some Notations

lg x = log 2 x

ln x = log e x

The floor of x: x, which is the largest integer less than or equal x.

The ceiling of x: x, which is the smallest integer greater than or equal to x.

For integer n, we have

n/2 + n/2 = n

(n-1)/2 <= n/2 <= n/2

n/2 <= n/2 <= (n+1)/2

Page 30: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Analysis of binary_search_1Some facts: The last step is a check with the target, hence successful and unsuccessful searches terminate at leaves, and there are 2n leaves; All the leaves are on the same level or on two adjacent levels (exercise); let t be the maximum level, then t = lg 2n, this is because 2n<=2t and 2n>2(t-1) for 2-trees, hence t >= lg 2n and t < lg 2n +1;

the number of leaves is less than the number of nodes at level t and greater than the number of nodes at level t-1

Page 31: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

The maximum number of key comparisons is t = lg 2n, approximately lg n +1;

The average number of comparisons is about lg n +1 (compute the external path length and then the average of comparisons);

Page 32: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Lower Bounds

Binary search does far fewer comparisons than sequential search. •Question: Is there any algorithm that will in the worst case and the average case , be able to find its target using significant fewer comparisons of keys than binary search?•The answer is NO! As long as the algorithm is based on comparisons. This can be proven by examine the comparison tree.

Page 33: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

• The height of our tree will equal the number of comparisons that the algorithm does in its worst case;• The external path length of the tree divided by the number of possible outcomes equal to the average number of comparisons done by the algorithm.

an internal node has one comparison

Leaves represent search results.

A comparison tree based on key comparison

Page 34: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

For a 2-tree with n leaves, the minimum value for its height and external path length occur when all the leaves are on the same level or on two adjacent levels.

r>s+1

E(T’)=E(T)-2r+(r-1)-s+2(s+1)<E(T)

Page 35: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Observations on 2-trees

Lemma. Let T be a 2-tree with k leaves. Then the height h of T satisfies h dlg ke and the external path length E(T) satisfies E(T) k lg k. The minimum values for h and E(T) occur when all the leaves of T are on the same level or on two adjacent levels.

Lemma. Suppose that an algorithm uses comparisons of keys to search for a target in a list. If there are k possible outcomes, then the algorithm must make at least lg k comparisonsof keys in its worst case and at least lg k in its average case.

Lemma. binary_search_1 is optimal in the class of all algorithms that search an ordered list by making comparisons of keys. In both the average and worst cases, binary_search_1 achieves the optimal bound.

Page 36: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Summary

1. What is the search problem?

2. Sequential search, the algorithm and its features;

3. Binary search, the algorithm and its features;

4. Analyzing search, comparison trees, external path length, and how the number of comparisons is computed from the trees;

5. Why binary search is optimal among search methods using key comparisons;

6. How algorithms can be verified or proved correct.

7. Learn to use STL implementations of sequential search and binary search.

Page 37: Searching ( 搜索) 1. Introduction 2. Sequential Search 3. Binary Search 4. Comparison Trees

Programming Exercises

1. Find sequential search in STL and see how to use it.

2. Find binary search in STL and see how to use it.

3. Solve Sicily 1194 using binary search.

4. Can you find the last occurrence of the given target?

5. How would you implement a search function that takes a predicate as a parameter?