data structures( 数据结构 ) course 2:searching

66
Data Structures( Data Structures( 数数数数 数数数数 ) ) Course 2:Searching Course 2:Searching

Upload: kassia

Post on 14-Jan-2016

115 views

Category:

Documents


1 download

DESCRIPTION

Data Structures( 数据结构 ) Course 2:Searching. index 下标 , 索引 , 指针 sentinel 哨兵 probability 概率 key 关键字 hash 散列 , 杂凑 collision 冲突 cluster 聚集 , 群集 synonym 同义语 , 同义词 probe 探测 load factor 装填因子. Vocabulary. sequential search 顺序查找 element 元素 order 次序 binary search 二分查找 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Structures( 数据结构 ) Course 2:Searching

Data Structures(Data Structures( 数据结数据结构构 ))

Course 2:SearchingCourse 2:Searching

Page 2: Data Structures( 数据结构 ) Course 2:Searching

2西南财经大学天府学院

VocabularyVocabulary

sequential search 顺序查找element 元素order 次序binary search 二分查找target 目标algorithm 算法array 数组location 位置object 对象 , 目标parameter 参数

index 下标 , 索引 , 指针sentinel 哨兵probability 概率key 关键字hash 散列 , 杂凑collision 冲突cluster 聚集 , 群集synonym 同义语 , 同义词probe 探测load factor 装填因子

Page 3: Data Structures( 数据结构 ) Course 2:Searching

3西南财经大学天府学院

SearchingSearching

One of the most common and time-One of the most common and time-consuming operations in computer consuming operations in computer science.science.

To find the location of a target among a To find the location of a target among a list of objects.list of objects.

Page 4: Data Structures( 数据结构 ) Course 2:Searching

4西南财经大学天府学院

Main contents(in chapter Main contents(in chapter 2)2)

List searching(including two basic search algorithms)

Sequential search(including three variations)Binary search

Hashed list searching—the key through an algorithmic function determines the location of data

Collision resolution

To discuss the list search algorithms using an array structure

Page 5: Data Structures( 数据结构 ) Course 2:Searching

5西南财经大学天府学院

2-1 list searches (work with arrays)2-1 list searches (work with arrays)

The algorithm used to search a list depends to the structure of list

Sequential search(any array)List no ordered

Small listsNot searched often

Page 6: Data Structures( 数据结构 ) Course 2:Searching

6西南财经大学天府学院

44 2121 3636 1414 6262 9191 88 2222 77 8181 7777 1010

A[0] A[1] A[11]

Target given(14)

Location wanted(3)

Locating data in unordered listLocating data in unordered list

Page 7: Data Structures( 数据结构 ) Course 2:Searching

7西南财经大学天府学院

Target given:14Location wanted:3

101077778181772222889191626214143636212144

A[0] A[1] A[11]14 not equal 4Inde

x0

101077778181772222889191626214143636212144

A[0] A[1] A[11]14 not equal 21

Index

1

101077778181772222889191626214143636212144A[0]A[1] A[11]

14 equal 14Index

3

Search ConceptSearch Concept

Page 8: Data Structures( 数据结构 ) Course 2:Searching

8西南财经大学天府学院

Search ConceptSearch Concept

Page 9: Data Structures( 数据结构 ) Course 2:Searching

9西南财经大学天府学院

Sequential search algorithms Sequential search algorithms

Needs to tell the calling algorithm two things

Did it Find the data it was looking for?If it did, at what index are the target data found.

Requires four parametersThe list we are searchingAn index to the last element in the listThe targetThe address where the found element’s index location is to stored

(Return Boolean)

Page 10: Data Structures( 数据结构 ) Course 2:Searching

10西南财经大学天府学院

sequential search algorithmsequential search algorithmalgorithm seqsearch(val list <array> val last <index> val target <keytype> ref locn <index>)looker=0loop (looker < last and target not equal list [looker])

looker = looker + 1end looplocn = lookerif (target equal list [looker]) found = trueelse found = falseend if return found end seqsearch

Locate the target in an unordered list Pre list must contain at least one elementlast is index to last element in the listtarget contains the data to be locatedlocn is address of index in calling algorithmPostif found—matching index stored in locn & found trueIf not found—last stored in locn & found falseReturn found<boolean>

Page 11: Data Structures( 数据结构 ) Course 2:Searching

11西南财经大学天府学院

Variations on sequential Variations on sequential searchessearches

Sentinel search

Probability search

Ordered list search

Page 12: Data Structures( 数据结构 ) Course 2:Searching

12西南财经大学天府学院

Sentinel searchSentinel searchalgorithm seqsearch(val list <array> val last <index> val target <keytype> ref locn <index>)List [last + 1] = targetlooker=0loop (target not equal list [looker]) looker = looker + 1end looplocn = lookerif (looker <= last) found = true locn = lookerelse found = false locn = lastend if return found end sentinel search

Locate the target in an unordered list Pre list must contain at least one elementLast is index to last element in the listTarget contains the data to be locatedLocn is address of index in calling algorithmPost if found—matching index stored in locn & found trueIf not found—last stored in locn & found trueReturn found<boolean>

Page 13: Data Structures( 数据结构 ) Course 2:Searching

13西南财经大学天府学院

probability searchprobability searchlooker=0loop (looker < last and target not equal list

[looker]) looker = looker + 1end loopif (target equal list [looker]) found = true if ( looker > 0 ) temp = list [looker – 1] list [looker – 1] = list [looker] list [looker] = temp looker = looker – 1 endifelse found = false end if locn = lookerreturn found end probability search

Locate the target in an unordered list Pre as the same abovePost if found—matching index stored in locn & found true &Element move up in Element move up in prioritypriorityIf not found—as same Return found<boolean>

Page 14: Data Structures( 数据结构 ) Course 2:Searching

14西南财经大学天府学院

Ordered list searchOrdered list searchIf (target <= list[last ] ) looker=0 loop (target > list [looker]) looker = looker + 1 end loopelse looker = lastendifif (target equal list[looker]) found = true else found = false end if locn = lookerreturn found

Locate target in a list ordered on target

Note:• It is not necessary to

search to the end of list• It is only for the small

list • Incorporate the Sentinel Pre: the same as sequential Postif found—the same as aboveIf not found—locn is index of

first element > target or locn equal last & found is false

Return found < boolean >

Page 15: Data Structures( 数据结构 ) Course 2:Searching

15西南财经大学天府学院

Binary searchBinary searchSequential search algorithm is very slow

–But, It is the only solution if the array is not sorted

Binary search(ordered list)–For the large list

–First sort

–Then search

Page 16: Data Structures( 数据结构 ) Course 2:Searching

16西南财经大学天府学院

Binary search methodBinary search method

Suppose L a sorted list searching for a value X

1. Compare X to the middle value (M) in L. 2. if X = M we are done. 3. if X < M we continue our search, but we can confine our

search to the first half of L and entirely ignore the second half of L.

4.if X > M we continue, but confine ourselves to the second half of L.

Page 17: Data Structures( 数据结构 ) Course 2:Searching

17西南财经大学天府学院

919181817777626236362222212114141010887744

A[0] A[1] A[11]

0

First

5

mid

11

last

Target are found ,target 22 is in the list

22>21

919181817777626236362222212114141010887744

A[0] A[1] A[11]

22<62

6 8 11

First mid last

919181817777626236362222212114141010887744

A[0] A[1] A[11]

22=22

6 6 7

First mid last

Page 18: Data Structures( 数据结构 ) Course 2:Searching

18西南财经大学天府学院

Target not found --Target 11 is not in the list

919181817777626236362222212114141010887744

A[0] A[11]

0First

5mid

11last

11<21

919181817777626236362222212114141010887744

A[0] A[1] A[11]0 2 4

First mid last 11>8

11>10First mid last

919181817777626236362222212114141010887744

A[0] A[1] A[11]3 3 4

First mid last

919181817777626236362222212114141010887744

A[0] A[1] A[11]11<144 4 4

First mid last

4 4 3Function terminates

Page 19: Data Structures( 数据结构 ) Course 2:Searching

19西南财经大学天府学院

Binary search(Binary search(ordered list ))algorithm binary__search( val list <array>, val end <index>, val target <keytype>, ref locn <index>)First = 0Last = endloop (first <= last ) mid = ( first + last ) / 2 if ( target > list [mid] ) look in upper half first = mid +1 else if ( target < list [mid] ) look in lower half last = mid – 1

else found equal : force exit first = last + 1 end if end loop locn = mid if (target equal list [mid]) found = true else found = false end if return found

end binary search

Pre list is ordered; it must contain at least one elementend is index to the largest element in the list Target is the value of element being soughtLocn is address of index in calling algorithmPostFound:locn assigned index to target element found set truenot found:locn = element below or above target found set falseReturn found<boolean>

Page 20: Data Structures( 数据结构 ) Course 2:Searching

20西南财经大学天府学院

Analyzing (the efficiency) Analyzing (the efficiency)

Sequential search ,Sentinel search ,Ordered list search : O(n)Binary search: O(log 2n)

Comparison of binary and sequential searches

size binary Sequential

(average)

Sequential

(worst case)16 4 8 16

10,000 14 5000 10,000

1,000,000 20 500,000 1,000,000

Page 21: Data Structures( 数据结构 ) Course 2:Searching

21西南财经大学天府学院

2-3 Hashed list searches2-3 Hashed list searches

Hash functionHash functionkey Location of data

Ideal search : we would know exactly where the

data are and go directly to there

Goal of hashed search : to find the data with only

one test

Use an array of data

Hash algorithmHash algorithmkey index of array(address of list )

Page 22: Data Structures( 数据结构 ) Course 2:Searching

22西南财经大学天府学院

address

John adamsJohn adams 107095107095

… … … …

Vu nguyenVu nguyen 102002102002

Sarah trappSarah trapp 111060111060

Harry leeHarry lee

[000][001][002][003][004][005][006][007][008]

[099][100]

hashhash

HashfunctionHash

functionkey address

102002107095111060

51002

key

Figure 2-6 Hash concept

Page 23: Data Structures( 数据结构 ) Course 2:Searching

23西南财经大学天府学院

Hash search: A search in which the key ,through an algorithmic function, determines the location of the data.

we use a hashing algorithm to transform the key into the index that contains the data we need to locate

(key-to –address)

Basic ConceptsBasic Concepts

Page 24: Data Structures( 数据结构 ) Course 2:Searching

24西南财经大学天府学院

A set of keys hash to the same location—Synonym

Contain two or more synonyms in a list—collision

Home address—produced by hashing algorithm

Collision resolution—two keys collide at a home addressPlace one of the keys and its data in another location

Prime area—memory contains all of home addresses

ProblemProblem

Page 25: Data Structures( 数据结构 ) Course 2:Searching

25西南财经大学天府学院

CC AA BB

[0] [4] [8] [16]

Collision resolution

1.hash(A)2.hash(B) 3.hash(C)

Collision resolutionB and ACollide at 8

C and BCollide at 16

Figure 2-7 the collision resolution concept

Page 26: Data Structures( 数据结构 ) Course 2:Searching

26西南财经大学天府学院

Locate an element in a hashed listLocate an element in a hashed list

Use the same algorithm to insert it into the listUse the same algorithm to insert it into the list

First hash the key and check the home addressFirst hash the key and check the home address

If it does If it does – the search is complete– the search is complete

If not If not – use the collision resolution algorithm to – use the collision resolution algorithm to

determine the next location and continue until determine the next location and continue until

find the element or determine it is not in the listfind the element or determine it is not in the list

Each calculation of an address and test for Each calculation of an address and test for

success – probesuccess – probe

Page 27: Data Structures( 数据结构 ) Course 2:Searching

27西南财经大学天府学院

Hashing methods

rotationmidsquaremodulo division

direct

subtraction digitextraction

foldingpseudorandom

generation

Figure 2-8 Basic hashing techniques

Hashing methodsHashing methods

Page 28: Data Structures( 数据结构 ) Course 2:Searching

28西南财经大学天府学院

Direct methodDirect method

The key is the address(an element a key , no synonyms)

Example1: total monthly sales by the days of the months

Create an array of 31accumulator

The accumulation code is:

dailySales[sale.day] = dailySales[sale.day] +sale.amount;

Page 29: Data Structures( 数据结构 ) Course 2:Searching

29西南财经大学天府学院

Example 2: a small Example 2: a small company has fewer<100company has fewer<100Employee number is Employee number is between 1 and 100 between 1 and 100

000000 (not used)(not used) 001001 Harry leeHarry lee

002002 Sarah trappSarah trapp 003003 004004 005005 Vu nguyenVu nguyen 006006 007007 008008

… … … … 099099 100100 John adamsJohn adams

[000][001][002][003][004][005][006][007][008]

[099][100]

hashhash005100002

51002

address

key

Figure 2-9 Direct hashingOf employee numbers

Page 30: Data Structures( 数据结构 ) Course 2:Searching

30西南财经大学天府学院

•keys are consecutive , but do not start from 1•Such as your student ID number

Advantage•Hashing function is very simple•No collisions

Disadvantage

Only for small lists

Subtraction methodSubtraction method

Page 31: Data Structures( 数据结构 ) Course 2:Searching

31西南财经大学天府学院

Note:

1. Generally speaking , hashing lists require some

empty elements to reduce the number of collisions

2. This application above two is the ideal ,but it is very

limited , such as ID card number

Page 32: Data Structures( 数据结构 ) Course 2:Searching

32西南财经大学天府学院

This method divides the key by the array size and uses the

remainder for the address

Hashing algorithm is:

Address = key modulus listsizeAddress = key modulus listsize

Note: a prime number listsize produces fewer

collisions

Modulo-division method(Division Modulo-division method(Division remainder)remainder)

Page 33: Data Structures( 数据结构 ) Course 2:Searching

33西南财经大学天府学院

379452379452 Marry DoddMarry Dodd

121267121267 Bryan DevauxBryan Devaux

378845378845 John CarverJohn Carver

… … … …160252160252 Tuan NgoTuan Ngo045128045128 Shouli FeldmanShouli Feldman

[000][001][002][003][004][005][006][007][008]

[305][306]

hashhash121267045128379452

23060

Figure 2-10 modulo-division Hashing

Listsize=307Listsize=307

Page 34: Data Structures( 数据结构 ) Course 2:Searching

34西南财经大学天府学院

Digit extraction method Selected digits are extracted from the key And used as addressExample

379452121267378845160252045128

394112388102051

6-digits Employee number

3-digit address

Select the first, third, fourth

digits

Page 35: Data Structures( 数据结构 ) Course 2:Searching

35西南财经大学天府学院

The key is squared and the address selected from the middle of the squared numberLimitation: the size of the keyExample: 4-digit keys

379 * 379=143641121 * 121=014641378 * 378=142884160 * 160=025600045 * 045=002025

Select 1-3 digits

Fill 0 to 6 digits

squared

9452*9452=89340304:address is 3403

364464288560202 Select 3-5

digits as address

379452121267378845160252045128

Variation : select a portion of the key

Midsquare methodMidsquare method

Page 36: Data Structures( 数据结构 ) Course 2:Searching

36西南财经大学天府学院

123456789

123

123 456789

789

1

discarded

368

(a)fold shift

321

123 456987

789

1

discarded

764

(b)fold boundary

Digits reversed

Digits reversed

Figure 2-11 hash fold examples

++

Folding methods : fold shift and Folding methods : fold shift and fold boundaryfold boundary

Page 37: Data Structures( 数据结构 ) Course 2:Searching

37西南财经大学天府学院

Useful when keys are assigned seriallyUseful when keys are assigned serially

600101600102600103600104600105

600101600102600103600104600105

160010260010360010460010560010

Original key Rotation Rotated key

Figure 2-12 Rotation hashing

Rotation method : Incorporate with Rotation method : Incorporate with othersothers

Page 38: Data Structures( 数据结构 ) Course 2:Searching

38西南财经大学天府学院

In this method, the key is used as the seed in a pseudorandom number generator , the resulting random number is scaled into the possible address range using modulo division

A common random generator is: y=ax+cFor efficiency,factors a and c should be prime numbersFor example , a=17, c=7

Pseudorandom method:Pseudorandom method:

Page 39: Data Structures( 数据结构 ) Course 2:Searching

39西南财经大学天府学院

… …

379452 Marry Dodd … … 121267 Bryan Devaux … …378845 John Carver

045128 Shouli Feldman … …160252 Tuan Ngo

[000]

[007]

[041]

[306]

hashhash121267045128379452

412977

Figure 2-10 modulo-division Hashing

(17*121267+7) modulo 307=41

(17*045128+7) modulo

307=297

[297](17*379452+7) modulo 307=7

Page 40: Data Structures( 数据结构 ) Course 2:Searching

40西南财经大学天府学院

Hash AlgorithmHash Algorithm

Convert the alphanumeric key into a number by adding the American Standard Code for Information Interchange(ASCII) to accumulator.Rotate the bits in the address to maximize the distribution of the values.Take the absolutely value of the address and map it into the address range.

Page 41: Data Structures( 数据结构 ) Course 2:Searching

41西南财经大学天府学院

Hash AlgorithmHash Algorithm

test for negative address if (addr<0)

addr=absolute(addr) end if addr =addr modulo maxaddr return end Hash

algorithm Hash( val key <array >, val size <integer>, val maxAddr <integer>, ref addr <integer>)Looper = 0Addr = 0 Hash KeyLoop (Loop<size) if (key[looper] not space)

addr =addr+key[looper]rotate addr 12 bits right

end if End loop

This algorithm converts an alphanumeric key of size characters into an integral address.Pre Key is a key to be hashed. size is the number of characters in the key. MaxAddr is the maximum possible address for the list.Post addr contain the hashed address

Page 42: Data Structures( 数据结构 ) Course 2:Searching

42西南财经大学天府学院

2-4 collision resolution2-4 collision resolution

Except the direct and subtraction, none of the hashing methods are one-to-one mappingCollision not avoidThere are several methods for hashing collisions

Collision resolution

Open addressing

Linearprobe Quadratic

probe pseudorandom Key offset

Linked lists buckets

Figure 2-13 collision resolution methods

Page 43: Data Structures( 数据结构 ) Course 2:Searching

43西南财经大学天府学院

load factor

Clustering

•There must be some empty There must be some empty elements in a list:elements in a list:

load load factorfactor

= The number of filled elementsThe number of filled elements

The total number of elementsThe total number of elements<75%<75%

Several conceptsSeveral concepts•data to group within the list data to group within the list (unevenly across a hashed list).(unevenly across a hashed list).

•a high degree of clustering grows a high degree of clustering grows the number of probes to locate an the number of probes to locate an element and reduces the element and reduces the processing efficiency of the list. processing efficiency of the list. There are two:There are two:•Primary clustering : when data Primary clustering : when data cluster around a home address cluster around a home address •Secondary clustering:when data Secondary clustering:when data become grouped along a collision become grouped along a collision path throughout a listpath throughout a list

•Need to design hashing algorithms Need to design hashing algorithms to minimize clustering to minimize clustering

Page 44: Data Structures( 数据结构 ) Course 2:Searching

44西南财经大学天府学院

Open addressingOpen addressing

Resolves collisions in the prime area (contains all of the home addresses )

Linear probeQuadratic probeDouble hashing

PseudorandomKey offset

Page 45: Data Structures( 数据结构 ) Course 2:Searching

45西南财经大学天府学院

379452379452 Marry DoddMarry Dodd070918070918 Sarah TrappSarah Trapp

121267121267 Bryan DevauxBryan Devaux 166702166702 Harry eagleHarry eagle

378845378845 John CarverJohn Carver

… … … …160252160252 Tuan NgoTuan Ngo045128045128 Shouli FeldmanShouli Feldman

[000][001][002][003][004][005][006][007][008]

[305][306]

hashhash070918

166702

1

1

Figure 2-14 linear probe collision resolution

First insert:No collision

second insert:

collision Add 1

Linear ProbeLinear Probe

Page 46: Data Structures( 数据结构 ) Course 2:Searching

46西南财经大学天府学院

linear probelinear probe

Variation :Add 1, subtract 2,Add 3, subtract 4

Advantage: simple to implement.

Disadvantage: first, tend to produce primary clustering . Second, tend to make the search algorithm more complex

Page 47: Data Structures( 数据结构 ) Course 2:Searching

47西南财经大学天府学院

Quadratic probe Quadratic probe

To eliminate primary clustering

The increment is the collision probe number squared.first probe, add 12,second probe, add 22 ,… The new address is the modulo of the list size.Disadvantage :

1. the time required to square the probe number. 2. It is not possible to generate a new address for

every element in the list.

Page 48: Data Structures( 数据结构 ) Course 2:Searching

48西南财经大学天府学院

Pseudorandom collision resolutionPseudorandom collision resolution

A double hashing : the address is rehashedUses a pseudorandom number to resolve the collision Using the collision address as a factor in the random number calculation, such as:

New address = 3 * collision address + 5

Figure2-15 showing a collision resolving for figure 2-14

Page 49: Data Structures( 数据结构 ) Course 2:Searching

49西南财经大学天府学院

379452379452 Marry DoddMarry Dodd070918070918 Sarah TrappSarah Trapp

121267121267 Bryan DevauxBryan Devaux

378845378845 John CarverJohn Carver 166702166702 Harry eagleHarry eagle

… … … …160252160252 Tuan NgoTuan Ngo045128045128 Shouli FeldmanShouli Feldman

[000][001][002][003][004][005][006][007][008]

[305][306]

hashhash070918166702

1

1

Figure 2-15 pseudorandom collision resolution

First insert:No collision

second insert:

collision

Pseudorandom

Y = 3x+5

Pseudorandom probePseudorandom probe

Page 50: Data Structures( 数据结构 ) Course 2:Searching

50西南财经大学天府学院

Key offsetKey offset

Another double hashing Produces different collision paths for different keys key offset calculates the new address as (the simplest versions)

offset = offset = key/listsizekey/listsizeaddress = ((offset + old address) modulo listsize)address = ((offset + old address) modulo listsize)

Page 51: Data Structures( 数据结构 ) Course 2:Searching

51西南财经大学天府学院

offset = 166702 / 307 = 543address = ((543 + 001) modulo 307) = 237

Example: the key is 166702, list size is 307,using the Example: the key is 166702, list size is 307,using the modulo-division generate an address of 1modulo-division generate an address of 1This synonym of 070918 produces a collision at 1This synonym of 070918 produces a collision at 1Using key offset to calculate the next addressUsing key offset to calculate the next address

If 237 were also a collision, repeat the processIf 237 were also a collision, repeat the process

offset = 166702 / 307 = 543address = ((543 + 237) modulo 307) = 166

Page 52: Data Structures( 数据结构 ) Course 2:Searching

52西南财经大学天府学院

To really see the effect of key offset, we need to calculate several different keys ,all hashing to the same home address. Table 2-3 shows that three keys that collide at address 001, Next two collision probe addresses

Key28Key28 Home Home addressaddress

Key Key offsetoffset

Probe 1Probe 1 Probe 2Probe 2

166702166702 11 543543 237237 166166572556572556 11 18651865 024024 047047067234067234 11 219219 220220 132132

Table 2-3 key offsetNote: each key resolves its collision at a different address for both the first and second probes

Page 53: Data Structures( 数据结构 ) Course 2:Searching

53西南财经大学天府学院

Linked list resolution Linked list resolution

To eliminate the disadvantage of open addressing that each collision resolution increases the probability of future collisionsA linked list is an ordered collection of data in which each element contains the location of the next element

Page 54: Data Structures( 数据结构 ) Course 2:Searching

54西南财经大学天府学院

379452 Marry Dodd070918 Sarah Trapp

121267 Bryan Devaux

… …

160252 Tuan Ngo

045128 Shouli Feldman

[000]

[001][002]

[003]

[004]

[005][006]

[007]

[008]

[305]

[306]

166702 Harry eagleHarry eagle

572556 Chris Wallj

Figure 2-16 linked list collision resolution

pointer pointer

Page 55: Data Structures( 数据结构 ) Course 2:Searching

55西南财经大学天府学院

Linked list resolutionLinked list resolution

Linked list resolution uses a separate area to store collisions and chains all synonyms together in a linked listIt uses two storage areas, the prime area and the overflow areaEach element in the prime area contains an additional field, a link head pointerThe linked list data can be stored in any order, but the most common is key sequence

Page 56: Data Structures( 数据结构 ) Course 2:Searching

56西南财经大学天府学院

Bucket hashingBucket hashing

nodes that accommodate multiple data. occurrences, collision are postponed until the bucket is full

Bucket

0

379452 Marry Dodd

Bucket

1

070918 Sarah Trapp 166702 Harry eagle367173 Ann georgis

Bucket

2

121267 Bryan Devaux572556 Chris wallj

Bucket

307

045128 Shouli Feldman

[000]

[001]

[002]

[307]

Figure 2-17 bucked hashing

Linear probe

Places here

Page 57: Data Structures( 数据结构 ) Course 2:Searching

57西南财经大学天府学院

Two problems & combination Two problems & combination approachesapproaches

First : it uses significantly more space, many of the buckets will be (or partially) emptySecond: it does not completely resolve the collision problemResolving the collision is to use the linear probeThere are several approaches to resolving collisions ,often uses multiple stepsExample one large database hashes to a bucket, full, linear probe , linked list overflow area

Page 58: Data Structures( 数据结构 ) Course 2:Searching

58西南财经大学天府学院

summarysummary

Searching is the process of finding the location of a target among a list of objectsTwo basic searching methods for arrays: sequential and binary searchThe sequential search is normally used when a list is not sorted. It starts at the beginning of the list and searches until it finds the data or hits the end of the listOne of the variation of the sequential search is the sentinel search. In this method,the condition ending the search is reduced to only one by artificially inserting the target at the end of the listThe second variation of the sequential search is called the probability search. In this method, the list is ordered with the most probable elements at the beginning of the list and the least probable at the end

Page 59: Data Structures( 数据结构 ) Course 2:Searching

59西南财经大学天府学院

2-5 summary(continued)2-5 summary(continued)

The sequential search can also be used to search a sorted list, in this case, we can terminate the search when the target is less than the current elementIf an array is sorted, we can use a more efficient algorithm called the binary searchthe binary search algorithm searches the list by first checking the middle element. If the target is not in the middle element, the algorithm eliminates the upper half or the lower half of the list depending on the value of the middle element. The process continues until the target is found or reduced list length becomes zero The efficiency of a sequential search is O(n)The efficiency of a binary search is O(log2n)

Page 60: Data Structures( 数据结构 ) Course 2:Searching

60西南财经大学天府学院

summary(continued)summary(continued)

In a hashed search,the key through an algorithmic transformation,determines the location of the data. It is a key-to-address transformationThere are several hashing functions : we discussed direct, subtraction, modulo division, digit extraction, mid-square, folding, rotation , and pseudorandom generation

Page 61: Data Structures( 数据结构 ) Course 2:Searching

61西南财经大学天府学院

summary(continued)summary(continued)

In direct hashing,the key is the address without any algorithmic manipulation In subtraction hashing,the key is transformed to an address by subtracting a fixed number from itIn modulo-division hashing,the key is divided by the list size,recommended to be a prime numberIn digit-extraction hashing,selected digits are extracted from the key and used as an addressIn mid-square hashing,the key is squared and the address is selected from the middle of the resultIn fold shift hashing,the key is divided into parts whose sizes match the size of the required address.then the parts are added to obtain the address

Page 62: Data Structures( 数据结构 ) Course 2:Searching

62西南财经大学天府学院

summary(continued)summary(continued)

In fold boundary hashing,the key is divided into parts whose sizes match the size of the required address.then the left and right parts are reversed and added to the middle part to obtain the addressIn rotation hashing,the rightmost digit of the key is rotated to the left to determine an address. However,this method is usually used in combination with other methodsIn the pseudorandom generation hashing,the key is used as the seed to generate a pseudorandom number. The result is then scaled to obtain the addressExcept in the direct and subtraction methods, collisions are unavoidable in hashing. Collision occur when a new key is hashed to an address that is already occupied

Page 63: Data Structures( 数据结构 ) Course 2:Searching

63西南财经大学天府学院

summary(continued)summary(continued)

Clustering is the tendency of data to build up unevenly across a hashed list.

Primary clustering occur when data build up around a home addressSecondary clustering occurs when data build up along a collision path in the list

To solve a collision, a collision resolution method is usedThree general methods are used to resolve collision : open addressing,linked list,and buckets

The open addressing method can be subdivided into linear probe,quadratic probe,pseudorandom rehashing,and key-offset rehashing

Page 64: Data Structures( 数据结构 ) Course 2:Searching

64西南财经大学天府学院

summary(continued)summary(continued)

In the linear probe method,when the collision occurs,the new data will be stored in the next available address.In the quadratic method,the increment is the collision probe number squared.In the pseudorandom rehashing method, we use a random number generator to rehash the addressIn the key-offset rehashing method,we use an offset to rehash the address

Page 65: Data Structures( 数据结构 ) Course 2:Searching

65西南财经大学天府学院

summary(continued)summary(continued)

In the linked list technique,we use separate areas to store collision and chain all synonyms together in a linked listIn bucket hashing,we use a bucket that can accommodate multiple data occurrences

Page 66: Data Structures( 数据结构 ) Course 2:Searching

66西南财经大学天府学院

HomeworkHomework

Using the modulo-division method and linear probing, store the keys shown below in an array with 19 elements, How many collision occurred? The value of load factor of the list after all keys have been inserted?

224562,137456,214562,140145,214567,162145,144467,199645,234534Repeat above problem using the digit-extraction method (first, third and fifth digits) and quadratic probing.