23/05/20151 data structures random access files. 223/05/2015 learning objectives explain random...
TRANSCRIPT
18/04/2318/04/23 11
Data Structures
Random Access FilesRandom Access Files
2218/04/2318/04/23
Learning ObjectivesLearning Objectives
Explain Random Access Searches.
Explain the purpose and operation of Hashing Algorithms.
3318/04/2318/04/23
Access Methods to DataAccess Methods to Data
Computers can store large volumes of Computers can store large volumes of data. data.
The difficulty is to be able to get it back. The difficulty is to be able to get it back.
In order to be able to retrieve data it must In order to be able to retrieve data it must be stored in some sort of order. be stored in some sort of order.
There are a number of ways of arranging There are a number of ways of arranging the data that will aid access under different the data that will aid access under different circumstances.circumstances.
Random Access
Is the ability to find (jump to) a file, program or specific data immediately without having to go through other files or data first (sequential access).
Think of the difference between finding and playing a song/track/movie on an old cassette or video tape versus a CD, DVD or mp3 player.
5518/04/2318/04/23
Random Access FileRandom Access File
Data is stored in no particular order.Data is stored in no particular order.
A “hashing algorithm” is performed on the A “hashing algorithm” is performed on the key field of the record to be stored or key field of the record to be stored or retrieved.retrieved. This results in a number This results in a number (called the hashed (called the hashed
location)location) which is used as the address to store which is used as the address to store or retrieve the record. or retrieve the record.
How this is done will be explained next.How this is done will be explained next.
6618/04/2318/04/23
Hashing using Modular ArithmeticHashing using Modular Arithmetic
Maximum: 100 items of data – a four-digit Maximum: 100 items of data – a four-digit key: key: (1537/100 = 15, (1537/100 = 15, remainder 37remainder 37))
1537 will be stored at location 1537 will be stored at location 3737Same key for approximately 200 items of Same key for approximately 200 items of data:data: ((3737 * 3 = * 3 = 111111))
1537 will be stored at the hashed location 1537 will be stored at the hashed location 111111
7704/18/2304/18/23
Hashing using FoldingHashing using Folding
The number 8473772 could be split into The number 8473772 could be split into 847 and 377. 847 and 377. If you add them together you get: 1224. If you add them together you get: 1224.
For a maximum of 100 items of data, you would For a maximum of 100 items of data, you would take the take the last two digits: 24last two digits: 24
847377 will be stored at location 847377 will be stored at location 2424 Same number for approximately 200 items of Same number for approximately 200 items of
data:data:((2424 * 3 = * 3 = 7272))
847377 will be stored at the hashed location 847377 will be stored at the hashed location 7272
8804/18/2304/18/23
Clashes / CollisionsClashes / Collisions
Some ID numbers will clash to the same Some ID numbers will clash to the same address.address. 15371537
1537 / 100 = … remainder 371537 / 100 = … remainder 371837 / 100 = … remainder 371837 / 100 = … remainder 37
9904/18/2304/18/23
Overcoming the problem of Overcoming the problem of clashes / collisions:clashes / collisions:
101004/18/2304/18/23
1. Search serially1. Search serially
Search serially from the hashed location Search serially from the hashed location until an empty location is found. until an empty location is found. Then insert the clashed record into this Then insert the clashed record into this empty location.empty location.
111104/18/2304/18/23
Hashed location
………..………..………..………..………..………..………..
Next free locationClashing
record inserted.
Memory
Search for next
free location.
Note: When trying to find the clashing record Note: When trying to find the clashing record again its location is unknown again its location is unknown (the computer only (the computer only knows that it is somewhere after the hashed location).knows that it is somewhere after the hashed location).
121204/18/2304/18/23
2. Memory bucket / Overflow Area2. Memory bucket / Overflow Area
Reserve an “overflow area” of memory or Reserve an “overflow area” of memory or “memory bucket” to place duplicates in “memory bucket” to place duplicates in serial form (one after the other).serial form (one after the other).Create a pointer to this “memory bucket” Create a pointer to this “memory bucket” or “overflow area” from the hashed or “overflow area” from the hashed location.location.
Hashed location
………..………..………..………..………..………..………..
Clashing record inserted serially (one after the other) at the next free location in
the “memory bucket”.
Pointer from hashed
location to the
“mem
ory bucket”.
Memory
Note: When trying to find the clashing record again, its exact location is Note: When trying to find the clashing record again, its exact location is unknown unknown (the computer only knows that it is in the “memory bucket” somewhere)(the computer only knows that it is in the “memory bucket” somewhere)..
Memory Bucket or
Overflow Area
141404/18/2304/18/23
3. Linked List3. Linked List
Use the hashed location as start of linked list, Use the hashed location as start of linked list, search serially through the memory from this search serially through the memory from this hashed location for the next free location and hashed location for the next free location and store the clashed record there.store the clashed record there.Add a pointer to the hashed location to the new Add a pointer to the hashed location to the new location used above.location used above.Create a null pointer in the new location used Create a null pointer in the new location used above to signify the end of the list.above to signify the end of the list.Subsequent clashes will simply extend this Subsequent clashes will simply extend this linked list.linked list.
151504/18/2304/18/23
Hashed location
………..………..………..………..………..………..………..
Next free location / Null Pointer (XX)
Clashing record inserted.
Memory
Search for next
free location.
Pointer to clashing
record.
Note: Note: 1.1. Subsequent clashes will simply extend this Subsequent clashes will simply extend this
linked list.linked list.2.2. When trying to find the clashing record again When trying to find the clashing record again
its exact location is known using this method.its exact location is known using this method.
161604/18/2304/18/23
In Summary:In Summary:
Records in a random access file are accessed Records in a random access file are accessed using a hashing algorithmusing a hashing algorithm by: by: Reading the key field.Reading the key field. Applying a hashing algorithm to the key field Applying a hashing algorithm to the key field
to give the address of the data.to give the address of the data. Looking for data at that address (whilst been Looking for data at that address (whilst been
aware of problems caused by clashes).aware of problems caused by clashes).
171704/18/2304/18/23
PlenaryPlenary
Explain Random Access Searches.
181804/18/2304/18/23
PlenaryPlenary
Random Access The data being searched for is used to The data being searched for is used to
give the address of where it is stored. give the address of where it is stored.
191904/18/2304/18/23
PlenaryPlenary
What is the purpose and operation of Hashing Algorithms? Allow data being searched for in a Allow data being searched for in a
random access file to be used to give random access file to be used to give the address of where it is stored.the address of where it is stored.
This is done by carrying out some This is done by carrying out some arithmetic on the data that is being arithmetic on the data that is being searched for.searched for.