Download - Extendible Hashing Report
Extendible Hashing Vinayak Hegde Nandikal
INTRODUCTION
A File structure is a combination of representations for data in files and of operations for
accessing the data. A File structure application allows us to read, write, and modify data. It might
also support finding the data that matches some search criteria or reading through the data in
some particular order .An improvement in file structure design may make an application
hundreds of times faster. The details of representation of data and the implementation of the
operations determine the efficiency of the file structure for particular application.
The fundamental operation of file systems: open, create, close, read, write, and seek.
Each of these operations involves the creation or use of a link between a physical file stored on a
secondary device and a logical file that represents a program’s more abstract view of the same
file. When the program describes an operation using the logical file name, the equivalent
physical operation gets performed on the corresponding physical file.
Disks are very slow compared to memory. On the other hand, disks provide enormous
capacity at much less cost than memory. They also keep the information stored on them when
they are turned off. The tension between a disk’s relatively slow access time and its enormous,
nonvolatile capacity is the driving force behind file structure design. Good file structure design
will give us access to all the capacity without making our applications spend a lot of time waiting
for the disk. A tremendous variety in the types of data and in the needs of applications makes file
structure design very important.
The problems that researchers struggle will reflect the same issues that one confronts in
addressing any substantial file design problem. Working through the approaches to major file
design issues shows one a lot about how to approach new design problems. Goals of research
and development in file structures are:
Get the information with one access to the disk.
Structures that allow to find the target information with as few accesses as possible.
File structures to group information so that to get everything we need with only one trip
to the disk.
1Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
SECTION 1
REQUIREMENTS SPECIFICATIONS
Requirements for Part 1:
In part 1, we are required to create a student record file. The record consists of the
following fields:
1. University Serial Number
2. Name
3. Address
4. Semester
5. Branch
There should be methods to initialize and assign a record. Also, we should be able to add
a new record, delete a record and modify a record. The number of files is fixed, but the lengths
of the fields are variable.
Requirements for Part 2:
In the second part, we need to develop a hashed index for the student record file
developed in Part 1. The key for the index is the student USN (University Serial Number). We
need to hash the keys and then store the key- reference pairs for further access. Once we develop
a hashed index, this index is used for the retrieval of records.
We need to provide the following functionalities:
1. Add a record.
2. Delete a record.
3. Modify a record.
Also, we need to demonstrate the doubling of the directory size and the space utilization
of the buckets.
2Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
Hardware Requirements:
PROCESSOR : Pentium Processor
PRIMARY MEMORY : 64 MB and above.
SECONDARY MEMORY : 1 GB and above.
Software Requirements:
PLATFORM: Microsoft Windows
COMPILER: Turbo C++
LANGUAGE USED: Oops with C++
External Interfaces
User interface GUI (Graphical User Interface) is provided.
3Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
SECTION 2
INTRODUCTION TO FILE STRUCTURES
Different Types Access Methods
Different types of access methods in file structure are:
Indexing
Co sequential processing model
AVL trees
B-trees
B+ trees
Hashing
Indexing:
Indexing is a way of structuring a file so that records can be found by key. This is an
Alternative to sorting. Unlike sorting, indexing permits us to perform binary searches for keys in
variable-length record files. If the index can be held in memory, record addition, deletion, and
retrieval can be done much more quickly with an indexed, entry-sequenced file than with a
sorted file. Indexes can do much more than merely improve on access time: they can provide us
with new capabilities that are inconceivable with access methods based on sorted data records.
The most exciting new capability involves the use of multiple secondary indexes.
Co sequential processing model:
The Co sequential processing model can be applied to problems that involve operations
such as matching and merging (and combination of these) on two or more sorted input files. In
its most complete form, the model depends on certain assumptions about the data in the input
files. Given these assumptions, we can describe the processing components of the model and
define pure virtual functions that represent those components.
Co sequential Operations involve the coordinated processing of two or more sequential
lists to produce a single output list. Sometimes the processing results in a merging, or union, of
the items in the input lists; sometimes the goal is a matching or intersection, of the items in the
4Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
lists; and other times the operation is a combination of matching and merging. These kinds of
operations on sequential lists are the basis of a great deal of file processing.
AVL trees:
It is a self-adjusting binary tree structure. AVL is a height-balanced tree, the allowed
difference between the heights of any two sub trees is one.
The important feature of an AVL tree is:
By setting a maximum allowable difference in the height of any two sub trees,
AVL trees guarantee a minimum level of performance in searching.
B-trees:
B-trees are multilevel indexes that solve the problem of linear cost of insertion and
deletion. This is what makes B-trees so good, and why they are now the standard way to
represent indexes. The solution is twofold. First, don’t require that the index records be full.
Second, don’t shift the overflow record into two records, each half full. Deletion takes similar
strategy of merging two records into a single record when necessary.
B+ trees:
The disadvantage of B-tree is that file could not be accessed sequentially with efficiency.
Adding a linked list structure at the bottom level of B-tree solved this problem. The combination
of B-tree and sequential linked list gave rise to B+ trees.
Hashing:
It is a good way of retrieval of records in one access for files that do not change greatly
with time but it does not work will with volatile, dynamic files. A hash function is like a black
box that produces an address every time a key is dropped. Hashing is like indexing in that it
involves associating a key with a relative record address.
5Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
SECTION 3
WHY C++?
Object-oriented toolkit:
Making file structures usable in application development requires tuning this conceptual
toolkit into application programming interfaces-collection of data types and operations that can
be used in application. We have chosen to employ object oriented approach in which data types
and operators are presented in a unified fashion as class definitions.
C++ is used in design of a file structure. C++ is an object oriented programming
language. Objected oriented programming supports the integration of data contents and behavior
into a single design. C++ class definition contains both data and function members and allow
programmers to control precisely the manipulation of objects. These classes are also an extensive
presentation of the features of C++. These features include:
Class Definition
Constructors
Public and private sections
Operator overloading
And the above features enhance the programmer’s ability to control the behaviors of objects.
6Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
SECTION 4
PROJECT PART I
Problem Definition:
Design a class called student. Each object of this class represents information about a
single student. Members should be included for student USN (University Serial Number), Name,
Address, Semester, Branch, etc. Methods should be included for initialization, assignment &
modification values. Provide methods to read the member values to the output stream suitably
formatted. Add methods to store objects as records into the files and load the objects from the
file using buffering, design a suitable IOBuffer class hierarchy. Add pack and Unpack methods
to class student. For all the mini projects assume a fixed-filed, variable-length record with
delimiter record structure for the data file.
Specification And Design:
The part 1 of the project deals with creating a student record file. The record consists of
the following fields as data members.
1. University Serial Number.---->USN
2. Name ---->name
3. Address ---->addr
4. Branch ---->brch
5. Semester. ---->sem
We have provided the following member functions for the operations on the file.
1. Creating a record ---->insert()
2. Assigning a record. ---->assign()
3. Searching a record ---->search()
4. Deleting a record. ---->delet()
5. Modifying a record. ---->modify()
6. Displaying a record ---->display()
7Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
insert() function is used to insert the record of one student at a time.
USN Yes
No
assign() function is used to assign the default value to the data members. Here we
assigned NULL value for all data members as a default value.
search () function is used to search for a record based on key value ( USN ).
USNNo
Yes
Mismatch
Match
8Dept of ISE 2007-08
Accept USN
USN Duplicat
e?
StopStart
Accept Data
Store the data in data.dat
Accept USN
Stop
StartRead a record from
Data.dat and unpack the USN
Compare the USN with the key entered
Display the record
EOF?
Extendible Hashing Vinayak Hegde Nandikal
delet() function is used to delete a student’s record based on the key value.
USNNo
Yes
Mismatch
Match
modify() function is used to modify the record based on key field entered.
USNNo
Yes
Mismatch
Match
display() function is used to display the records in the file.
USN Yes
No and USN ispresent
9Dept of ISE 2007-08
Stop
Compare the USN with the key entered
Place * at the beginning of record indicates deleted record
EOF?
Accept USN
Start From Data.dat and unpack the USN
Accept USN
StartFrom Data.dat and unpack
the USN
Compare the USN with the key entered
Accept new values from user
Stop
Store the newly accepted data in disk
EOF?
Accept USN
EOF?
StopStart
Read and display the
record
Extendible Hashing Vinayak Hegde Nandikal
Algorithm for Part 1:
The steps of insertion are as follows:
Accept the USN from the user
Check for duplication, for the duplicate display error, else continues.
Accept the data from user and check for constraints.
By making use of pack() function, pack the data and put it on the buffer.
By making use of write() function, write the packed data from buffer to disk.
The steps of searching are as follows:
Accept the USN from user.
By making use of read() function, read the records from the disk to buffer.
By making use of unpack() function, unpack only the key, compare it with the key user
has entered. If it matches unpack whole record and display it.
If the match does not occur, go to next record until end of file.
The steps of deletion are as follows:
Accept the key value from the user.
Read the record to the buffer using read().
Unpack the USN from buffer to RAM and compare the USN with key entered
If it matches, use tombstones to indicate record has been deleted.
If it does nit matches, go to next record till end of file.
The steps of modification are as follows:
Accept the key value from the user.
Read the record to the buffer using read().
Unpack the key field from buffer to RAM and compare the with key entered
If it matches,
Accept the new value from the user.
Write the packed data from the buffer to the disk.
If the key doesn’t matches check for next record, repeat until eof, then display error
message.
10Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
Setps for deleting a record is as follows :
Read the first set of records from the disk to the buffer.
Unpack the records in buffer and put it on to the RAM.
Display the record and repeat until the end of file.
We have provided the following buffer operations.
read()-from file to buffer
write()-from buffer to file
pack()-from RAM to buffer
unpack()-from buffer to RAM
Pack() write()
Unpack() read()
Figure: Pack(), unpack(), read() and write() operation
Analysis And Design of Buffer Hierarchy:
The read and write file operations need a buffer, which is developed using a hierarchy of
classes. The highest class in the hierarchy is the class IOBuffer. Since we know the number of
fields and since the lengths of the fields are variable, we use the Delimited Text Buffer class.
Here, we write the length of the record first and then the record itself. The fields are separated
using a delimiter. There are methods that pack the fields into the buffer and there are methods
that unpack the fields from the buffer. The access to the records of the file is sequential. We
also provide for addition of records and deletion of records. The fields of records can be
assigned a specific value and records can also be modified. In general we have the following
hierarchy:
IO BUFFER
11Dept of ISE 2007-08
RAM BUFFERSTORAGE DEVICE
Extendible Hashing Vinayak Hegde Nandikal
VARIABLE LENGTH BUFFER and FIXED LENGTH BUFFER
DELIMITED FIELD BUFFER, LENGTH FIELD BUFFER and
FIXED FIELD BUFFER
The hierarchy is shown in the diagram.
Figure: Buffer Class Hierarchy
The field packing and unpacking operations, in their various forms, can be Encapsulated
into C++ classes. The three different field representation strategies are Delimited, length-based
and fixed length is implemented in different classes. Class IO BUFFER does not include any
implementation methods. It is an abstract Class and hence object of it can be declared. All the
necessary read, write pack and unpack operation is provided in classes down the hierarchy.
Inheritance allows related classes to share members. We use this powerful Mechanism
provided by C ++ to buffering. Object-Oriented design of classes Guarantees that operations on
objects are performed correctly.
12Dept of ISE 2007-08
IOBUFFERChar array of
Buffer
FIXED LENGTH BUFFER
Read and Write operations
DELIMITED FIELD BUFFERPack and Unpack operations
LENGTH FIELD BUFFER
Pack and Unpack
FIXED FIELD BUFFER
Pack and Unpack
VARIABLE LENGTH BUFFER
Read and Write operations
Extendible Hashing Vinayak Hegde Nandikal
SECTION 5
PROJECT PART II
Problem Definition:
Develop a hashed index of the student record file with the USN as the key. Write a driver
program to create a hashed file from an existing student record file. Demonstrate the recursive
collapse of directory over more than one level.
1. Demonstrate doubling of the directory size
2. Display the space utilization for buckets and directory size.
Specification And Design:
The second part of the project deals with providing O (1) access to the records of the file.
For this, we need to develop an index to the file. The USN is used as the key. To provide O (1)
access we need to hash the index. There are two approaches to hashing.
1. Static hashing
2. Dynamic hashing.
Static hashing is very good for the files, which do not undergo any changes frequently.
But real time files change frequently and the performance of static hashing deteriorates.
Dynamic hashing copes with this problem. In this approach, we hash the key and use
only a part of the hashed address. This approach is called “ Use more as we need more”
approach.
We also use what are called “BUCKETS”. Buckets are nothing but containers of key
reference pairs. All the keys in a bucket have same starting address. Once the bucket is full, we
split the bucket into two and distribute the keys among the buckets. To keep track of the
13Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
buckets, we develop another structure, a DIRECTORY. A directory maintains an array of the
bucket locations.
Thus, we hash a key and get a part of the hashed address depending on the population of
the records. Then we use this part of the hashed address as an index into the array of buckets and
find its location. We then directly seek to that location and get the record.
The main design issue here is whether we provide a static hashing that uses a prespecified
size of address space or a dynamic hashing. The dynamic hashing is very useful for files that
change frequently.
We have decided to implement extendible hashing, which uses a part of the hashed
address depending on the size of the file. This is called the use-more-as-U-need-more approach.
We do not hash the data file itself. Instead, we only hash the index. The index consists of key-
record address pairs.
Buckets are used to resolve collision problem. Here one address can hold more than one
record or index entry. We also use Directories to keep track of the buckets. The bucket consists
of key-reference pairs. This means that the buffer class that needs to be used is fixed length
buffer. We keep the addresses of the buckets in memory using arrays.
Buckets are filled with key-reference pairs as and when the data records are inserted.
When a bucket gets filled, the bucket is split into two and the records are redistributed. This
means that we are using more of the hashed address as and when the file size increases. Also,
we keep track of deletions. A deletion may trigger the collapse of the directory, as less number
of buckets will be needed. Thus the hashing technique becomes truly dynamic.
Structure of the ProjectThe project is basically required to do any operation based on hashing the primary key
USN. Hence it all begins by hashing the key into a valid address. The address points to the
directory entry. The directory consists of address to buckets. The bucket in tUSN contains the
address to the address to the record in the STUDENT.DAT file.
14Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
The below diagram shows what our project does. The general steps are:
A given key is hashed to a directory address.
The directory cell contains the address for the bucket.
Bucket contains the address of the record in student file.
Figure: Structure of hashed index.
KEY
Creating the addresses:
MakeAddress function extracts a portion of the full hashed address. This function is also
used to reverse the order of the bits in the hashed address, making the lowest–order bit of the
hash address the highest-order bit of the value used in extendible hashing because least
significant integer values tends to have more variation than the high-order bits.
Hash function: retUSNs an integer hash value for key for a 15-bit.
Splitting in Buckets:
Method SPLIT of class Bucket divides keys between an existing bucket and a new
bucket. If necessary, it doubles the size of the directory to accommodate the new bucket.
15Dept of ISE 2007-08
HASH
D
I
R
E
C
T
O
R
Y
BUCKETS
BUCKETS
BUCKETS
STUDENT FILE
Extendible Hashing Vinayak Hegde Nandikal
Directory and Bucket Operations:
The INSERT method first searches for the key. SEARCH arranges for the CurrentBucket
member to contain the proper bucket for the key. The FIND method determines where the key
would be if it were in the structure.
Method DoubleSize() and InsertBucket():
The Insert method manages record addition. If the key is already exists, Insert retUSNs
immediately. If the key does not exist, Insert calls Bucket::Insert, for the bucket into which the
key is to be added. If the bucket is full, Bucket::Insert calls Split to handle the task of splitting
the bucket. If the directory needs to be larger, Split calls method Directory::DoubleSize to double
the directory size.
Finding Buddy Buckets:
The method works by checking to see whether it is possible for there to be a buddy
bucket. The next test compares the number of bits used by the bucket with the number of bits
used in the directory address space. A pair of buddy buckets is a set of buckets that are
immediately descendents of the same node in the tries. This method retUSNs a buddy bucket or -
1 if none found.
Collapsing the Directory:
Method Directory::Collapse begins by making sure that we are not at the lower limit of
directory size. By treating the special case of a directory with a single cell here, at the start of the
function, we simplify subsequent processing: with the exception of this case, all directory sizes
are evenly divisible by 2. The test to see if the directory can be collapsed consists of examining
each pair of directory cells to see if they point to different buckets. As soon as we find such a
pair, we know we cannot collapse the directory and method retUSNs
Deletion operations:
16Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
We first find the key to be deleted. IF we cannot find it, return failure; if it found call
Bucket::Remove to remove the key from the bucket. Return the value reported back from the
method.
Space utilization:
It is defined as the ratio of actual number of records to the total number of records that
could be stored in allocated space. Expectation of average utilization of 69 %. Space utilization
can be calculated using the formula:
Utilization= (r / b*N)
Where, r is number of records
b is block size, and
N is average number of blocks
Source Code
17Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
int MakeAddress (char *key, int depth)
{
int retval = 0;
int mask = 1;
int hashVal = Hash(key);
for ( int j = 0; j < depth; j++)
{
retval = retval << 1;
int lowbit = hashVal & mask;
retval = retval | lowbit;
hashVal = hashVal >> 1;
}
retUSN retval;
}
int Hash (char * key)
{
int sum = 0;
int len = strlen(key);
if (len % 2 == 1) len++; // make len even
for(int j=0; j < len; j+=2)
sum = ( sum +100 * key[j] + key[j+1]) %19937; retUSN sum;
}
Class Bucket: public Text Index
{
protected:
Bucket (Directory & dir, int maxKeys = defaultMaxKeys);
int Insert (char * key, int recAddr);
int Remove (char * key);
Bucket * Split ();
int NewRange (int & newStart, int & newEnd);
int Redistribute (Bucket & newBucket);
int FindBuddy ();// find the bucket that is the buddy of this
18Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
int TryCombine (); // attempt to combine buckets
int Combine (Bucket * buddy, int buddyIndex);
int Depth;
int BucketAddr;
ostream & Print (ostream &);
friend class Directory;
friend class BucketBuffer;
};
class BucketBuffer: public TextIndexBuffer
{
public:
BucketBuffer (int keySize, int maxKeys);
int Pack (const Bucket & bucket);
int Unpack (Bucket & bucket);
};
class Directory
{
public:
Directory (int maxBucketKeys = -1);
Directory ();
int Open (char * name);
int Create (char * name);
int Close ();
int Insert (char * key, int recAddr);
int Remove (char * key);
int Search (char * key); // retUSN RecAddr for key
int ReSize (void);
int Reduction (void);
void spaceutil(char * myfile);
ostream & Print (ostream & stream);
protected:
int Depth;
19Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
int NumCells;
int * BucketAddr;
int DoubleSize ();
int Collapse ();
int InsertBucket (int bucketAddr, int first, int last);
int RemoveBucket (int bucketIndex, int depth);
int Find (char * key);
int StoreBucket (Bucket * bucket);
int LoadBucket (Bucket * bucket, int bucketAddr);
int MaxBucketKeys;
BufferFile * DirectoryFile;
LengthFieldBuffer * DirectoryBuffer;
Bucket * CurrentBucket;
BucketBuffer * theBucketBuffer;// buffer for buckets
BufferFile * BucketFile;
int Pack () const;
int Unpack ();
Bucket * PrintBucket; friend class Bucket;
};
int Directory::Insert (char * key, int recAddr)
{
int found = Search (key);
if (found == -1) retUSN CurrentBucket->Insert(key, recAddr);
retUSN 0;// key already in directory
}
int Directory::Search (char * key)
{
int bucketAddr = Find(key);
LoadBucket (CurrentBucket, bucketAddr);
retUSN CurrentBucket->Search(key);
}
Bucket * Bucket::Split ()
20Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
{
int newStart, newEnd;
if (Depth == Dir.Depth || Dir.NumCells==1)
{
doublesizetrue=1;
Dir.DoubleSize();
}
Bucket * newBucket = new Bucket (Dir, MaxKeys);
Dir.StoreBucket (newBucket);
NewRange (newStart, newEnd);
Dir.InsertBucket(newBucket->BucketAddr, newStart, newEnd);
Depth ++;
newBucket->Depth = Depth;
Redistribute (*newBucket);
Dir.StoreBucket (this);
Dir.StoreBucket (newBucket);
retUSN newBucket;
}
int Directory:: DoubleSize ()
{
int newSize = 2 * NumCells;
Int *newBucketAddr = new int[newSize];
for(int i=0;i<NumCells;i++)
{
newBucketAddr[2*i] = BucketAddr[i];
newBucketAddr[2*i+1] = BucketAddr[i];
}
delete BucketAddr;
BucketAddr = newBucketAddr;
Depth++;
NumCells = newSize;
retUSN 1;
21Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
}
int Bucket::FindBuddy ()
{
if (Dir.Depth == 0) retUSN -1;
if (Depth < Dir.Depth) retUSN -1;
int sharedAddress = MakeAddress(Keys[0], Depth);
retUSN sharedAddress ^ 1;
}
int Directory :: Collapse()
{
if (Depth == 0) retUSN 0;
for (int i=0;i<NumCells;i+=2)
if(BucketAddr[i] != BucketAddr[i+1])
retUSN 0;
int newSize = NumCells / 2;
int * newAddrs = new int [newSize];
for(int j =0; j<newSize;j++)
newAddrs[j] = BucketAddr[j*2];
delete BucketAddr;
BucketAddr = newAddrs;
Depth --;
collapsetrue=1;
NumCells = newSize;
retUSN 1;
}
int Bucket::TryCombine ()
{
int result;
int buddyIndex = FindBuddy ();
if (buddyIndex == -1) retUSN 0;
int buddyAddr = Dir.BucketAddr[buddyIndex];
Bucket * buddyBucket = new Bucket (Dir, MaxKeys);
22Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
Dir . LoadBucket (buddyBucket, buddyAddr);
if (NumKeys + buddyBucket->NumKeys > MaxKeys) retUSN 0;
Combine (buddyBucket, buddyIndex);
result = Dir.Collapse ();
if (result) TryCombine();
retUSN 1;
}
int Bucket::Remove (char * key)
{
int result = TextIndex::Remove (key);
if (!result) return 0;
TryCombine ();
Dir.StoreBucket(this);
return 1;
}
int Directory::Remove (char * key)
{
int bucketAddr = Find(key);
LoadBucket (CurrentBucket, bucketAddr);
return CurrentBucket -> Remove (key);
}
void Directory::spaceutil(char * myfile)
{
fstream file(myfile,ios::in);
float numrecs=0,util;
char ch;
while(1)
{
file>>ch;
if(file.fail())
break;
else if(ch=='#')
23Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
numrecs++;
}
file.close();
int cnt=1;
for(int i=0;i<NumCells-1;i++)//counts number of buckets
{
if(BucketAddr[i+1]==BucketAddr[i])
continue;
cnt++;
}
util=(numrecs/(cnt*4))*100;//utilization=r/(bN)
cout<<"\nRECORDS IN THE FILE = "<<numrecs<<"\n";
cout<<"\n\nBUCKETS USED BY THE RECORDS = "<<cnt++;
cout<<"\n\n\nDIRECTORY SIZE IS = "<<NumCells;
cout<<"\n\n\nUTILIZATION OF SPACE = "<<util<<"%\n\n";
//for directory
float x;
x=pow(numrecs,1.25);
x=x*0.98;
cout<<"\nUTILIZATION 0F SPACE BY THE DIRECTORY = "<<x<<"bytes";
}
void Insert(char *myfile)
{
Student s;
char str[30];
setcolor(BLACK);
settextstyle(2,0,5);
outtextxy(230,100,"ENTER USN NUMBER :");
strget(420,100,s.Usn,10);
strupr(s.Usn);
int res = Dir.Search(s.Usn);
24Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
if(res!=-1)
{
outtextxy(400,400,"This reg-no already exists!!!");
outtextxy(400,410,"Press Any Key....");
getch();
return;
}
if((strcmp(s.Usn,NULL)==0))
{
outtextxy(400,400,"Enter a Valid Key!!!\a");
getch();
return;
}
if(!isdigit(s.Usn[0])||!isalpha(s.Usn[1])||!isalpha(s.Usn[2])||!isdigit(s.Usn[3])||!
isdigit(s.Usn[4])||!isalpha(s.Usn[5])||!isalpha(s.Usn[6])||!isdigit(s.Usn[7])||!isdigit(s.Usn[8])||!
isdigit(s.Usn[9]))
{
outtextxy(400,400,"Enter a Valid Key!!!\a");
getch();
return;
}
outtextxy(230,120,"ENTER NAME :");
strget(420,120,s.Name,20);
strupr(s.Name);
int re = Dir.Search(s.Name);
if(re!=-1)
{
outtextxy(400,220,"Name Duplication..!!!");
getch();
}
if((strcmp(s.Name,NULL)==0))
25Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
{
outtextxy(400,400,"Enter a Valid NAME!!!\a");
getch();
}
if(!isalpha(s.Name))
{
outtextxy(400,220,"Name Contains other than alpha charector!!");
outtextxy(400,240,"Re-enter NAME");//
getch();
goto NAME;
}
outtextxy(230,140,"ENTER ADDRESS :");
strget(420,140,s.Address,30);
strupr(s.Address);
outtextxy(230,160,"ENTER SEMESTER :");
strget(420,160,s.Semester,2);strupr(s.Semester);
if(atoi(s.Semester)>8)
{
outtextxy(400,400,"Invalid Semester!!!\a");
getch();
return;
}
outtextxy(230,180,"ENTER BRANCH :");
strget(420,180,s.Branch,5);strupr(s.Branch);
int flag=0;
for(int i=0;i<16;i++)
if(strcmp(s.Branch,s.Brlist[i])==0)
{
flag=1;
break;
}
if(flag==0)
26Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
{
outtextxy(400,400,"InValid Branch!!!\a");
getch();
return;
}
outtextxy(230,200,"ENTER COLLEGE :");
strget(420,200,s.College,10);
strupr(s.College);
int recaddr=s.Append(myfile);
Dir.Insert(s.Usn,recaddr);
outtextxy(400,400,"Record Successfully Appended.");
getch();
if(doublesizetrue)
{
closegraph();
clrscr();
cprintf("The Directory Has Doubled");
doublesizetrue=0;
Dir.Print(cout);
}
}
void deleterecord (char *myfile)
{
Student s;
strupr(s.Usn);
settextstyle(2,0,5);
outtextxy(50,50,"ENTER USN NUMBER : ");
strget(200,50,s.Usn,10);
strupr(s.Usn);
int addr=Dir.Search(s.Usn);
if(addr==-1)
27Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
{
outtextxy(300,300,"THE RECORD DOES NOT EXIST");
getch();
return;
}
fstream ofile(myfile,ios::in|ios::out);
ofile.seekp(addr,ios::beg);
ofile.write("*",1);
ofile.close();
Dir.Remove(s.Usn);
outtextxy(200,400,"THE RECORD IS DELETED SUCCESSFULLY");
compaction();
getch();
}
void display(char *myfile)
{
Student s;
setcolor(BLACK);
settextstyle(2,0,5);
outtextxy(50,50,"ENTER USN NUMBER : ");
strget(200,50,s.Usn,10);
strupr(s.Usn);
int addr;
if((addr = Dir.Search(s.Usn))==-1)
{
outtextxy(300,300,"Record not found!");
outtextxy(300,320,"Press Any Key..");
getch();
return;
}
DelimFieldBuffer :: SetDefaultDelim('|');
DelimFieldBuffer Buff;
28Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
fstream file(myfile,ios::in);
Buff.DRead(file,addr);
s.Unpack(Buff);
char str[100];
sprintf(str,"USN NO : %s",s.Usn);
outtextxy(100,100,str);
sprintf(str,"NAME : %s",s.Name);
outtextxy(100,120,str);
sprintf(str,"ADDRESS : %s",s.Address);
outtextxy(100,140,str);
sprintf(str,"SEMESTER : %s",s.Semester);
outtextxy(100,160,str);
sprintf(str,"BRANCH : %s",s.Branch);
outtextxy(100,180,str);
sprintf(str,"COLLEGE : %s",s.College);
outtextxy(100,200,str);
file.close();
}
SECTION 6
GUI DESIGN
29Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
SECTION 7
SNAPSHOTS
30Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
MAIN MENU
31Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
RECORD INSERTION
DISPLAYING ALL RECORD
32Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
RECODR MODIFICATION
DISPLAYING A RECORD
33Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
SPACE UTILIZATION
DIRECTORY DISPLAY
34Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
SECTION 8
CONCLUSION AND FUTURE ENHANCEMENTS
Conclusion:
Hashing is a way of structuring a file so that records can be found by applying a hash
function that transforms a key into address. This address is then used as the basis for insertion
and retrieval of records. Here more than one record can be hashed to the same address, this
phenomenon is called collision. The extendible hashing provides O(1) performance since there is
no overflow. These access time values are truly independent of the size of the file.
Future Enhancement:
Instead of the given STUDENT class, the project can be made to handle a generic class
that accepts a class name as a parameter and used for different applications. Another class called
BUFFERFILE can be included given that it contains a handle to the base class of the buffer class
35Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
hierarchy i.e., IOBUFFER and handle to the file for simultaneous manipulation of buffer and file
to support more pure form of OBJECT OREIENTATION.
Some of the possible improvements and new features that can be included are:
Improved User Interface with commercial level enhancements.
Support for remote administration of the system.
Support for simultaneous access and modification of the student file from different
systems.
Improved free space management for data files.
Implementation of other addressing techniques in addition to the present hashing
technique to analyze performance issues.
BIBLIOGRAPHY
TITLE AUTHOR
1) FILE STRUCTURES MICHAEL J. FOLK AN OBJECT ORIENTED BILL ZOELLICK APPROACH WITH C ++ GREG RICCARDI
2) LET US C++ YASHVANTH KANETKAR
3) THE COMPLETE REFERENCE C++ HERBERT SCHILD
36Dept of ISE 2007-08
Extendible Hashing Vinayak Hegde Nandikal
37Dept of ISE 2007-08