1 chap 7. indexing. 2 chapter objectives(1) introduce concepts of indexing that have broad...

66
1 Chap 7. Indexing

Upload: solomon-allen

Post on 13-Jan-2016

239 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

1

Chap 7. Indexing

Page 2: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

2

Chapter Objectives(1)

Introduce concepts of indexing that have broad applications in the design of file systems

Introduce the use of a simple linear index to provide rapid access to records in an entry-sequenced, variable-length record file

Investigate the implementation of the use of indexes for file maintenance

Introduce the template features of C++ for object I/O

Describe the object-oriented approach to indexed sequential files

Page 3: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

3

Chapter Objectives(2)

Describe the use of indexes to provide access to records by more than one key

Introduce the idea of an inverted list, illustrating Boolean operations on lists

Discuss of when to bind an index key to an address in the data file

Introduce and investigate the implications of self-indexing files

Page 4: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

4

Contents(1)

7.1 What is an Index?

7.2 A Simple Index for Entry-Sequenced Files

7.3 Using Template Classes in C++ for Object I/O

7.4 Object-Oriented Support for Indexed, Entry-

Sequenced Files of Data Objects

7.5 Indexes That Are Too Large to Hold in Memory

Page 5: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

5

Contents(2)

7.6 Indexing to Provide Access by Multiple Keys

7.7 Retrieval Using Combinations of Secondary Keys

7.8 Improving the Secondary Index Structure: Inverted Lists

7.9 Selective Indexes

7.10 Binding

Page 6: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

6

Overview: Index(1)

Index: a data structure which associates given key values with corresponding

record numbers

It is usually physically separate from the file (unlike for indexed sequential

files tight binding).

Linear indexes (like indexes found at the back of books)

Index records are ordered by key value as in an ordered relative file

Best algorithm for finding a record with a specific key value is binary

search

Addition requires reorganization

Page 7: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

7

Overview: Index(2)

k1 k2 k4 k5 k7 k9

k1 k2 k4 k5 k7 k9

AAA ZZZ CCC XXX EEE FFF

Index File

Data File

Page 8: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

8

Overview: Index(3)

Tree Indexes (like those of indexed sequential files)

Hierarchical in that each level

Beginning with the root level, points to the next record

Leaves POINTs only the data file

Indexed Sequential File

Binary Tree Index

AVL Tree Index

B+ tree Index

Page 9: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

9

Roles of Index?

Index: keys and reference fields

Fast Random Accesses

Uniform Access Speed

Allow users to impose order on a file without actually rearranging the

file

Provide multiple access paths to a file

Give user keyed access to variable-length record files

Page 10: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

10

A Simple Index(1)

Datafile entry-sequenced, variable-length record

primary key : unique for each entry in a file

Search a file with key (popular need) cannot use binary search in a variable-length

record file(can’t know where the middle record)

construct an index object for the file

index object : key field + byte-offset field

Page 11: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

11

A Simple Index (2)

ANG3795 167

COL31809 353

COL38358 211

DG18807 256

FF245 442

LON2312 32

MER75016 300

RCA2626 77

WAR23699 132

DG139201 396

LON|2312|Romeo and Juliet|Prokofiev . . .

RCA|2626|Quarter in C Sharp Minor . . .

WAR|23699|Touchstone|Corea . . .

ANG|3795|Sympony No. 9|Beethoven . . .

COL|38358|Nebeaska|Springsteen . . .

DG|18807|Symphony No. 9|Beethoven . . .

MER|75016|Coq d'or Suite|Rimsky . . .

COL|31809|Symphony No. 9|Dvorak . . .

DG|139201|Violin Concerto|Beethoven . . .

FF|245|Good News|Sweet Honey In The . . .

32

77

132

167

211

256

300

353

396

442

Datafile

Actual data recordAddress ofrecord

Referencefield

KeyIndexfile

Page 12: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

12

A Simple Index (3)

Index file: fixed-size record, sorted

Datafile: not sorted because it is entry sequenced

Record addition is quick (faster than a sorted file)

Can keep the index in memory

find record quickly with index file than with a sorted one

Class TextIndex encapsulates the index data and index operations

Key Reference field

Page 13: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

Let’s See Figure 7.4Class TextIndex{ public: TextIndex(int maxKeys = 100, int unique = 1);

int Insert(const char*ckey, int recAddr); //add to index int Remove(const char* key); //remove key from index int Search(const char* key) const;

//search for key, return recAddr void Print (ostream &) const; protected: int MaxKeys; // maximum num of entries int NumKeys;// actual num of entries char **Keys; // array of key values int* RecAddrs; // array of record references int Find (const chat* key) const; int Init (int maxKeys, int unique); int Unique;// if true --> each key must be unique}

Page 14: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndex::TextIndex

TextIndex:: TextIndex (int maxKeys, int unique)

: NumKeys (0), Keys(0), RecAddrs(0)

{Init (maxKeys, unique);}

TextIndex :: ~TextIndex ()

{delete Keys; delete RecAddrs;}

Page 15: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndex::Init

int TextIndex :: Init (int maxKeys, int unique)

{

Unique = unique != 0;

if (maxKeys <= 0)

{

MaxKeys = 0;

return 0;

}

MaxKeys = maxKeys;

Keys = new char *[maxKeys];

RecAddrs = new int [maxKeys];

return 1;

}

Page 16: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndex::Insert

int TextIndex :: Insert (const char * key, int recAddr){

int i;int index = Find (key);if (Unique && index >= 0) return 0; // key already inif (NumKeys == MaxKeys) return 0; //no room for another keyfor (i = NumKeys-1; i >= 0; i--){

if (strcmp(key, Keys[i])>0) break; // insert into location i+1Keys[i+1] = Keys[i];RecAddrs[i+1] = RecAddrs[i];

}Keys[i+1] = strdup(key);RecAddrs[i+1] = recAddr;NumKeys ++;return 1;

}

Page 17: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndex::Remove

int TextIndex :: Remove (const char * key)

{

int index = Find (key);

if (index < 0) return 0; // key not in index

for (int i = index; i < NumKeys; i++)

{

Keys[i] = Keys[i+1];

RecAddrs[i] = RecAddrs[i+1];

}

NumKeys --;

return 1;

}

Page 18: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndex::Search

int TextIndex :: Search (const char * key) const

{

int index = Find (key);

if (index < 0) return index;

return RecAddrs[index];

}

Page 19: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndex::Find

int TextIndex :: Find (const char * key) const

{

for (int i = 0; i < NumKeys; i++)

if (strcmp(Keys[i], key)==0) return i;// key found

else if (strcmp(Keys[i], key)>0) return -1;// not found

return -1;// not found

}

Page 20: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

Index Implementation

Page 706~709

G.1 Recording.h

G.2 Recording.cpp

G.3 Makerec.cpp

Page 710~712

G.4 Textind.h

G.5 Textind.cpp

Page 21: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

IndexRecordingFile

int IndexRecordingFile (char * myfile, TextIndex & RecordingIndex){

Recording rec; int recaddr, result;DelimFieldBuffer Buffer; // create a bufferBufferFile RecordingFile(Buffer); result = RecordingFile . Open (myfile,ios::in);if (!result){ cout << "Unable to open file "<<myfile<<endl; return 0; }while (1) // loop until the read fails{

recaddr = RecordingFile . Read (); // read next recordif (recaddr < 0) break;rec. Unpack (Buffer);RecordingIndex . Insert(rec.Key(), recaddr);cout << recaddr <<'\t'<<rec<<endl;

}RecordingIndex . Print (cout);result = RetrieveRecording (rec, "LON2312", RecordingIndex, RecordingFile);cout <<"Found record: "<<rec;

}

Page 22: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

RetrieveRecording

int RetrieveRecording (Recording & recording, char * key,

TextIndex & RecordingIndex, BufferFile & RecordingFile)

// read and unpack the recording, return TRUE if succeeds

{ int result;

cout <<"Retrieve "<<key<<" at recaddr "<<RecordingIndex.Search(key)<<endl;

result = RecordingFile . Read (RecordingIndex.Search(key));

cout <<"read result: "<<result<<endl;

if (result == -1) return FALSE;

result = recording.Unpack (RecordingFile.GetBuffer());

return result;

}

Page 23: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

Template Class RecordFile

we want to make the following code possible

– Person p; RecordFile pFile; pFile.Read(p);

– Recording r; RecordFile rFile; rFile.Read(r);

difficult to support files for different record types without having to

modify the class

Template class which is derived from BufferFile

– the actual declarations and calls

– RecordFile <Person> pFile; pFile.Read(p);

– RecordFile <Recording> rFile; rFile.Read(p);

Template Class for I/O Object(1)

Page 24: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

Template Class for I/O Object(2)

Template Class RecordFile

template <class RecType>class RecordFile : public BufferFile{ public:

int Read(RecType& record, int recaddr = -1); int Write(const RecType& record, int recaddr = -1); int Append(const RecType& record); RecordFile(IOBuffer& buffer) : BufferFile(buffer) {}

};//The template parameter RecType must have the following methods//int Pack(IOBuffer &); pack record into buffer//int Unpack(IOBuffer &); unpack record from buffer

Page 25: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

Adding I/O to an existing class RecordFile

add methods Pack and Unpack to class Recording

create a buffer object to use in the I/O

– DelimFieldBuffer Buffer;

declare an object of type RecordFile<Recording>

– RecordFile<Recording> rFile (Buffer);

Declaration and Calls

Template Class for I/O Object(3)

Recording r1, r2;rFile.Open(“myfile”);rFile.Read(r1);rFile.Write(r2);

Directly open a file and read andwrite objects of class Recording

Page 26: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

Object-Oriented Approach to I/O

Class IndexedFile

add indexed access to the sequential access provided by class RecordFile

extends RecordFile with Update, Append and Read method

– Update & Append : maintain a primary key index of data file

– Read : supports access to object by key

TextIndex, RecordFile ==> IndexedFile

Issues of IndexedFile

– how to make a persistent index of a file

– how to guarantee that the index is an accurate reflection of the contents

of the data file

Page 27: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

27

Create the original empty index and data files

Load the index file into memory

Rewrite the index file from memory

Add records to the data file and index

Delete records from the data file

Update records in the data file

Update the index to reflect changes in the data file

Retrieve records

Basic Operations of IndexedFile(1)

Page 28: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

28

Basic Operations of TextIndexedFile (1)

Creating the files

initially empty files (index file and data file) created as empty files with header records

implementation ( makeind.cpp in Appendix G ) Create method in class BufferFile

Loading the index into memory

loading/storing objects are supported in the IOBuffer classes

need to choose a particular buffer class to use for an index file ( tindbuff.cpp in Appendix G )

– define class TextIndexBuffer as a derived class of FixedFieldBuffer to support reading and writing of index objects

Page 29: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

29

Rewriting the index file from memory

part of the Close operation on an IndexedFile

write back index object to the index file

should protect the index when failure

write changes when out-of-date(use status flag)

Implementation – Rewind and Write operations of class BufferFile

Record Addition

Basic Operations of TextIndexedFile(2)

Add an entry to the index

Requires rearrangementif in memory, no file access using TextIndex.Insert

Add a new record to data file

using RecordFile<Recording>::Write

+

Page 30: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

30

Record Deletion

data file: the records need not be moved

index: delete entry really or just mark it

– using TextIndex::Delete

Record Updating (2 categories)

the update changes the value of the key field

– delete/add approach

– reorder both the index and the data file

the update does not affect the key field

– no rearrangement of the index file

– may need to reconstruct the data file

Basic Operations of TextIndexedFile(3)

Page 31: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

Class TextIndexedFile(1)

Members

methods

– Create, Open, Close, Read (sequential & indexed), Append, and

Update operations

protected members

– ensure the correlation between the index in memory (Index),

the index file (IndexFile), and the data file (DataFile)

char* key()

– the template parameter RecType must have the key method

– used to extract the key value from the record

Page 32: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

Class TextIndexedFile(2)Template <class RecType>class TextIndexedFile{ public:

int Read(RecType& record); // read next recordint Read(char* key, RecType& record) // read by key int Append(const RecType& record);int Update(char* oldKey, const RecType& record);int Create(char* name, int mode=ios::in|los::out);int Open(char* name, int mode=ios::in|los::out);int Close();TextIndexedFile(IOBuffer & buffer, int keySize, int maxKeys=100);~TextIndexedFile(); // close and delete

protected:TextIndex Index; BufferFile IndexFile;TextIndexBuffer IndexBuffer;RecordFile<RecType> DataFile;char * FileName; // base file name for fileint SetFileName(char* fName, char*& dFileName, char*&IdxFName);

};

Page 33: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndexedFile 생성자 / 소멸자

template <class RecType>

TextIndexedFile<RecType>::TextIndexedFile (IOBuffer & buffer,

int keySize, int maxKeys) : DataFile(buffer), Index (maxKeys),

IndexBuffer(keySize, maxKeys),

IndexFile(IndexBuffer)

{

FileName = 0;

}

template <class RecType>

TextIndexedFile<RecType>::~TextIndexedFile (){ Close(); }

Page 34: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndexedFile::Createint TextIndexedFile<RecType>::Create (char * fileName, int mode)// use fileName.dat and fileName.ind{ int result;

char * dataFileName, * indexFileName;result = SetFileName (fileName, dataFileName, indexFileName);cout <<"file names "<<dataFileName<<" "<<indexFileName<<endl;if (result == -1) return 0;result = DataFile.Create (dataFileName, mode);if (!result){

FileName = 0; // remove connectionreturn 0;

}result = IndexFile.Create (indexFileName, ios::out|ios::in);if (!result){

DataFile . Close(); // close the data fileFileName = 0; // remove connectionreturn 0;

}return 1;

}

Page 35: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndexedFile::Opentemplate <class RecType>int TextIndexedFile<RecType>::Open (char * fileName, int mode)// open data and index file and read index file{ int result;

char * dataFileName, * indexFileName;result = SetFileName (fileName, dataFileName, indexFileName);if (!result) return 0;// open filesresult = DataFile.Open (dataFileName, mode);if (!result) { FileName = 0; return 0; }result = IndexFile.Open (indexFileName, ios::out);if (!result) { DataFile . Close(); FileName = 0; return 0; }// read index into memoryresult = IndexFile . Read ();if (result != -1) {result = IndexBuffer . Unpack (Index);if (result != -1) return 1; }DataFile.Close();IndexFile.Close();FileName = 0;return 0;

}

Page 36: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndexedFile::Read

template <class RecType>

int TextIndexedFile<RecType>::Read (RecType & record)

{ return result = DataFile . Read (record, -1);}

template <class RecType>

int TextIndexedFile<RecType>::Read (char * key, RecType & record)

{

int ref = Index.Search(key);

if (ref < 0) return -1;

int result = DataFile . Read (record, ref);

return result;

}

Page 37: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndexedFile::Append

template <class RecType>

int TextIndexedFile<RecType>::Append (const RecType & record)

{

char * key = record.Key();

int ref = Index.Search(key);

if (ref != -1) // key already in file

return -1;

ref = DataFile . Append(record);

int result = Index . Insert (key, ref);

return ref;

}

Page 38: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndexedFile::Close

template <class RecType>

int TextIndexedFile<RecType>::Close ()

{ int result;

if (!FileName) return 0; // already closed!

DataFile . Close();

IndexFile . Rewind();

IndexBuffer.Pack (Index);

result = IndexFile . Write ();

cout <<"result of index write: "<<result<<endl;

IndexFile . Close ();

FileName = 0;

return 1;

}

Page 39: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndexBuffer

class TextIndexBuffer: public FixedFieldBuffer

{public:

TextIndexBuffer(int keySize, int maxKeys = 100,

int extraFields = 0, int extraSize=0);

// extraSize is included to allow derived classes to extend

// the buffer with extra fields.

// Required because the buffer size is exact.

int Pack (const TextIndex &);

int Unpack (TextIndex &);

void Print (ostream &) const;

protected:

int MaxKeys;

int KeySize;

char * Dummy; // space for dummy in pack and unpack

};

Page 40: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndexBuffer::TextIndexBuffer

TextIndexBuffer::TextIndexBuffer (int keySize, int maxKeys, int extraFields, int extraSpace)

: FixedFieldBuffer (1+2*maxKeys+extraFields,

sizeof(int)+maxKeys*keySize+maxKeys*sizeof(int) + extraSpace)

// buffer fields consist of numKeys, actual number of keys

// Keys [maxKeys] key fields size = maxKeys * keySize

// RecAddrs [maxKeys] record address fields size = maxKeys*sizeof(int)

{

MaxKeys = maxKeys;

KeySize = keySize;

AddField (sizeof(int));

for (int i = 0; i < maxKeys; i++)

{

AddField (KeySize);

AddField (sizeof(int));

}

Dummy = new char[keySize+1];

}

Page 41: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndexBuffer::Pack

int TextIndexBuffer::Pack (const TextIndex & index)

{

int result;

Clear ();

result = FixedFieldBuffer::Pack (&index.NumKeys);

for (int i = 0; i < index.NumKeys; i++)

{// note only pack the actual keys and recaddrs

result = result && FixedFieldBuffer::Pack (index.Keys[i]);

result = result && FixedFieldBuffer::Pack (&index.RecAddrs[i]);

}

for (int j = 0; j<index.MaxKeys-index.NumKeys; j++)

{// pack dummy values for other fields

result = result && FixedFieldBuffer::Pack (Dummy);

result = result && FixedFieldBuffer::Pack (Dummy);

}

return result;

}

Page 42: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

TextIndexBuffer::Unpack

int TextIndexBuffer::Unpack(TextIndex & index)

{

int result;

result = FixedFieldBuffer::Unpack (&index.NumKeys);

for (int i = 0; i < index.NumKeys; i++)

{// note only pack the actual keys and recaddrs

index.Keys[i] = new char[KeySize]; // just to be safe

result = result && FixedFieldBuffer::Unpack (index.Keys[i]);

result = result && FixedFieldBuffer::Unpack (&index.RecAddrs[i]);

}

for (int j = 0; j<index.MaxKeys-index.NumKeys; j++)

{// pack dummy values for other fields

result = result && FixedFieldBuffer::Unpack (Dummy);

result = result && FixedFieldBuffer::Unpack (Dummy);

}

return result;

}

Page 43: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

IndexRecordingFile

int IndexRecordingFile (char * myfile, TextIndexedFile<Recording> & indexFile){ Recording rec; int recaddr, result;

DelimFieldBuffer Buffer; // create a bufferBufferFile RecFile(Buffer); result = RecFile . Open (myfile,ios::in);if (!result){ cout << "Unable to open file "<<myfile<<endl;

return 0;}while (1) // loop until the read fails{ recaddr = RecFile . Read (); // read next record

if (recaddr < 0) break;rec. Unpack (Buffer);indexFile . Append(rec);

}Recording rec1;result = indexFile.Read ("LON2312", rec1);cout <<"Found record: "<<rec;

}

Page 44: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

Enhancements to TextIndexedFile(1)

Support other types of keys

Restriction: the key type is restricted to string (char *)

Relaxation: support a template class SimpleIndex with parameter for key

type

Support data object class hierarchies

Restriction: every object must be of the same type in RecordFile

Relaxation: the type hierarchy supports virtual pack methods

Page 45: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

Enhancements to TextIndexedFile(2)

Support multirecord index files

Restriction: the entire index fit in a single record

Relaxation: add protected method Insert, Delete, and Search to

manipulate the arrays of index objects

Active optimization of operations

Obvious: the most obvious optimization is to use binary search in the

Find method

Active: add a flag to the index object to avoid writing the index record

back to the index file when it has not been changed

Page 46: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

Where are we going?

Plain Stream File

Persistency ==> Buffer support ==> BufferFile

<incremental approach> Deriving BufferFile using

various other classes

Random Access ==> Index support => IndexedFile

<incremental approach> : Deriving TextIndexedFile using RecordFile and

TextIndex

Page 47: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

47

Too Large Index(1)

On secondary storage (large linear index)

Disadvantages

binary searching of the index requires several seeks(slower than a sorted

file)

index rearrangement requires shifting or sorting records on second storage

Alternatives (to be considered later)

hashed organization

tree-structured index (e.g. B-tree)

Page 48: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

48

Too Large Index (2)

Advantages over the use of a data file sorted by key even if the index is on the

secondary storage

can use a binary search

sorting and maintaining the index is less expensive than doing the data file

can rearrange the keys without moving the data records if there are pinned

records

Page 49: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

49

Index by Multiple Keys(1)

DB-Schema = ( ID-No, Title, Composer, Artist, Label)

Find the record with ID-NO “COL38358” (primary key - ID-No)

Find all the recordings of “Beethoven” (2ndary key - composer)

Find all the recordings titled “Violin Concerto” (2ndary key - title)

Page 50: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

50

Index by Multiple Keys(2)

Most people don’t want to search only

by primary key

Secondary Key

can be duplicated

Figure -->

Secondary Key Index

secondary key --> consult one

additional index (primary key

index)

BEETHOVEN ANG3795

BEETHOVEN DG139201

BEETHOVEN COL38358

COREA WAR23699

DVORAK COL31809

PROKOFIEV LON2312

RIMSKY-KORSAKOV MER75016

SPRINGSTEEN COL38358

SWEET HONEY IN THE R FF245

BEETHOVEN DG18807

Secondary key Primary key

Composer index

BEETHOVEN DG18807

Page 51: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

51

Secondary Index:Basic Operations(1)

Record Addition

similar to the case of adding to primary index

secondary index is stored in canonical form

– fixed length (so it can be truncated)

– original name can be obtained from the data file

can contain duplicate keys

local ordering in the same key group

Page 52: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

52

Secondary Index:Basic Operations (2)

Record Deletion (2 cases)

Secondary index references directly record

– delete both primary index and secondary index

– rearrange both indexes

Secondary index references primary key

– delete only primary index

– leave intact the reference to the deleted record

– advantage : fast

– disadvantage : deleted records take up space

Page 53: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

53

Secondary Index: Basic Operations (3)

Record Updating

primary key index serves as a kind of protective buffer

Secondary index references directly record

– update all files containing record’s location

Secondary index references primary key (1)

– affect secondary index only when either primary or secondary key is changed

Continued.

Page 54: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

54

Secondary Index: Basic Operations (4)

Secondary index references primary key(2)

when changes the secondary key

– rearrange the secondary key index

when changes the primary key

– update all reference field

– may require reordering the secondary index

when confined to other fields

– do not affect the secondary key index

Page 55: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

55

Retrieval of Records

Types

primary key access

secondary key access

combination of above

Combination of keys

using secondary key index, it is easy

boolean operation (AND, OR)

Page 56: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

56

Inverted Lists(1) Inverted List

a secondary key leads to a set of one or more primary keys

Disadvantages of 2nd-ary index structure

rearrange when adding

repeated entry when duplicating

Solution A: by an array of references

Solution B: by linking the list of references

Page 57: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

57

Array of References

BEETHOVEN ANG3795 DG139201 DG18807 RCA2626

COREA WAR23699

DVORAK COL31809

PROKOFIEV LON2312

RIMSKY-KORSAKOV MER75016

SPRINGSTEEN COL38358

SWEET HONEY IN THE R FF245

Secondary key Set of primary key references

Revised composer index

* no need to rearrange

* limited reference array

* internal fragmentation

Page 58: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

58

Inverted Lists (2)

Guidelines for better solution

no reorganization when adding

no limitation for duplicate key

no internal fragmentation

Solution B: by Linking the list of references

A list of primary key references

secondary key field, relative record number of the first corresponding primary

key reference

PROKOFIEV ANG36193

LON2312

Page 59: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

59

Linking List of References (1)

BEETHOVEN

COREA

PROKOFIEV

RIMSKY-KORSAKOV

SPINGSTEEN

SWEET HONEY IN THE R

DVORAK

3

2

7

10

6

4

9

LON2312

RCA2626

ANG23699

COL38358

DG18807

MER75016

COL31809

DG139201

ANG36193

WAR23699

-1

-1

-1

8

-1

1

-1

-1

5

0

0

1

2

3

4

5

6

7

8

9 FF245 -1

Secondary Index file Label ID List file

Improved revision of the composer index

0

1

2

3

4

5

6

10

Page 60: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

60

Linking List of References (2)

The primary key references in a separate, entry-sequenced file

Advantages

rearranges only when secondary key changes

rearrangement is quick

less penalty associated with keeping the secondary index file on secondary storage (less need for sorting)

Label ID List file not need to be sorted

reusing the space of deleted record is easy

Page 61: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

61

Linking List of References (3)

Disadvantage

same secondary key references may not be physically grouped

– lack of locality

– could involve a large amount of seeking

– solution: reside in memory

– same Label ID list can hold the lists of a number of secondary index files

– if too large in memory, can load only a part of it

Page 62: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

62

Selective Indexes

Selective Index: Index on a subset of records

Selective index contains only some part of entire index

provide a selective view

useful when contents of a file fall into several categories

– e.g. 20 < Age < 30 and $1000 < Salary

Page 63: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

63

Index Binding(1)

When to bind the key indexes to the physical address of its associated record?

File construction time binding

(Tight, in-the-data binding)

tight binding & faster access

the case of primary key

when secondary key is bound to that time

– simpler and faster retrieval

– reorganization of the data file results in modifications of all

bound index files

Page 64: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

64

Index Binding (2)

Postpone binding until a record is actually retrieved (Retrieval-time binding) minimal reorganization & safe approach mostly for secondary key

Tight, in-the-data binding is good when static, little or no changes rapid performance during retrieval mass-produced, read-only optical disk

Page 65: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

65

Let’s Review (1)

7.1 What is an Index?

7.2 A Simple Index for Entry-Sequenced Files

7.3 Using Template Classes in C++ for Object I/O

7.4 Object-Oriented Support for Indexed, Entry-

Sequenced Files of Data Objects

7.5 Indexes That Are Too Large to Hold in Memory

Page 66: 1 Chap 7. Indexing. 2 Chapter Objectives(1)  Introduce concepts of indexing that have broad applications in the design of file systems  Introduce the

66

Let’s Review(2)

7.6 Indexing to Provide Access by Multiple Keys

7.7 Retrieval Using Combinations of Secondary Keys

7.8 Improving the Secondary Index Structure:

Inverted Lists

7.9 Selective Indexes

7.10 Binding