1 indexes on sequential files source: our textbook, slides by hector garcia-molina

39
1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

Post on 20-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

1

Indexes on Sequential Files

Source: our textbook, slides by Hector Garcia-Molina

Page 2: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

2

How to Represent a Relation

Suppose we scatter its records arbitrarily among the blocks of the disk

How to answer SELECT * FROM R? Scan every block:

ridiculously slow would require lots of overhead info in

each block and each record header

Page 3: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

3

How to Represent a Relation

Reserve some blocks for the relation

No need to scan entire disk How to answer SELECT * FROM R

WHERE cond ? Scan all the records in the reserved

blocks Still ridiculously slow

Page 4: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

4

Indexes

Use indexes -- special data structures -- that allow us to find all the records that satisfy a condition "efficiently"

Possible data structures: simple indexes on sorted files secondary indexes on unsorted files B-trees hash tables

Page 5: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

5

Sorted Files Sorted file: records (tuples) of the

file (relation) are in sorted order of the field (attribute) of interest.

This field might or might not be a key of the relation.

This field is called the search key. A sorted file is also called a

sequential file.

Page 6: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

6

Index on Sequential File

An index is another file containing key-pointer pairs of the form (K,a)

K is a search key a is an address (pointer) The record at address a has search

key K Particularly useful when the search

key is the primary key of the relation

Page 7: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

7

Dense Indexes

An index with one entry for every key in the data file

What's the point? Index is much smaller than data file

when record contains much more than just the search key

If index is small enough to fit in main memory, record with a certain search key can be found quickly: binary search in memory, followed by only one disk I/O

Page 8: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

8

Example of a Dense IndexSequential File

2010

4030

6050

8070

10090

Dense Index

10203040

50607080

90100110120

Page 9: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

9

Some Numbers relation with 1,000,000 tuples block size is 4096 bytes 10 records per block thus 100,000 blocks, > 400 Mbytes key field is 30 bytes pointer is 8 bytes thus at least 100 key-pointer pairs per block thus dense index size is 10,000 blocks, about 40

Mbytes since log(10,000) = 13, takes at most 14 disk

I/O's for a search

Page 10: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

10

Sparse Index

Uses less space than a dense index Requires more time to find a

record with a given key In a sparse index, there is just one

(key,pointer) pair per data block. The key is for the first record in the

block.

Page 11: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

11

Sparse Index ExampleSequential File

2010

4030

6050

8070

10090

Sparse Index

10305070

90110130150

170190210230

Page 12: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

12

Using a Sparse Index

To find the record with key K, search the index for the largest key ≤ K

Use binary search to do this Retrieve the indicated data block Search the block for the record

with key K

Page 13: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

13

Comparing Sparse and Dense Indexes

Sparse index uses much less space In the previous numeric example,

sparse index size is now only 1000 index blocks, about 4 Mbytes

Dense index, unlike sparse, lets us answer "is there a record with key K?" without having to retrieve a data block

Page 14: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

14

Multiple Levels of Index

Make an index for the index Can continue this idea for more

levels, but usually only two levels in practice

Second and higher level indexes must be sparse, otherwise no savings

Page 15: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

15

Two-Level Index ExampleSequential File

2010

4030

6050

8070

10090

Sparse 2nd level

10305070

90110130150

170190210230

1090

170250

330410490570

Page 16: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

16

Numeric Example Again

Suppose we put a second-level index on the first-level sparse index

Since first-level index uses 1000 blocks and 100 key-pointer pairs fit per block, we need 10 blocks for second-level index

Very likely to keep the second-level index in memory

Thus search requires at most two disk I/O's (one for block of first-level index, one for data block)

Page 17: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

17

Duplicate Search Keys

What if more than one record has a given search key value? (Then the search key is not a key of the relation.)

Solution 1: Use a dense index and allow duplicate search keys in it.

To find all data records with search key K, follow all the pointers in the index with search key K

Page 18: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

18

Solution 1 Example

1010

2010

3020

3030

4540

10101020

20303030

1010

2010

3020

3030

4540

10101020

20303030

Page 19: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

19

Duplicate Search Keys with Dense Index

Solution 2: only keep record in index for first data record with each search key value (saves some space in the index)

To find all data records with search key K, follow the one pointer in the index and then move forward in the data file

Page 20: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

20

Solution 2 Example

1010

2010

3020

3030

4540

10203040

Page 21: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

21

Duplicate Search Keys with Sparse Index

Recall that index has an entry for just the first data record in each block

To find all data records with key K: find last entry (E1) in index with key ≤ K move toward front of index until reaching

entry (E2) with key < K Check data blocks pointed to by entries

from E2 to E1 for records with search key K

Page 22: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

22

Dupl. Keys w/ Sparse Index

1010

2010

3020

3030

4540

10102030

care

ful if lookin

gfo

r 2

0 o

r 3

0!

Page 23: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

23

Variation on Previous Scheme

Index entry for a data block holds smallest search key that is new (did not appear in a previous block)

If there is no new search key in that block, then index entry holds the lone search key in the block

To find all data record with key K: search index for first entry whose key is either

K, or < K but next key is > K if a record with key K is in that block then

scan forward from there

Page 24: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

24

Variation Example

1010

2010

3020

3030

4540

10203030

shouldthis be40?

Page 25: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

25

Inserting and Deleting Data

Recall three main techniques: create/delete overflow blocks

overflow blocks do not have entries in a sparse index

may be able to insert new blocks in sequential order new block needs an entry in a sparse index changing an index can create same problems

make room in a full block by sliding some data to an adjacent block; combine adjacent blocks if they get too empty

Page 26: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

26

General Strategy

When data file changes, index must adapt

Details depend on whether index is sparse or dense and how data file modifications are implemented

Index file is itself sequential, so same strategies as for modifying data files can be applied to index files

Page 27: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

27

Effects of Actions on IndexAction Dense Index Sparse Index

Create empty overflow block

none none

Delete empty overflow block

none none

Create empty (main) block

none insert

Delete empty (main) block

none delete

Insert record insert maybe update

Delete record delete maybe update

Slide record update maybe update

Page 28: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

28

Explanations for Actions create/destroy empty overflow block has

no effect on dense index since it refers to records sparse index since it refers to main records

create/destroy empty main block: no effect on dense index as above insert/delete entry in sparse index

insert/delete/slide record: insert/delete/update entry in dense index only change sparse index if affects first

record in block

Page 29: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

29

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

Page 30: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

30

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 40

Page 31: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

31

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete record 30

4040

Page 32: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

32

Deletion from sparse index

2010

4030

6050

8070

10305070

90110130150

– delete records 30 & 40

5070

Page 33: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

33

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

Page 34: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

34

Deletion from dense index

2010

4030

6050

8070

10203040

50607080

– delete record 30

4040

Page 35: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

35

Insertion, sparse index case

2010

30

5040

60

10304060

Page 36: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

36

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 34

34

• our lucky day! we have free space where we need it!

Page 37: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

37

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 15

15

2030

20

• Illustrated: Immediate reorganization• Variation:

– insert new block (chained file)– update index

Page 38: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

38

Insertion, sparse index case

2010

30

5040

60

10304060

– insert record 25

25

overflow blocks(reorganize later...)

Page 39: 1 Indexes on Sequential Files Source: our textbook, slides by Hector Garcia-Molina

39

Insertion, dense index case

• Similar

• Often more expensive . . .