1 overview of storage and indexing yanlei diao umass amherst feb 13, 2007 slides courtesy of r....

1

Overview of Storage and Indexing

Yanlei DiaoUMass AmherstFeb 13, 2007

Slides Courtesy of R. Ramakrishnan and J. Gehrke

2

DBMS Architecture

Disk Space Manager

DB

Access Methods

Buffer Manager

Query Parser

Query Rewriter

Query Optimizer

Query Executor

Lock Manager Log Manager

3

Data on External Storage Disks: Can retrieve random pages at (almost) fixed

cost But reading several consecutive pages is much faster than

reading them in random order. Tapes: Can only read pages in sequence

Slower but cheaper than disks; used for archival storage Page: Unit of information read from or written to disk

Size of page: DBMS parameter, 4KB, 8KB Disk space manager:

Abstraction: a collection of pages. Allocate/de-allocate a page. Read/write a page.

Page I/O: Pages read from disk and pages written to disk Dominant cost of database operations

4

Access Methods Access methods: routines to manage various

disk-based data structures. Files of records Various kinds of indexes

File of records: Important abstraction of external storage in a DBMS! Record id (rid) is sufficient to physically locate a record

Indexes: Auxiliary data structures Given a value in the index search key field(s), find the

record ids of records with this value

5

Buffer Management Architecture:

Data is read into memory for processing Data is written to disk for persistent storage

Buffer manager stages pages between external storage and main memory buffer pool.

Access method layer makes calls to the buffer manager.

6

Access Methods

Many alternatives exist, each ideal for some situations, and not so good for others: Heap (unorder) files: Suitable when typical

access is a file scan retrieving all records. Sorted Files: Best if records are retrieved in

order of the sorting attribute(s), or only a `range’ of records is needed.

Indexes: Data structures to organize records via trees or hashing.

• Like sorted files, they speed up searches for a subset of records, based on values in certain (“search key”) fields

• Updates are much faster than in sorted files.

7

Indexes An index on a file speeds up selections on the

search key field(s) for the index. Any subset of the fields of a relation can be the search

key for an index on the relation. Search key is not the same as key (minimal set of fields

that uniquely identify a record in a relation).

An index contains a collection of data entries, and supports efficient retrieval of all data entries k* with a given key value k. Data entry (in an index) versus data record (in a file). Given a data entry k*, we can find record(s) with the

key k in at most one disk I/O. (Details soon …)

8

B+ Tree Indexes

Leaf pages contain data entries: • Leaf pages are chained using prev & next pointers• Data entries are sorted by the search key value

Non-leaf

Pages

Pages (Sorted by search key)

Leaf

9

B+ Tree Indexes

Non-leaf pages have index entries; only used to direct searches

P0 K 1 P 1 K 2 P 2 K m P m

index entry

Non-leaf

Pages

Pages (Sorted by search key)

Leaf

10

Example B+ Tree

Equality selection: find 28*? 29*? Range selection: find all > 15* and < 30* Insert/delete: Find the data entry in a leaf, then

change it. Need to adjust the parent sometimes. And the change sometimes bubbles up the tree.

2* 3*

Root

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

Entries <= 17 Entries > 17

Data entries in leaf nodes are sorted.

11

Hash-Based Indexes Good for equality selections. A hash index is a collection

of buckets. Bucket = primary page plus

zero or more overflow pages. Buckets contain data entries.

h

1 2 3 … … … … … … N-1

Search key value k

Hashing function h: h(k) = bucket of data entries that

may contain the search key value k. No need for “index entries” in this

scheme.

12

Alternatives for Data Entry k* in Index

In a data entry k* we can store: Alt1: Data record with key value k Alt2: <k, rid of data record with search key value

k> Alt3: <k, list of rids of data records with search

key k>

Choice of an alternative for data entries is orthogonal to an indexing technique used. Indexing techniques: B+ tree, hashing Every indexing technique can be combined with

any alternative.

13

Alternative 1 for Data Entries

Alternative 1: Index structure itself is a file organization for

data records (instead of a Heap or sorted file).

At most one index on a given collection of data records can use Alternative 1.

• Otherwise, data records are duplicated, leading to redundant storage and potential inconsistency.

If data records are very large, # of pages containing data entries is high.

• Size of the index structure to direct searches (e.g., non-leaf nodes in B+ tree) can also be large.

14

Alternatives 2, 3 for Data Entries

Alternatives 2 and 3: Data entries, <k, rid(s)>, typically much

smaller than data records. • Index structure used to direct search is much smaller

than with Alternative 1.• So, better than Alternative 1 with large data records,

especially if search keys are small. Alternative 3 more compact than Alternative 2

• In the presence of multiple K* entries! Alternative 3 leads to variable sized data

entries, even if search keys are of fixed length.• Variable sizes records/data entries are hard to

manage.

15

Index Classification

Primary index vs. secondary index: If search key contains the primary key, then

called primary index. Other indexes are called secondary indexes.

Unique index: Search key contains a candidate key. No data entries can have the same search key

value.

Recall: Key? Primary key? Candidate key?

16

Index Classification (Contd.) Clustered index: order of data records is the

same as, or `close to’, order of (sorted) data entries. Alternative 1 implies clustered Alternatives 2 and 3 are clustered only if data records

are sorted on the search key field. A data file can be clustered on at most one search key. Retrieving data records: fast, with one or a few I/Os.

Why? Unclustered index:

Multiple unclustered indexes on a data file. Retrieving data records: can be slow, toughing many

pages.

17

Clustered vs. Unclustered Indexes

Suppose that Alternative (2) is used for data entries, and data records are stored in a Heap file. To build a clustered index, first sort the Heap file (with some

free space on each page for future inserts). Overflow pages may be needed for inserts. (Thus, order of

data records is `close to’, but not identical to, the sort order.)

Index entries

Data entries

direct search for

(Index File)

(Data file)

Data Records

data entries

Data entries

Data Records

CLUSTERED UNCLUSTERED

18

Index Classification (Contd.) Dense index: contains (at least) one data entry

for every search key value that appears in a record in the indexed file Alternatives 1, 2, 3

Sparse index: contains one entry for each page of records in the indexed file. Sparse index has to be clustered! Alternative 1? Alternative 2? Alternative 3?

19

Cost Model for Our Analysis

We ignore CPU costs, for simplicity: B: The number of data pages

R: Number of records per page

D: (Average) time to read or write disk page

I/O cost: Measuring # of page I/O’s ignores pre-fetching of a sequence of pages; thus, approximate I/O cost.

Average-case analysis, based on simplistic assumptions. Good enough to show the overall trends!

20

Comparing File Organizations

Heap files (random order; insert at eof)

Sorted files, sorted on <age, sal> Clustered B+ tree file, Alternative (1),

search key <age, sal> Unclustered B + tree index on a heap

file, search key <age, sal> Unclustered hash index on a heap file,

search key <age, sal>

21

Operations to Compare

Scan: Fetch all records from disk Equality search: e.g. (age, sal) = (24, 50K) Range selection:

e.g. (age, sal) [(22, 20K), (30, 200K)] Insert a record Delete a record

22

Assumptions in Our Analysis Heap Files:

Equality selection on key; exactly one match. Sorted Files:

Files compacted after deletions. Indexes:

Alt (2), (3): size of data entry = 10% size of data record

Hash: No overflow chains.• 80% page occupancy => File size = 1.25 data size

B+Tree:• 67% occupancy (typical): implies file size = 1.5 data size• Balanced with fanout F (133 typical) at each non-leaf

level

23

Assumptions (contd.)

Scans: Leaf nodes of a tree-index are chained. Index data entries plus actual file scanned

for unclustered indexes. Range searches:

We use tree indexes to restrict the set of data records fetched, but ignore hash indexes.

24

(a) Scan (b) Equality (c ) Range (d) Insert (e) Delete

(1) Heap BD 0.5BD BD 2D Search +D

(2) Sorted BD Dlog 2B D(log 2 B + # pgs with match recs)

Search + BD

Search +BD

(3) Clustered

1.5BD Dlog F 1.5B D(log F 1.5B + # pages w. match recs)

Search + D

Search +D

(4) Unclust. Tree index

BD(R+0.15) D(1 + log F 0.15B)

D(log F 0.15B + # match recs)

Search + 2D

Search + 2D

(5) Unclust. Hash index

BD(R+0.125) 2D BD Search + 2D

Search + 2D

Cost of Operations

Several assumptions underlie these (rough) estimates!

25

(a) Scan (b) Equality (c ) Range (d) Insert (e) Delete

(1) Heap BD 0.5BD BD 2D Search +D

(2) Sorted BD Dlog 2B D(log 2 B + # pgs with match recs)

Search + BD

Search +BD

(3) Clustered

1.5BD Dlog F 1.5B D(log F 1.5B + # pages w. match recs)

Search + D

Search +D

(4) Unclust. Tree index

BD(R+0.15) D(1 + log F 0.15B)

D(log F 0.15B + # match recs)

Search + 2D

Search + 2D

(5) Unclust. Hash index

BD(R+0.125) 2D BD Search + 2D

Search + 2D

Cost of Operations

Several assumptions underlie these (rough) estimates!

# of leaf pages: B*0.1/67%=0.15B; Cost of index I/O: 0.15BD# of data entries on a leaf page: R*10*0.67=6.7R; Cost of data I/O: 0.15B*6.7R*D=BDRKey: unclustered B+tree, one I/O per data entry! # of data entry pages: B*0.1/80%=0.125B; Cost of index I/O: 0.125BD# of data entries on a hash page: R*10*0.80=8R; Cost of data I/O: 0.125B*8R*D=BDRKey: unclustered hash index, one I/O per data entry!

26

Summary

Many alternative access methods exist, each appropriate in some situation.

Index is a collection of data entries plus a way to quickly find entries with given key values.

If selection queries are frequent, building an index is important. Hash-based indexes only good for equality

search. Tree-based indexes best for range search;

also good for equality search.

27

Summary (Contd.)

Data entries can be actual data records, <key, rid> pairs, or <key, rid-list> pairs. Choice orthogonal to indexing technique used

to locate data entries with a given key value. Can have several indexes on a given file of

data records, each with a different search key.

Indexes can be classified as clustered vs. unclustered, primary vs. secondary, etc. Differences have important consequences for utility/performance.

28

Questions

1 overview of storage and indexing yanlei diao umass amherst feb 13, 2007 slides courtesy of r....

Documents

data record

search key leaf slide

data entry

index search key fields

processing data

random pages

index entry nonleaf

collection of pages