representing data elements

28
1 Representing Data Elements Fields, Records, Blocks Variable-length Data Modifying Records Source: our textbook

Upload: crevan

Post on 05-Jan-2016

35 views

Category:

Documents


1 download

DESCRIPTION

Representing Data Elements. Fields, Records, Blocks Variable-length Data Modifying Records. Source: our textbook. Overview. Attributes are represented by sequences of bytes, called fields Tuples are represented by collections of fields, called records - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Representing Data Elements

1

Representing Data Elements

Fields, Records, BlocksVariable-length DataModifying Records

Source: our textbook

Page 2: Representing Data Elements

2

Overview

Attributes are represented by sequences of bytes, called fields

Tuples are represented by collections of fields, called records

Relations are represented by collections of records, called files

Files are stored in blocks, using specialized data structures to support efficient modification and querying

Page 3: Representing Data Elements

3

Representing SQL Data Types

integers and reals: built-in CHAR(n): array of n bytes VARCHAR(n): array of n+1 bytes

(extra byte is either string length or null char)

dates and times: fixed length strings etc.

Page 4: Representing Data Elements

4

Representing Tuples

For now, assume all attributes (fields) are fixed length.

Concatenate the fields Store the offset of each field in

schema0 30 286 287 297

nameCHAR(30)30 bytes

addressVARCHAR(255)256 bytes

genderCHAR(1)1 byte

birthdateDATE10 bytes

Page 5: Representing Data Elements

5

More on Tuples

Due to hardware considerations, certain types of data need to start at addresses that are multiples of 4 or 8

Previous example becomes:0 32 288 292 304

nameCHAR(30)30 bytes+ 2

addressVARCHAR(255)256 bytes

genderCHAR(1)1 byte+ 3

birthdateDATE10 bytes+ 2

Page 6: Representing Data Elements

6

Record Headers

Often it is convenient to keep some "header" information in each record: a pointer to schema information

(attributes/fields, types, their order in the tuple, constraints)

length of the record/tuple timestamp of last modification

Page 7: Representing Data Elements

7

Packing Records into Blocks

Start with block header: timestamp of last modification/access offset of each record in the block, etc.

Follow with sequence of records May end with some unused space

headerrecord 1record 2 … record n-1record n

Page 8: Representing Data Elements

8

Representing Addresses

Often addresses (pointers) are part of records: the application data in object-oriented databases as part of indexes and other data structures

supporting the DBMS Every data item (block, record, etc.) has

two addresses: database address: address on the disk (typically 8-16 bytes) memory address, if the item is in memory

(typically 4 bytes)

Page 9: Representing Data Elements

9

Translation Table

Provides mapping from database addresses to memory addresses for all blocks currently in memory

Later we'll discuss how to implement it

Page 10: Representing Data Elements

10

Pointer Swizzling

When a block is moved from disk into main memory, change all the disk addresses that point to items in this block into main memory addresses.

Need a bit for each address to indicate if it is a disk address or a memory address.

Why? Faster to follow memory pointers (only uses a single machine instruction).

Page 11: Representing Data Elements

11

Example of Swizzling

Block 1

Block 2

Disk Main Memory

read intomain memory swizzled

unswizzled

Page 12: Representing Data Elements

12

Swizzling Policies

Automatic swizzling: as soon as block is brought into memory, swizzle all relevant pointers

Swizzling on demand: only swizzle a pointer if and when it is actually followed

No swizzling: always refer to translation table

Programmer control

Page 13: Representing Data Elements

13

Automatic Swizzling

Locating all pointers within a block: refer to the schema, which will indicate where

addresses are in the records for index structures, pointers are at known

locations Update translation table with memory

addresses of items in the block Update pointers in the block (in memory)

with memory addresses, when possible, as obtained from translation table

Page 14: Representing Data Elements

14

Unswizzling

When a block is moved from memory back to disk, all pointers must go back to database (disk) addresses

Use translation table again Important to have an efficient data

structure for the translation table

Page 15: Representing Data Elements

15

Pinned Records and Blocks

A block in memory is pinned if it cannot be safely written back to disk

Indicate with a bit in the block header

Reasons for pinning: related to failure recovery (more later) because of pointer swizzling

If block B1 has swizzled pointer to an item in block B2, then B2 is pinned.

Page 16: Representing Data Elements

16

Unpinning a Block Consider each item in the block to be

unpinned Keep in the translation table the places

in memory holding swizzled pointers to that item (e.g., with a linked list)

Unswizzle those pointers: use translation table to replace the memory addresses with database (disk) addresses

Page 17: Representing Data Elements

17

Variable Length Data

Data items with varying size (e.g., if maximum size of a field is large but most of the time the values are small)

Variable-format records (e.g., NULLs method for representing a hierarchy of entity sets as relations)

Records that do not fit in a block (e.g., an MPEG of a movie)

Page 18: Representing Data Elements

18

Variable-Length Fields

Store the fixed-length fields before the variable-length fields in each record

Keep in the record header record length pointers to the beginnings of all the

variable-length fields Book discusses variations on this idea

Page 19: Representing Data Elements

19

Variable Length Fieldsotherheaderinfo

recordlength

to var lenfield 2

var lenfield 2

var lenfield 3

fixed lenfield 2

var lenfield 1

fixed lenfield 1

to var lenfield 3

Page 20: Representing Data Elements

20

Variable-Format Records

Represent by a sequence of tagged fields

Each tagged field contains name type length, if not deducible from the type value

Page 21: Representing Data Elements

21

Splitting Records Across Blocks

Called spanned records Useful when

record size exceeds block size putting an integral number of records in a

block wastes a lot of the block (e.g., record size is 51% of block size)

Each record or fragment header contains bit indicating if it is a fragment if fragment then pointers to previous and

next fragments of the record (i.e., a linked list)

Page 22: Representing Data Elements

22

Record Modification

Modifications to records: insert delete update

issues even with fixed-length records and fields

even more involved with variable-length data

Page 23: Representing Data Elements

23

Inserting New Records

If records need not be any particular order, then just find a block with enough empty space

Later we'll see how to keep track of all the tuples of a given relation

But what if blocks should be kept in a certain order, such as sorted on primary key?

Page 24: Representing Data Elements

24

Insertion in Order

If there is space in the block, then add the record(going right to left), add a pointer to it (going leftto right) and rearrange the pointers as needed.

record4

record3

record2

record1

unused

header

Page 25: Representing Data Elements

25

What if Block is Full?

Records are stored in several blocks, in sorted order

One approach: keep a linked list of "overflow" blocks for each block in the main sequence

Another approach is described in the book

Page 26: Representing Data Elements

26

Deleting Records

Try to reclaim space made available after a record is deleted

If using an offset table, then rearrange the records to fill in any hole that is left behind and adjust the pointers

Additional mechanisms are based on keeping a linked list of available space and compacting when possible

Page 27: Representing Data Elements

27

Tombstones

What about pointers to deleted records?

We place a tombstone in place of each deleted record

Tombstone is permanent Issue of where to place the tombstone Keep a tombstone bit in each record

header: if this is a tombstone, then no need to store additional data

Page 28: Representing Data Elements

28

Updating Records

For fixed-length records, there is no effect on the storage system

For variable-length records: if length increases, like insertion if length decreases, like deletion

except tombstones are not necessary