1 sections 13.1 – 13.3 sanuja dabade & eilbroun benjamin cs 257 – dr. ty lin secondary...

490
1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

Post on 21-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

1

SECTIONS 13.1 – 13.3Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin

SECONDARY STORAGE MANAGEMENT

Page 2: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

2

Presentation Outline

13.1 The Memory Hierarchy 13.1.1 The Memory Hierarchy 13.1.2 Transfer of Data Between

Levels 13.1.3 Volatile and Nonvolatile

Storage 13.1.4 Virtual Memory

13.2 Disks 13.2.1 Mechanics of Disks 13.2.2 The Disk Controller 13.2.3 Disk Access Characteristics

Page 3: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

3

Presentation Outline (con’t)

13.3 Accelerating Access to Secondary Storage 13.3.1 The I/O Model of

Computation 13.3.2 Organizing Data by

Cylinders 13.3.3 Using Multiple Disks 13.3.4 Mirroring Disks 13.3.5 Disk Scheduling and the

Elevator Algorithm 13.3.6 Prefetching and Large-

Scale Buffering

Page 4: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

4

13.1.1 Memory Hierarchy

Several components for data storage having different data capacities available

Cost per byte to store data also varies Device with smallest capacity offer the

fastest speed with highest cost per bit

Page 5: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

5

Memory Hierarchy Diagram

Programs, DBMS

Main Memory DBMS’s

Main Memory

Cache

As Visual Memory Disk File System

Tertiary Storage

Page 6: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

6

13.1.1 Memory Hierarchy

Cache Lowest level of the hierarchy Data items are copies of certain locations of

main memory Sometimes, values in cache are changed

and corresponding changes to main memory are delayed

Machine looks for instructions as well as data for those instructions in the cache

Holds limited amount of data

Page 7: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

7

13.1.1 Memory Hierarchy (con’t) No need to update the data in main

memory immediately in a single processor computer

In multiple processors data is updated immediately to main memory….called as write through

Page 8: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

8

Main Memory

Everything happens in the computer i.e. instruction execution, data manipulation, as working on information that is resident in main memory

Main memories are random access….one can obtain any byte in the same amount of time

Page 9: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

9

Secondary storage

Used to store data and programs when they are not being processed

More permanent than main memory, as data and programs are retained when the power is turned off

E.g. magnetic disks, hard disks

Page 10: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

10

Tertiary Storage

Holds data volumes in terabytes Used for databases much larger than

what can be stored on disk

Page 11: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

11

13.1.2 Transfer of Data Between levels Data moves between adjacent levels of the

hierarchy At the secondary or tertiary levels

accessing the desired data or finding the desired place to store the data takes a lot of time

Disk is organized into bocks Entire blocks are moved to and from

memory called a buffer

Page 12: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

12

13.1.2 Transfer of Data Between level (cont’d) A key technique for speeding up

database operations is to arrange the data so that when one piece of data block is needed it is likely that other data on the same block will be needed at the same time

Same idea applies to other hierarchy levels

Page 13: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

13

13.1.3 Volatile and Non Volatile Storage A volatile device forgets what data is

stored on it after power off Non volatile holds data for longer period

even when device is turned off All the secondary and tertiary devices

are non volatile and main memory is volatile

Page 14: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

14

13.1.4 Virtual Memory

Typical software executes in virtual memory

Address space is typically 32 bit or 232 bytes or 4GB

Transfer between memory and disk is in terms of blocks

Page 15: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

15

13.2.1 Mechanism of Disk

Mechanisms of Disks Use of secondary storage is one of the

important characteristic of DBMS Consists of 2 moving pieces of a disk

1. disk assembly 2. head assembly

Disk assembly consists of 1 or more platters Platters rotate around a central spindle Bits are stored on upper and lower surfaces

of platters

Page 16: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

16

13.2.1 Mechanism of Disk

Disk is organized into tracks The track that are at fixed radius from

center form one cylinder Tracks are organized into sectors Tracks are the segments of circle

separated by gap

Page 17: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

17

Page 18: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

18

13.2.2 Disk Controller

One or more disks are controlled by disk controllers

Disks controllers are capable of Controlling the mechanical actuator that

moves the head assembly Selecting the sector from among all those in

the cylinder at which heads are positioned Transferring bits between desired sector

and main memory Possible buffering an entire track

Page 19: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

19

13.2.3 Disk Access Characteristics Accessing (reading/writing) a block requires

3 steps Disk controller positions the head assembly at

the cylinder containing the track on which the block is located. It is a ‘seek time’

The disk controller waits while the first sector of the block moves under the head. This is a ‘rotational latency’

All the sectors and the gaps between them pass the head, while disk controller reads or writes data in these sectors. This is a ‘transfer time’

Page 20: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

20

13.3 Accelerating Access to Secondary Storage Several approaches for more-efficiently

accessing data in secondary storage: Place blocks that are together in the same

cylinder. Divide the data among multiple disks. Mirror disks. Use disk-scheduling algorithms. Prefetch blocks into main memory.

Scheduling Latency – added delay in accessing data caused by a disk scheduling algorithm.

Throughput – the number of disk accesses per second that the system can accommodate.

Page 21: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

21

13.3.1 The I/O Model of Computation The number of block accesses (Disk I/O’s) is

a good time approximation for the algorithm. This should be minimized.

Ex 13.3: You want to have an index on R to identify the block on which the desired tuple appears, but not where on the block it resides. For Megatron 747 (M747) example, it takes

11ms to read a 16k block. A standard microprocessor can execute millions

of instruction in 11ms, making any delay in searching for the desired tuple negligible.

Page 22: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

22

13.3.2 Organizing Data by Cylinders If we read all blocks on a single track or

cylinder consecutively, then we can neglect all but first seek time and first rotational latency.

Ex 13.4: We request 1024 blocks of M747. If data is randomly distributed, average

latency is 10.76ms by Ex 13.2, making total latency 11s.

If all blocks are consecutively stored on 1 cylinder: 6.46ms + 8.33ms * 16 = 139ms

(1 average seek) (time per rotation)(# rotations)

Page 23: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

23

13.3.3 Using Multiple Disks

If we have n disks, read/write performance will increase by a factor of n.

Striping – distributing a relation across multiple disks following this pattern: Data on disk R1: R1, R1+n, R1+2n,… Data on disk R2: R2, R2+n, R2+2n,… … Data on disk Rn: Rn, Rn+n, Rn+2n, …

Ex 13.5: We request 1024 blocks with n = 4. 6.46ms + (8.33ms * (16/4)) = 39.8ms

(1 average seek) (time per rotation) (# rotations)

Page 24: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

24

13.3.4 Mirroring Disks

Mirroring Disks – having 2 or more disks hold identical copied of data.

Benefit 1: If n disks are mirrors of each other, the system can survive a crash by n-1 disks.

Benefit 2: If we have n disks, read performance increases by a factor of n.

Performance increases further by having the controller select the disk which has its head closest to desired data block for each read.

Page 25: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

25

13.3.5 Disk Scheduling and the Elevator Problem

Disk controller will run this algorithm to select which of several requests to process first.

Pseudo code: requests[] // array of all non-processed data

requests upon receiving new data request:

requests[].add(new request) while(requests[] is not empty)

move head to next location if(head location is at data in requests[])

retrieve data remove data from requests[]

if(head reaches end) reverse head direction

Page 26: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

26

13.3.5 Disk Scheduling and the Elevator Problem

(con’t)Events:

Head starting point

Request data at 8000

Request data at 24000

Request data at 56000

Get data at 8000Request data at

16000Get data at 24000Request data at

64000Get data at 56000Request Data at

40000Get data at 64000Get data at 40000Get data at 16000

data time

Current time

Current time

0

Current time

4.3

Current time

10

Current time

13.6

Current time

20

Current time

26.9

Current time

30

Current time

34.2

Current time

45.5

Current time

56.8

800016000240003200040000480005600064000

data time

8000.. 4.3

data time

8000.. 4.3

24000.. 13.6

data time

8000.. 4.3

24000.. 13.6

56000.. 26.9

data time

8000.. 4.3

24000.. 13.6

56000.. 26.9

64000.. 34.2

data time

8000.. 4.3

24000.. 13.6

56000.. 26.9

64000.. 34.2

40000.. 45.5

data time

8000.. 4.3

24000.. 13.6

56000.. 26.9

64000.. 34.2

40000.. 45.5

16000.. 56.8

Page 27: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

27

13.3.5 Disk Scheduling and the Elevator Problem

(con’t)

data time

8000.. 4.3

24000.. 13.6

56000.. 26.9

64000.. 34.2

40000.. 45.5

16000.. 56.8

data time

8000.. 4.3

24000.. 13.6

56000.. 26.9

16000.. 42.2

64000.. 59.5

40000.. 70.8

Elevator Algorithm

FIFOAlgorithm

Page 28: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

28

13.3.6 Prefetching and Large-Scale Buffering If at the application level, we can predict

the order blocks will be requested, we can load them into main memory before they are needed.

Page 29: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

29

-Disk failure ways and their mitigation

13.4

By Priya Gangaraju and Xiaqing He

Page 30: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

30

Ways in which disks can fail: Intermittent failure

Media Decay

Write failure

Disk Crash

Page 31: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

31

Intermittent Failures

Read or write operation on a sector successful not on first try, but after repeated tries.

The most common form of failure.

Parity checks can be used to detect this kind of failure.

Page 32: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

32

Media Decay Serious form of failure.

Bit/Bits are permanently corrupted.

Impossible to read a sector correctly even after many trials.

Stable storage technique for organizing a disk is used to avoid this failure.

Page 33: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

33

Write failure Attempt to write a sector is not possible.

Attempt to retrieve previously written sector is unsuccessful.

Possible reason – power outage while writing of the sector.

Stable Storage Technique can be used to avoid this.

Page 34: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

34

Disk Crash

• Most serious form of disk failure.

• Entire disk becomes unreadable, suddenly and permanently.

• RAID techniques can be used for coping with disk crashes.

Page 35: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

35

More on Intermittent failures…

When we try to read a sector, but the correct content of that sector is not delivered to the disk controller.

If the controller has a way to tell that the sector is good or bad (checksums), it can then reissue the read request when bad data is read.

Page 36: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

36

More on Intermittent Failures.. The controller can attempt to write a

sector, but the contents of the sector are not what was intended.

The only way to check this is to let the disk go around again read the sector.

One way to perform the check is to read the sector and compare it with the sector we intend to write.

Page 37: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

37

Continued Instead of performing the complete

comparison at the disk controller, simpler way is to read the sector and see if a good sector was read.

If it is good sector, then the write was correct otherwise the write was unsuccessful and must be repeated.

Page 38: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

38

Checksums Technique used to determine the

good/bad status of a sector.

Each sector has some additional bits called the checksum that are set depending on the values of the data bits in that sector.

If checksum is not proper on reading, then there is an error in reading.

Page 39: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

39

More on Checksums…

There is a small chance that the block was not read correctly even if the checksum is proper.

The probability of correctness can be increased by using many checksum bits.

Page 40: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

40

Checksum calculation.. Checksum is based on the parity of all bits

in the sector.

If there are odd number of 1’s among a collection of bits, the bits are said to have odd parity. A parity bit ‘1’ is added.

If there are even number of 1’s then the collection of bits is said to have even parity. A parity bit ‘0’ is added.

Page 41: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

41

Continued The number of 1’s among a collection of

bits and their parity bit is always even.

During a write operation, the disk controller calculates the parity bit and append it to the sequence of bits written in the sector.

Every sector will have a even parity.

Page 42: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

42

Examples… A sequence of bits 01101000 has odd

number of 1’s. The parity bit will be 1. So the sequence with the parity bit will now be 011010001.

A sequence of bits 11101110 will have an even parity as it has even number of 1’s. So with the parity bit 0, the sequence will be 111011100.

Page 43: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

43

Continued

Any one-bit error in reading or writing the bits results in a sequence of bits that has odd-parity.

The disk controller can count the number of 1’s and can determine if the sector has odd parity in the presence of an error.

Page 44: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

44

Odds… There are chances that more than one bit

can be corrupted and the error can be unnoticed.

Increasing the number of parity bits can increase the chances of detecting errors.

In general, if there are n independent bits as checksum, the chances of error will be one in 2n.

Page 45: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

45

Stable Storage

Checksums can detect the error but cannot correct it.

Sometimes we overwrite the previous contents of a sector and yet cannot read the new contents correctly.

To deal with these problems, Stable Storage policy can be implemented on the disks.

Page 46: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

46

Continued Sectors are paired and each pair

represents one sector-contents X.

The left copy of the sector may be represented as XL and XR as the right copy.

Page 47: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

47

Assumptions

We assume that copies are written with sufficient number of parity bits to decrease the chance of bad sector looks good when the parity checks are considered.

Also, If the read function returns a good value w for either XL or XR then it is assumed that w is the true value of X.

Page 48: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

48

Stable -Storage Writing Policy:

1. Write the value of X into XL. Check the value has status “good”; i.e., the parity-check bits are correct in the written copy. If not repeat write. If after a set number of write attempts, we have not successfully written X in XL, assume that there is a media failure in this sector. A fix-up such as substituting a spare sector for XL must be adopted.

2. Repeat (1) for XR.

Page 49: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

49

Stable-Storage Reading Policy:

The policy is to alternate trying to read XL and XR until a good value is returned.

If a good value is not returned after pre chosen number of tries, then it is assumed that X is truly unreadable.

Page 50: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

50

Error-Handling capabilities:

Media failures:• If after storing X in sectors XL and XR,

one of them undergoes media failure and becomes permanently unreadable, we can read from the second one.

• If both the sectors have failed to read, then sector X cannot be read.

• The probability of both failing is extremely small.

Page 51: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

51

Write Failure:• When writing X, if there is a system

failure(like power shortage), the X in the main memory is lost and the copy of X being written will be erroneous.

• Half of the sector may be written with part of new value of X, while the other half remains as it was.

Page 52: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

52

The possible cases when the system becomes available:

1. The failure occurred when writing to XL. Then XL is considered bad. Since XR was never changed, its status is good. We can make a copy of XR into XL, which is the old value of X.

2. The failure occurred after XL is written. Then XL will have the good status and XR which has the old value of XR has bad status. We can copy the new value of X to XR from XL.

Page 53: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

53

Recovery from Disk Crashes

To reduce the data loss by Dish crashes, schemes which involve redundancy, extending the idea of parity checks or duplicate sectors can be applied.

The term used for these strategies is RAID or Redundant Arrays of Independent Disks.

In general, if the mean time to failure of disks is n years, then in any given year, 1/nth of the surviving disks fail.

Page 54: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

54

Each of the RAID schemes has data disks and redundant disks.

Data disks are one or more disks that hold the data.

Redundant disks are one or more disks that hold information that is completely determined by the contents of the data disks.

When there is a disk crash of either of the disks, then the other disks can be used to restore the failed disk to avoid a permanent information loss.

Page 55: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

55

Content

1)Focus on : “How to recover from disk crashes” common term RAID “redundancy array of independent disks”2)Several schemes to recover from disk crashes: Mirroring—RAID level 1; Parity checks--RAID 4; Improvement--RAID 5; RAID 6;

Page 56: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

56

1) Mirroring The simplest scheme to recovery from Disk

Crashes How does Mirror work? -- making two or more copied of the

data on different disks Benefit:

-- save data in case of one disk will fail;

-- divide data on several disks and let access to several blocks at once

Page 57: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

57

1) Mirroring (con’t) For mirroring, when the data can be lost? -- the only way data can be lost if there is a second

(mirror/redundant) disk crash while the first (data) disk crash is being repaired.

Possibility:Suppose: One disk: mean time to failure = 10 years; One of the two disk: average of mean time to failure

= 5 years; The process of replacing the failed disk= 3

hours=1/2920 year;So: the possibility of the mirror disk will fail=1/10 *

1/2,920 =1/29,200; The possibility of data loss by mirroring: 1/5 *

1/29,200 = 1/146,000

Page 58: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

58

2)Parity Blocks why changes? -- disadvantages of Mirroring: uses so many

redundant disks What’s new? -- RAID level 4: uses only one redundant disk

How this one redundant disk works? -- modulo-2 sum; -- the jth bit of the redundant disk is the

modulo-2 sum of the jth bits of all the data disks.

Example

Page 59: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

59

2)Parity Blocks(con’t)___Example

Data disks: Disk1: 11110000 Disk2: 10101010 Disk3: 00111000

Redundant disk: Disk4: 01100010

Page 60: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

60

2)RAID 4 (con’t) Reading -- Similar with reading blocks from any disk;

Writing 1)change the data disk; 2)change the corresponding block of the

redundant disk;

Why? -- hold the parity checks for the

corresponding blocks of all the data disks

Page 61: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

61

2)RAID 4 (con’t) _ writingFor a total N data disks:1) naïve way: read N data disks and compute the modulo-

2 sum of the corresponding blocks; rewrite the redundant disk according to

modulo-2 sum of the data disks;

2) better way: Take modulo-2 sum of the old and new

version of the data block which was rewritten; Change the position of the redundant disk

which was 1’s in the modulo-2 sum;

Page 62: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

62

2)RAID 4 (con’t) _ writing_Example Data disks: Disk1: 11110000 Disk2: 10101010 01100110 Disk3: 00111000

to do: Modulo-2 sum of the old and new version of disk 2:

11001100 So, we need to change the positions 1,2,5,6 of the

redundant disk.

Redundant disk: Disk4: 01100010 10101110

Page 63: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

63

2)RAID 4 (con’t) _failure recovery

Redundant disk crash:-- swap a new one and recomputed data from all

the data disks;

One of Data disks crash: -- swap a new one;-- recomputed data from the other disks including

data disks and redundant disk;

How to recomputed? (same rule, that’s why there will be some improvement)

-- take modulo-2 sum of all the corresponding bits of all the other disks

Page 64: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

64

3) An Improvement: RAID 5

Why need a improvement? -- Shortcoming of RAID level 4: suffers from a bottleneck defect

(when updating data disk need to read and write the redundant disk);

Principle of RAID level 5 (RAID 5): -- treat each disk as the redundant disk for some of the blocks;

Why it is feasible?The rule of failure recovery for redundant disk and data disk is the

same:

“take modulo-2 sum of all the corresponding bits of all the other disks”

So, there is no need to retreat one as redundant disk and others as data disks

Page 65: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

65

3) RAID 5 (con’t) How to recognize which blocks of each

disk treat this disk as redundant disk?

-- if there are n+1 disks which were labeled from 0 to N, then we can treat the ith cylinder of disk J as redundant if J is the remainder when I is divided by n+1;

Example;

Page 66: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

663) RAID 5 (con’t)_example N=3; The first disk, labeled as 0 : 4,8,12…; The second disk, labeled as 1 : 1,5,9…; The third disk, labeled as 2 : 2,6,10…; ……….

Suppose all the 4 disks are equally likely to be written, for one of the 4 disks, the possibility of being written:

1/4 + 3 /4 * 1/3 =1/2 If N=m => 1/m +(m-1)/m * 1/(m-1) = 2/m

Page 67: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

67

4)Coping with multiple disk crashes

RAID 6 – deal with any number of disk crashes if using

enough redundant disks Example a system of seven disks ( four data

disks_number 1-4 and 3 redundant disks_ number 5-7);

• How to set up this 3*7 matrix ? (why is 3? – there are 3 redundant disks)1)every column values three 1’s and 0’s except for all three 0’s;2) column of the redundant disk has single 1’s;3) column of the data disk has at least two 1’s;

Page 68: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

68

4) Coping with multiple disk crashes (con’t)

Reading: read form the data disks and ignore the

redundant disk

Writing: Change the data disk change the corresponding bits of all the

redundant disks

Page 69: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

69

4) Coping with multiple disk crashes (con’t)

In those system which has 4 data disks and 3 redundant disk, how they can correct up to 2 disk crashes?

Suppose disk a and b failed: find some row r (in 3*7 matrix)in which the

column for a and b are different (suppose a is 0’s and b is 1’s);

Compute the correct b by taking modulo-2 sum of the corresponding bits from all the other disks other than b which have 1’s in row r;

After getting the correct b, Compute the correct a with all other disks available;

Example

Page 70: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

70

4) Coping with multiple disk crashes (con’t)_example

3*7 matrix data disk

redundant disk

disk number 1 2 3 4 5 6 7

1 1 1 0 1 0 0

1 1 0 1 0 1 0

1 0 1 1 0 0 1

Page 71: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

71

4) Coping with multiple disk crashes (con’t)_example

First block of all the disks disk contents 1) 11110000 2) 10101010 3) 00111000 4) 01000001 5) 01100010 6) 00011011 7) 10001001

Page 72: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

72

4) Coping with multiple disk crashes (con’t)_example

Two disks crashes; disk contents 1) 11110000 2) ????????? 3) 00111000 4) 01000001 5) ????????? 6) 00011011 7) 10001001

Page 73: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

73

4) Coping with multiple disk crashes (con’t)_example

In that 3*7 matrix, find in row 2, disk 2 and 5 have different value and disk 2’s value is 1 and 5’s value is 0.

so: compute the first block of disk 2 by modulo-2 sum of all the corresponding bits of disk 1,4,6;

then compute the first block of disk 2 by modulo-2 sum of all the corresponding bits of disk 1,2,3;

1) 11110000 2) ????????? => 00001111 3) 00111000 4) 01000001 5) ????????? => 01100010 6) 00011011 7) 10001001

Page 74: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

74

Summary..

Disk failures - their mitigation:1. Intermittent failure - checksums2. Media decay – Stable Storage Technique3. Write Failure – Stable Storage Technique4. Disk Crashes – RAID Techniques

• “How to recover from disk crashes” --- by RAID

Page 75: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

75

Material taken from

Disk Failures (Chapter 13.4.1 to 13.4.9) Database Systems – The Complete Book Second

Edition.

Page 76: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

76

13.5 Arranging data on disk13.5 Arranging data on disk

Meghna Jain Meghna Jain ID-205ID-205CS257CS257

Prof: Dr. T.Y.LinProf: Dr. T.Y.Lin

Page 77: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

77

Data elements are represented as records, which stores Data elements are represented as records, which stores in consecutive bytes in same same disk block. in consecutive bytes in same same disk block.

Basic layout techniques of storing data : Basic layout techniques of storing data :

Fixed-Length RecordsFixed-Length Records

Allocation criteria - data should start at word boundary.Allocation criteria - data should start at word boundary.

Fixed Length record header Fixed Length record header

1. A pointer to record schema.1. A pointer to record schema.2. The length of the record.2. The length of the record.

3. Timestamps to indicate last modified or last read.3. Timestamps to indicate last modified or last read.

Page 78: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

78

Example Example

CREATE TABLE employee(CREATE TABLE employee(

name CHAR(30) PRIMARY KEY,name CHAR(30) PRIMARY KEY,

address VARCHAR(255),address VARCHAR(255),

gender CHAR(1),gender CHAR(1),

birthdate DATEbirthdate DATE

););

Data should start at word boundary and contain header Data should start at word boundary and contain header and four fields name, address, gender and birthdate.and four fields name, address, gender and birthdate.

Page 79: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

79

Packing Fixed-Length Records into BlocksPacking Fixed-Length Records into Blocks : :

Records are stored in the form of blocks on the disk Records are stored in the form of blocks on the disk and they move into main memory when we need and they move into main memory when we need to update or access them.to update or access them.

A block header is written first, and it is followed by A block header is written first, and it is followed by series of blocks.series of blocks.

Page 80: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

80

Block header contains the following Block header contains the following information :information :

Links to one or more blocks that are part of a Links to one or more blocks that are part of a network of blocks.network of blocks.

Information about the role played by this block Information about the role played by this block in such a network.in such a network.

Information about the relation, the tuples in this Information about the relation, the tuples in this block belong to.block belong to.

A "directory" giving the offset of each record in A "directory" giving the offset of each record in the block.the block.

Time stamp(s) to indicate time of the block's Time stamp(s) to indicate time of the block's last modification and/or access.last modification and/or access.

Page 81: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

81 Example

Along with the header we can pack as many record as we can

in one block as shown in the figure and remaining space will

be unused.

Page 82: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

82

13.6 REPRESENTING BLOCK AND RECORD ADDRESSES

Ramya KarriCS257 Section 2ID: 206

Page 83: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

83

INTRODUCTION

Address of a block and Record In Main Memory

Address of the block is the virtual memory address of the first byte

Address of the record within the block is the virtual memory address of the first byte of the record

In Secondary Memory: sequence of bytes describe the location of the block in the overall system

Sequence of Bytes describe the location of the block : the device Id for the disk, Cylinder number, etc.

Page 84: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

84

ADDRESSES IN CLIENT-SERVER SYSTEMS The addresses in address space are represented in two

ways Physical Addresses: byte strings that determine the

place within the secondary storage system where the record can be found.

Logical Addresses: arbitrary string of bytes of some fixed length

Physical Address bits are used to indicate: Host to which the storage is attached Identifier for the disk Number of the cylinder Number of the track Offset of the beginning of the record

Page 85: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

85

Map Table relates logical addresses to physical addresses.

Logical Physical

Logical Address

Physical Address

ADDRESSES IN CLIENT-SERVER SYSTEMS (CONTD..)

Page 86: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

86

LOGICAL AND STRUCTURED ADDRESSES

Purpose of logical address? Gives more flexibility, when we

Move the record around within the block Move the record to another block

Gives us an option of deciding what to do when a record is deleted?

Record 4

Record 3

Record 2

Record 1

HeaderOffset table

Unused

Page 87: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

87

POINTER SWIZZLING

Having pointers is common in an object-relational database systems

Important to learn about the management of pointers

Every data item (block, record, etc.) has two addresses: database address: address on the

disk memory address, if the item is in

virtual memory

Page 88: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

88

POINTER SWIZZLING (CONTD…)

Translation Table: Maps database address to memory address

All addressable items in the database have entries in the map table, while only those items currently in memory are mentioned in the translation table

Dbaddr Mem-addr

Database address

Memory Address

Page 89: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

89

POINTER SWIZZLING (CONTD…)

Pointer consists of the following two fields Bit indicating the type of address Database or memory address Example 13.17Disk

Block 2

Block 1

Memory

Swizzled

Unswizzled

Block 1

Page 90: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

90

EXAMPLE 13.7

Block 1 has a record with pointers to a second record on the same block and to a record on another block

If Block 1 is copied to the memory The first pointer which points within

Block 1 can be swizzled so it points directly to the memory address of the target record

Since Block 2 is not in memory, we cannot swizzle the second pointer

Page 91: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

91

POINTER SWIZZLING (CONTD…)

Three types of swizzling Automatic Swizzling

As soon as block is brought into memory, swizzle all relevant pointers.

Swizzling on Demand Only swizzle a pointer if and when it is

actually followed. No Swizzling

Pointers are not swizzled they are accesses using the database address.

Page 92: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

92

PROGRAMMER CONTROL OF SWIZZLING

Unswizzling When a block is moved from memory

back to disk, all pointers must go back to database (disk) addresses

Use translation table again Important to have an efficient data

structure for the translation table

Page 93: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

93

A block in memory is said to be pinned if it cannot be written back to disk safely.

If block B1 has swizzled pointer to an item in block B2, then B2 is pinned Unpin a block, we must unswizzle any pointers

to it Keep in the translation table the places in

memory holding swizzled pointers to that item Unswizzle those pointers (use translation table

to replace the memory addresses with database (disk) addresses

PINNED RECORDS AND BLOCKS

Page 94: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

94

Eswara Satya Pavan Rajesh PinapalaCS 257ID: 221

Page 95: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

95

Topics Records with Variable Length Fields Records with Repeating Fields Variable Format Records Records that do not fit in a block BLOBS

Page 96: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

96

name

address

gender

birth date

ExampleExample

Fig 1 : Movie star record with four fields

0 30 286 287 297

Page 97: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

97

Records with Variable Fields

An effective way to represent variable length records is as follows

Fixed length fields are Kept ahead of the variable length fields

Record header contains• Length of the record• Pointers to the beginning of all variable length fields except the first one.

Page 98: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

98

Records with Variable Length Fields

birth date

name address

header informationrecord

lengthto address

gender

Figure 2 : A Movie Star record with name and address implemented as variable length character strings

Page 99: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

99

Records with Repeating Fields

Records contains variable number of occurrences of a field F

All occurrences of field F are grouped together and the record header contains a pointer to the first occurrence of field F

L bytes are devoted to one instance of field F

Locating an occurrence of field F within the record

• Add to the offset for the field F which are the integer multiples of L starting with 0 , L ,2L,3L and so on to locate•We stop upon reaching the offset of the field F.

Page 100: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

100

Records with Repeating Fields

name address

other header informationrecord

lengthto addressto movie pointers

pointers to movies

Figure 3 : A record with a repeating group of references to movies

Page 101: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

101

Figure 4 : Storing variable-length fields separately from the record

address

name

record header information

length of nameto address

length of address

to name

to movie references number of

references

Records with Repeating Fields

Page 102: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

102

Advantage

Keeping the record itself fixed length allows record to be searched more efficiently, minimizes the overhead in the block headers, and allows records to be moved within or among the blocks with minimum effort.

Disadvantage

Storing variable length components on another block increases the number of disk I/O’s needed to examine all components of a record.

Records with Repeating Fields

Page 103: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

103

A compromise strategy is to allocate a fixed portion of the record for the repeating fields

If the number of repeating fields is lesser than allocated space, then there will be some unused space If the number of repeating fields is greater than allocated space, then extra fields are stored in a different location and

Pointer to that location and count of additional occurrences is stored in the record

Records with Repeating Fields

Page 104: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

104Variable Format Records Records that do not have fixed schema

Variable format records are represented by sequence of tagged fields

Each of the tagged fields consist of information• Attribute or field name• Type of the field• Length of the field• Value of the field

Why use tagged fields• Information – Integration applications• Records with a very flexible schema

Page 105: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

105

Variable Format Records

Fig 5 : A record with tagged fields

N 16

S S14

Clint Eastwood

Hog’s Breath Inn

R

code for name

code for restaurant ownedcode for string

typecode for string type length

length

Page 106: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

106

Records that do not fit in a block

When the length of a record is greater than block size ,then then record is divided and placed into two or more blocks

Portion of the record in each block is referred to as a RECORD FRAGMENT

Record with two or more fragments is called SPANNED RECORD

Record that do not cross a block boundary is called UNSPANNED RECORD

Page 107: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

107

Spanned records require the following extra header information

• A bit indicates whether it is fragment or not

• A bit indicates whether it is first or last fragment of a record

• Pointers to the next or previous fragment for the same record

Spanned Records

Page 108: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

108

Records that do not fit in a block

Figure 6 : Storing spanned records across blocks

record 1 record 3 record 2 - a

record 2 - b

block header

record header

block 1 block 2

Page 109: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

109

BLOBS

Large binary objects are called BLOBS

e.g. : audio files, video files

Storage of BLOBS

Retrieval of BLOBS

Page 110: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

110

110

Record Modifications

Chapter 13

Section 13.8

Neha SamantCS 257

(Section II) Id 222

Page 111: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

111

Modification types

Insertion

Deletion

Update

111

Page 112: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

112

Insertion

112

Insertion of records without order

Records can be placed in a block with empty space or in a new block.

Insertion of records in fixed order Space available in the block No space available in the block (outside the block)

Structured addressPointer to a record from outside the block.

Page 113: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

113

Insertion in fixed order

Space available within the block Use of an offset table in the header of each block with pointers to the

location of each record in the block. The records are slid within the block and the pointers in the offset

table are adjusted.

113

Record 2Record 3Record 4

header unused

Offset table

Record 1

Page 114: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

114

Insertion in fixed order

114

No space available within the block (outside the block) Find space on a “nearby” block.

• In case of no space available on a block, look at the following block in sorted order of blocks.

• If space is available in that block ,move the highest records of first block 1 to block 2 and slide the records around on both blocks.

Create an overflow block• Records can be stored in overflow block.• Each block has place for a pointer to an overflow block in its header.• The overflow block can point to a second overflow block as shown below.

Block B

Overflow block for B

Page 115: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

115

Deletion

Recover space after deletion When using an offset table, the records can be slid around the

block so there will be an unused region in the center that can be recovered.

In case we cannot slide records, an available space list can be maintained in the block header.

The list head goes in the block header and available regions hold the links in the list.

115

Page 116: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

116

Deletion

116

Use of tombstone The tombstone is placed in a record in order to avoid pointers to the deleted

record to point to new records.

The tombstone is permanent until the entire database is reconstructed.

If pointers go to fixed locations from which the location of the record is found then we put the tombstone in that fixed location. (See examples)

Where a tombstone is placed depends on the nature of the record pointers.

Map table is used to translate logical record address to physical address.

Page 117: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

117

Deletion

117

Record 1 Record 2

Use of tombstone If we need to replace records by tombstones, place the bit that serves as the

tombstone at the beginning of the record.

This bit remains the record location and subsequent bytes can be reused for another record

Record 1 can be replaced, but the tombstone remains, record 2 has no tombstone and can be seen when we follow a pointer to it.

Page 118: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

118

Update

Fixed Length update No effect on storage system as it occupies same space

as before update.

Variable length update Longer length Short length

118

Page 119: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

119

Update

Variable length update (longer length) Stored on the same block:

Sliding records Creation of overflow block.

Stored on another block Move records around that block Create a new block for storing variable length fields.

119

Page 120: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

120

Update

Variable length update (Shorter length) Same as deletion

Recover space Consolidate space.

120

Page 121: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

BTrees & Bitmap IndexesBTrees & Bitmap Indexes14.2, 14.7

DATABASE SYSTEMS – The Complete Book

Presented By: Under the supervision of:

Deepti Kundu Dr. T.Y.Lin

Maciej Kicinski

Page 122: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

122

B TreesB Trees ►►►►

Page 123: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

123

StructureStructure

A balanced tree, meaning that all paths from the

leaf node have the same length. There is a parameter n associated with each Btree

block. Each block will have space for n searchkeys

and n+1 pointers. The root may have only 1 parameter, but all other

blocks most be at least half full.

Page 124: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

124

StructureStructure

● A typical node >● a typical interior node would havepointers pointing toleaves with outvalues● a typical leaf wouldhave pointers pointto recordsN search keysN+1 pointers

Page 125: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

125

ApplicationApplication

The search key of the Btree is the primary key for the data file.

Data file is sorted by its primary key. Data file is sorted by an attribute that is not a

key,and this attribute is the search key for the Btree.

Page 126: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

126

LookupLookup

If at an interior node, choose the correct pointer to use. This is done by comparing keys to search value.

Page 127: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

127

LookupLookup

If at a leaf node, choose the key that matches what

you are looking for and the pointer for that leads

to the data.

Page 128: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

128

InsertionInsertion

When inserting, choose the correct leaf node to put pointer to data.

If node is full, create a new node and split keysbetween the two.

Recursively move up, if cannot create new pointer to new node because full, create new node.

This would end with creating a new root node, ifthe current root was full.

Page 129: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

129

DeletionDeletion

Perform lookup to find node to delete and delete it.

If node is no longer half full, perform join on adjacent node and recursively delete up, or key move if that node is full and recursively change

pointer up.

Page 130: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

130

EfficiencyEfficiency

Btrees allow lookup, insertion, and deletion of records using very few disk I/Os.

Each level of a Btree would require one read. Then you would follow the pointer of that to the next or

final read.

Page 131: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

131

EfficiencyEfficiency

Three levels are sufficient for Btrees. Having each block have 255 pointers, 255^3 is about 16.6 million.

You can even reduce disk I/Os by keeping a level of a Btree in main memory. Keeping the first block with 255 pointers would reduce the reads to 2, and even possible to keep the

next 255 pointers in memory to reduce reads to 1.

Page 132: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

132

Bitmap Bitmap Indexes Indexes ►►►►

Page 133: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

133

DefinitionDefinition

A bitmap index for a field F is a collection of bit-vectors of length n, one for each possible value that may appear in that field F.[1]

Page 134: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

134

What does that mean?What does that mean?

Assume relation R with 2 attributes A and B. Attribute A is of type

Integer and B is of type String.

6 records, numbered 1 through 6 as shown.

A B

1 30 foo

2 30 bar

3 40 baz

4 50 foo

5 40 bar

6 30 baz

Page 135: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

135

Example Continued…Example Continued…

A bitmap for attribute B is:A bitmap for attribute B is:

Value Vector

foo 100100

bar 010010

baz 001001

A B

1 30 foo

2 30 bar

3 40 baz

4 50 foo

5 40 bar

6 30 baz

Page 136: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

136

Where do we reach?Where do we reach?

A bitmap index is a special kind of database index that uses bitmaps.[2]

Bitmap indexes have traditionally been considered to work well for data such as gender, which has a small number of distinct values, e.g., male and female, but many occurrences of those values.[2]

Page 137: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

137

A little more…A little more…

A bitmap index for attribute A of relation R is: A collection of bit-vectors The number of bit-vectors = the number of distinct

values of A in R. The length of each bit-vector = the cardinality of R. The bit-vector for value v has 1 in position i, if the ith

record has v in attribute A, and it has 0 there if not.[3] Records are allocated permanent numbers.[3] There is a mapping between record numbers and record

addresses.[3]

Page 138: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

138

Motivation for Bitmap IndexesMotivation for Bitmap Indexes

Very efficient when used for partial match queries.[3]

They offer the advantage of buckets [2] Where we find tuples with several specified attributes

without first retrieving all the record that matched in each of the attributes.

They can also help answer range queries [3]

Page 139: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

139

Another ExampleAnother Example

Multidimensional Array of multiple types

{(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)}

5 = 100010

79 = 010100

4 = 001000

6 = 000001

d = 101100

t = 010010

a = 000001

Page 140: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

140

Example Continued…Example Continued…

{(5,d),(79,t),(4,d),(79,d),(5,t),(6,a)}

Searching for items is easy, just AND together.

To search for (5,d)

5 = 100010

d = 101100

100010 AND 101100 = 100000

The location of the record has been traced!

Page 141: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

141

Compressed BitmapsCompressed Bitmaps

Assume: The number of records in R are n Attribute A has m distinct values in R

The size of a bitmap index on attribute A is m*n. If m is large, then the number of 1’s will be around 1/m.

Opportunity to encode A common encoding approach is called run-length

encoding.[1]

Page 142: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

142

Run-length encoding Represents runs

A run is a sequence of i 0’s followed by a 1, by some suitable binary encoding of the integer i.

A run of i 0’s followed by a 1 is encoded by: First computing how many bits are needed to represent i, Say k Then represent the run by k-1 1’s and a single 0 followed by k bits

which represent i in binary. The encoding for i = 1 is 01. k = 1 The encoding for i = 0 is 00. k = 1

We concatenate the codes for each run together, and the sequence of bits is the encoding of the entire bit-vector

Page 143: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

143

Understanding with an ExampleUnderstanding with an Example

Let us decode the sequence 11101101001011 Staring at the beginning (left most bit):

First run: The first 0 is at position 4, so k = 4. The next 4 bits are 1101, so we know that the first integer is i = 13

Second run: 001011 k = 1 i = 0

Last run: 1011 k = 1 i = 3

Our entire run length is thus 13,0,3, hence our bit-vector is: 0000000000000110001

Page 144: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

144

Managing Bitmap IndexesManaging Bitmap Indexes

1) How do you find a specific bit-vector for a

value efficiently?

2) After selecting results that match, how do you retrieve the results efficiently?

3) When data is changed, do you you alter bitmap index?

Page 145: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

145

1) Finding bit vectors1) Finding bit vectors

Think of each bit-vector as a key to a value.[1] Any secondary storage technique will be

efficient in retrieving the values.[1] Create secondary key with the attribute value

as a search key [3]BtreeHash

Page 146: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

146

2) Finding Records2) Finding Records

Create secondary key with the record number as a search key [3]

Or in other words, Once you learn that you need record k, you can create

a secondary index using the kth position as a search key.[1]

Page 147: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

147

3) Handling Modifications3) Handling Modifications

Two things to remember:

Record numbers must remain fixed once assigned

Changes to data file require changes to bitmap index

Page 148: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

148

DeletionTombstone replaces deleted recordCorresponding bit is set to 0

Page 149: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

149

InsertionRecord assigned the next record

number. A bit of value 0 or 1 is appended to

each bit vectorIf new record contains a new value of

the attribute, add one bit-vector.

Page 150: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

150

ModificationChange the bit corresponding to the old

value of the modified record to 0Change the bit corresponding to the

new value of the modified record to 1If the new value is a new value of A,

then insert a new bit-vector.

Page 151: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

151

ReferencesReferences

[1] Database Systems : The Complete Book - Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom

[2] http://en.wikipedia.org/wiki/Bitmap_index#Example

[3] faculty.kfupm.edu.sa/ICS/adam/ICS541/L10-md-bitmap-indexing.ppt

[4] http://csis.bits-pilani.ac.in/faculty/goel/Data%20Warehousing/Lecture%20Notes/Lecture%20%239%20-%20Bitmap%20Indexes%20in%20DW.doc (- a good doc file to read the concepts of bitmap indexes)

Page 152: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

152

Query Execution

Chapter 15

Section 15.1

Presented by Khadke, Suvarna

CS 257 (Section II) Id 213

152

Page 153: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

153

Agenda

Query Processor and major parts of Query processor

Physical-Query-Plan Operators Scanning Tables Basic approaches to locate the tuples of

a relation R Sorting While Scanning Tables Computation Model for Physical Operator I/O Cost for Scan Operators Iterators

153

Page 154: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

154

Query Processor

Group of components of a DBMS that converts a user queries and data-modification commands into a sequence of database operations

It also executes those operations Must supply detail regarding how the

query is to be executed

154

Page 155: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

155

Major parts of Query processor

155

Query execution:The algorithms that manipulate the data of the database.Focus on the operations of extended relational algebra.

Page 156: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

156

Outline of Query Compilation

Query compilation Parsing : A parse tree for the

query is constructed Query Rewrite : The parse tree

is converted to an initial query plan (algebraic representation of the query). And transformed into logical query plan (less time)

Physical Plan Generation : Logical Q Plan is converted into physical query plan by selecting algorithms and order of execution of these operator.

156

Page 157: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

157

Physical-Query-Plan Operators Physical operators are implementations

of the operator of relational algebra. They can also be use in non relational

algebra operators like “scan” which scans tables.

157

Page 158: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

158

Scanning Tables

One of the basic thing we can do in a Physical query plan is to read the entire contents of a relation R.

Variation of this operator involves simple predicate

Read only those tuples of the relation R that satisfy the predicate.

158

Page 159: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

159

Basic approaches to locate the tuples of a relation R

Table Scan Relation R is stored in secondary memory with its

tuples arranged in blocks It is possible to get the blocks one by one

Index-scan If there is an index on any attribute of Relation R Can use this index to get all the tuples of R For Example: sparse index on R

159

Page 160: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

160

Sorting While Scanning Tables

Number of reasons to sort a relation Query could include an ORDER BY clause,

requiring that a relation be sorted. Algorithms to implement relational algebra

operations requires one or both arguments to be sorted relations.

Physical-query-plan operator sort-scan takes a relation R, attributes on which the sort is to be made, and produces R in that sorted order

Another reason : B-tree index on a, multiway merge-sort

160

Page 161: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

161

Computation Model for Physical Operator Physical-Plan Operator should be

selected wisely which is essential for good Query Processor .

For “cost” of each operator is estimated by number of disk I/O’s for an operation.

The total cost of operation depends on the size of the answer and includes the final write back cost to the total cost of the query.

161

Page 162: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

162

Parameters for Measuring Costs B: The number of blocks are needed to

hold all tuples of R. Also known as B(R) T:The number of tuples in R. Also known as T(R)

V: The number of distinct values that appear in a column of a relation

V(R, a)- is the number of distinct values of column for a in R

162

Page 163: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

163

I/O Cost for Scan Operators

If relation R is clustered, then the number of disk I/O for the table-scan operator is approximately B disk I/O’s

If relation R is not clustered, then the number of required disk I/O generally is much higher

A index on a relation R occupies many fewer than B(R) blocks

That means a scan of the entire R which takes at least B disk I/O’s will require more I/O’s than the entire index

163

Page 164: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

164

Iterators for Implementation of Physical Operators Many physical operators can be

implemented as an Iterator. Three methods forming the iterator for

an operation are: 1. Open( ) : This method starts the process of getting

tuples It initializes any data structures needed

to perform the operation

164

Page 165: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

165

Iterators for Implementation of Physical Operators 2. GetNext( ): Returns the next tuple in the result Adjusts data structures as necessary to allow

subsequent tuples to be obtained If there are no more tuples to return, GetNext

returns a special value NotFound 3. Close( ) : Ends the iteration after all tuples It calls Close on any arguments of the

operator165

Page 166: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

166

Reference

ULLMAN, J. D., WISDOM J. & HECTOR G., DATABASE SYSTEMS THE COMPLETE BOOK, 2nd Edition, 2008.

166

Page 167: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

167

BySwathi Vegesna

Page 168: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

168

At a glimpse Introduction Partitioning Relations by Hashing Algorithm for Duplicate Elimination Grouping and Aggregation Union, Intersection, and Difference Hash-Join Algorithm Sort based Vs Hash based Summary

Page 169: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

169

Introduction

Hashing is done if the data is too big to store in main memory buffers. Hash all the tuples of the argument(s) using

an appropriate hash key. For all the common operations, there is a

way to select the hash key so all the tuples that need to be considered together when we perform the operation have the same hash value.

This reduces the size of the operand(s) by a factor equal to the number of buckets.

Page 170: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

170

Partitioning Relations by Hashing

Algorithm:

initialize M-1 buckets using M-1 empty buffers;FOR each block b of relation R DO BEGIN

read block b into the Mth buffer;FOR each tuple t in b DO BEGIN

IF the buffer for bucket h(t) has no room for t THENBEGIN

copy the buffer t o disk;initialize a new empty block in that buffer;

END; copy t to the buffer for bucket h(t);END ;

END ;FOR each bucket DO

IF the buffer for this bucket is not empty THENwrite the buffer to disk;

Page 171: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

171Duplicate Elimination For the operation δ(R) hash R to M-1

Buckets.

(Note that two copies of the same tuple t will hash to the same bucket)

Do duplicate elimination on each bucket Ri independently, using one-pass algorithm

The result is the union of δ(Ri), where Ri is the portion of R that hashes to the ith bucket

Page 172: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

172

Requirements Number of disk I/O's: 3*B(R) B(R) < M(M-1), only then the two-pass, hash-

based algorithm will work In order for this to work, we need: hash function h evenly distributes the tuples

among the buckets each bucket Ri fits in main memory (to allow

the one-pass algorithm) i.e., B(R) ≤ M2

Page 173: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

173

Grouping and Aggregation Hash all the tuples of relation R to M-1

buckets, using a hash function that depends only on the grouping attributes(Note: all tuples in the same group end up in the same bucket)

Use the one-pass algorithm to process each bucket independently

Uses 3*B(R) disk I/O's, requires B(R) ≤ M2

Page 174: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

174

Union, Intersection, and Difference

For binary operation we use the same hash function to hash tuples of both arguments.

R U S we hash both R and S to M-1 R ∩ S we hash both R and S to 2(M-1) R-S we hash both R and S to 2(M-1) Requires 3(B(R)+B(S)) disk I/O’s. Two pass hash based algorithm requires

min(B(R)+B(S))≤ M2

Page 175: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

175

Hash-Join Algorithm Use same hash function for both relations;

hash function should depend only on the join attributes

Hash R to M-1 buckets R1, R2, …, RM-1

Hash S to M-1 buckets S1, S2, …, SM-1

Do one-pass join of Ri and Si, for all i 3*(B(R) + B(S)) disk I/O's; min(B(R),B(S)) ≤

M2

Page 176: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

176

Sort based Vs Hash based For binary operations, hash-based only

limits size to min of arguments, not sum

Sort-based can produce output in sorted order, which can be helpful

Hash-based depends on buckets being of equal size

Sort-based algorithms can experience reduced rotational latency or seek time

Page 177: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

177

Summary Partitioning Relations by Hashing Algorithm for Duplicate Elimination Grouping and Aggregation Union, Intersection, and Difference Hash-Join Algorithm Sort based Vs Hash based

Page 178: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

178

Index-Based Algorithms

Chapter 15

Section 15.6

Presented by Fan YangCS 257

Class ID218

178

Page 179: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

179

Clustering and Nonclustering Indexes

Clustered Relation: Tuples are packed into roughly as few blocks as can possibly hold those tuples

Clustering indexes: Indexes on attributes that all the tuples with a fixed value for the search key of this index appear on roughly as few blocks as can hold them

179

Page 180: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

180

Clustering and Nonclustering Indexes A relation that isn’t clustered cannot

have a clustering index

A clustered relation can have nonclustering indexes

180

Page 181: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

181

Index-Based Selection

For a selection σC(R), suppose C is of the form a=v, where a is an attribute

For clustering index R.a: the number of disk I/O’s will be B(R)/V(R,a)

181

Page 182: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

182

Index-Based Selection

The actual number may be higher:1. index is not kept entirely in main

memory2. they spread over more blocks3. may not be packed as tightly as

possible into blocks

182

Page 183: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

183

Example

B(R)=1000, T(R)=20,000 number of I/O’s required:

1. clustered, not index 1000

2. not clustered, not index 20,000

3. If V(R,a)=100, index is clustering 10

4. If V(R,a)=10, index is nonclustering 2,000

183

Page 184: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

184

Joining by Using an Index

Natural join R(X, Y) S S(Y, Z)

Number of I/O’s to get RClustered: B(R)Not clustered: T(R)

Number of I/O’s to get tuple t of SClustered: T(R)B(S)/V(S,Y)Not clustered: T(R)T(S)/V(S,Y)

184

Page 185: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

185

Example

R(X,Y): 1000 blocks S(Y,Z)=500 blocksAssume 10 tuples in each block, so T(R)=10,000 and T(S)=5000V(S,Y)=100If R is clustered, and there is a clustering index on Y for Sthe number of I/O’s for R is: 1000 the number of I/O’s for S is10,000*500/100=50,000

185

Page 186: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

186

Joins Using a Sorted Index

Natural join R(X, Y) S (Y, Z) with index on Y for either R or S

Extreme case: Zig-zag join Example:

relation R(X,Y) and S(Y,Z) with index on Y for both relationssearch keys (Y-value) for R: 1,3,4,4,5,6search keys (Y-value) for S: 2,2,4,6,7,8

186

Page 187: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

187

The Query Compiler

16.1 Parsing and Preprocessing

Meghna Jain(205)Dr. T. Y. Lin

Page 188: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

188Presentation Outline

16.1 Parsing and Preprocessing

16.1.1 Syntax Analysis and Parse Tree

16.1.2 A Grammar for Simple Subset of SQL

16.1.3 The Preprocessor

16.1.4 Processing Queries Involving Views

Page 189: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

189Query compilation is divided into three steps

1. Parsing: Parse SQL query into parser tree.

2. Logical query plan: Transforms parse tree into expression tree of relational algebra.

3.Physical query plan: Transforms logical query plan into physical query plan.

. Operation performed

. Order of operation

. Algorithm used

. The way in which stored data is obtained and passed from one

operation to another.

Page 190: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

190

Parser

Preprocessor

Logical Query plan generator

Query rewrite

Preferred logical query plan

Query

Form a query to a logical query plan

Page 191: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

191

Syntax Analysis and Parse Tree

Parser takes the sql query and convert it to parse tree. Nodes of parse tree:

1. Atoms: known as Lexical elements such as key words, constants, parentheses, operators, and other schema elements.

2. Syntactic categories: Subparts that plays a similar role in a query as <Query> , <Condition>

Page 192: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

192

Grammar for Simple Subset of SQL<Query> ::= <SFW><Query> ::= (<Query>)

<SFW> ::= SELECT <SelList> FROM <FromList> WHERE <Condition>

<SelList> ::= <Attribute>,<SelList><SelList> ::= <Attribute>

<FromList> ::= <Relation>, <FromList><FromList> ::= <Relation>

<Condition> ::= <Condition> AND <Condition><Condition> ::= <Tuple> IN <Query><Condition> ::= <Attribute> = <Attribute><Condition> ::= <Attribute> LIKE <Pattern>

<Tuple> ::= <Attribute>

Atoms(constants), <syntactic categories>(variable),::= (can be expressed/defined as)

Page 193: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

193

Query and Parse T ree

StarsIn(title,year,starName)

MovieStar(name,address,gender,birthdate)

Query: Give titles of movies that have at least one star born

in 1960

SELECT title FROM StarsIn WHERE starName IN (

SELECT name FROM MovieStar WHERE birthdate LIKE '%1960%'

);

Page 194: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

194

Page 195: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

195

Another query equivalent

SELECT title FROM StarsIn, MovieStarWHERE starName = name AND birthdate LIKE '%1960%' ;

Page 196: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

196

Parse Tree<Query>

<SFW>

SELECT <SelList> FROM <FromList> WHERE <Condition>

<Attribute> <RelName> , <FromList> AND

title StarsIn <RelName>

<Condition> <Condition>

<Attribute> = <Attribute> <Attribute> LIKE <Pattern>

starName name birthdate ‘%1960’

MovieStar <Query>

Page 197: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

197

The Preprocessor Functions of Preprocessor . If a relation used in the query is virtual view then each use

of this relation in the form-list must replace by parser tree that describe the view.

. It is also responsible for semantic checking 1. Checks relation uses : Every relation mentioned in

FROM-

clause must be a relation or a view in current schema. 2. Check and resolve attribute uses: Every attribute

mentioned

in SELECT or WHERE clause must be an attribute of same relation in the current scope.

3. Check types: All attributes must be of a type appropriate to

their uses.

Page 198: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

198

StarsIn(title,year,starName)

MovieStar(name,address,gender,birthdate)

Query: Give titles of movies that have at least one star born

in 1960

SELECT title FROM StarsIn WHERE starName IN (

SELECT name FROM MovieStar WHERE birthdate LIKE '%1960%'

);

Page 199: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

199

Preprocessing Queries Involving Views

When an operand in a query is a virtual view, the preprocessor needs to replace the operand by a piece of parse tree that represents how the view is constructed from base table.

Base Table: Movies( title, year, length, genre, studioname, producerC#)

View definition : CREATE VIEW ParamountMovies AS

SELECT title, year FROM movies

WHERE studioName = 'Paramount';

Example based on view:

SELECT title FROM ParamountMovies WHERE year = 1979;

Page 200: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

200

16.2 ALGEBRAIC LAWS FOR IMPROVING QUERY PLANS

Ramya Karri

ID: 206

Page 201: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

201

Optimizing the Logical Query Plan

The translation rules converting a parse tree to a logical query tree do not always produce the best logical query tree.

It is often possible to optimize the logical query tree by applying relational algebra laws to convert the original tree into a more efficient logical query tree.

Optimizing a logical query tree using relational algebra laws is called heuristic optimization

Page 202: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

202

Relational Algebra Laws

These laws often involve the properties of:

commutativity - operator can be applied to operands independent of order.

E.g. A + B = B + A - The “+” operator is commutative.

associativity - operator is independent of operand grouping.

E.g. A + (B + C) = (A + B) + C - The “+” operator is associative.

Page 203: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

203

Associative and Commutative Operators

The relational algebra operators of cross-product (×), join (⋈), union, and intersection are all associative and commutative.

Commutative

R X S = S X R

R ⋈ S = S ⋈ R

R S = S R

R ∩ S = S ∩ R

Associative

(R X S) X T = S X (R X T)

(R ⋈ S) ⋈ T= S ⋈ (R ⋈ T)

(R S) T = S (R T)

(R ∩ S) ∩ T = S ∩ (R ∩ T)

Page 204: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

204

Laws Involving Selection

Complex selections involving AND or OR can be broken into two or more selections: (splitting laws)

σC1 AND C2 (R) = σC1( σC2 (R))σC1 OR C2 (R) = ( σC1 (R) ) S ( σC2 (R) )

Example R={a,a,b,b,b,c} p1 satisfied by a,b, p2 satisfied by b,c σp1vp2 (R) = {a,a,b,b,b,c} σp1(R) = {a,a,b,b,b} σp2(R) = {b,b,b,c} σp1 (R) U σp2 (R) = {a,a,b,b,b,c}

Page 205: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

205

Laws Involving Selection (Contd..)

Selection is pushed through both arguments for union:

σC(R S) = σC(R) σC(S)

Selection is pushed to the first argument and optionally the second for difference:

σC(R - S) = σC(R) - S

σC(R - S) = σC(R) - σC(S)

Page 206: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

206

Laws Involving Selection (Contd..) All other operators require selection to be

pushed to only one of the arguments. For joins, may not be able to push selection to

both if argument does not have attributes selection requires.

σC(R × S) = σC(R) × S

σC(R ∩ S) = σC(R) ∩ S

σC(R ⋈ S) = σC(R) ⋈ S

σC(R ⋈D S) = σC(R) ⋈D S

Page 207: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

207

Laws Involving Selection (Contd..) Example Consider relations R(a,b) and S(b,c) and

the expression σ (a=1 OR a=3) AND b<c (R ⋈S) σ a=1 OR a=3(σ b<c (R ⋈S)) σ a=1 OR a=3(R ⋈ σ b<c (S)) σ a=1 OR a=3(R) ⋈ σ b<c (S)

Page 208: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

208

Laws Involving Projection

Like selections, it is also possible to push projections down the logical query tree. However, the performance gained is less than selections because projections just reduce the number of attributes instead of reducing the number of tuples.

Page 209: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

209

Laws Involving Projection

Laws for pushing projections with joins:

πL(R × S) = πL(πM(R) × πN(S))

πL(R ⋈ S) = πL((πM(R) ⋈ πN(S))

πL(R ⋈D S) = πL((πM(R) ⋈D πN(S))

Page 210: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

210

Laws Involving Projection Laws for pushing projections with set operations.

Projection can be performed entirely before union.

πL(R UB S) = πL(R) UB πL(S)

Projection can be pushed below selection as long as we also keep all attributes needed for the selection (M = L attr(C)).

πL ( σC (R)) = πL( σC (πM(R)))

Page 211: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

211

Laws Involving Join

We have previously seen these important rules about joins:

1. Joins are commutative and associative.

2. Selection can be distributed into joins.

3. Projection can be distributed into joins.

Page 212: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

212

Laws Involving Duplicate Elimination The duplicate elimination operator (δ)

can be pushed through many operators.

R has two copies of tuples t, S has one copy of t,

δ (RUS)=one copy of t δ (R) U δ (S)=two copies of t

Page 213: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

213

Laws Involving Duplicate Elimination Laws for pushing duplicate elimination

operator (δ):

δ(R × S) = δ(R) × δ(S)

δ(R S) = δ(R) δ(S)

δ(R D S) = δ(R) D δ(S)

δ( σC(R) = σC(δ(R))

Page 214: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

214

Laws Involving Duplicate Elimination The duplicate elimination operator (δ)

can also be pushed through bag intersection, but not across union, difference, or projection in general.

δ(R ∩ S) = δ(R) ∩ δ(S)

Page 215: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

215

Laws Involving Grouping

The grouping operator (γ) laws depend on the aggregate operators used.

There is one general rule, however, that grouping subsumes duplicate elimination:

δ(γL(R)) = γL(R)

The reason is that some aggregate functions are unaffected by duplicates (MIN and MAX) while other functions are (SUM, COUNT, and AVG).

Page 216: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

The Query Compiler

(16.3 & 16.4)

DATABASE SYSTEMS – The Complete Book

Presented By: Under the supervision of:

Deepti Kundu Dr. T.Y.Lin

Maciej Kicinski

Page 217: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

217

Topics to be coveredTopics to be covered

From Parse to Logical Query Plans Conversion to Relational Algebra Removing Subqueries From Conditions Improving the Logical Query Plan Grouping Associative/ Commutative Operators

Estimating the Cost of Operation Estimating Sizes of Intermediate Relations Estimating the Size of a Projection Estimating the Size of a Selection Estimating the Size of a Join Estimating Sizes for Other Operations

Page 218: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

218

ReviewReview

Query

Preferred logical query plan

Parser

Preprocessor

Logical query plan generator

Query Rewriter

Section 16.1

Section 16.3

Page 219: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

219

Two steps to turn Parse tree into Two steps to turn Parse tree into Preferred Logical Query PlanPreferred Logical Query Plan

Replace the nodes and structures of the parse tree, in appropriate groups, by an operator or operators of relational algebra.

Take the relational algebra expression and turn it into an expression that we expect can be converted to the most efficient physical query plan.

Page 220: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

220

ReferenceReference RelationsRelations

StarsIn (movieTitle, movieYear, starName) MovieStar (name, address, gender, birthdate)

Page 221: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

221

Conversion to Relational AlgebraConversion to Relational Algebra

If we have a <Query> with a <Condition> that has no subqueries, then we may replace the entire construct – the select-list, from-list, and condition – by a relational-algebra expression.

Page 222: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

222

The relational-algebra expression consists of the following from bottom to top: The products of all the relations mentioned in the

<FromList>, which Is the argument of: A selection σC, where C is the <Condition> expression in

the construct being replaced, which in turn is the argument of:

A projection πL , where L is the list of attributes in the <SelList>

Page 223: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

223

A query : ExampleA query : Example

SELECT movieTitle

FROM Starsin, MovieStar

WHERE starName = name AND

birthdate LIKE ‘%1960’;

Page 224: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

224

SELECT movieTitleSELECT movieTitleFROM Starsin, MovieStarFROM Starsin, MovieStarWHERE starName = name AND WHERE starName = name AND birthdate LIKE ‘%1960’; birthdate LIKE ‘%1960’;

Page 225: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

225

Translation to an algebraic expression Translation to an algebraic expression treetree

Page 226: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

226

Removing Subqueries From ConditionsRemoving Subqueries From Conditions

For parse trees with a <Condition> that has a subquery

Intermediate operator – two argument selection It is intermediate in between the syntactic

categories of the parse tree and the relational-algebra operators that apply to relations.

Page 227: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

227

Using a two-argument Using a two-argument σσ

πmovieTitle

σ

StarsIn <Condition>

MovieStar

IN πname<Tuple>

starName

σ birthdate LIKE ‘%1960'

<Attribute>

Page 228: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

228

Two argument selection with condition Two argument selection with condition involving INinvolving IN

Now say we have, two arguments – some relation and the second argument is a <Condition> of the form t IN S. ‘t’ – tuple composed of some attributes of R ‘S’ – uncorrelated subquery

Steps to be followed:1. Replace the <Condition> by the tree that is the expression for S ( δ is

used to remove duplicates)

2. Replace the two-argument selection by a one-argument selection σC.

3. Give σC an argument that is the product of R and S.

Page 229: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

229

Two argument selection with condition Two argument selection with condition involving INinvolving IN

σ

R <Condition>

t IN S

σC

X

R  δ

S

Page 230: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

230

The effectThe effect

Page 231: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

231

Improving the Logical Query PlanImproving the Logical Query Plan

Algebraic laws to improve logical query plans: Selections can be pushed down the expression tree as

far as they can go. Similarly, projections can be pushed down the tree, or

new projections can be added. Duplicate eliminations can sometimes be removed, or

moved to a more convenient position in the tree. Certain selections can be combined with a product

below to turn the pair of operations into an equijoin.

Page 232: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

232

Grouping Associative/ Commutative Grouping Associative/ Commutative OperatorsOperators

An operator that is associative and commutative operators may be though of as having any number of operands.

We need to reorder these operands so that the multiway join is executed as sequence of binary joins.

Its more time consuming to execute them in the order suggested by parse tree.

For each portion of subtree that consists of nodes with the same associative and commutative operator (natural join, union, and intersection), we group the nodes with these operators into a single node with many children.

Page 233: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

233

The effect of query rewritingThe effect of query rewriting

Π movieTitle

Starname = name

StarsIn σbirthdate LIKE ‘%1960’

MovieStar

Page 234: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

234

Final step in producing logical query Final step in producing logical query planplan

=>

U

U

U

W

R

S T

VU

U V W

R S T

Page 235: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

235

An Example to summarizeAn Example to summarize

“find movies where the average age of the stars was at most 40 when the movie was made”

SELECT distinct m1.movieTitle, m1,movieYear

FROM StarsIn m1

WHERE m1.movieYear – 40 <= (

SELECT AVG (birthdate)

FROM StartsIn m2, MovieStar s

WHERE m2.starName = s.name AND

m1.movieTitle = m2.movieTitle AND

m1.movieYear = m2.movieyear

);

Page 236: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

236

SELECT distinct m1.movieTitle, m1,movieYearFROM StarsIn m1WHERE m1.movieYear – 40 <= ( SELECT AVG (birthdate) FROM StartsIn m2, MovieStar s WHERE m2.starName = s.name AND m1.movieTitle = m2.movieTitle AND m1.movieYear = m2.movieyear );

Page 237: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

237

`

Page 238: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

238

Page 239: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

239

16.4 From Estimating the Cost of Operation ►

Page 240: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

240

Estimating the Cost of OperationsEstimating the Cost of Operations

After getting to the logical query plan, we turn it into physical plan.

Consider all the possible physical plan and estimate their costs – this evaluation is known as cost-based enumeration.

The one with least estimated cost is the one selected to be passed to the query-execution engine.

Page 241: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

241

Selection for each physical plan

We select for each physical plan: An order and grouping for associative-and-commutative

operations like joins, unions, and intersections. An algorithm for each operator in the logical plan, for

instance, deciding whether a nested-loop join or hash-join should be used.

Additional operators – scanning, sorting etc. – that are needed for the physical plan but that were not present explicitly in the logical plan.

The way in which the arguments are passed from on operator to the next.

Page 242: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

242

Estimating Sizes of Intermediate Relations

1. Give accurate estimates.

2. Are easy to compute.

3. Are logically consistent; that is, the size estimate for an intermediate relation should not depend on how that relation is computed.

Page 243: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

243

Estimating the Size of a Projection

We should treat a classical, duplicate-eliminating projection as a bag-projection.

The size of the result can be computed exactly. There may be reduction in size (due to eliminated

components) or increase in size (due to new components created as combination of attributes).

Page 244: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

244

Estimating the Size of a Selection

While performing selection, we may reduce the number of tuples but the sizes of tuple remain same.

Size can be computed as:

Where A is an attribute of R and c is a constant

The recommended estimate is

T(S) = T(R)/ V(R,A)

S = σ A=c (R)

Page 245: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

245

Estimating Sizes of Other Operations

Union Intersection Difference Duplicate Elimination Grouping and Aggregation

Page 246: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

246

Choosing an Order for Joins

Chapter 16.6 by:Chiu LukID: 210

Page 247: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

247

Introduction

This section focuses on critical problem in cost-based optimization: Selecting order for natural join of three or

more relations Compared to other binary operations,

joins take more time and therefore need effective optimization techniques

Page 248: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

248

Introduction

Page 249: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

249

Significance of Left and Right Join Arguments The argument relations in joins

determine the cost of the join The left argument of the join is

Called the build relation Assumed to be smaller Stored in main-memory

Page 250: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

250

Significance of Left and Right Join Arguments The right argument of the join is

Called the probe relation Read a block at a time Its tuples are matched with those of build

relation The join algorithms which distinguish

between the arguments are: One-pass join Nested-loop join Index join

Page 251: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

251

Join Trees

Order of arguments is important for joining two relations

Left argument, since stored in main-memory, should be smaller

With two relations only two choices of join tree

With more than two relations, there are n! ways to order the arguments and therefore n! join trees, where n is the no. of relations

Page 252: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

252

Join Trees

Order of arguments is important for joining two relations

Left argument, since stored in main-memory, should be smaller

With two relations only two choices of join tree

With more than two relations, there are n! ways to order the arguments and therefore n! join trees, where n is the no. of relations

Page 253: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

253

Join Trees

Total # of tree shapes T(n) for n relations given by recurrence:

T(1) = 1 T(2) = 1 T(3) = 2 T(4) = 5 … etc

Page 254: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

254

Left-Deep Join Trees

Consider 4 relations. Different ways to join them are as follows

Page 255: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

255

In fig (a) all the right children are leaves. This is a left-deep tree

In fig (c) all the left children are leaves. This is a right-deep tree

Fig (b) is a bushy tree Considering left-deep trees is

advantageous for deciding join orders

Page 256: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

256

Join order Join order selection

A1 A2 A3 .. An Left deep join trees

Dynamic programming Best plan computed for each subset of relations

Best plan (A1, .., An) = min cost plan of( Best plan(A2, .., An) A1 Best plan(A1, A3, .., An) A2 …. Best plan(A1, .., An-1)) An

Ai

An

Page 257: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

257

Dynamic Programming to Select a Join Order and Grouping Three choices to pick an order for the join of many

relations are: Consider all of the relations Consider a subset Use a heuristic o pick one

Dynamic programming is used either to consider all or a subset Construct a table of costs based on relation size Remember only the minimum entry which will

required to proceed

Page 258: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

258

Dynamic Programming to Select a Join Order and Grouping

Page 259: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

259

Dynamic Programming to Select a Join Order and Grouping

Page 260: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

260

Dynamic Programming to Select a Join Order and Grouping

Page 261: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

261

Dynamic Programming to Select a Join Order and Grouping

Page 262: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

262

A Greedy Algorithm for Selecting a Join Order It is expensive to use an exhaustive

method like dynamic programming Better approach is to use a join-order

heuristic for the query optimization Greedy algorithm is an example of that

Make one decision at a time about order of join and never backtrack on the decisions once made

Page 263: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

Completing the Physical-Query-Plan and Chapter 16 Summary (16.7-16.8)

CS257 Spring 2009Professor Tsau Lin

Student: Suntorn Sae-EungDonavon Norwood

Page 264: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

264

Outline

16.7 Completing the Physical-Query-PlanI. Choosing a Selection MethodII. Choosing a Join MethodIII. Pipelining Versus MaterializationIV. Pipelining Unary OperationsV. Pipelining Binary OperationsVI. Notation for Physical Query PlanVII. Ordering the Physical Operations

16.8 Summary of Chapter 16

Page 265: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

265

Before complete Physical-Query-Plan

A query previously has been Parsed and Preprocessed (16.1) Converted to Logical Query Plans (16.3) Estimated the Costs of Operations (16.4) Determined costs by Cost-Based Plan

Selection (16.5) Weighed costs of join operations by

choosing an Order for Joins

Page 266: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

266

16.7 Completing the Physical-Query-Plan

3 topics related to turning LP into a complete physical plan

1. Choosing of physical implementations such as Selection and Join methods

2. Decisions regarding to intermediate results (Materialized or Pipelined)

3. Notation for physical-query-plan operators

Page 267: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

267

I. Choosing a Selection Method (A)

Algorithms for each selection operators1. Can we use an created index on an

attribute? If yes, index-scan. Otherwise table-scan)2. After retrieve all condition-satisfied tuples

in (1), then filter them with the rest selection conditions

Page 268: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

268

Choosing a Selection Method(A) (cont.)

Recall Cost of query = # disk I/O’s How costs for various plans are estimated from σC(R) operation

1. Cost of table-scan algorithm

a) B(R) if R is clusteredb) T(R) if R is not clustered

2. Cost of a plan picking an equality term (e.g. a = 10) w/ index-scan

a) B(R) / V(R, a) clustering indexb) T(R) / V(R, a) nonclustering index

3. Cost of a plan picking an inequality term (e.g. b < 20) w/ index-scan

a) B(R) / 3 clustering indexb) T(R) / 3 nonclustering index

Page 269: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

269

Example

Selection: σx=1 AND y=2 AND z<5 (R)

- Where parameters of R(x, y, z) are : T(R)=5000, B(R)=200,

V(R,x)=100, and V(R, y)=500

- Relation R is clustered- x, y have nonclustering indexes, only index on z

is clustering.

Page 270: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

270

Example (cont.)

Selection options:1. Table-scan filter x, y, z. Cost is B(R) = 200 since R

is clustered.2. Use index on x =1 filter on y, z. Cost is 50 since

T(R) / V(R, x) is (5000/100) = 50 tuples, index is not clustering.

3. Use index on y =2 filter on x, z. Cost is 10 since T(R) / V(R, y) is (5000/500) = 10 tuples using nonclustering index.

4. Index-scan on clustering index w/ z < 5 filter x ,y. Cost is about B(R)/3 = 67

Page 271: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

271

Example (cont.)

Costsoption 1 = 200 option 2 = 50 option 3 = 10 option 4 = 67

The lowest Cost is option 3. Therefore, the preferred physical plan

1. retrieves all tuples with y = 2 2. then filters for the rest two conditions (x, z).

Page 272: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

272

II. Choosing a Join Method

Determine costs associated with each join algorithms: 1. One-pass join, and nested-loop join devotes

enough buffer to joining2. Sort-join is preferred when attributes are pre-

sorted or two or more join on the same attribute such as

(R(a, b) S(a, c)) T(a, d) - where sorting R and S on a will produce result of R S to be sorted on a and used directly in next join

Page 273: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

273

3. Index-join for a join with high chance of using index created on the join attribute such as R(a, b) S(b, c)

4. Hashing join is the best choice for unsorted or non-indexing relations which needs multipass join.

Choosing a Join Method (cont.)

Page 274: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

274

III. Pipelining Versus Materialization Materialization (naïve way)

store (intermediate) result of each operations on disk

Pipelining (more efficient way)

Interleave the execution of several operations, the tuples

produced by one operation are passed directly to the

operations that used it

store (intermediate) result of each operations on buffer,

which is implemented on main memory

Page 275: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

275

Unary = a-tuple-at-a-time or full relation selection and projection are the best

candidates for pipelining.

IV. Pipelining Unary Operations

R

In buf Unaryoperation

Out buf

In buf Unaryoperation

Out buf

M-1 buffers

Page 276: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

276

Pipelining Unary Operations (cont.)

Pipelining Unary Operations are implemented by iterators

Page 277: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

277

V. Pipelining Binary Operations Binary operations : , , - , , x The results of binary operations can also

be pipelined. Use one buffer to pass result to its

consumer, one block at a time. The extended example shows tradeoffs

and opportunities

Page 278: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

278

Example

Consider physical query plan for the expression

(R(w, x) S(x, y)) U(y, z) Assumption

R occupies 5,000 blocks, S and U each 10,000 blocks.

The intermediate result R S occupies k blocks for some k.

Both joins will be implemented as hash-joins, either one-pass or two-pass depending on k

There are 101 buffers available.

Page 279: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

279

Example (cont.)

First consider join R S, neither relations fits in buffers Needs two-pass hash-join to partition R into 100 buckets (maximum possible) each bucket has 50 blocks The 2nd pass hash-join uses 51 buffers, leaving

the rest 50 buffers for joining result of R S with U.

Page 280: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

280

Example (cont.)

Case 1: suppose k 49, the result of R S occupies at most 49 blocks.

Steps 1. Pipeline in R S into 49 buffers2. Organize them for lookup as a hash table3. Use one buffer left to read each block of

U in turn4. Execute the second join as one-pass join.

Page 281: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

281

Example (cont.)

The total number of I/O’s is 55,000 45,000 for two-pass hash

join of R and S 10,000 to read U for one-

pass hash join of (R S) U.

Page 282: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

282

Example (cont.)

Case 2: suppose k > 49 but < 5,000, we can still pipeline, but need another strategy which intermediate results join with U in a 50-bucket, two-pass hash-join. Steps are:

1. Before start on R S, we hash U into 50 buckets of 200 blocks each.

2. Perform two-pass hash join of R and U using 51 buffers as case 1, and placing results in 50 remaining buffers to form 50 buckets for the join of R S with U.

3. Finally, join R S with U bucket by bucket.

Page 283: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

283

Example (cont.)

The number of disk I/O’s is: 20,000 to read U and write its tuples into

buckets 45,000 for two-pass hash-join R S k to write out the buckets of R S k+10,000 to read the buckets of R S and U

in the final join The total cost is 75,000+2k.

Page 284: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

284

Example (cont.)

Compare Increasing I/O’s between case 1 and case 2 k 49 (case 1)

Disk I/O’s is 55,000 k > 50 5000 (case 2)

k=50 , I/O’s is 75,000+(2*50) = 75,100 k=51 , I/O’s is 75,000+(2*51) = 75,102 k=52 , I/O’s is 75,000+(2*52) = 75,104

Notice: I/O’s discretely grows as k increases from 49 50.

Page 285: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

285

Example (cont.)

Case 3: k > 5,000, we cannot perform two-pass join in 50 buffers available if result of R S is pipelined. Steps are

1. Compute R S using two-pass join and store the result on disk.

2. Join result on (1) with U, using two-pass join.

Page 286: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

286

Example (cont.)

The number of disk I/O’s is: 45,000 for two-pass hash-join R and S k to store R S on disk 30,000 + k for two-pass join of U in R S

The total cost is 75,000+4k.

Page 287: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

287

Example (cont.)

In summary, costs of physical plan as function of R S size.

Page 288: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

288

VI. Notation for Physical Query Plans

Several types of operators: 1. Operators for leaves2. (Physical) operators for Selection3. (Physical) Sorts Operators4. Other Relational-Algebra Operations

In practice, each DBMS uses its own internal notation for physical query plan.

Page 289: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

289

Notation for Physical Query Plans (cont.)

1. Operator for leaves A leaf operand is replaced in LQP tree

TableScan(R) : read all blocks SortScan(R, L) : read in order according to

L IndexScan(R, C): scan index attribute A by

condition C of form Aθc.

IndexScan(R, A) : scan index attribute R.A.

This behaves like TableScan but more efficient if R is not clustered.

Page 290: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

290

Notation for Physical Query Plans (cont.)

2. (Physical) operators for Selection Logical operator σC(R) is often combined

with access methods. If σC(R) is replaced by Filter(C), and there is no

index on R or an attribute on condition C Use TableScan or SortScan(R, L) to access R

If condition C Aθc AND D for condition D, and there is an index on R.A, then we may

Use operator IndexScan(R, Aθc) to access R and Use Filter(D) in place of the selection σC(R)

Page 291: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

291

Notation for Physical Query Plans (cont.)

3. (Physical) Sort Operators Sorting can occur any point in physical

plan, which use a notation SortScan(R, L). It is common to use an explicit operator

Sort(L) to sort relation that is not stored. Can apply at the top of physical-query-

plan tree if the result needs to be sorted with ORDER BY clause (г).

Page 292: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

292

Notation for Physical Query Plans (cont.)

4. Other Relational-Algebra Operations Descriptive text definitions and signs to

elaborate Operations performed e.g. Join or grouping. Necessary parameters e.g. theta-join or list

of elements in a grouping. A general strategy for the algorithm e.g.

sort-based, hashed based, or index-based. A decision about number of passed to be

used e.g. one-pass, two-pass or multipass. An anticipated number of buffers the

operations will required.

Page 293: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

293

Notation for Physical Query Plans (cont.)

Example of a physical-query-plan A physical-query-plan in example 16.36 for the

case k > 5000 TableScan Two-pass hash join Materialize (double line) Store operator

Page 294: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

294

Notation for Physical Query Plans (cont.)

Another example A physical-query-plan in example 16.36 for the

case k < 49 TableScan (2) Two-pass hash join Pipelining Different buffers needs Store operator

Page 295: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

295

Notation for Physical Query Plans (cont.)

A physical-query-plan in example 16.35 Use Index on condition y = 2 first Filter with the rest condition later on.

Page 296: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

296

VII. Ordering of Physical Operations The PQP is represented as a tree

structure implied order of operations. Still, the order of evaluation of interior

nodes may not always be clear. Iterators are used in pipeline manner Overlapped time of various nodes will

make “ordering” no sense.

Page 297: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

297

Ordering of Physical Operations (cont.)

3 rules summarize the ordering of events in a PQP tree:

1. Break the tree into sub-trees at each edge that represent materialization.

Execute one subtree at a time.2. Order the execution of the subtree

Bottom-top Left-to-right

3. All nodes of each sub-tree are executed simultaneously.

Page 298: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

298

Summary of Chapter 16

In this part of the presentation I will talk about the main topics of Chapter 16.

Page 299: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

299

COMPILATION OF QUERIES

Compilation means turning a query into a physical query plan, which can be implemented by query engine.

Steps of query compilation : Parsing Semantic checking Selection of the preferred logical query

plan Generating the best physical plan

Page 300: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

300

THE PARSER

The first step of SQL query processing. Generates a parse tree Nodes in the parse tree corresponds to

the SQL constructs Similar to the compiler of a programming

language

Page 301: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

301

VIEW EXPANSION

A very critical part of query compilation. Expands the view references in the

query tree to the actual view. Provides opportunities for the query

optimization.

Page 302: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

302

SEMANTIC CHECKING

Checks the semantics of a SQL query. Examines a parse tree. Checks :

Attributes Relation names Types

Resolves attribute references.

Page 303: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

303

CONVERSION TO A LOGICAL QUERY PLAN

Converts a semantically parsed tree to a algebraic expression.

Conversion is straightforward but sub queries need to be optimized.

Two argument selection approach can be used.

Page 304: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

304

ALGEBRAIC TRANSFORMATION

Many different ways to transform a logical query plan to an actual plan using algebraic transformations.

The laws used for this transformation : Commutative and associative laws Laws involving selection Pushing selection Laws involving projection Laws about joins and products Laws involving duplicate eliminations Laws involving grouping and aggregation

Page 305: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

305

ESTIMATING SIZES OF RELATIONS

True running time is taken into consideration when selecting the best logical plan.

Two factors the affects the most in estimating the sizes of relation : Size of relations ( No. of tuples ) No. of distinct values for each attribute

of each relation Histograms are used by some systems.

Page 306: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

306

COST BASED OPTIMIZING

Best physical query plan represents the least costly plan.

Factors that decide the cost of a query plan : Order and grouping operations like joins,

unions and intersections. Nested loop and the hash loop joins used. Scanning and sorting operations. Storing intermediate results.

Page 307: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

307

PLAN ENUMERATION STRATEGIES

Common approaches for searching the space for best physical plan . Dynamic programming : Tabularizing the

best plan for each sub expression Selinger style programming : sort-order the

results as a part of table Greedy approaches : Making a series of

locally optimal decisions Branch-and-bound : Starts with enumerating

the worst plans and reach the best plan

Page 308: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

308

LEFT-DEEP JOIN TREES

Left – Deep Join Trees are the binary trees with a single spine down the left edge and with leaves as right children.

This strategy reduces the number of plans to be considered for the best physical plan.

Restrict the search to Left – Deep Join Trees when picking a grouping and order for the join of several relations.

Page 309: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

309

PHYSICAL PLANS FOR SELECTION

Breaking a selection into an index-scan of relation, followed by a filter operation.

The filter then examines the tuples retrieved by the index-scan.

Allows only those to pass which meet the portions of selection condition.

Page 310: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

310

PIPELINING VERSUS MATERIALIZING This flow of data between the operators can be

controlled to implement “ Pipelining “ . The intermediate results should be removed from main

memory to save space for other operators. This techniques can implemented using “

materialization “ . Both the pipelining and the materialization should be

considered by the physical query plan generator. An operator always consumes the result of other

operator and is passed through the main memory.

Page 311: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

311

Reference

[1] H. Garcia-Molina, J. Ullman, and J. Widom, “Database System: The Complete Book,” second edition: p.897-913, Prentice Hall, New Jersey, 2008

Page 312: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

Concurrency Control

Chiu LukCS257 Database Systems Principles

Spring 2009

Page 313: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

313

Concurrency Control

Concurrency control in database management systems (DBMS) ensures that database transactions are performed concurrently without the concurrency violating the data integrity of a database.

Executed transactions should follow the ACID rules. The DBMS must guarantee that only serializable (unless Serializability is intentionally relaxed), recoverable schedules are generated.

It also guarantees that no effect of committed transactions is lost, and no effect of aborted (rolled back) transactions remains in the related database.

Page 314: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

314

Issues with Concurrency: Example

A = 500

B = 500

C = 500

AccountBalances

Bank database: 3 Accounts

Property: A + B + C = 1500

Money does not leave the system

Page 315: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

315

Issues with Concurrency: Example

Read (A, t)

t = t - 100

Write (A, t)

Read (B, t)

t = t + 100

Write (B, t)

Transaction T1: Transfer 100 from A to B

A = 400, B = 600, C = 500

A = 500, B = 500, C = 500

Page 316: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

316

Issues with Concurrency: Example

Read (A, s)

s = s - 100

Write (A, s)

Read (C, s)

s = s + 100

Write (C, s)

Transaction T2: Transfer 100 from A to C

Page 317: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

317Read (A, t)

t = t - 100

Write (A, t) Read (B, t)t = t + 100Write (B, t)

Read (A, s)

s = s - 100

Write (A, s)

Read (C, s)s = s + 100Write (C, s)

Transaction T1 Transaction T2 A B C

400 600600

500 500500

400 500500

400 500500

400 500600

400 + 600 + 600 = 1600

Page 318: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

318Read (A, t)

t = t - 100

Write (A, t)

Read (B, t)t = t + 100Write (B, t)

Read (A, s)

s = s - 100

Write (A, s)

Read (C, s)s = s + 100Write (C, s)

Transaction T1 Transaction T2 A B C

300 600600

500 500500

400 500500

300 500500

300 500600

300 + 600 + 600 = 1500

Page 319: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

319

Scheduler

Page 320: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

320

Serial and Serializable Schedules

In the field of databases, a schedule is a list of actions, (i.e. reading, writing, aborting, committing), from a set of transactions. In this example, Schedule D is the set of 3 transactions T1, T2, T3. The schedule describes the actions of the transactions as seen by the DBMS. T1 Reads and writes to object X, and then T2 Reads and writes to object Y, and finally T3 Reads and writes to object Z. This is an example of a serial schedule, because the actions of the 3 transactions are not interleaved.

Page 321: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

321

Serial and Serializable Schedules

A schedule that is equivalent to a serial schedule has the serializability property. In schedule E, the order in which the actions of the transactions are executed is

not the same as in D, but in the end, E gives the same result as D.

Page 322: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

322

Conflict actions

Two or more actions are said to be in conflict if: The actions belong to different transactions. At least one of the actions is a write operation. The actions access the same object (read or write).

The following set of actions is conflicting: T1:R(X), T2:W(X), T3:W(X)

While the following sets of actions are not: T1:R(X), T2:R(X), T3:R(X) T1:R(X), T2:W(Y), T3:R(X)

Page 323: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

323

Conflict Serializable

A schedule is said to be conflict-serializable when the schedule is conflict-equivalent to one or more serial schedules.

Another definition for conflict-serializability is that a schedule is conflict-serializable if and only if there exists an acyclic precedence graph/serializability graph for the schedule.

Which is conflict-equivalent to the serial schedule <T1,T2>, but not <T2,T1>.

Page 324: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

324Read (A, t)

t = t - 100

Write (A, t)

Read (B, t)t = t + 100Write (B, t)

Read (A, s)

s = s - 100

Write (A, s)

Read (C, s)s = s + 100Write (C, s)

A B C

300 600600

500 500500

400 500600

300 + 600 + 600 = 1500

Serial Schedule

T1

T2

Page 325: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

325

Read (A, t)

t = t - 100

Write (A, t)

Read (B, t)t = t + 100Write (B, t)

Read (A, s)

s = s - 100

Write (A, s)

Read (C, s)s = s + 100Write (C, s)

A B C

300 600600

500 500500

400 600500

300 + 600 + 600 = 1500

Serial Schedule

T2

T1

Page 326: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

326

Serial Schedule

SnS0 S1 S2

T1 T2 Tn

Consistent States

Page 327: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

327

Conflict Serializability

Two actions Ai and Aj executed on the same data object by Ti and Tj conflicts if either one of them is a write operation.

Let Ai and Aj are consecutive non-conflicting actions that belongs to different transactions. We can swap Ai and Aj without changing the result.

Two schedules are conflict equivalent if they can be turned one into the other by a sequence of non-conflicting swaps of adjacent actions.

Page 328: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

328

Conflict Serializability

T1 T2

R(A)

W(A)

R(A)

R(B)

W(A)

W(B)

R(B)

W(B)

Page 329: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

329

Conflict Serializability

T1 T2

R(A)

W(A)

R(B)

R(A)

W(A)

W(B)

R(B)

W(B)

Page 330: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

330

Conflict Serializability

T1 T2

R(A)

W(A)

R(A)

R(B)

W(B)

W(A)

R(B)

W(B)

Page 331: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

331

Conflict Serializability

T1 T2

R(A)

W(A)

R(A)

W(B)

R(B)

W(A)

R(B)

W(B)

SerialSchedule

Page 332: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

332

References Database Systems: The Complete Book (2nd Edition) (Hardcover) by

Hector Garcia-Molina (Author), Jeffrey D. Ullman (Author), Jennifer Widom (Author) Publisher : Prenctice Hall.

http://en.wikipedia.org/wiki/Concurrency_control http://www.utdallas.edu/~mxk055100/db07files/serilizable-defs.ppt http://en.wikipedia.org/wiki/Schedule_(computer_science)#Serializable http://www.cs.duke.edu/~shivnath/courses/fall06/Lectures/11_serial.ppt

Page 333: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

333

333

Concurrency Control (18.3-18.4)

CS257 Spring/2009Professor: Tsau Lin

Student: Donavon Norwood Suntorn Sae-Eung

Page 334: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

334

334

INTRODUCTION

Enforcing serializability by locks Locks Locking scheduler Two phase locking

Locking systems with several lock modes Shared and exclusive locks Compatibility matrices Upgrading/updating locks Incrementing locks

Page 335: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

335

335

18.3 Locks

It works like as follows : A request from transaction Scheduler checks in the lock table Generates a serializable schedule of actions.

Page 336: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

336

336

18.3.1 Consistency of transactions

Actions and locks must relate each other Transactions can only read & write only if

has a lock and has not released the lock. Unlocking an element is compulsory.

Legality of schedules No two transactions can aquire the lock on

same element without the prior one releasing it.

Page 337: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

337

337

18.3.2 Locking scheduler

Grants lock requests only if it is in a legal schedule.

Lock table stores the information about current locks on the elements.

Page 338: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

338

338

18.3.2 The locking scheduler (contd.)

A legal schedule of consistent transactions but unfortunately it is not a serializable.

Page 339: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

339

339

18.3.2 The locking scheduler (contd.)

The locking scheduler delays requests that would result in an illegal schedule.

Page 340: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

340

340

18.3.3 Two-phase locking (2PL)

Guarantees a legal schedule of consistent transactions is conflict-serializable.

All lock requests proceed all unlock requests.

The growing phase: Obtain all the locks and no unlocks allowed.

The shrinking phase: Release all the locks and no locks allowed.

Page 341: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

341

341

Working of Two-Phase locking

Assures serializability. Two protocols for 2PL:

Strict two phase locking : Transaction holds all its exclusive locks till commit / abort.

Rigorous two phase locking : Transaction holds all locks till commit / abort.

Possible to find a transaction Tj that has a 2PL and a schedule S for Ti ( non 2PL ) and Tj that is not conflict serializable.

Page 342: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

342

342

Failure of 2PL.

2PL fails to provide security against deadlocks.

Page 343: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

343

18.4 Locking Systems with Several Lock Modes

In 18.3, a transaction must lock a database element (X) either reads or writes. No reason why several transactions could not

read X at the same time, as long as none write X

Introduce locking schemes Shared/Read Lock ( For Reading) Exclusive/Write Lock( For Writing)

343

Page 344: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

344

18.4.1 Shared & Exclusive Locks

Consistency of Transactions Cannot write without Exclusive Lock Cannot read without holding some lock Consider lock for writing is “stronger” than for

reading This basically works on 2 principles

A read action can only proceed a shared or an exclusive lock

A write lock can only proceed a exclusive lock All locks need to be unlocked before commit

344

Page 345: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

345

18.4.1 Shared & Exclusive Locks (cont.)

Two-phase locking of transactions Must precede unlocking

Notation: sli (X)– Ti requests shared lock on DB element X

xli (X)– Ti requests exclusive lock on DB element X

ui (X)– Ti relinguishes whatever lock on X

345

Page 346: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

346

18.4.1 Shared & Exclusive Locks (cont.) Legality of Schedules

An element may be locked exclusively by one transaction or by several in shared mode, but not both

Page 347: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

347

18.4.2 Compatibility Matrices

A convenient way to describe lock-management policies Rows correspond to a lock held on an

element by another transaction Columns correspond to mode of lock

requested. Example :

347

Lock requested

S X

Lock inhold

S YES NO

X NO NO

Page 348: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

348

18.4.3 Upgrading Locks

A transaction (T) taking a shared lock is friendly toward other transaction.

When T wants to read and write a new value X,

At first, take a shared lock on X

T performs operations on X (may spend long time)

When T is ready to write a new value, “Upgrade” lock to exclusive lock on X.

Transactions with unpredicted read write locks can use Upgrading Locks.

348

Page 349: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

349

18.4.3 Upgrading Locks (cont.)

Observe the example

T1 cannot take an exclusive lock on B until all locks on B

are released.

Page 350: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

350

18.4.3 Upgrading Locks (cont.)

Upgrading can simply cause a “Deadlock”. Both the transactions want to upgrade on

the same element

350

Page 351: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

351

18.4.4 Update locks

The third lock mode resolving the deadlock problem on upgrading lock. Only “Update lock” can be upgraded to a

write lock later. An update lock can be granted on X when

there are already shared locks on X. Once there is an update lock, it prevents

additional any kinds of lock, and later changes to an exclusive lock.

Notation: uli (X)351

Page 352: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

352

18.4.4 Update locks (cont.)

• Compatibility matrix (asymmetric)

352

Lock requestedS X U

Lock inhold

S YES NO YES

X NO NO NO

U NO NO NO

Page 353: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

353

18.4.4 Update locks (cont.)

Example

Page 354: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

354

18.4.5 Increment Locks

A kind of lock which is useful for increasing/decreasing transactions. e.g. money transfer between bank accounts.

If 2 transactions (T1, T2) add constants to the same database element (X), It doesn’t matter which goes first, but no reads

are allowed in between transaction processing Let see on following exhibits

354

Page 355: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

355

18.4.5 Increment Locks (cont.)

A=5

A=15

A=17

A=7T1: INC (A,2)

T1: INC (A,2)

T2: INC (A,10)

T2: INC (A,10)

CASE 1

CASE 2

Page 356: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

356

18.4.5 Increment Locks (cont.) What if

A=5

A=15

A=15

A=7T1: INC (A,2)

T1: INC (A,2)

T2: INC (A,10)

T2: INC (A,10)

A=5 A=7

A != 17

Page 357: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

357

18.4.5 Increment Locks (cont.) INC (A, c):

is increment action of write constant ‘c’ to database element A

stands for an atomic execution of READ(A,t); t=t+c; WRITE(A,t);

Notation: ili (X)– action of Ti requesting an increment lock on X

inci (X)– action of Ti increments X by some constant; don’t care about the value of the constant.

Page 358: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

358

18.4.5 Increment Locks (cont.)

358

• Compatibility matrix

Lock requestedS X I

Lock inhold

S YES NO NO

X NO NO NO

I NO NO YES

Page 359: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

359

18.4.5 Increment Locks (cont.) Example

Page 360: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

360

Refrences

H. Garcia-Molina, J. Ullman, and J. Widom, “Database System: The Complete Book,” second edition: chapter 18.3-18.4, p.897-913, Prentice Hall, New Jersy, 2008

360

Page 361: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

361

361

Concurrency Control

Chapter 18

Section 18.5

Presented by Khadke, Suvarna

CS 257 (Section II) Id 213

Page 362: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

362

Overview

Assume knowledge of: Lock Two phase lock Lock modes: shared, exclusive, update

A simple scheduler architecture based on following principle : Insert lock actions into the stream of reads,

writes, and other actions Release locks when the transaction manager

tells it that the transaction will commit or abort

362

Page 363: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

363

Scheduler That Inserts Lock Actions into the transactions request stream

Page 364: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

364

Scheduler That Inserts Lock ActionsActions requested by a transaction are generally

transmitted through the scheduler and executed on the database. If transaction is delayed, waiting for a lock,

Part I: Takes the stream of requests generated by the transaction & insert appropriate lock modes to db operations (read, write, or update)

Part II: Take actions (a lock or db operation) from Part I and executes it appropriately.

Determine the transaction (T) that action belongs and status of T (delayed or not). If T is not delayed then

1. Database access action is transmitted to the database and executed

364

Page 365: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

365

Scheduler That Inserts Lock Actions

2. If lock action is received by PartII, it checks the L Table whether lock can be granted or not

i> Granted, the L Table is modified to include granted lock

ii>Not G. then update L Table about requested lock then PartII delays transaction T

3. When a T = commits or aborts, PartI is notified by the transaction manager and releases all locks.

If any transactions are waiting for locks PartI notifies PartII.

3. Part II when notified about the lock on some DB element, determines next transaction T’ to get lock to continue.

365

Page 366: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

366

The Lock Table

A relation that associates database elements with locking information about that element

Implemented with a hash table using database elements as the hash key

Size is proportional to the number of lock elements only, not to the size of the entire database

366

DB element A

Lock information for A

Page 367: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

367

Lock Table Entries Structure

367

Some Sort of information found in Lock Table entry :SXU scheme on a typical DB element A is tuple with following components1>Group modes: a summary of the most stringent conditions that transaction requesting a new lock on A faces. -S: only shared locks are held-X: one exclusive lock and no other locks- U: one update lock and one or more shared locks2>wait : one transaction waiting for a lock on A3>A list : T currently hold locks on A or Waiting for lock on A

Page 368: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

368

Handling Lock Requests

Suppose transaction T requests a lock on A

If there is no lock table entry for A, then there are no locks on A, so create the entry and grant the lock request

If the lock table entry for A exists, use the group mode to guide the decision about the lock request

368

Page 369: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

369

Handling Lock Requests

369

If group mode is U (update) or X (exclusive)

No other lock can be granted Deny the lock request by T Place an entry on the list saying T requests a lock And Wait? = ‘yes’

If group mode is S (shared)

Another shared or update lock can be granted Grant request for an S or U lock Create entry for T on the list with Wait? = ‘no’ Change group mode to U if the new lock is an

update lock

Page 370: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

370

Handling Unlock Requests

370

Now suppose transaction T unlocks A Delete T’s entry on the list for A If T’s lock is not the same as the group

mode, no need to change group mode Otherwise check entire list for new group

modeS: GM(S) or nothingU: GM(S) or nothingX: nothing

Page 371: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

371

Handling Unlock Requests

371

If the value of waiting is “yes" need to grant one or more locks using following approachesFirst-Come-First-Served: Grant the lock to the longest waiting request. No starvation (waiting forever for lock)Priority to Shared Locks: Grant all S locks waiting, then one U lock. Grant X lock if no others waitingPriority to Upgrading: If there is a U lock waiting to upgrade to an X lock, grant that first.

Page 372: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

372

Reference List

ULLMAN, J. D., WISDOM J. & HECTOR G., DATABASE SYSTEMS THE COMPLETE BOOK, 2nd Edition, 2008.

372

Page 373: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

373

SECTION 18.7THE TREE PROTOCOL

By :Saloni Tamotia (215)

Page 374: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

374BASICS

B-Trees

- Tree data structure that keeps data sorted

- allow searches, insertion, and deletion

- commonly used in database and file systems Lock

- Enforce limits on access to resources

- way of enforcing concurrency control Lock Granularity

- Level and type of information that lock

protects.

Page 375: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

375

TREE PROTOCOL Kind of graph-based protocol Alternate to Two-Phased Locking

(2PL) database elements are disjoint pieces

of data Nodes of the tree DO NOT form a

hierarchy based on containment Way to get to the node is through its

parent Example: B-Tree

Page 376: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

376 ADVANTAGES OF TREE PROTOCOL

Unlocking takes less time as compared to 2PL

Freedom from deadlocks

Page 377: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

377 18.7.1 MOTIVATION FOR TREE-BASED LOCKING

Consider B-Tree Index, treating individual nodes as lockable database elements.

Concurrent use of B-Tree is not possible with standard set of locks and 2PL.

Therefore, a protocol is needed which can assure serializability by allowing access to the elements all the way at the bottom of the tree even if the 2PL is violated.

Page 378: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

378

18.7.1 MOTIVATION FOR TREE-BASED LOCKING

(cont.)Reason for : “Concurrent use of B-Tree is not possible with

standard set of locks and 2PL.”

every transaction must begin with locking the root node 2PL transactions can not unlock the root until all the required

locks are acquired.

Page 379: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

379

18.7.2 ACCESSING TREE STRUCTURED

DATAAssumptions:

Only one kind of lockConsistent transactionsLegal schedulesNo 2PL requirement on transaction

Page 380: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

380

18.7.2 RULES FOR ACCESSING TREE

STRUCTURED DATARULES:

First lock can be at any node. Subsequent locks may be acquired only after

parent node has a lock. Nodes may be unlocked any time. No relocking of the nodes even if the node’s

parent is still locked

Page 381: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

381

18.7.3 WHY TREE PROTOCOL WORKS?

Tree protocol implies a serial order on transactions in the schedule.

Order of precedence:

Ti < s Tj If Ti locks the root before Tj, then Ti locks every node in

common with Tj before Tj.

Page 382: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

382 ORDER OF PRECEDENCE

Page 383: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

383

SECTION 18.8Timestamps

By :Rupinder Singh (216)

Page 384: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

384

What is Timestamping?

Scheduler assign each transaction T a unique number, it’s timestamp TS(T).

Timestamps must be issued in ascending order, at the time when a transaction first notifies the scheduler that it is beginning.

Page 385: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

385

Timestamp TS(T)

Two methods of generating Timestamps. Use the value of system, clock as the

timestamp. Use a logical counter that is incremented

after a new timestamp has been assigned. Scheduler maintains a table of

currently active transactions and their timestamps irrespective of the method used

Page 386: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

386 Timestamps for database element X and commit bit

RT(X):- The read time of X, which is the highest timestamp of transaction that has read X.

WT(X):- The write time of X, which is the highest timestamp of transaction that has write X.

C(X):- The commit bit for X, which is true if and only if the most recent transaction to write X has already committed.

Page 387: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

387

Physically Unrealizable Behavior

Read too late: A transaction U that started after

transaction T, but wrote a value for X before T reads X.

U writes X

T reads X

T start U start

Page 388: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

388

Physically Unrealizable Behavior

Write too late A transaction U that started after T,

but read X before T got a chance to write X.

U reads X

T writes X

T start U start

Figure: Transaction T tries to write too late

Page 389: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

389

Dirty Read

It is possible that after T reads the value of X written by U, transaction U will abort.

U writes X

T reads X

U start T start U aborts

T could perform a dirty read if it reads X when shown

Page 390: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

390

Rules for Timestamps-Based scheduling

1. Scheduler receives a request rT(X)

a) If TS(T) ≥ WT(X), the read is physically realizable. 1. If C(X) is true, grant the request, if TS(T) > RT(X),

set RT(X) := TS(T); otherwise do not change RT(X). 2. If C(X) is false, delay T until C(X) becomes true or

transaction that wrote X aborts.

b) If TS(T) < WT(X), the read is physically unrealizable. Rollback T.

Page 391: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

391

Rules for Timestamps-Based scheduling (Cont.)

2. Scheduler receives a request WT(X).a) if TS(T) ≥ RT(X) and TS(T) ≥ WT(X), write is physically realizable and must be performed.

1. Write the new value for X,2. Set WT(X) := TS(T), and3. Set C(X) := false.

b) if TS(T) ≥ RT(X) but TS(T) < WT(X), then the write is physically realizable, but there is already a later values in X.

a. If C(X) is true, then the previous writers of X is committed, and ignore the write by T.

b. If C(X) is false, we must delay T.c) if TS(T) < RT(X), then the write is physically unrealizable, and T must be rolled back.

Page 392: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

392 Rules for Timestamps-Based scheduling (Cont.)

3. Scheduler receives a request to commit T. It must find all the database elements X written by T and set C(X) := true. If any transactions are waiting for X to be committed, these transactions are allowed to proceed.

4. Scheduler receives a request to abort T or decides to rollback T, then any transaction that was waiting on an element X that T wrote must repeat its attempt to read or write.

Page 393: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

393

Multiversion Timestamps Multiversion schemes keep old versions

of data item to increase concurrency. Each successful write results in the

creation of a new version of the data item written.

Use timestamps to label versions. When a read(X) operation is issued,

select an appropriate version of X based on the timestamp of the transaction, and return the value of the selected version.

Page 394: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

394

Timestamps and Locking

Generally, timestamping performs better than locking in situations where: Most transactions are read-only. It is rare that concurrent transaction

will try to read and write the same element.

In high-conflict situation, locking performs better than timestamps

Page 395: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

395

BySwathi Vegesna

Page 396: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

396

At a Glance Introduction Validation based scheduling Validation based Scheduler Expected exceptions Validation rules Example Comparisons Summary References

Page 397: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

397

Introduction

What is optimistic concurrency control? Timestamp- based scheduling and Validation-based scheduling

Page 398: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

398

Validation based scheduling Scheduler keep a record of what the

active transactions are doing. Executes in 3 phases

1. Read2. Validate3. Write

Page 399: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

399

Validation based Scheduler Contains an assumed serial order of

transactions. Maintains three sets:

START( ): set of T’s started but not completed validation.

VAL( ): set of T’s validated but not finished the writing phase.

FIN( ): set of T’s that have finished.

Page 400: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

400

Expected exceptions1. Suppose there is a transaction U, such that: U is in VAL or FIN; that is, U has validated, FIN(U)>START(T); that is, U did not finish before T started RS(T) ∩WS(T) ≠φ; let it contain database element X.

2. Suppose there is transaction U, such that:• U is in VAL; U has successfully validated.•FIN(U)>VAL(T); U did not finish before T entered its validation phase.•WS(T) ∩ WS(U) ≠φ; let x be in both write sets.

Page 401: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

401

Validation rules Check that RS(T) ∩ WS(U)= φ for any

previously validated U that did not finish before T has started ie FIN(U)>START(T).

Check that WS(T) ∩ WS(U)= φ for any previously validated U that did not finish before T is validated ie FIN(U)>VAL(T)

Page 402: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

402

Example

Page 403: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

403

Solution Validation of U:

Nothing to check Validation of T:

WS(U) ∩ RS(T)= {D} ∩{A,B}=φWS(U) ∩ WS(T)= {D}∩ {A,C}=φ

Validation of R:RS(V) ∩ WS(T)= {B}∩{A,C}=φWS(V) ∩ WS(T)={D,E}∩ {A,C}=φRS(V) ∩ WS(U)={B} ∩{D}=φ

Validation of W:RS(W) ∩ WS(T)= {A,D}∩{A,C}={A}WS(W) ∩ WS(V)= {A,D}∩{D,E}={D}WS(W) ∩ WS(V)= {A,C}∩{D,E}=φ (W is not validated)

Page 404: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

404

ComparisonConcurrency control Mechanisms

Storage Utilization Delays

Locks Space in the lock table is proportional to the number of database elements locked.

Delays transactions but avoids rollbacks

Timestamps Space is needed for read and write times with every database element, neither or not it is currently accessed.

Do not delay the transactions but cause them to rollback unless Interface is low

Validation Space is used for timestamps and read or write sets for each currently active transaction, plus a few more transactions that finished after some currently active transaction began.

Do not delay the transactions but cause them to rollback unless interface is low

Page 405: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

405

Summary Concurrency control by validation The three phases Validation Rules Comparison

Page 406: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

406

References Database Systems: The Complete Book

Page 407: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

407 21.1 Introduction to Information Integration

CS257 Fan Yang

Page 408: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

408

Need for Information Integration All the data in the world could put in a

single database (ideal database system) In the real world (impossible for a single

database):databases are created independentlyhard to design a database to support future use

Page 409: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

409

University Database

Registrar: to record student and grade Bursar: to record tuition payments by

students Human Resources Department: to record

employees Other department….

Page 410: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

410

Inconvenient

Record grades for students who pay tuition

Want to swim in SJSU aquatic center for free in summer vacation?(all the cases above cannot achieve the function by a single database)

Solution: one database

Page 411: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

411

How to integrate

Start overbuild one database: contains all the legacy databases; rewrite all the applicationsresult: painful

Build a layer of abstraction (middleware)on top of all the legacy databasesthis layer is often defined by a collection of classes

BUT…

Page 412: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

412

Heterogeneity Problem

What is Heterogeneity ProblemAardvark Automobile Co. 1000 dealers has 1000 databasesto find a model at another dealercan we use this command:

SELECT * FROM CARS WHERE MODEL=“A6”;

Page 413: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

413

Type of Heterogeneity

Communication Heterogeneity Query-Language Heterogeneity Schema Heterogeneity Data type difference Value Heterogeneity Semantic Heterogeneity

Page 414: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

414

Conclusion

One database system is perfect, but impossible

Independent database is inconvenient Integrate database

1. start over2. middleware

heterogeneity problem

Page 415: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

Chapter 21.2Modes of Information Integration

ID: 219Name: Qun YuClass: CS257 219 Spring 2009Instructor: Dr. T.Y.Lin

Page 416: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

416

Content Index

21.2 Modes of Information Integration21.2.1 Federated Database Systems

21.2.2 Data Warehouses

21.2.3 Mediators

Page 417: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

417 Federations

The simplest architecture for integrating several DBs

One to one connections between all pairs of DBs

n DBs talk to each other, n(n-1) wrappers are needed

Good when communications between DBs are limited

Page 418: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

418

Wrapper : a software translates incoming queries and outgoing answers. In a result, it allows information sources to conform to some shared schema.

Wrapper

Page 419: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

419 Federations DiagramDB2DB1

DB3 DB4

2 Wrappers

2 Wrappers

2 Wrappers

2 Wrappers

2 Wrappers 2 Wrappers

A federated collection of 4 DBs needs 12 components to translate queries from one to another.

Page 420: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

420

Example

Car dealers want to share their inventory. Each dealer queries the other’s DB to find the needed car.

Dealer-1’s DB relation: NeededCars(model,color,autoTrans)

Dealer-2’s DB relation: Auto(Serial, model, color)

Options(serial,option)

Dealer-1’s DB

Dealer-2’s DB

wrapper

wrapper

Page 421: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

421

Example…

Dealer 1 queries Dealer 2 for needed cars

For(each tuple(:m,:c,:a) in NeededCars){

if(:a=TRUE){/* automatic transmission wanted */

SELECT serial

FROM Autos, Options

WHERE Autos.serial = Options.serial AND Options.option = ‘autoTrans’

AND Autos.model = :m AND Autos.color =:c;

}

Else{/* automatic transmission not wanted */

SELECT serial

FROM Auto

WHERE Autos.model = :m AND

Autos.color = :c AND

NOT EXISTS( SELECT * FROM Options WHERE serial = Autos.serial

AND option=‘autoTrans’);

}

}

Page 422: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

422

Data Warehouse

Sources are translated from their local schema to a global schema and copied to a central DB.

User transparent: user uses Data Warehouse just like an ordinary DB

User is not allowed to update Data Warehouse

Page 423: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

423

Warehouse Diagram

Warehouse

Extractor Extractor

Source 1 Source 2

User query

result

Combiner

Page 424: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

424

Example

Construct a data warehouse from sources DB of 2 car dealers:

Dealer-1’s schema: Cars(serialNo, model,color,autoTrans,cdPlayer,…)Dealer-2’s schema: Auto(serial,model,color)

Options(serial,option)

Warehouse’s schema: AutoWhse(serialNo,model,color,autoTrans,dealer)

Extractor --- Query to extract data from Dealer-1’s data:

INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)SELECT serialNo,model,color,autoTrans,’dealer1’ From Cars;

Page 425: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

425

Example

Extractor --- Query to extract data from Dealer-2’s data:

INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)SELECT serialNo,model,color,’yes’,’dealer2’ FROM Autos,OptionsWHERE Autos.serial=Options.serial AND

option=‘autoTrans’;

INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer)SELECT serialNo,model,color,’no’,’dealer2’ FROM AutosWHERE NOT EXISTS ( SELECT * FROM serial =Autos.serial AND option = ‘autoTrans’);

Page 426: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

426

Construct Data Warehouse

1) Periodically reconstructed from the current data in the sources, once a night or at even longer intervals.

Advantages:

simple algorithms.

Disadvantages:

1) need to shut down the warehouse;

2) data can become out of date.

There are mainly 3 ways to constructing the data in the warehouse:

Page 427: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

427

Construct Data Warehouse

2) Updated periodically based on the changes(i.e. each night) of the sources.

Advantages:

involve smaller amounts of data. (important when warehouse is large and needs to be modified in a short period)

Disadvantages:

1) the process to calculate changes to the warehouse is complex.

2) data can become out of date.

Page 428: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

428

Construct Data Warehouse

3) Changed immediately, in response to each change or a small set of changes at one or more of the sources.

Advantages:

data won’t become out of date.

Disadvantages:

requires too much communication, therefore, it is generally too expensive.

(practical for warehouses whose underlying sources changes slowly.)

Page 429: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

429

Mediators

Virtual warehouse, which supports a virtual view or a collection of views, that integrates several sources.

Mediator doesn’t store any data. Mediators’ tasks: 1)receive user’s query, 2)send queries to wrappers, 3)combine results from wrappers, 4)send the final result to user.

Page 430: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

430

A Mediator diagram

Mediator

Wrapper Wrapper

Source 1 Source 2

User query

Query

Query

QueryQuery

Result

Result

Result

Result

Result

Page 431: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

431

Example

Same data sources as the example of data warehouse, the mediatorIntegrates the same two dealers’ source into a view with schema:

AutoMed(serialNo,model,color,autoTrans,dealer)

When the user have a query:

SELECT sericalNo, model FROM AkutoMedWhere color=‘red’

Page 432: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

432

Example

In this simple case, the mediator forwards the same query to eachOf the two wrappers.

Wrapper1: Cars(serialNo, model, color, autoTrans, cdPlayer, …)SELECT serialNo,model FROM carsWHERE color = ‘red’;

Wrapper2: Autos(serial,model,color); Options(serial,option)SELECT serial, modelFROM AutosWHERE color=‘red’;

The mediator needs to interprets serial into serialNo, and then returns the union of these sets of data to user.

Page 433: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

433

Example

There may be different options for the mediator to forward user query,for example, the user queries if there are a specific model&color car(i.e. “Gobi”, “blue”).

The mediator decides 2nd query is needed or not based on the result of 1st query. That is, If dealer-1 has the specific car, the mediator doesn’t have to query dealer-2.

Page 434: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

434

SECTIONS 21.4 – 21.5Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin

INFORMATION INTEGRATION

Page 435: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

435

Presentation Outline

21.4 Capability Based Optimization 21.4.1The Problem of Limited Source

Capabilities 21.4.2 A notation for Describing Source

Capabilities 21.4.3 Capability-Based Query-Plan

Selection 21.4.4 Adding Cost-Based Optimization

21.5 Optimizing Mediator Queries 21.5.1 Simplified Adornment Notation 21.5.2 Obtaining Answers for Subgoals 21.5.3 The Chain Algorithm 21.5.4 Incorporating Union Views at the

Mediator

Page 436: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

436

21.4 Capability Based Optimization Introduction

Typical DBMS estimates the cost of each query plan and picks what it believes to be the best

Mediator – has knowledge of how long its sources will take to answer

Optimization of mediator queries cannot rely on cost measure alone to select a query plan

Optimization by mediator follows capability based optimization

Page 437: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

437

21.4.1 The Problem of Limited Source Capabilities Many sources have only Web Based

interfaces Web sources usually allow querying

through a query form E.g. Amazon.com interface allows us to

query about books in many different ways. But we cannot ask questions that are too

general E.g. Select * from books;

Page 438: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

438

21.4.1 The Problem of Limited Source Capabilities (con’t) Reasons why a source may limit the ways

in which queries can be asked Earliest database did not use relational

DBMS that supports SQL queries Indexes on large database may make

certain queries feasible, while others are too expensive to execute

Security reasons E.g. Medical database may answer queries about

averages, but won’t disclose details of a particular patient's information

Page 439: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

439

21.4.2 A Notation for Describing Source Capabilities For relational data, the legal forms of

queries are described by adornments Adornments – Sequences of codes that

represent the requirements for the attributes of the relation, in their standard order f(free) – attribute can be specified or not b(bound) – must specify a value for an

attribute but any value is allowed u(unspecified) – not permitted to specify a

value for a attribute

Page 440: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

440

21.4.2 A notation for Describing Source Capabilities….(cont’d)

c[S](choice from set S) means that a value must be specified and value must be from finite set S.

o[S](optional from set S) means either do not specify a value or we specify a value from finite set S

A prime (f’) specifies that an attribute is not a part of the output of the query

A capabilities specification is a set of adornments

A query must match one of the adornments in its capabilities specification

Page 441: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

441

21.4.2 A notation for Describing Source Capabilities….(cont’d)

E.g. Dealer 1 is a source of data in the form:Cars (serialNo, model, color, autoTrans, navi)The adornment for this query form is b’uuuu

Page 442: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

442

21.4.3 Capability-Based Query-Plan Selection Given a query at the mediator, a capability

based query optimizer first considers what queries it can ask at the sources to help answer the query

The process is repeated until: Enough queries are asked at the sources to resolve

all the conditions of the mediator query and therefore query is answered. Such a plan is called feasible.

We can construct no more valid forms of source queries, yet still cannot answer the mediator query. It has been an impossible query.

Page 443: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

443

21.4.3 Capability-Based Query-Plan Selection (cont’d) The simplest form of mediator query

where we need to apply the above strategy is join relations

E.g we have sources for dealer 2 Autos(serial, model, color) Options(serial, option)

Suppose that ubf is the sole adornment for Auto and Options have two adornments, bu and uc[autoTrans, navi]

Query is – find the serial numbers and colors of Gobi models with a navigation system

Page 444: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

444

21.4.4 Adding Cost-Based Optimization

Mediator’s Query optimizer is not done when the capabilities of the sources are examined

Having found feasible plans, it must choose among them

Making an intelligent, cost based query optimization requires that the mediator knows a great deal about the costs of queries involved

Sources are independent of the mediator, so it is difficult to estimate the cost

Page 445: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

445

21.5 Optimizing Mediator Queries Chain algorithm – a greed algorithm that

finds a way to answer the query by sending a sequence of requests to its sources. Will always find a solution assuming at

least one solution exists. The solution may not be optimal.

Page 446: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

446

21.5.1 Simplified Adornment Notation A query at the mediator is limited to b

(bound) and f (free) adornments. We use the following convention for

describing adornments: nameadornments(attributes) where:

name is the name of the relation the number of adornments = the number of

attributes

Page 447: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

447

21.5.2 Obtaining Answers for Subgoals

Rules for subgoals and sources: Suppose we have the following subgoal:

Rx1x2…xn(a1, a2, …, an),

and source adornments for R are: y1y2…yn. If yi is b or c[S], then xi = b. If xi = f, then yi is not output restricted.

The adornment on the subgoal matches the adornment at the source: If yi is f, u, or o[S] and xi is either b or f.

Page 448: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

448

21.5.3 The Chain Algorithm

Maintains 2 types of information: An adornment for each subgoal. A relation X that is the join of the relations for

all the subgoals that have been resolved. Initially, the adornment for a subgoal is b

iff the mediator query provides a constant binding for the corresponding argument of that subgoal.

Initially, X is a relation over no attributes, containing just an empty tuple.

Page 449: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

449

21.5.3 The Chain Algorithm (con’t) First, initialize adornments of subgoals

and X. Then, repeatedly select a subgoal that

can be resolved. Let Rα(a1, a2, …, an) be the subgoal:

1. Wherever α has a b, we shall find the argument in R is a constant, or a variable in the schema of R. Project X onto its variables that appear in R.

Page 450: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

450

21.5.3 The Chain Algorithm (con’t)2. For each tuple t in the project of X, issue a

query to the source as follows (β is a source adornment).

If a component of β is b, then the corresponding component of α is b, and we can use the corresponding component of t for source query.

If a component of β is c[S], and the corresponding component of t is in S, then the corresponding component of α is b, and we can use the corresponding component of t for the source query.

If a component of β is f, and the corresponding component of α is b, provide a constant value for source query.

Page 451: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

451

21.5.3 The Chain Algorithm (con’t)

If a component of β is u, then provide no binding for this component in the source query.

If a component of β is o[S], and the corresponding component of α is f, then treat it as if it was a f.

If a component of β is o[S], and the corresponding component of α is b, then treat it as if it was c[S].

3. Every variable among a1, a2, …, an is now bound. For each remaining unresolved subgoal, change its adornment so any position holding one of these variables is b.

Page 452: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

452

21.5.3 The Chain Algorithm (con’t)4. Replace X with X πs(R), where S is all

of the variables among: a1, a2, …, an.

5. Project out of X all components that correspond to variables that do not appear in the head or in any unresolved subgoal.

If every subgoal is resolved, then X is the answer.

If every subgoal is not resolved, then the algorithm fails.

α

Page 453: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

453

21.5.3 The Chain Algorithm Example Mediator query:

Q: Answer(c) ← Rbf(1,a) AND Sff(a,b) AND Tff(b,c) Example:

Relation R S TData

Adornment bf c’[2,3,5]f bu

w x

1 2

1 3

1 4

x y

2 4

3 5

y z

4 6

5 7

5 8

Page 454: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

454

21.5.3 The Chain Algorithm Example (con’t) Initially, the adornments on the subgoals are

the same as Q, and X contains an empty tuple. S and T cannot be resolved because they each

have ff adornments, but the sources have either a b or c.

R(1,a) can be resolved because its adornments are matched by the source’s adornments.

Send R(w,x) with w=1 to get the tables on the previous page.

Page 455: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

455

21.5.3 The Chain Algorithm Example (con’t) Project the subgoal’s relation onto its

second component, since only the second component of R(1,a) is a variable.

This is joined with X, resulting in X equaling this relation.

Change adornment on S from ff to bf.

a

2

3

4

Page 456: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

456

21.5.3 The Chain Algorithm Example (con’t) Now we resolve Sbf(a,b):

Project X onto a, resulting in X. Now, search S for tuples with attribute a

equivalent to attribute a in X.

Join this relation with X, and remove a because it doesn’t appear in the head nor any unresolved subgoal:

a b

2 4

3 5

b

4

5

Page 457: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

457

21.5.3 The Chain Algorithm Example (con’t) Now we resolve Tbf(b,c):

Join this relation with X and project onto the c attribute to get the relation for the head.

Solution is {(6), (7), (8)}.

b c

4 6

5 7

5 8

Page 458: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

458

21.5.4 Incorporating Union Views at the Mediator This implementation of the Chain

Algorithm does not consider that several sources can contribute tuples to a relation.

If specific sources have tuples to contribute that other sources may not have, it adds complexity.

To resolve this, we can consult all sources, or make best efforts to return all the answers.

Page 459: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

459

21.5.4 Incorporating Union Views at the Mediator (con’t) Consulting All Sources

We can only resolve a subgoal when each source for its relation has an adornment matched by the current adornment of the subgoal.

Less practical because it makes queries harder to answer and impossible if any source is down.

Best Efforts We need only 1 source with a matching

adornment to resolve a subgoal. Need to modify chain algorithm to revisit each

subgoal when that subgoal has new bound requirements.

Page 460: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

460

Presenter:

Namrata Buddhadev (104_224_21.6.1-21.6.7)

Professor:

Dr T Y Lin

Page 461: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

461

Index

21.6 Local-as-View Mediators21.6.1 Motivation for LAV Mediators21.6.2 Terminology for LAV Mediators21.6.3 Expanding Solutions21.6.4 Containment of Conjunctive

Queries21.6.5 Why the Containment-Mapping

Test Works21.6.6 Finding Solutions to a Mediator

Query21.6.7 Why the LMSS Theorem Holds

Page 462: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

462

Local-as-View Mediators

GAV: Global as view mediators are like view, it doesn’t exist physically, but piece of it are constructed by the mediator by asking queries

LAV: Local as view mediators, defines the global predicates at the mediator, but we do not define these predicates as views of the source of data

Global expressions are defined for each source involving global predicates that describe the tuple that source is able to produce and queries are answered at mediator by discovering all possible ways to construct the query using the views provided by sources

Page 463: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

463

Motivation for LAV Mediators

LAV mediators help us to discover how and when to use that source in a given queryExample: Par(c,p)-> GAV of Par(c,p) gives information about the child and parent but does not give information of grandparents

LAV Par(c,p) will help to get information of chlid-parent and even grandparent

Page 464: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

464

Terminology for LAV Mediation

It is in form of logic that serves as the language for defining views.

Datalog is used which will remain common for the queries of mediator and source which is known as Conjunctive query.

LAV has global predicates which are the subgoals of mediator queries

Conjunctive queries defines the views which has unique view predicate and that view has Global predicates and associated with particular view.

Page 465: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

465

Example: Par(c,p)->Global predicate view defined by conjunctive query:

1. V1(c,p)<- Par(c,p)2. Another source produces: V2(c,g)<-

Par(c,p) AND Par(p,g) Query at the mediator ask for great

grandparents facts:1. Q(w,z)<-Par(w,x) AND Par(x,y) AND Par(y,z)2. Or Q(w,z)<-V1(w,x) AND V2(x,z)3. Or Q(w,z)<-V2(w,y) AND V1(y,z)

Page 466: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

466Expanding Solutions Query Q, Solution S, Sub goals : V(a1,a2,..,an)

[can be same]V(b1,b2,..,bn)<-B (Entire Body)[distinct], we can replace V(a1,..an) in solution S by a version of body B that has the sub goals of B with variables possibly altered.

Rules:1. Find local variables of B which are there in the

body but not in the head, we can replace any local variables within the conjunctive query if it does not appear elsewhere in the conjunctive query.

Page 467: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

467

• If there are any local variables of B that appear in B or in S, replace each one by a distinct new variable that appears nowhere in V or in S.

• In the body B, replace each bi, by ai, for i=1,2,..n.

• Example:V(a,b,c,d)<-E(a,b,x,y) AND F(x,y,c,d)here for V, x and y are local so,x, y->e, fso,V(a,b,c,d)<-E(a,b,e,f) AND F(e,f,c,d)a,d ->x, b->y and c->1V(x,y,1,x) has two subgoals E(x,y,e,f) and F(e,f,1,x).

Page 468: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

468

Containment of Conjunctive Queries Conjunctive query S be the solution to the

mediator Q,Expansion of S->E, produces same answers that Q produces, so, E subset Q.

A containment mapping from Q to E is function Γ(x) is the ith argument of the head E.

Add to Γ the rule that Γ(c) =c for any constant c. IF P(x1,x2,..xn) is a subgoal of Q, then P(Γ(x1), Γ(x2),.., Γ(xn)) is a subgoal of E.

Page 469: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

469

Example:

Queries: P1: H(x,y)<-A(x,z) AND A(z,y)P2: H(a,b)<-A(a,c) AND A(c,d) AND A(d,b)consider Γ(x)=a and Γ(y)=b, first subgoal A(x,z) can only map to A(a,c) of P2.1. Γ(z) must be C as A(x,z) can map A(a,c) of P2.2. Γ(z) must be d as Γ(y)=b, subgoal A(z,y) of P1 becomes A(d,b) in P2.So, no containment mapping from P! and P2 exists.

Page 470: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

470

Complexity of the containment Mapping Test :It is NP-complete to decide whether there is an containment mapping from one conjunctive query to another.

Importance of containment mappings is expressed by the theorem:If Q1 and A2 are conjunctive queries, then Q2 is subset or equal to Q1, if and only if there is a containment mapping from Q1 and Q2.

Page 471: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

471Why Containment Mapping Test Works:

Questions:1. If there is containment mapping, why

must there be a containment of conjunctive queries?

2. If there is containment, why must there be a containment mapping?

Page 472: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

472Finding Solutions to a Mediator Query

Query Q, solutions S, Expansion E of S is contained in Q.“If a query Q has n subgoals, then any answer produced by any solution is also produced by a solution that has at most n subgoals.This is known by LMSS Theorem

Page 473: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

473

Example: Q1: Q(w,z)<-Par(w,x) AND Par(x,y) AND

Par(y,z)S1: Q(w,z)<-V1(w,x) AND V2(x,z)S2: Q(w,z)<-V1(w,x) AND V2(x,z) AND V1(t,u) AND V2(u,v)by LMSS, E2: Q(w,z)<-Par(w,x) AND Par(x,p) AND Par(t,u) AND Par(u,q) AND Par(q,v) and E2 is subset or equal to E1 using containment mapping that sends each vairable of E1 to the same variable in E2.

Page 474: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

474

Why the LMSS Theorem Holds

Query Q with n subgoals and S with n subgoals, E of S must be contained in query Q, E is expansion of Q.

S’ must be the solution got after removing all subgoals from S those are not the target of Q.

E subset or equal to Q and also E’ is the expansion of S’.

So, S is subser of S’ : identity mapping. Thus there is no need for solution s among the

solution S among the solutions to query Q.

Page 475: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

Information Integration Entity Resolution – 21.7

Presented By: Deepti Bhardwaj

Roll No: 223_103

Page 476: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

476

Contents

21.7 Entity Resolution21.7.1 Deciding Whether Records

Represent a Common Entity21.7.2 Merging Similar Records21.7.3 Useful Properties of Similarity and

Merge Functions21.7.4 The R-Swoosh Algorithm for ICAR

Records21.7.5 Other Approaches to Entity

Resolution

Page 477: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

477

Introduction

Determining whether two records or tuples do or do not represent the same person, organization, place or other entity is called ENTITY RESOLUTION.

Page 478: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

478Deciding whether Records represent a Common Entity

Two records represent the same individual if the two records have similar values for each of the fields associated with those records.

It is not sufficient that the values of corresponding fields be identical because of following reasons:

1. Misspellings2. Variant Names3. Misunderstanding of Names

Page 479: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

479

Continue: Deciding whether Records represent a Common Entity

4. Evolution of Values5. Abbreviations

Thus when deciding whether two records represent

the same entity, we need to look carefully at the

kinds of discrepancies and use the test thatmeasures the similarity of records.

Page 480: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

480

Deciding Whether Records Represents a Common Entity - Edit Distance

First approach to measure the similarity of records is Edit Distance.

Values that are strings can be compared by counting the number of insertions and deletions of characters it takes to turn one string into another.

So the records represent the same entity if their similarity measure is below a given threshold.

Page 481: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

481

Deciding Whether Records Represents a Common Entity - Normalization

To normalize records by replacing certain substrings by others. For instance: we can use the table of abbreviations and replace abbreviations by what they normally stand for.

Once normalize we can use the edit distance to measure the difference between normalized values in the fields.

Page 482: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

482 Merging Similar Records

Merging means replacing two records that are similar enough to merge and replace by one single record which contain information of both.

There are many merge rules:1. Set the field in which the records disagree to

the empty string.2. (i) Merge by taking the union of the values in

each field(ii) Declare two records similar if at least two of the three fields have a nonempty intersection.

Page 483: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

483

Continue: Merging Similar Records

Name Address Phone1. Susan 123 Oak St. 818-555-1234 2. Susan 456 Maple St. 818-555-12343. Susan 456 Maple St. 213-555-5678

After Merging

Name Address Phone(1-2-3) Susan {123 Oak St.,456 Maple St} {818-555-

1234, 213- 555-5678}

Page 484: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

484

Useful Properties of Similarity and Merge Functions

The following properties say that the merge operation is a semi lattice :

1. Idempotence : That is, the merge of a record with itself should surely be that record.

2. Commutativity : If we merge two records, the order in which we list them should not matter.

3. Associativity : The order in which we group records for a merger should not matter.

Page 485: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

485

Continue: Useful Properties of Similarity and Merge Functions

There are some other properties that we expect similarity relationship to have:

• Idempotence for similarity : A record is always similar to itself

• Commutativity of similarity : In deciding whether two records are similar it does not matter in which order we list them

• Representability : If r is similar to some other record s, but s is instead merged with some other record t, then r remains similar to the merger of s and t and can be merged with that record.

Page 486: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

486 R-swoosh Algorithm for ICAR Records Input: A set of records I, similarity function and a merge function. Output: A set of merged records O. Method:

O:= emptyset; WHILE I is not empty DO BEGIN

Let r be any record in I; Find, if possible, some record s in O that is similar to r; IF no record s exists THEN

move r from I to O ELSE BEGIN

delete r from I; delete s

from O; add the merger of r and s to I;

END; END;

Page 487: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

487

Other Approaches to Entity Resolution

The other approaches to entity resolution are :

Non- ICAR Datasets Clustering Partitioning

Page 488: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

488

Other Approaches to Entity Resolution - Non ICAR Datasets

Non ICAR Datasets : We can define a dominance relation r<=s that means record s contains all the information contained in record r.

If so, then we can eliminate record r from further consideration.

Page 489: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

489

Other Approaches to Entity Resolution - Clustering

Clustering: Some time we group the records into clusters such that members of a cluster are in some sense similar to each other and members of different clusters are not similar.

Page 490: 1 SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

490

Other Approaches to Entity Resolution - Partitioning

Partitioning: We can group the records, perhaps several times, into groups that are likely to contain similar records and look only within each group for pairs of similar records.