ims 4212: database implementation 1 dr. lawrence west, management dept., university of central...

16
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida [email protected] Physical Database Implementation—Topics Denormalization Partitioning Tables (relations) Parallel Processing & RAID

Upload: alisha-curtis

Post on 03-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

1Dr. Lawrence West, Management Dept., University of Central [email protected]

Physical Database Implementation—Topics

• Denormalization

• Partitioning Tables (relations)

• Parallel Processing & RAID

Page 2: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

2Dr. Lawrence West, Management Dept., University of Central [email protected]

Denormalization

• Denormalizing is the process of reshuffling attributes and sometimes entities to create entities that violate the rules of normalization

• We are trading off (again) storage efficiency and anomaly avoidance for better retrieval efficiency

• Denormalizing includes:

– Storing derived attributes explicitly

– Allowing transitive dependencies (violating second, third, or Boyce-Codd normal form)

– Merging entities in 1:1 relationships

Page 3: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

3Dr. Lawrence West, Management Dept., University of Central [email protected]

Denormalization (cont.)

• Derived attributes

– Storing derived attributes is one of the most common means of improving processing efficiency

– How many tables/row examinations are avoided by storing total grade points and total credit hours with the STUDENT entity?

– What new operations must be introduced to keep the data current?

– Explicitly storing derived attributes gives rise to new operational business rules to enforce accuracy

StudentIDLastNameFirstNameTotCrHrTotGP

STUDENT

Page 4: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

4Dr. Lawrence West, Management Dept., University of Central [email protected]

Denormalization (cont.)

• 1:1 Relationships

– It may be possible to collapse data from one entity in a 1:1 relationship into the other.

– Usually the pervasive entity survives

– Alternately, both entities may be retained but the data from one may be copied into the other to avoid a table look-up

FacultyIDLastNameFirstNamePhoneDept. <FK>

PROFESSOR

BuildingRoomWindowLengthWidthLastPaintFacultyID <FK>

OFFICE

Has

BuildingRoomWindowLengthWidthLastPaintFacultyID <FK>LastNameFirstNamePhoneDept

OFFICE

Page 5: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

5Dr. Lawrence West, Management Dept., University of Central [email protected]

Denormalizing (cont.)

• 1:M Relationships

– You may consider moving or duplicating attributes from the “one” side of a 1:M relationship into the “many” side

– This will result in considerable data duplication

– Considerations

• There should be many records on the “one” side

• Frequent access should be directly into the “many” side

Page 6: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

6Dr. Lawrence West, Management Dept., University of Central [email protected]

Denormalization (cont.)

• 1:M Relationships (cont.)

• Similar technique may be used by collapsing or copying attributes into the associative entity between two entities in a M:M relationship

SectionID <FK1>StudentID <FK2>Grade <FK3>LastNameFirstName

ENROLLMENT

StudentIDLastNameFirstName :

STUDENT

Has

Page 7: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

7Dr. Lawrence West, Management Dept., University of Central [email protected]

Denormalization (cont.)

• The goal of denormalizing is to avoid accessing a (large) table for high frequency critical transactions

• Denormalizing usually requires additional business rules to guarantee that data remains accurate in the face of updates

Page 8: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

8Dr. Lawrence West, Management Dept., University of Central [email protected]

Partitioning

• Partitioning entities divides one table into many

– Horizontal partitioning

• Each table has all fields from the original table

• Each table has a subset of records

– Vertical partitioning

• Each table has the PK of the original table

• Each table has all records

• Each table has a subset of fields

– May partition both vertically and horizontally

• Very powerful technique with historical data

Page 9: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

9Dr. Lawrence West, Management Dept., University of Central [email protected]

Partitioning (cont.)

• Horizontal Partitioning

– How many records in the STUDENT table?

– How many of them are currently enrolled?

– How frequently do we need to access both current and former students in the same query or operation?

– It may make sense to partition tables based on a historical context

• Active records vs. archived records

– May also partition based on geographic considerations

– Whole table can be reconstructed using UNION query

Page 10: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

10Dr. Lawrence West, Management Dept., University of Central [email protected]

Partitioning (cont.)

• Vertical Partitioning

– Librarian, Registrar, Athletic Department, and Health Center may all need a different subset of fields from the STUDENT entity

– It may make sense to create separate tables containing the necessary attributes for each view

– Common PK creates 1:1 cardinality between all tables

– Whole logical record can be assembled using SQL when needed

– We are actually backing into a supertype/subtype relationship

STUDENT

StudentIDLastNameFirstNameMiddleInitial

STUDENTADDRESS

StudentIDHomeStreetHomeCityHomeStateHomeCountryHomeZipLocalStreetLocalCityLocalStateLocalCountryLocalZipLocalPhoneE_Mail

STUDENTE_MAIL

StudentIDE_Mail

STUDENTHEALTHREC

StudentIDBloodTypeHeightWeight

Page 11: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

11Dr. Lawrence West, Management Dept., University of Central [email protected]

RAID Storage Devices

• In conventional drives data is laid down sequentially along a track in the disk

– Read/Write head must move along the track to read the data

– Each read/write operation must finish before the next can begin

– A drive failure can result in loss of all data

Page 12: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

12Dr. Lawrence West, Management Dept., University of Central [email protected]

RAID Storage Devices

• RAID is for Redundant Array of Inexpensive Disks

– Multiple disks appear as a single logical drive to the computer

– May be implemented in hardware or software (OS)

• Various RAID levels provide for different levels of performance and redundancy

• Most RAID levels enable the rebuilding of entire lost physical drives through parity storage

Page 13: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

13Dr. Lawrence West, Management Dept., University of Central [email protected]

RAID Storage Devices—Raid 3

• Records are striped acrossmultiple physical devices– Part of each record is laid down

across multiple physical drives– Much faster Read/Write time since disk rotation needed

to read whole record/block is much shorter

– However only one request can be serviced concurrently– Not commonly used in practice

• A single parity disk allows reconstruction of data on damaged drives

* Image source: Wikipedia

*

Page 14: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

14Dr. Lawrence West, Management Dept., University of Central [email protected]

RAID Storage Devices—Raid 4

• Blocks are stored independentlyon the drives

– Block A1 can be serviced justby Drive 0

– Simultaneous requests for Blocks B2 or D3 can also be serviced

• A single parity drive enables recovery of lost data

• Write operations may be slower—simultaneous write operations to Drives 0-2 must wait on the parity calculation and writing on Drive 3

* Image source: Wikipedia

*

Page 15: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

15Dr. Lawrence West, Management Dept., University of Central [email protected]

RAID Storage Devices—Raid 5

• Similar to Raid 4 except thatparity storage is distributedacross multiple drives

– Rotating allocation

– Lessens the chance that writes on two drives will wait on parity updates on a single parity drive

* Image source: Wikipedia

*

Page 16: IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida lwest@bus.ucf.edu Physical Database Implementation—Topics

IMS 4212: Database Implementation

16Dr. Lawrence West, Management Dept., University of Central [email protected]

Parallel Processing

• More and more computers support parallel processing (multiple CPUs on the same computer)

• Some tasks can be split among multiple processors

• In an SQL SELECT query the usual method requires the RDBMS to scan each record to determine if it matches the WHERE clause or JOIN criteria

• In parallel processing part of the whole table is passed to each processor

• Availability depends on hardware, OS, and RDBMS

1 ...

2 ...

3 ...

4 ...

5 ...

6 ...

7 ...

8 ...

9 ...

On

e P

roce

sso

r

P3

P2

P1