ims 4212: database implementation 1 dr. lawrence west, management dept., university of central...
TRANSCRIPT
IMS 4212: Database Implementation
1Dr. Lawrence West, Management Dept., University of Central [email protected]
Physical Database Implementation—Topics
• Denormalization
• Partitioning Tables (relations)
• Parallel Processing & RAID
IMS 4212: Database Implementation
2Dr. Lawrence West, Management Dept., University of Central [email protected]
Denormalization
• Denormalizing is the process of reshuffling attributes and sometimes entities to create entities that violate the rules of normalization
• We are trading off (again) storage efficiency and anomaly avoidance for better retrieval efficiency
• Denormalizing includes:
– Storing derived attributes explicitly
– Allowing transitive dependencies (violating second, third, or Boyce-Codd normal form)
– Merging entities in 1:1 relationships
IMS 4212: Database Implementation
3Dr. Lawrence West, Management Dept., University of Central [email protected]
Denormalization (cont.)
• Derived attributes
– Storing derived attributes is one of the most common means of improving processing efficiency
– How many tables/row examinations are avoided by storing total grade points and total credit hours with the STUDENT entity?
– What new operations must be introduced to keep the data current?
– Explicitly storing derived attributes gives rise to new operational business rules to enforce accuracy
StudentIDLastNameFirstNameTotCrHrTotGP
STUDENT
IMS 4212: Database Implementation
4Dr. Lawrence West, Management Dept., University of Central [email protected]
Denormalization (cont.)
• 1:1 Relationships
– It may be possible to collapse data from one entity in a 1:1 relationship into the other.
– Usually the pervasive entity survives
– Alternately, both entities may be retained but the data from one may be copied into the other to avoid a table look-up
FacultyIDLastNameFirstNamePhoneDept. <FK>
PROFESSOR
BuildingRoomWindowLengthWidthLastPaintFacultyID <FK>
OFFICE
Has
BuildingRoomWindowLengthWidthLastPaintFacultyID <FK>LastNameFirstNamePhoneDept
OFFICE
IMS 4212: Database Implementation
5Dr. Lawrence West, Management Dept., University of Central [email protected]
Denormalizing (cont.)
• 1:M Relationships
– You may consider moving or duplicating attributes from the “one” side of a 1:M relationship into the “many” side
– This will result in considerable data duplication
– Considerations
• There should be many records on the “one” side
• Frequent access should be directly into the “many” side
IMS 4212: Database Implementation
6Dr. Lawrence West, Management Dept., University of Central [email protected]
Denormalization (cont.)
• 1:M Relationships (cont.)
• Similar technique may be used by collapsing or copying attributes into the associative entity between two entities in a M:M relationship
SectionID <FK1>StudentID <FK2>Grade <FK3>LastNameFirstName
ENROLLMENT
StudentIDLastNameFirstName :
STUDENT
Has
IMS 4212: Database Implementation
7Dr. Lawrence West, Management Dept., University of Central [email protected]
Denormalization (cont.)
• The goal of denormalizing is to avoid accessing a (large) table for high frequency critical transactions
• Denormalizing usually requires additional business rules to guarantee that data remains accurate in the face of updates
IMS 4212: Database Implementation
8Dr. Lawrence West, Management Dept., University of Central [email protected]
Partitioning
• Partitioning entities divides one table into many
– Horizontal partitioning
• Each table has all fields from the original table
• Each table has a subset of records
– Vertical partitioning
• Each table has the PK of the original table
• Each table has all records
• Each table has a subset of fields
– May partition both vertically and horizontally
• Very powerful technique with historical data
IMS 4212: Database Implementation
9Dr. Lawrence West, Management Dept., University of Central [email protected]
Partitioning (cont.)
• Horizontal Partitioning
– How many records in the STUDENT table?
– How many of them are currently enrolled?
– How frequently do we need to access both current and former students in the same query or operation?
– It may make sense to partition tables based on a historical context
• Active records vs. archived records
– May also partition based on geographic considerations
– Whole table can be reconstructed using UNION query
IMS 4212: Database Implementation
10Dr. Lawrence West, Management Dept., University of Central [email protected]
Partitioning (cont.)
• Vertical Partitioning
– Librarian, Registrar, Athletic Department, and Health Center may all need a different subset of fields from the STUDENT entity
– It may make sense to create separate tables containing the necessary attributes for each view
– Common PK creates 1:1 cardinality between all tables
– Whole logical record can be assembled using SQL when needed
– We are actually backing into a supertype/subtype relationship
STUDENT
StudentIDLastNameFirstNameMiddleInitial
STUDENTADDRESS
StudentIDHomeStreetHomeCityHomeStateHomeCountryHomeZipLocalStreetLocalCityLocalStateLocalCountryLocalZipLocalPhoneE_Mail
STUDENTE_MAIL
StudentIDE_Mail
STUDENTHEALTHREC
StudentIDBloodTypeHeightWeight
IMS 4212: Database Implementation
11Dr. Lawrence West, Management Dept., University of Central [email protected]
RAID Storage Devices
• In conventional drives data is laid down sequentially along a track in the disk
– Read/Write head must move along the track to read the data
– Each read/write operation must finish before the next can begin
– A drive failure can result in loss of all data
IMS 4212: Database Implementation
12Dr. Lawrence West, Management Dept., University of Central [email protected]
RAID Storage Devices
• RAID is for Redundant Array of Inexpensive Disks
– Multiple disks appear as a single logical drive to the computer
– May be implemented in hardware or software (OS)
• Various RAID levels provide for different levels of performance and redundancy
• Most RAID levels enable the rebuilding of entire lost physical drives through parity storage
IMS 4212: Database Implementation
13Dr. Lawrence West, Management Dept., University of Central [email protected]
RAID Storage Devices—Raid 3
• Records are striped acrossmultiple physical devices– Part of each record is laid down
across multiple physical drives– Much faster Read/Write time since disk rotation needed
to read whole record/block is much shorter
– However only one request can be serviced concurrently– Not commonly used in practice
• A single parity disk allows reconstruction of data on damaged drives
* Image source: Wikipedia
*
IMS 4212: Database Implementation
14Dr. Lawrence West, Management Dept., University of Central [email protected]
RAID Storage Devices—Raid 4
• Blocks are stored independentlyon the drives
– Block A1 can be serviced justby Drive 0
– Simultaneous requests for Blocks B2 or D3 can also be serviced
• A single parity drive enables recovery of lost data
• Write operations may be slower—simultaneous write operations to Drives 0-2 must wait on the parity calculation and writing on Drive 3
* Image source: Wikipedia
*
IMS 4212: Database Implementation
15Dr. Lawrence West, Management Dept., University of Central [email protected]
RAID Storage Devices—Raid 5
• Similar to Raid 4 except thatparity storage is distributedacross multiple drives
– Rotating allocation
– Lessens the chance that writes on two drives will wait on parity updates on a single parity drive
* Image source: Wikipedia
*
IMS 4212: Database Implementation
16Dr. Lawrence West, Management Dept., University of Central [email protected]
Parallel Processing
• More and more computers support parallel processing (multiple CPUs on the same computer)
• Some tasks can be split among multiple processors
• In an SQL SELECT query the usual method requires the RDBMS to scan each record to determine if it matches the WHERE clause or JOIN criteria
• In parallel processing part of the whole table is passed to each processor
• Availability depends on hardware, OS, and RDBMS
1 ...
2 ...
3 ...
4 ...
5 ...
6 ...
7 ...
8 ...
9 ...
On
e P
roce
sso
r
P3
P2
P1