chapter 2. data storage

84
Chapter 2 1 Chapter 2. Data Storage

Upload: burke-sanford

Post on 03-Jan-2016

30 views

Category:

Documents


2 download

DESCRIPTION

Chapter 2. Data Storage. Outline. Memory hierarchy Hardware: Disks Access Times Example - Megatron 747 Optimizations Disk failure RAIDs. Users. DBMS’s. Operating Systems. Hardware - Data Storage. As Virtual Memory. Disk. File System. The Memory Hierarchy. DBMS. Programs, - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 2. Data Storage

Chapter 2 1

Chapter 2. Data Storage

Page 2: Chapter 2. Data Storage

Chapter 2 2

Outline

• Memory hierarchy• Hardware: Disks• Access Times• Example - Megatron 747• Optimizations• Disk failure• RAIDs

Page 3: Chapter 2. Data Storage

Chapter 2 3

Operating Systems

DBMS’s

Hardware - Data Storage

Users

Page 4: Chapter 2. Data Storage

Chapter 2 4

The Memory Hierarchy

TertiaryStorage

DiskAsVirtualMemory

FileSystem

Main memory

Cache

DBMS

Programs,Main-memoryDBMS’s

Page 5: Chapter 2. Data Storage

Chapter 2 5

Cache

– The cache is an integrated circuit or part of the processor’s chip

• Holding data or machine instructions• Copy from main-memory

– If data being expelled from the cache has been modified, then the new value must be copied into the main memory.

– Typical performance• Capacities up to a megabyte• Access time: 10 nanoseconds (10-8 seconds)• Moving data bet. Cache and main memory: 100

nanoseconds (10-9 seconds)

Page 6: Chapter 2. Data Storage

Chapter 2 6

Main Memory

• Everything that happens in the computer is resident in main memory

• Capacity: around 100 Mbyte to 10 Gbyte

• Random access– Typical access time is 10-100 nanoseconds

Page 7: Chapter 2. Data Storage

Chapter 2 7

Virtual Memory

• Is a part of disk• In a 32-bit address machine

– Virtual memory grows up to 232 bytes (4 Gbyte)

• Data is moved between disk and main memory in entire blocks, which are also called pages in main memory

• Main-memory database systems

Page 8: Chapter 2. Data Storage

Chapter 2 8

Secondary Storage (1)

• Slower, more capacious than main memory

• Random access• magnetic, optical, magneto-optical disks

• Disk read/write are done by moving a chuck of bytes called blocks (or pages)file

buffer

Page 9: Chapter 2. Data Storage

Chapter 2 9

Secondary Storage (2)

• Accessing a block: 10-30 milliseconds

• Recently, one disk unit can store data

ranging from 10 to 32 Gbytes

• A machine can have several disk units

Page 10: Chapter 2. Data Storage

Chapter 2 10

Tertiary Storage (1)

• Have been developed to hold data

volumes measured in terabytes

• Compared with secondary storage, it

offers

– Higher read/write times

– Larger capacities and smaller cost per byte

• Not random access in general

Page 11: Chapter 2. Data Storage

Chapter 2 11

Tertiary Storage (2)

• Kinds of tertiary storage devices– Ad-hoc tape storage– Optical-disk juke boxes: CD-ROMs– Tape silo: an automated version ad-hoc tape

storage

• Capacities– CD: 2/3 Gbytes, 2.3 Gbytes– Tapes: 50 Gbytes

• Access time: about 1000 times slower than secondary memory

Page 12: Chapter 2. Data Storage

Chapter 2 12

Volatile and Nonvolatile

• Volatile vs. nonvolatile storage

• Flush memory– A form of main memory– Nonvolatile– Becomes economical

• RAM disk– A battery-backed main memory

Page 13: Chapter 2. Data Storage

Chapter 2 13

Access Time vs. Capacity

2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -95

6

7

8

9

10

11

12

13

floppy disk

zip disk

Secondary

Main

Cache

Tertiary

X (10 seconds)

Y (10 y bytes)

Page 14: Chapter 2. Data Storage

Chapter 2 14

Moore’s Law

• Gordon Moore observed that the followings double every 18 months– The speed of processors, i.e., the number of

instructions executed per second and the ratio of the speed to cost of a processor

– The cost of main memory per bit and the number of bits that can be put on one chip

– The cost of disk per bit and the number of bytes that a disk can hold

• Not applicable to– Main memory access time, disk access time

Page 15: Chapter 2. Data Storage

Chapter 2 15

Disks

Terms: Platter, Head, ActuatorCylinder, TrackSector, Block, Gap

A typical disk

Page 16: Chapter 2. Data Storage

Chapter 2 16

Disks: A Top View

• Cylinder, Track, Sector, Gap

• Gaps often represents about 10% of the total tracks

• A entire section cannot be used if portion of it gets destroyed

• Typically a block consists of one or more sectors.

top view

Page 17: Chapter 2. Data Storage

Chapter 2 17

The Disk Controller

Processor

MainMemory

DiskController

Bus

Disks

• Controls one or more disk drives

– controlling the mechanical actuator

– selecting a surface or a sector on that surface

– Transferring bits via a data bus

Page 18: Chapter 2. Data Storage

Chapter 2 18

Disk Storage Characteristics (as of 1999)

• Rotation speed of the disk assembly– 5400 RPM (one rotation every 11 milliseconds)

• Number of platters per unit– Typical disk drive: 5 platters (10 surfaces)– Floppy/zip disk: 1 platter (2 surfaces)

• Number of tracks per surface– Have as many as 10,000 tracks– 3.5 inch diskette : 40 tracks

• Number of bytes per track– Common disk: 105 or more bytes– 3.5 inch diskette: 150K

Page 19: Chapter 2. Data Storage

Chapter 2 19

Megatron 747 Disk (1)

• Characteristics

– Have 4 platters (8

surfaces)

– 8192 (213) tracks per

surface

– On average 256 (28)

sectors per track

– 512 (29) bytes per sector

– Diameters of tracks

• outermost track is 3.5

inches

• innermost track is 1.5

inches

– Track consists of two parts

• gap: 10 %• data: 90%

Page 20: Chapter 2. Data Storage

Chapter 2 20

Megatron 747 Disk (2)

• The capacity of the disk– 8 surfaces * 8192 tracks * 256 sectors * 512

bytes = 8G bytes

• A single track on average– 256 sectors * 512 bytes = 128K bytes = 1 Mbits

• A cylinder is of 1 Mbytes on average • If a block is 4096 bytes (212)

– A block uses 8 sectors (= 4096 bytes / 512 bytes)

– A track consists of 32 blocks (= 256 sectors / 8)

Page 21: Chapter 2. Data Storage

Chapter 2 21

Megatron 747 Disk (3)

– If each track had the same number (i.e. 256) of sectors, then the density of bits around the tracks would be greater

• Length of the outermost track

– 0.9 * 3.5 * ≒ 9.9 inch– 1 megabit / 9.9 ≒

100,000 bits per inch

• Length of the innermost track

– 0.9 * 1.5 * ≒ 4.2 inch– 1 megabit /4.2 ≒ 250,000

bits per inch

– Each track in Megatron 747 has the different numbers of sectors

• outer: 320 sectors• middle: 250 sectors• inner: 192 sectors• The outermost track

– 1,801,800 bit / 9.9 ≒ 182,000 bpi

• The innermost track– 47,880 bit / 4.2 ≒ 114,000

bpi

Page 22: Chapter 2. Data Storage

Chapter 2 22

The Latency of The Disk

• Disk access time– seek time– rotational delay– transfer time– others

block xin memory

disk access time

I wantblock X

Page 23: Chapter 2. Data Storage

Chapter 2 23

Seek Time

• The time to position the head assembly at the proper cylinder– 0(zero): already to be at the proper cylinder– Otherwise: move to be at the proper cylinder

In range3 or 20x

x

1 Max

Cylinders Traveled

Time

Page 24: Chapter 2. Data Storage

Chapter 2 24

Rotational latency Time

• The time for disk to rotate the first of the sectors containing the block

• One rotation takes 10 ms, so rotational latency on average 5 ms.

Head Here

Block I Want

Page 25: Chapter 2. Data Storage

Chapter 2 25

Transfer Time/Other delays

• Transfer Time– the time to read/writes the data on the

appropriate disk surface– 10 Mbytes per second

• Other delays (here, those are neglected)

– taken by the processor and disk controller

– due to contention for the disk controller

– other delays due to contention

Page 26: Chapter 2. Data Storage

Chapter 2 26

Modifying Blocks

• Not possible to modify a block on disk directly

• Sequence of procedures– Read block (time: rt)– Modify in memory (time: mt)– Write block (time: wt)– Verify (time: vt) if appropriate

• Total time– rt + mt + wt + vt

Page 27: Chapter 2. Data Storage

Chapter 2 27

Example 2.3 (1)

• Let us examine the time to read a 4096-byte block from the Megatron 747 disk

• Characteristic– 4 platters (8 surfaces), 1 surface = 8192 tracks– 1 track = 256 sectors, 1 sector = 512 bytes– Disk rotates at 3840 RPM, one rotation = 1/64

of a second– To move the head assembly

• 1ms (to start and stop)+ 1ms for every 500 cylinders

– Heads move one track in 1.002 ms– To move heads from innermost to outermost

track• 1 + (8192 / 500) = 17.4 ms

Page 28: Chapter 2. Data Storage

Chapter 2 28

Example 2.3 (2)

• Minimum time (the best case)– No seek time, no rotational latency, only

transfer time– Note: 1 track = 256 sectors, 1 sector = 512

bytes– 4096 bytes / 512 bytes = 8 sectors (including

7 gap)– gaps/sectors occupy 10%/90% of track– A track has 256 gaps and 256 sectors– 36 * 7/256 + 324 * 8/256 = 11.109 degrees– (11.109/360)/64 = 4.8e-4 seconds = 0.5 ms

Page 29: Chapter 2. Data Storage

Chapter 2 29

Example 2.3 (3)

• Maximum time (the worst case)– full seek time and rotational latency, plus

transfer time

– full seek time: 17.4 ms– full rotational time: 1/64 of a second = 15.6

ms– transfer time: 0.5 ms– 17.4 + 15.6 + 0.5 = 33.5 ms

Page 30: Chapter 2. Data Storage

Chapter 2 30

Example 2.3 (4)

• Average Time – Transfer time: 0.5 ms– Average rotational

time: half of the full rotation = 7.8 ms

– Average seek time• average distance

traveled = 1/3 of the disk = 2730 cylinders

• 1+ 2730/500 = 6.5ms

– 0.5 + 7.8 + 6.5 = 14.8 ms

4096

2048

00 4096 8192

Averagetravel

Starting track

Page 31: Chapter 2. Data Storage

Chapter 2 31

RAM model vs. I/O model computation

• I/O model computation– Dominance of I/O cost

• Remember, 105 - 106 in-memory operations take the same time as one disk I/O

• Should minimize the number of block accesses

• Data Structure vs. File Processing

Page 32: Chapter 2. Data Storage

Chapter 2 32

Using Secondary Storage Effectively

• In general database– Whole databases are much too large to fit in

main memory– Key parts of databases are buffered in main

memory– Disk I/O’s occur frequently

• Main memory sorts (such as “Quick sort”) are inadequate

Page 33: Chapter 2. Data Storage

Chapter 2 33

Merge Sort

Step List 1 List 2 Output

start 1, 3, 4, 9 2, 5, 7, 8 none

1) 3, 4, 9 2, 5, 7, 8 1

2) 3, 4, 9 5, 7, 8 1,2

3) 4, 9 5, 7, 8 1,2,3

4) 9 5, 7, 8 1,2,3,4

5) 9 7, 8 1,2,3,4,5

6) 9 8 1,2,3,4,5,7

7) 9 none 1,2,3,4,5,7,8

8) none none 1,2,3,4,5,7,8,9

Page 34: Chapter 2. Data Storage

Chapter 2 34

Two-Phase, Multiway Merge-Sort (1)

• Phase 1

– Sort main-memory-sized pieces of the data

• Fill all available main memory with blocks

• Sort the records in main memory

• Write the sorted records

Page 35: Chapter 2. Data Storage

Chapter 2 35

Two-Phase, Multiway Merge-Sort (2)

• Phase 2– Merge all the sorted sublists into a single

sorted list• Find the smallest key among the first remaining

elements of all the lists

• Move the smallest element to the first available position of the output block

• If output block is full, write it to disk and reinitialize the same buffer

• Repeat until all input blocks become exhausted.

Page 36: Chapter 2. Data Storage

Chapter 2 36

Main-memory Organization

Pointersto firstunchosenrecord

Input buffers, one for each sorted list

Select smallest

unchosen for output

Outputbuffers

Page 37: Chapter 2. Data Storage

Chapter 2 37

Merge Sort Example (1)

• Assumption– 10,000,000 tuples, 1 tuple = 100 bytes

– So, 1 Gbyte data

– 50 Mbytes memory available

– 4096 byte blocks, so each block contains 40 records

– Total # of blocks: 250,000

– # of blocks in main memory: 12,800 (= 50*220 / 212)

– Number of sublists• 19 sublists (12,800 blocks) + 1 sublists (6,800 blocks)

– Each block read or write: 15 ms

Page 38: Chapter 2. Data Storage

Chapter 2 38

Merge Sort Example (2)

• Computation– First phase

• Read each of the 250,000 blocks once

• Write 250,000 new blocks

• Total time– (250,000 * 15 ms) * 2 = 7500 seconds = 125 minutes

– Second phase• Similar with the first phase

• Total time: 125 minutes

Page 39: Chapter 2. Data Storage

Chapter 2 39

Improving the Access Time of Secondary Storage

• Place blocks on the same cylinder

• Divide the data among several small

disks

• Mirroring disks

• Use a disk-scheduling algorithm

• Prefetch blocks to main memory in

anticipation of their later use

Page 40: Chapter 2. Data Storage

Chapter 2 40

Organizing Data by Cylinders

• Use several adjacent cylinders

• Read all the blocks on a single track or

on a cylinder consecutively

• Neglect all but the first seek time and

the first rotational latency

Page 41: Chapter 2. Data Storage

Chapter 2 41

Example 2.9 (1)

• Recall examples 2.3 and 2.7• Original data may be stored on consecutive

cylinders• Total # of cylinders: 1000 (= 1Gbytes / 1M bytes)• Main memory can hold 50 cylinders (i.e. 50M)

• To read 50 cylinder data into main memory– 6.5 ms for average seek time– 49 ms for 49 one-cylinder seeks (1 ms each)– 6.4 seconds for transfer of 12,800 blocks

• (12,800 * 0.5 ms) / 1000 = 6.4 seconds

– So, 6.5 + 49 + 6,400 = 6455.5 ms

Page 42: Chapter 2. Data Storage

Chapter 2 42

Example 2.9 (2)

• First phase– Read

• ((6.5 ms + 49 ms + 6.4 seconds) * 20 times) = 2.15 minutes

– Write: The same as reading– Total time: 4.3 minutes

• Second phase – Still takes about 125 minutes (WHY ?)

Page 43: Chapter 2. Data Storage

Chapter 2 43

Using Multiple Disks in place of One

• Use several disks with their independent

heads

• Transfer data at a higher rate

• Roughly speaking, total time could be

divided by the number of disks

Page 44: Chapter 2. Data Storage

Chapter 2 44

Example 2.10 (1)

• Replace one 747 by four 737’s which have one platter and two surfaces

• Assumption– Divide the given records among the four disks

– Occupy 1000 adjacent cylinders on each disk

– Fill ¼ of main memory each disk

– Recall previous examples

• Average seek time and rotational latency: 0

• Number of full memory blocks: 12,800

– ¼ memory size: 3,200 blocks

Page 45: Chapter 2. Data Storage

Chapter 2 45

Example 2.10 (2)

• Computation– First phase

• Transfer time: 3200 * 0.5 ms = 1.6 seconds• Read: (6.5 ms + 49 ms + 1.6 seconds) * 20 = 33

sec.• Write: similar with reading

• Total time: about 1 minute

Page 46: Chapter 2. Data Storage

Chapter 2 46

Example 2.10 (3)

• Second phase– Apply delicate techniques (?) to reduce disk

I/O time• Start comparisons among the 20 lists as soon as

the first element of the block appears in main memory

• Use four output buffers• …

– Total time: about 1 hours (?)

Page 47: Chapter 2. Data Storage

Chapter 2 47

Mirroring Disks

• Two or more disks hold identical copies of

data

• Survive a head crash by either disk

• If we make n copies of a disk, we can read

any n blocks in parallel.

• Using mirror disks does not speed up

writing, but neither does it slow writing

down (to some extent)

Page 48: Chapter 2. Data Storage

Chapter 2 48

Scheduling Requests by the Elevator Algorithm

• Disk controller choose which of several

requests to execute first, to increase

throughput

• Elevator Algorithm

– Proceed in the same direction until the next

cylinder with blocks to access is encountered

– When no requests ahead in direction of travel,

reverse direction

Page 49: Chapter 2. Data Storage

Chapter 2 49

Example 2.11

Cylinder of

Request

First time

available

1000 0

3000 0

7000 0

2000 20

8000 30

5000 40

Cylinder of

Request

Time complete

d

1000 8.3

3000 21.6

7000 38.9

8000 50.2

5000 65.5

2000 80.8

Cylinder of

Request

Time complete

d

1000 8.3

3000 21.6

7000 38.9

2000 58.2

8000 79.5

5000 94.8Arrival times for six block-

access requests

Finishing times for block

accesses using the elevator algorithm

Finishing times for block

accesses using the first-come-

first-served algorithm

Page 50: Chapter 2. Data Storage

Chapter 2 50

Prefetching Data on Track- or Cylinder-sized Chunks

• Can we predict the order in which blocks will be requested from disk ?

• For example,– Devote two block buffers to each list when

merged (when there is plenty of memory)– When a buffer is exhausted, switch to the

other buffer for the same list

Page 51: Chapter 2. Data Storage

Chapter 2 51

Single Buffering

• Single buffering

1)Read B1 Buffer

2)Process Data in Buffer

3)Read B2 Buffer

4)Process Data in Buffer

...

• Computation

– P = time to

process/block

– R = time to read in 1

block

– n = # of blocks

– Single buffer time =

n(P+R)

Page 52: Chapter 2. Data Storage

Chapter 2 52

Single Buffering vs. Double Buffering

Memory:

Disk:

A B C D GE F

A B

done

process

AC

process

B

done

Page 53: Chapter 2. Data Storage

Chapter 2 53

Double Buffering

• Computation

– P = processing time/block

– R = IO time/block

– n = # of blocks

– Double buffering time: R + nP

– Single buffering time: n(R+P)

Page 54: Chapter 2. Data Storage

Chapter 2 54

Prefetching

• Combine prefetching with the cylinder-based strategy– Store the sorted sublists on whole,

consecutive cylinders– Read whole tracks or cylinders whenever we

need some records from a given list

Page 55: Chapter 2. Data Storage

Chapter 2 55

Example 2.14 (1)

• Consider the second phase of the sort• Have in main memory two track-sized

buffers– A track: 128KB– Total space requirement: 128KB * 20 lists * 2 =

5 Mbyte– Read all the blocks on 1000 cylinders (8000

tracks)– Computation

• average seek time : 6.5 ms• the time for disk to rotate once: 15.6 ms• total time (for reading): (6.5 + 15.6) * 8000 = 2.95

minutes

Page 56: Chapter 2. Data Storage

Chapter 2 56

Example 2.14 (2)

• Have in main memory two cylinder-sized buffers per sorted sublist– 1 cylinder = 8 tracks = 128K * 8 = 1M – Use 40 buffers of a megabyte each– 50 megabytes available main memory– Need only do a seek once per cylinder– Read all the block on 1000 cylinders (8000

tracks)– Total time (for reading)

• (6.5 + 8 * 15.6) * 1000 cylinders) = 2.19 minutes

Page 57: Chapter 2. Data Storage

Chapter 2 57

Block Size Selection

• Big block amortize I/O cost

• Big block read in more useless stuff

and takes longer to read

• As memory prices drop, blocks get

bigger…

Page 58: Chapter 2. Data Storage

Chapter 2 58

Disk Failures

• Intermittent failure– An attempt to read or write a sector is unsuccessful,

but with repeated tries we are able to read or write successfully.

• Media decay– A bit or bits are permanently corrupted, and the

sector becomes unreadable.

• Write failure– We can neither write successfully nor can we retrieve

the previously written sector.

• Disk Crash– When a disk becomes unreadable permanently

Page 59: Chapter 2. Data Storage

Chapter 2 59

Checksums (1)

• Each section has additional bits, called the checksum, to check reading or writing operations

• (w, s)• w: the data that is read• s: a status bit

• A simple form of checksum: parity

Page 60: Chapter 2. Data Storage

Chapter 2 60

Checksums (2)

• Example 1 (even parity)– The sequence of bits in a sector : 01101000– The parity bit is 1– Data becomes 011010001

• Example 2 (even parity)– The sequence of bits in a sector : 11101110– The parity bit is 0– Data becomes 111011100

Page 61: Chapter 2. Data Storage

Chapter 2 61

Checksums (3)

• Possible that we cannot detect an error if more than one bit of the sector may be corrupted

• If we use n independent bits as a checksum, then the chance of missing an error is only 1/2n (WHY ?)

Page 62: Chapter 2. Data Storage

Chapter 2 62

Stable Storage (1)

• How to correct errors ?

• Stable storage is a technique for organizing a disk so that media decays or failed writes do not result in permanent loss.– The general idea is that sectors are paired,

and each pair represents one sector-contents X

– As the left (XL) and right (XR) copies

Page 63: Chapter 2. Data Storage

Chapter 2 63

Stable Storage (2)

• Writing policy– Write the value of X into XL

• if status is good, write the value• if status is bad, repeat writing• If fails after a number of times, a media failure in the

sector

– Repeat above scheme for XR

• Reading policy (to obtain the value of X)– Read XL

• if status bad is returned, repeat reading• if status good is returned, take that value as X

– If can’t read XL , repeat above with XR

Page 64: Chapter 2. Data Storage

Chapter 2 64

Recovery from Disk Crashes

• Disk crash is fatal in mission-critical applications

• RAID (redundant arrays of independent disks)– Here, we talk levels 5, 6, and 7

– These RAID schemes also handle failures discussed previously

Page 65: Chapter 2. Data Storage

Chapter 2 65

The Failure Model of Disks

• Mean time to failure represents the length of time by which 50% of a population of disks will have failed catastrophically.– For modern disks, it is about 10 years

Fractionsurviving

Time

Page 66: Chapter 2. Data Storage

Chapter 2 66

RAID Level 1

• To protect against data loss– Use mirroring disks

• The only way data can be lost is if there is a second disk crash while the first crash is being repaired.

Page 67: Chapter 2. Data Storage

Chapter 2 67

How often will a data loss occur?

• Assume– The process of replacing the failed disk

• take 3 hours, 1/8 day, 1/2920 year

– A failure rate of 5% per year

• Probability that the mirror disk will fail during copying– (1/20) * (1/2920) = 1/58,400

• Mean time to a failure involving data loss– One of the two disks will fail once in 5 years

on the average• 5 * 58,400 = 292,000 years

Page 68: Chapter 2. Data Storage

Chapter 2 68

RAID Level 4 (1)

• Use one redundant disks no matter how many data disks there are

• In the redundant disk, the ith block consists of parity checks for the ith blocks of all the data disks

• Use modulo-2 sum: an even paritydisk1: 11110000disk2: 10101010disk3: 00111000

disk4: 01100010

Data disks

Redundant disk

Page 69: Chapter 2. Data Storage

Chapter 2 69

The Algebra of Modulo-2 Sums

• The commutative law– x y = y x

• The associative law– x (y z) = (x y) z

• The all-0 vector of the appropriate length is the identity for – x Ō = Ō x = x

is its own inverse– x x = Ō– If x y = z, y = x z

Page 70: Chapter 2. Data Storage

Chapter 2 70

RAID Level 4: Reading (2)

• Read disks normally. • We could read the redundant disk !

– Example • read disk 2, 3, and 4, and get the contents of disk

1 using modulo-2 sum.

disk2 : 10101010disk3 : 00111000disk4 : 01100010

disk1 : 11110000

Page 71: Chapter 2. Data Storage

Chapter 2 71

RAID Level 4: Writing (3)

• When a block is written, we need to change the redundant disk

• Naïve approach– N-1 reads of blocks not being rewritten– One write of new block– Rewrite new redundant disk– In total, N+1 disk I/O’s

• There is a better way to do that !

Page 72: Chapter 2. Data Storage

Chapter 2 72

Writing Example (4)

• When disk 2 changes from 10101010 to 11001100

disk1 : 11110000disk2 : 10101010disk3 : 00111000

disk4 : 01100010

01100110

disk1 : 11110000disk2 : 11001100disk3 : 00111000

disk4 : 0000010000000100

Modulo-2 sum of old and new bits of disk 2

Modulo-2 sum of old redundant disk and modulo-2 sum of disk 2’s

Page 73: Chapter 2. Data Storage

Chapter 2 73

RAID Level 4: Failure Recovery (5)

• Recomputing any missing data is

simple, and does not depend on which

disk (data or redundant) is failed.

Page 74: Chapter 2. Data Storage

Chapter 2 74

RAID Level 5

• We could treat each disk as the

redundant disk for some of the blocks

– That is, do not have to treat one disk as the

redundant disk and the others as data disks

• When there are n+1 disks (disk 0 – disk

n)

– If (i mod n+1) = j, then we can treat the ith

cylinder of disk j as redundant

Page 75: Chapter 2. Data Storage

Chapter 2 75

Example 2.21 (1)

• How redundant blocks compute for 4 disks (n=3)?– Disk 0

• redundant for block 4, 8, 12, …

– Disk 1 • redundant for block 1, 5, 9, …

– Disk 2• redundant for block 2, 6, 10, …

– Disk 3• redundant for block 3, 7, 11, …

Page 76: Chapter 2. Data Storage

Chapter 2 76

Example 2.21 (2)

• The reading and writing load for each disk is the same– If all blocks are equally likely to be written

• each disk has a 1/4 chance

– If not• each disk has a 1/3 chance

– Each of four disks is involved in ½ of the writes

• 1/4 + 3/4 * 1/3 = 1/2

Page 77: Chapter 2. Data Storage

Chapter 2 77

RAID Level 6 (1)

• To handle with any number of disk crashes – data or redundant

• Here, focused on a simple example, where two simultaneous crashes are correctable and the strategy is based on a simple error-correcting code, Hamming code

• Consider a system with seven disks– data disks: disk 1-4– redundant disks: disk 5-7

Page 78: Chapter 2. Data Storage

Chapter 2 78

RAID Level 6 (2)

• The relationship between data and redundant disks

– Note• every possible column of three 0’s and 1’s, except for the

all-0 column• the columns for the redundant disk have a singe 1• the columns for the data disks each have at least two 1’s

DATA Redundant

Disk Number

1 2 3 4 5 6 7

1 1 1 0 1 0 0

1 1 0 1 0 1 0

1 0 1 1 0 0 1

Page 79: Chapter 2. Data Storage

Chapter 2 79

RAID Level 6 (3)

• The disks with 1 in a row are treated as if they were the entire set of disks in a RAID level 4 scheme.– The bits of disk 5

• are the modulo-2 sum of bits of disk 1,2, and 3– The bits of disk 6

• are the modulo-2 sum of bits of disk 1,2, and 4– The bits of disk 7

• are the modulo-2 sum of bits of disk 1,3, and 4

DATA Redundant

Disk Number

1 2 3 4 5 6 7

1 1 1 0 1 0 0

1 1 0 1 0 1 0

1 0 1 1 0 0 1

Page 80: Chapter 2. Data Storage

Chapter 2 80

RAID Level 6 – Read/Write

• Reading: Just read data from any data disk normally

• Writing– Need to recalculate several redundant disks

Page 81: Chapter 2. Data Storage

Chapter 2 81

A Writing Example (1)

• Writing– Disk 2 is changed to be

0000111– Corresponding redundant

disks• disk 5 and 6

– Using modulo-2 sum• between old and new disk 2• between modulo-2 sum of

disk 2’s and disk 5• between modulo-2 sum of

disk 2’s and disk 6

Disk Contents

1 11110000

2 10101010

3 00111000

4 01000001

5 01100010

6 00011011

7 10001001

Page 82: Chapter 2. Data Storage

Chapter 2 82

A Writing Example (2)

Disk Contents

1 11110000

2 00001111

3 00111000

4 01000001

5 11000111

6 10111110

7 10001001

10101010 (old disk 2)00001111 (new disk 2)10100101 (modulo-2 sum )

10100101 (modulo-2 sum)01100010 (disk 5)11000111 (new disk 5)

10100101 (modulo-2 sum)00011011 (disk 6)10111110 (new disk 6)

Page 83: Chapter 2. Data Storage

Chapter 2 83

RAID Level 6 – Failure Recovery

• Assume that disk a and b fails simultaneously

• Find a row r in which the columns of a and b are different – For example, a has 0 in row r, b has 1 in row r

• Compute the correct b by taking the modulo-2 sum of corresponding bits from all the disks other than b that have 1 in row r.

• Then, compute the correct a

Page 84: Chapter 2. Data Storage

Chapter 2 84

A Recovery Example

– Pick the second row

– Disk 2: • modulo-2 sum of disks 1, 4, and

6• 00001111

– Disk 5: • modulo-2 sum of disks 1, 2, and

3• 11000111

Disk Contents

1 11110000

2 ????????

3 00111000

4 01000001

5 ????????

6 10111110

7 10001001