![Page 1: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/1.jpg)
1
CDA3101 Fall 2013CDA3101 Fall 2013
Computer Storage:
Practical Aspects
6,13 November 2013
Copyright © 2011 Prabhat Mishra
![Page 2: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/2.jpg)
2
Storage SystemsStorage Systems Introduction
Disk Storage
Dependability and Reliability
• I/O Performance
Server Computers
Conclusion
CDA 3101 – Fall 2013 Copyright © 2011 Prabhat Mishra
![Page 3: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/3.jpg)
Case for StorageShift in focus from computation to communication and
storage of information “The Computing Revolution” (1960s to 1980s)
– IBM, Control Data Corp., Cray Research
“The Information Age” (1990 to today)– Google, Yahoo, Amazon, …
Storage emphasizes reliability and scalability as well as cost-performance Program crash – frustrating Data loss is unacceptable dependability is key concern
Which software determines HW features?Operating System for storageCompiler for processor
![Page 4: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/4.jpg)
Cost vs Access time in DRAM/Disk
DRAM is 100,000 times faster, and costs 30-150 times more per gigabyte.
![Page 5: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/5.jpg)
Chapter 6 — Storage and Other I/O Topics — 5
Flash StorageNonvolatile semiconductor storage
100× – 1000× faster than diskSmaller, lower power, more robustBut more $/GB (between disk and DRAM)
§6.4 Flas
h S
tora
ge
![Page 6: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/6.jpg)
6
Hard Disk Drive
![Page 7: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/7.jpg)
Seek Time is not Linear in Distance
• RULE OF THUMB: average seek is the time to access 1/3rd of the number of cylinders -- it is not linear, accelerate arm, pause, decelerate, wait for settle time. -- The average does not work well due to locality property.
Requires 3 revolutions to perform 4 reads(26, 100, 724, 9987)
Requires just 3/4th of a revolution
![Page 8: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/8.jpg)
DependabilityFault: failure of a component
May or may not lead to system failure
Service accomplishmentService delivered
as specified
Service interruptionDeviation from
specified service
FailureRestoration
![Page 9: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/9.jpg)
Dependability MeasuresReliability: mean time to failure (MTTF)
Service interruption: mean time to repair (MTTR)
Mean time between failures (MTBF)MTBF = MTTF + MTTR
Availability = MTTF / (MTTF + MTTR)
Improving Availability Increase MTTF: fault avoidance, fault tolerance, fault
forecasting
Reduce MTTR: improved tools and processes for diagnosis and repair
![Page 10: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/10.jpg)
Disk Access ExampleGiven
512B sector, 15,000rpm, 4ms average seek time, 100MB/s transfer rate, 0.2ms controller overhead, idle disk
Average read time 4ms seek time
+ ½ / (15,000/60) = 2ms rotational latency+ 512 / 100MB/s = 0.005ms transfer time+ 0.2ms controller delay= 6.2ms
If actual average seek time is 1ms Average read time = 3.2ms
![Page 11: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/11.jpg)
Use Arrays of Small Disks?
14”10”5.25”3.5”
3.5”
Disk Array: 1 disk design
Conventional: 4 disk designs
Low End High End
Can smaller disks be used to close gap in performance between disks and CPUs? Improves throughput, latency may not improve
![Page 12: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/12.jpg)
Array Reliability
• Reliability of N disks = Reliability of 1 Disk ÷ N
50,000 Hours ÷ 70 disks = 700 hours
Disk system MTTF: Drops from 6 years to 1 month!
• Arrays (w/o redundancy) too unreliable to be used
Hot spares support reconstruction in parallel with access: very high media availability can be achievedHot spares support reconstruction in parallel with access: very high media availability can be achieved
![Page 13: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/13.jpg)
Redundant Arrays of (Inexpensive) Disks
Files are "striped" across multiple disks
Redundancy yields high data availability
Availability: service still provided to user, even if some components failed
Disks will still fail
Contents reconstructed from data redundantly stored in the arrayCapacity penalty to store redundant information
Bandwidth penalty to update redundant information
![Page 14: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/14.jpg)
RAID 1: Disk Mirroring/Shadowing
• Each disk is fully duplicated onto its “mirror”
Very high availability can be achieved
• Bandwidth sacrifice on write:• Logical write = two physical writes
• Reads may be optimized
• Most expensive solution: 100% capacity overhead
recoverygroup
![Page 15: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/15.jpg)
RAID 10 vs RAID 01Striped mirrors
RAID 1 + 0 For example, four pair of disks for four-disk
data
Mirrored stripes For example, pair of
four disks for
four-disk data RAID 0 + 1
15
![Page 16: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/16.jpg)
RAID 2Memory-style error correcting codes in disksNot used anymore.
Other RAID organizations are more attractive
16
![Page 17: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/17.jpg)
RAID 3: Parity Disk
P
100100111100110110010011
. . .logical record 1
0100011
11001101
10100011
11001101
P contains sum ofother disks per stripe mod 2 (“parity”)If disk fails, subtract P from sum of other disks to find missing information
Striped physicalrecords
![Page 18: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/18.jpg)
Inspiration for RAID 4RAID 3 relies on parity disk to discover errors
on Read
But every sector has an error detection field
To catch errors on read, rely on error detection field vs. the parity disk
Allows independent reads to different disks simultaneously
![Page 19: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/19.jpg)
RAID 4: High I/O Rate Parity
D0 D1 D2 D3 P
D4 D5 D6 PD7
D8 D9 PD10 D11
D12 PD13 D14 D15
PD16 D17 D18 D19
D20 D21 D22 D23 P
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.Disk Columns
IncreasingLogicalDisk
Address
Stripe
Inside of 5 disksInside of 5 disks
Example:small read D0 & D5, large write D12-D15
Example:small read D0 & D5, large write D12-D15
![Page 20: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/20.jpg)
Inspiration for RAID 5RAID 4 works well for small readsSmall writes (write to one disk):
Option 1: read other data disks, create new sum and write to Parity Disk
Option 2: since P has old sum, compare old data to new data, add the difference to P
Small writes are limited by Parity Disk: Write to D0, D5 both also write to P disk
D0 D1 D2 D3 P
D4 D5 D6 PD7
![Page 21: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/21.jpg)
RAID 5: Distributed Parity
N + 1 disksLike RAID 4, but parity blocks distributed
across disksAvoids parity disk being a bottleneck
Widely used
![Page 22: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/22.jpg)
RAID 6: Recovering from 2 failuresWhy > 1 failure recovery?
If operator accidentally replaces the wrong disk during a failure
Since disk bandwidth is growing slowly than disk capacity, the MTTR of a disk is increasing increases the chances of a 2nd failure during repair
since it takes longer– 500 GB SATA disk could take 3 hours to read sequentially
reading much more data during reconstruction meant increasing the chance of an uncorrectable media failure, which would result in data loss
Increasing number of disks, use of ATA disks (slower and larger than SCSI disks).
![Page 23: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/23.jpg)
RAID 6: Recovering from 2 failuresNetwork Appliance’s row-diagonal parity or RAID-DP
Like the standard RAID schemes, it uses redundant space based on parity calculation per stripe
Since it is protecting against a double failure, it adds two check blocks per stripe of data.
If p+1 disks total, p-1 disks have data
Row parity disk is just like in RAID 4
Even parity across other data blocks in its stripe
Each block of the diagonal parity disk contains the even parity of the blocks in the same diagonal
![Page 24: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/24.jpg)
Example p = 5Row diagonal parity starts by recovering one of the 4 blocks
on the failed disk using diagonal parity Since each diagonal misses one disk, and all diagonals miss a
different disk, 2 diagonals are only missing 1 block
Once the data for those blocks are recovered, then the standard RAID recovery scheme can be used to recover two more blocks in the standard RAID 4 stripes
Process continues until two failed disks are restored
![Page 25: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/25.jpg)
I/O - IntroductionI/O devices can be characterized by
Behavior: input, output, storagePartner: human or machineData rate: bytes/sec, transfers/sec
I/O bus connections
![Page 26: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/26.jpg)
I/O System CharacteristicsDependability is important
Particularly for storage devices
Performance measuresLatency (response time)
Throughput (bandwidth)
Desktops & embedded systems
Primary focus is response time & diversity of devices
Servers
Primary focus is throughput & expandability of devices
![Page 27: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/27.jpg)
Typical x86 PC I/O System
![Page 28: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/28.jpg)
I/O Register MappingMemory mapped I/O
Registers are addressed in same space as memory
Address decoder distinguishes between themOS uses address translation mechanism to make
them only accessible to kernel
I/O instructionsSeparate instructions to access I/O registersCan only be executed in kernel modeExample: x86
![Page 29: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/29.jpg)
PollingPeriodically check I/O status register
If device ready, do operationIf error, take action
Common in small or low-performance real-time embedded systemsPredictable timingLow hardware cost
In other systems, wastes CPU time
![Page 30: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/30.jpg)
InterruptsWhen a device is ready or error occurs
Controller interrupts CPUInterrupt is like an exception
But not synchronized to instruction executionCan invoke handler between instructionsCause information often identifies the interrupting
devicePriority interrupts
Devices needing more urgent attention get higher priority
Can interrupt handler for a lower priority interrupt
![Page 31: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/31.jpg)
I/O Data TransferPolling and interrupt-driven I/O
CPU transfers data between memory and I/O data registers
Time consuming for high-speed devices
Direct memory access (DMA)OS provides starting address in memory
I/O controller transfers to/from memory autonomously
Controller interrupts on completion or error
![Page 32: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/32.jpg)
Server ComputersApplications are increasingly run on servers
Web search, office apps, virtual worlds, …
Requires large data center servers
Multiple processors, networks connections, massive storage
Space and power constraints
Server equipment built for 19” racks
Multiples of 1.75” (1U) high
![Page 33: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/33.jpg)
Chapter 6 — Storage and Other I/O Topics — 33
Rack-Mounted Servers
Sun Fire x4150 1U server
![Page 34: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/34.jpg)
Sun Fire x4150 1U server
4 cores each
16 x 4GB = 64GB DRAM
![Page 35: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/35.jpg)
Concluding RemarksI/O performance measures
Throughput, response timeDependability and cost also important
Buses used to connect CPU, memory,I/O controllersPolling, interrupts, DMA
RAIDImproves performance and dependability
Please read Sections 6.1 – 6.10 P&H 4th Ed.
![Page 36: 1 CDA3101 Fall 2013 Computer Storage: Practical Aspects 6,13 November 2013 Copyright © 2011 Prabhat Mishra](https://reader036.vdocuments.net/reader036/viewer/2022062417/5517a26f5503460e6e8b5e28/html5/thumbnails/36.jpg)
THINK: Weekend!!
36
The best way to predict the future is to create it. Peter Drucker
The best way to predict the future is to create it. Peter Drucker