digital systems architecture topics eece 343-01 eece...
TRANSCRIPT
1
1
Digital Systems ArchitectureEECE 343-01 EECE 292-02
Dependability and RAID
Dr. William H. RobinsonApril 23, 2004
http://eecs.vanderbilt.edu/courses/eece343/
2
TopicsKILLS – BUGS – DEAD!
– TV commercial for RAID bug spray
• Administrative stuff– Turn in Reading Assignment #8– Homework Assignment #4 solutions posted
• More on magnetic storage devices• Dependability defined• RAID• Queuing theory
3
• North Bridge– Connects CPU to
memory & video
• South Bridge– Connects North
Bridge to everythingelse
Pentium 4 Implementation
4
AMD’s Integrated Memory Controller
• http://www.pcstats.com/articleview.cfm?articleid=1469&page=5
• Reduces the time it takes the processor to access memory– Data need only travel between
the processor and the physical memory
• Memory traffic need no longer run between the processor and the Northbridge– The Northbridge traditionally
provides the memory controller with data, and removing this bottleneck increases overall operations
2
5
• CPU Performance: 60% per year
• I/O system performance limited by mechanicaldelays (disk I/O)
< 10% per year (IO per sec or MB per sec)
• Amdahl's Law: system speed-up limited by the slowest part!
10% IO & 10x CPU => 5x Performance (lose 50%)10% IO & 100x CPU => 10x Performance (lose 90%)
• I/O bottleneck:Diminishing fraction of time in CPUDiminishing value of faster CPUs
Adapted from John Kubiatowicz’s CS 252 lecture notes. Copyright © 2003 UCB.
Motivation: Who Cares About I/O?
6
• Why still important to keep CPUs busy vs. IO devices ("CPU time"), as CPUs not costly?
– Moore's Law leads to both large, fast CPUs but also to very small, cheap CPUs
– 2001 Hypothesis: 600 MHz PC is fast enough for Office Tools?– PC slowdown since fast enough unless games, new apps?
• People care more about about storing information and communicating information than calculating
– "Information Technology" vs. "Computer Science"– 1960s and 1980s: Computing Revolution– 1990s and 2000s: Information Age
Big Picture: Who Cares About CPUs?
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
7
I/O SystemsProcessor
Cache
Memory - I/O Bus
MainMemory
I/OController
Disk Disk
I/OController
I/OController
Graphics Network
interruptsinterrupts
Adapted from John Kubiatowicz’s CS 252 lecture notes. Copyright © 2003 UCB. 8
Storage Technology Drivers• Driven by the prevailing computing paradigm
– 1950s: migration from batch to on-line processing– 1990s: migration to ubiquitous computing
• computers in phones, books, cars, video cameras, …• nationwide fiber optical network with wireless tails
• Effects on storage industry– Embedded storage
• smaller, cheaper, more reliable, lower power
– Data utilities• high capacity, hierarchically managed storage
Adapted from John Kubiatowicz’s CS 252 lecture notes. Copyright © 2003 UCB.
3
9
• Disk Latency = Seek Time + Rotation Time + Transfer Time + Controller Overhead
• Seek Time? depends on number of tracks arm has to move, seek speed of disk
• Rotation Time? depends on speed disk rotates, how far sector is from head
• Transfer Time? depends on data rate (bandwidth) of disk (bit density), size of request
Platter
Arm
Actuator
HeadSectorInnerTrack
OuterTrack
ControllerSpindle
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Disk Device Performance
10
• Average distance sector from head?
• 1/2 time of a rotation– 10000 Revolutions Per Minute ⇒ 166.67 Rev/sec– 1 revolution = 1/ 166.67 sec ⇒ 6.00 milliseconds– 1/2 rotation (revolution) ⇒ 3.00 ms
• Average number of tracks arm moves?– Sum all possible seek distances
from all possible tracks / # possible• Assumes average seek distance is random
– Disk industry standard benchmarkAdapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Disk Device Performance
11
• To keep things simple, originally kept same number of sectors per track
– Since outer track longer, lower Bits Per Inch (BPI)
• Competition ⇒ decided to keep BPI the same for all tracks (“constant bit density”)
⇒ More capacity per disk⇒ More of sectors per track towards edge⇒ Since disk spins at constant speed,
outer tracks have faster data rate
• Bandwidth outer track 1.7X inner track!– Inner track highest density, outer track lowest, so not really constant– 2.1X length of track outer / inner, 1.7X bits outer / inner
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Data Rate: Inner vs. Outer Tracks
12
SectorTrack
Cylinder
HeadPlatter
• Purpose:– Long-term, nonvolatile storage– Large, inexpensive, slow level in
the storage hierarchy
• Characteristics:– Seek Time (~8 ms avg)
• positional latency• rotational latency
• Transfer rate– 10-40 MByte/sec– Blocks
• Capacity– Gigabytes– Quadruples every 2 years
(aerodynamics)
7200 RPM = 120 RPS => 8 ms per revavg rot. latency = 4 ms
128 sectors per track => 0.25 ms per sector1 KB per sector => 16 MB / s
Response time= Queue + Controller + Seek + Rot + Xfer
Service time
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Devices: Magnetic Disks
4
13
• Capacity+ 100%/year (2X / 1.0 yrs)
• Transfer rate (BW)+ 40%/year (2X / 2.0 yrs)
• Rotation + seek time– 8%/ year (1/2 in 10 yrs)
• MB/$> 100%/year (2X / <1.5 yrs)Fewer chips + areal density
Adapted from John Kubiatowicz’s CS 252 lecture notes. Copyright © 2003 UCB.
Disk Performance Model /Trends
14
• Bits recorded along a track– Metric is Bits Per Inch (BPI)
• Number of tracks per surface– Metric is Tracks Per Inch (TPI)
• Disk designs brag about bit density per unit area– Metric is Bits Per Square Inch– Called Areal Density– Areal Density = BPI x TPI
Areal Density
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
15
Year Areal Density1973 1.71979 7.71989 631997 30902000 17100
1
10
100
1000
10000
100000
1970 1980 1990 2000
Year
Are
al D
ensi
– Areal Density = BPI x TPI– Change slope from (30%/yr) to (60%/yr) about 1991
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Areal Density
16
Examples on why precise definitions are so important for reliability
• Is a programming mistake a fault, error, or failure?– Are we talking about the time it was designed
or the time the program is run? – If the running program doesn’t exercise the mistake,
is it still a fault/error/failure?
• If an alpha particle hits a DRAM memory cell, is it a fault/error/failure if it doesn’t change the value?
– Is it a fault/error/failure if the memory doesn’t access the changed bit?
– Did a fault/error/failure still occur if the memory had error correction and delivered the corrected value to the CPU?
Adapted from David Patterson’s CS 252 lecture notes. Copyright © 2001 UCB.
Definitions
5
17
• Computer system dependability: quality of delivered service such that reliance can be placed on service
• Service is observed actual behavior as perceived by other system(s) interacting with this system’s users
• Each module has ideal specified behavior, where service specification is agreed description of expected behavior
• A system failure occurs when the actual behavior deviates from the specified behavior
• Failure occurred because an error, a defect in module• The cause of an error is a fault• When a fault occurs it creates a latent error, which becomes effective
when it is activated• When error actually affects the delivered service, a failure occurs (time
from error to failure is error latency)
Adapted from David Patterson’s CS 252 lecture notes. Copyright © 2001 UCB.
IFIP Standard Terminology
18
• A fault creates one or more latent errors• Properties of errors are
– A latent error becomes effective once activated– An error may cycle between its latent and effective states– An effective error often propagates from one component to
another, thereby creating new errors
• Effective error is either a formerly-latent error in that component or it propagated from another error
• A component failure occurs when the error affects the delivered service
• These properties are recursive, and apply to any component in the system
• An error is manifestation in the system of a fault, a failure is manifestation on the service of an error
Adapted from David Patterson’s CS 252 lecture notes. Copyright © 2001 UCB.
Fault vs. (Latent) Error vs. Failure
19
• An error is manifestation in the system of a fault, a failure is manifestation on the service of an error
• Is a programming mistake a fault, error, or failure?– Are we talking about the time it was designed
or the time the program is run?– If the running program doesn’t exercise the mistake,
is it still a fault/error/failure?
• A programming mistake is a fault• The consequence is an error (or latent error) in the
software• Upon activation, the error becomes effective• When this effective error produces erroneous data
which affect the delivered service, a failure occurs
Adapted from David Patterson’s CS 252 lecture notes. Copyright © 2001 UCB.
Fault vs. (Latent) Error vs. Failure
20
• An error is manifestation in the system of a fault, a failure is manifestation on the service of an error
• If an alpha particle hits a DRAM memory cell, is it a fault/error/failure if it doesn’t change the value?
– Is it a fault/error/failure if the memory doesn’t access the changed bit?
– Did a fault/error/failure still occur if the memory had error correction and delivered the corrected value to the CPU?
• An alpha particle hitting a DRAM can be a fault• If it changes the memory, it creates an error• Error remains latent until effected memory word is read• If the effected word error affects the delivered service,
a failure occurs
Adapted from David Patterson’s CS 252 lecture notes. Copyright © 2001 UCB.
Fault vs. (Latent) Error vs. Failure
6
21
• Fault Tolerance (or more properly, Error Tolerance): mask local faults(prevent errors from becoming failures)
– RAID disks– Uninterruptible Power Supplies
• Disaster Tolerance: masks site errors(prevent site errors from causing service failures)
– Protects against fire, flood, sabotage,..– Redundant system and service at remote site.– Use design diversity
From Jim Gray’s “Talk at UC Berkeley on Fault Tolerance " 11/9/00
Adapted from David Patterson’s CS 252 lecture notes. Copyright © 2001 UCB.
Fault Tolerance vs. Disaster Tolerance
22
14”10”5.25”3.5”
3.5”
Disk Array: 1 disk design
Conventional: 4 disk designs
Low End High End
Katz and Patterson asked in 1987: Can smaller disks be used to close gap in performance between disks and CPUs?
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Use Arrays of Small Disks?
23
Low cost/MBHigh MB/volumeHigh MB/wattLow cost/Actuator
Cost and Environmental Efficiencies
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Advantages of Small FormfactorDisk Drives
24
Capacity Volume PowerData Rate I/O Rate MTTF Cost
IBM 3390K20 GBytes97 cu. ft.
3 KW15 MB/s
600 I/Os/s250 KHrs
$250K
IBM 3.5" 0061320 MBytes
0.1 cu. ft.11 W
1.5 MB/s55 I/Os/s50 KHrs
$2K
Disk Arrays have potential for large data and I/O rates, high MBper cu. ft., high MB per KW, but what about reliability?
x7023 GBytes11 cu. ft.
1 KW120 MB/s
3900 IOs/s??? Hrs$150K
9X3X
8X
6X
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks)
7
25
• Reliability of N disks = Reliability of 1 Disk ÷ N
50,000 Hours ÷ 70 disks = 700 hours
Disk system MTTF: Drops from 6 years to 1 month!
• Arrays (without redundancy) too unreliable to be useful!
Hot spares support reconstruction in parallel with access: very high media availability can be achievedHot spares support reconstruction in parallel with access: very high media availability can be achieved
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Array Reliability
26
• Files are "striped" across multiple disks
• Redundancy yields high data availability– Availability: service still provided to user, even if some components
failed
• Disks will still fail
• Contents reconstructed from data redundantly stored in the array
⇒ Capacity penalty to store redundant info⇒ Bandwidth penalty to update redundant info
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Redundant Arrays of (Inexpensive) Disks
27
RAID Levels 0 Through 5
Backup and parity drives are shaded
28
recoverygroup
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Redundant Arrays of Inexpensive Disks RAID 1: Disk Mirroring/Shadowing
• Each disk is fully duplicated onto its “mirror”– Very high availability can be achieved
• Bandwidth sacrifice on write:– Logical write = two physical writes
• Reads may be optimized• Most expensive solution: 100% capacity overhead
8
29
Redundant Arrays of Inexpensive Disks RAID 2: Hamming Code
• Split each byte into a pair of 4-bit nibbles– Add Hamming code to each– Parity bits are 1, 2, 4
• Provides single-bit error correction• Drawbacks
– Requires all drives to be rotationally synchronized– Requires a large number of drives to reduce overhead– Controller complexity to calculate checksum
30
P100100111100110110010011
. . .logical record 1
0010011
11001101
10010011
11001101
P contains sum ofother disks per stripe mod 2 (“parity”)If disk fails, subtract P from sum of other disks to find missing information
Striped physicalrecords
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Redundant Arrays of Inexpensive Disks RAID 3: Parity Disk
31
• Sum computed across recovery group to protect against hard disk failures, stored in P disk
• Logically, a single high capacity, high transfer rate disk: good for large transfers
• Wider arrays reduce capacity costs, but decreases availability
• 33% capacity cost for parity in this configuration
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
RAID 3
32
• RAID 3 relies on parity disk to discover errors on Read
• But every sector has an error detection field
• Rely on error detection field to catch errors on read, not on the parity disk
• Allows independent reads to different disks simultaneously
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Inspiration for RAID 4
9
33
D0 D1 D2 D3 P
D4 D5 D6 PD7
D8 D9 PD10 D11
D12 PD13 D14 D15
PD16 D17 D18 D19
D20 D21 D22 D23 P...
.
.
.
.
.
.
.
.
.
.
.
.Disk Columns
IncreasingLogicalDisk
Address
Stripe
Insides of 5 disksInsides of 5 disks
Example:small read D0 & D5, large write D12-D15
Example:small read D0 & D5, large write D12-D15
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Redundant Arrays of Inexpensive Disks RAID 4: High I/O Rate Parity
34
• RAID 4 works well for small reads• Small writes (write to one disk):
– Option 1: read other data disks, create new sum and write to Parity Disk
– Option 2: since P has old sum, compare old data to new data, addthe difference to P
• Small writes are limited by Parity Disk: Write to D0, D5 both also write to P disk
D0 D1 D2 D3 P
D4 D5 D6 PD7
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Inspiration for RAID 5
35
Independent writespossible because ofinterleaved parity
Independent writespossible because ofinterleaved parity
D0 D1 D2 D3 P
D4 D5 D6 P D7
D8 D9 P D10 D11
D12 P D13 D14 D15
P D16 D17 D18 D19
D20 D21 D22 D23 P...
.
.
.
.
.
.
.
.
.
.
.
.Disk Columns
IncreasingLogicalDisk
Addresses
Example: write to D0, D5 uses disks 0, 1, 3, 4
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Redundant Arrays of Inexpensive Disks RAID 5: High I/O Rate Interleaved Parity
36
D0 D1 D2 D3 PD0'
+
+
D0' D1 D2 D3 P'
newdata
olddata
old parity
XOR
XOR
(1. Read) (2. Read)
(3. Write) (4. Write)
RAID-5: Small Write Algorithm1 Logical Write = 2 Physical Reads + 2 Physical Writes
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Problems of Disk Arrays: Small Writes
10
37
• RAID-I (1989) – Consisted of a Sun 4/280
workstation with 128 MB of DRAM, four dual-string SCSI controllers, 28 5.25-inch SCSI disks and specialized disk striping software
• Today RAID is $19 billion dollar industry, 80% nonPC disks sold in RAIDs
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
Berkeley History: RAID-I
38
• Disk Mirroring, Shadowing (RAID 1)
Each disk is fully duplicated onto its "shadow"
Logical write = two physical writes
100% capacity overhead
• Parity Data Bandwidth Array (RAID 3)
Parity computed horizontally
Logically a single high data bw disk
• High I/O Rate Parity Array (RAID 5)Interleaved parity blocks
Independent reads and writes
Logical write = 2 reads + 2 writes
10010011
11001101
10010011
00110010
10010011
10010011
Adapted from David Culler’s CS 252 lecture notes. Copyright © 2002 UCB.
RAID Techniques Summary: Goal was performance, popularity due to reliability of storage
39
• Queueing Theory applies to long term, steady state behavior ⇒ Arrival rate = Departure rate
• Little’s Law: Mean number tasks in system = arrival rate x mean response time
– Observed by many, Little was first to prove– Simple interpretation: you should see the same number of tasks in
queue when entering as when leaving.
• Applies to any system in equilibrium, as long as nothing in black box is creating or destroying tasks
“Black Box”Queueing
System
Arrivals Departures
Adapted from John Kubiatowicz’s CS 252 lecture notes. Copyright © 2003 UCB.
Introduction to Queueing Theory
40
• Server spends a variable amount of time with customers– Weighted mean m1 = (f1 x T1 + f2 x T2 +...+ fn x Tn)/F
= Σ p(T)xT– σ2 = (f1 x T12 + f2 x T22 +...+ fn x Tn2)/F – m12
= Σ p(T)xT2 - m12
– Squared coefficient of variance: C = σ2/m12
• Unitless measure (100 ms2 vs. 0.1 s2)• Exponential distribution C = 1 : most short relative to
average, few others long; 90% < 2.3 x average, 63% < averageHypoexponential distribution C < 1 : most close to average, C=0.5 => 90% < 2.0 x average, only 57% < averageHyperexponential distribution C > 1 : further from average C=2.0 => 90% < 2.8 x average, 69% < average
Avg.
Avg.
0
Proc IOC Device
Queue serverSystem
Adapted from John Kubiatowicz’s CS 252 lecture notes. Copyright © 2003 UCB.
A Little Queuing Theory:Use of Random Distributions
11
41
• Disk response times C ≈ 1.5 (majority seeks < average)• Yet usually pick C = 1.0 for simplicity
– Memoryless, exponential distribution– Many complex systems well described by memoryless distribution!
• Another useful value is average time must wait for server to complete current task: m1(z)
– Called “Average Residual Wait Time”– Not just 1/2 x m1 because doesn’t capture variance– Can derive m1(z) = 1/2 x m1 x (1 + C)– No variance ⇒ C= 0 => m1(z) = 1/2 x m1– Exponential ⇒ C= 1 => m1(z) = m1
Proc IOC Device
Queue serverSystem
Avg.
0 Time
Adapted from John Kubiatowicz’s CS 252 lecture notes. Copyright © 2003 UCB.
A Little Queuing Theory:Variable Service Time
42
• Calculating average wait time in queue Tq:– All customers in line must complete; avg time: m1≡Tser= 1/µ– If something at server, it takes to complete on average m1(z)
• Chance server is busy = u=λ/µ; average delay is u x m1(z)Tq = u x m1(z) + Lq x TserTq = u x m1(z) + λ x Tq x TserTq = u x m1(z) + u x TqTq x (1 – u) = m1(z) x uTq = m1(z) x u/(1-u) = Tser x {1/2 x (1+C)} x u/(1 – u))
Notation:λ average number of arriving customers/secondTser average time to service a customeru server utilization (0..1): u = λ x TserTq average time/customer in queueLq average length of queue:Lq= λ x Tqm1(z) average residual wait time = Tser x {1/2 x (1+C)}
Little’s Law
Defn of utilization (u)
Adapted from John Kubiatowicz’s CS 252 lecture notes. Copyright © 2003 UCB.
A Little Queuing Theory:Average Wait Time
43
• Assumptions so far:– System in equilibrium– Time between two successive arrivals in line are random– Server can start on next customer immediately after prior finishes– No limit to the queue: works First-In-First-Out– Afterward, all customers in line must complete; each avg Tser
• Described “memoryless” or Markovian request arrival (M for C=1 exponentially random), General service distribution (no restrictions), 1 server: M/G/1 queue
• When Service times have C = 1, M/M/1 queueTq = Tser x u x(1 + C) /(2 x (1 – u)) = Tser x u /(1 – u)Tser average time to service a customeru server utilization (0..1): u = r x TserTq average time/customer in queue
Adapted from John Kubiatowicz’s CS 252 lecture notes. Copyright © 2003 UCB.
A Little Queuing Theory: M/G/1 and M/M/1
44
A Little Queuing Theory: An Example
• Processor sends 10 x 8KB disk I/Os per second, requests & service exponentially distrib., avg. disk service = 20 ms
• On average, how utilized is the disk?– What is the number of requests in the queue?– What is the average time spent in the queue?– What is the average response time for a disk request?
• Notation:λ average number of arriving customers/second = 10Tser average time to service a customer = 20 ms (0.02s)u server utilization (0..1): u = λ x Tser= 10/s x .02s = 0.2Tq average time/customer in queue = Tser x u / (1 – u)
= 20 x 0.2/(1-0.2) = 20 x 0.25 = 5 ms (0 .005s)Tsys average time/customer in system: Tsys =Tq +Tser= 25 msLq average length of queue:Lq= λ x Tq
= 10/s x .005s = 0.05 requests in queueLsys average # tasks in system: Lsys = λ x Tsys = 10/s x .025s = 0.25
Adapted from John Kubiatowicz’s CS 252 lecture notes. Copyright © 2003 UCB.
12
45
• Processor access memory over “network”• DRAM service properties
– speed = 9cycles+ 2cycles/word– With 8 word cachelines: Tser= 25cycles, µ=1/25=.04 ops/cycle– Deterministic Servicetime! (C=0)
• Processor behavior– CPI=1, 40% memory, 7.5% cache misses – Rate: λ= 1 inst/cycle*.4*.075 = .03 ops/cycle
• Notation:λ average number of arriving customers/cycle=.03 Tser average time to service a customer= 25 cyclesu server utilization (0..1): u = λ x Tser= .03×25=.75Tq average time/customer in queue = Tser x u x (1 + C) /(2 x (1 – u))
= (25 x 0.75 x ½)/(1 – 0.75) = 37.5 cyclesTsys average time/customer in system: Tsys = Tq +Tser= 62.5 cyclesLq average length of queue:Lq= λ x Tq
= .03 x 37.5 = 1.125 requests in queueLsys average # tasks in system :Lsys = λ x Tsys = .03 x 62.5 = 1.875
Adapted from John Kubiatowicz’s CS 252 lecture notes. Copyright © 2003 UCB.
A Little Queuing Theory:Yet Another Example
46
Summary• Disk industry growing rapidly, improves:
– Bandwidth 40%/yr , – Areal density 60%/year, $/MB faster?
• Disk Latency queue + controller + seek + rotate + transfer
• Advertised average seek time benchmark much greater than average seek time in practice
Adapted from John Kubiatowicz’s CS 252 lecture notes. Copyright © 2003 UCB.
47
Things to Do• Homework Assignment #4 posted with
solutions
• Project #3 due on Monday