dependability – more on ecc, raidcs61c/sp16/lec/37/2016sp-cs61c-l37... · dependability – more...
TRANSCRIPT
CS61C:GreatIdeasinComputerArchitecture
Dependability– MoreonECC,RAID
VladimirStojanovic&NicholasWeaverhttp://inst.eecs.berkeley.edu/~cs61c/
1
HammingDistance2:DetectionDetectSingleBitErrors
3
• No1biterrorgoestoanothervalidcodeword• ½codewords arevalid
InvalidCodewords
HammingDistance3:CorrectionCorrectSingleBitErrors,DetectDoubleBitErrors
4
•No2biterrorgoestoanothervalidcodeword;1biterrornear• 1/4codewords arevalid
Nearest000
(one1)
Nearest111(one0)
HammingErrorCorrectingCode• Overheadinvolvedinsingleerror-correctioncode• Letp be totalnumberofparitybitsand d numberofdatabitsin p +d bitword
• Ifp errorcorrectionbitsaretopointto errorbit(p +d cases)+indicatethatnoerrorexists(1case),weneed:
2p >=p +d +1,thusp >=log2(p+d+1)forlarged,p approacheslog2(d)
• 8bitsdata=> d=8,2p >= p+8+1=>p>= 4• 16bdata=>5bparity,32bdata=>6bparity,64bdata=>7bparity
5
HammingSingle-ErrorCorrection,Double-ErrorDetection(SEC/DED)
• Adding extraparitybitcoveringtheentireword providesdoubleerrordetectionaswellassingleerrorcorrection1 2 34 5678p1 p2 d1 p3 d2 d3 d4p4
• Hammingparitybits H (p1 p2 p3)arecomputed(evenparityasusual)plusthe evenparityovertheentireword, p4:H=0 p4=0,noerrorH≠0p4=1,correctablesingleerror(oddparityif1error=>p4=1)H≠0p4=0, doubleerroroccurred(evenparityif2errors=>p4=0)H=0 p4=1, singleerroroccurredinp4bit,notinrestofword
TypicalmoderncodesinDRAMmemorysystems:64-bitdatablocks(8bytes)with72-bitcodewords(9bytes).
6
HammingSingleErrorCorrection+DoubleErrorDetection
7
1biterror(one1)Nearest0000
1biterror(one0)Nearest1111
2biterror(two0s,two1s)
HalfwayBetweenBoth
HammingDistance=4
iClicker QuestionThefollowingwordisreceived,encodedwithHammingcode:0 1 10 001
Whatisthecorrecteddatabitsequence?
A.1111B.0001C.1101D.1011E.1000
8
WhatifMoreThan2-BitErrors?
• Networktransmissions,disks,distributedstoragecommonfailuremodeisburstsofbiterrors,notjustoneortwobiterrors– ContiguoussequenceofB bitsinwhichfirst,lastandanynumberofintermediatebitsareinerror
– Causedbyimpulsenoiseorbyfadinginwireless– Effectisgreaterathigherdatarates
• SolvewithCyclicRedundancyCheck(CRC),interleavingorothermoreadvancedcodes
9
iClicker Question
Thefollowingwordisreceived,encodedwithHammingcode:0 1 10 001
checkp1:0 x 1x 0x1– o.k.checkp2:x11xx 01– errorinp2checkp4:x x x 0 001– errorinp4Errorinlocation2+4=6Correctdata:101 1(answerD)
10
CansmallerdisksbeusedtoclosegapinperformancebetweendisksandCPUs?
ArraysofSmallDisks
12
14”10”5.25”3.5”
3.5”
DiskArray:1diskdesign
Conventional:4diskdesigns
LowEnd HighEnd
ReplaceSmallNumberofLargeDiskswithLargeNumberofSmallDisks!(1988Disks)
13
CapacityVolumePowerDataRateI/ORateMTTFCost
IBM3390K20GBytes97cu.ft.3KW15MB/s600I/Os/s250KHrs$250K
IBM3.5"0061320MBytes0.1cu.ft.11W1.5MB/s55I/Os/s50KHrs$2K
x7023GBytes11cu.ft.1KW120MB/s3900IOs/s???Hrs$150K
DiskArrayshavepotentialforlargedataandI/Orates,highMBpercu.ft.,highMBperKW,butwhataboutreliability?
9X3X
8X
6X
RAID:RedundantArraysof(Inexpensive)Disks
• Filesare"striped"acrossmultipledisks• Redundancyyieldshighdataavailability– Availability:servicestillprovidedtouser,evenifsomecomponentsfailed
• Diskswillstillfail• ContentsreconstructedfromdataredundantlystoredinthearrayÞ CapacitypenaltytostoreredundantinfoÞ Bandwidthpenaltytoupdateredundantinfo
14
RedundantArraysofInexpensiveDisksRAID1:DiskMirroring/Shadowing
15
• Eachdiskisfullyduplicatedontoits“mirror”Veryhighavailabilitycanbeachieved
•Writeslimitedbysingle-diskspeed•Readsmaybeoptimized
Mostexpensivesolution:100%capacityoverhead
recoverygroup
RedundantArrayofInexpensiveDisksRAID3:ParityDisk
16
P
100100111100110110010011...
logicalrecord 10100011
11001101
10100011
11001101
Pcontainssumofotherdisksperstripemod2(“parity”)Ifdiskfails,subtractPfromsumofotherdiskstofindmissinginformation
Stripedphysicalrecords
RedundantArraysofInexpensiveDisksRAID4:HighI/ORateParity
D0 D1 D2 D3 P
D4 D5 D6 PD7
D8 D9 PD10 D11
D12 PD13 D14 D15
PD16 D17 D18 D19
D20 D21 D22 D23 P...
.
.
.
.
.
.
.
.
.
.
.
.DiskColumns
IncreasingLogicalDiskAddress
Stripe
Insidesof5disks
Example:smallreadD0&D5, largewriteD12-D15
17
InspirationforRAID5• RAID4workswellforsmallreads• Smallwrites(writetoonedisk):– Option1:readotherdatadisks,createnewsumandwritetoParityDisk
– Option2:sincePhasoldsum,compareolddatatonewdata,addthedifferencetoP
• SmallwritesarelimitedbyParityDisk:WritetoD0,D5bothalsowritetoPdisk
18
D0 D1 D2 D3 P
D4 D5 D6 PD7
RAID5:HighI/ORateInterleavedParity
19
Independentwritespossiblebecauseofinterleavedparity
D0 D1 D2 D3 P
D4 D5 D6 P D7
D8 D9 P D10 D11
D12 P D13 D14 D15
P D16 D17 D18 D19
D20 D21 D22 D23 P...
.
.
.
.
.
.
.
.
.
.
.
.DiskColumns
IncreasingLogicalDiskAddresses
Example:writetoD0,D5usesdisks0,1,3,4
ProblemsofDiskArrays: SmallWrites
D0 D1 D2 D3 PD0'
+
+
D0' D1 D2 D3 P'
newdata
olddata
oldparity
XOR
XOR
(1.Read) (2.Read)
(3.Write) (4.Write)
RAID-5:SmallWriteAlgorithm
1LogicalWrite=2PhysicalReads+2PhysicalWrites
20
RAID-I
• RAID-I(1989)–ConsistedofaSun4/280workstationwith128MBofDRAM,fourdual-stringSCSIcontrollers,285.25-inchSCSIdisksandspecializeddiskstripingsoftware
22
RAIDII• 1990-1993• EarlyNetworkAttached
Storage(NAS)SystemrunningaLogStructuredFileSystem(LFS)
• Impact:– $25Billion/yearin2002– Over$150BillioninRAID
devicesoldsince1990-2002– 200+RAIDcompanies(atthe
peak)– SoftwareRAIDastandard
componentofmodernOSs
23