![Page 1: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/1.jpg)
DisksandRAID
CS4410Opera5ngSystems
Spring2017CornellUniversityLorenzoAlvisi
AnneBracy
See:Ch12,14.2inOSPPtextbook
TheslidesaretheproductofmanyroundsofteachingCS4410byProfessorsSirer,Bracy,Agarwal,George,andVanRenesse.
![Page 2: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/2.jpg)
StorageDevices
2
Magne5cdisks•Storagethatrarelybecomescorrupted•Largecapacityatlowcost•Blocklevelrandomaccess•Slowperformanceforrandomaccess•BeKerperformanceforstreamingaccess
Flashmemory•Storagethatrarelybecomescorrupted•Capacityatintermediatecost(50xdisk)•Blocklevelrandomaccess•Goodperformanceforreads;worseforrandomwrites
![Page 3: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/3.jpg)
THAT WAS THEN • 13th September 1956 • The IBM RAMAC 350 • Total Storage = 5 million
characters (just under 5 MB)
hKp://royal.pingdom.com/2008/04/08/the-history-of-computer-data-storage-in-pictures/
THIS IS NOW • 2.5-3.5” hard drive • Example: 500GB Western
Digital Scorpio Blue hard drive
Magnetic Disks are 60 years old!
![Page 4: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/4.jpg)
Readingfromadisk
4
Track
Sector
HeadArm
Arm Assembly
Platter
Surface
Surface
Motor Motor
Spindle
Must specify: • cylinder # (distance from spindle) • surface # • sector # • transfer size • memory address
![Page 5: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/5.jpg)
DiskTracks
5
Track
Sector
HeadArm
Arm Assembly
Platter
Surface
Surface
Motor Motor
Spindle
Track*
~1micronwide(1000nm)•Wavelengthoflightis~0.5micron•Resolu5onofhumaneye:50microns•100Ktracksonatypical2.5”disk
Tracklengthvariesacrossdisk•Outside:•Moresectorspertrack•Higherbandwidth
•Mostofdiskareainouterregionsofdisk
*nottoscale:headisactuallymuchbiggerthanatrack
Sector
![Page 6: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/6.jpg)
Disk overheadsDiskLatency=SeekTime+RotaOonTime+TransferTime
• Seek: to get to the track (5-15 millisecs) • Rotational Latency: to get to the sector (4-8 millisecs)
(on average, only need to wait half a rotation) • Transfer: get bits off the disk (25-50 microsecs)
Track
Sector Seek Time
Rotational Latency
![Page 7: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/7.jpg)
Hard Disks vs. RAM
Hard Disks RAMSmallest write sector wordAtomic write sector word
Random access 5 ms 10-1000 nsSequential access 200 MB/s 200-1000MB/s
Cost $50 / terabyte $5 / gigabytePower reliance (survives power outage?)
Non-volatile (yes)
Volatile (no)
![Page 8: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/8.jpg)
DiskScheduling
8
Objective:minimize seek time
Context: a queue of cylinder numbers (#0-199) Head pointer @ 53
Queue: 98, 183, 37, 122, 14, 124, 65, 67
Metric: how many cylinders traversed?
![Page 9: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/9.jpg)
DiskScheduling:FIFO
9
• Schedulediskoperationsinordertheyarrive• Downsides?
Head pointer @ 53Queue: 98, 183, 37, 122, 14, 124, 65, 67 FIFOSchedule?Totalheadmovement?
![Page 10: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/10.jpg)
DiskScheduling:ShortestSeekTimeFirst
10
• Selectrequestwithminimumseektimefromcurrentheadposition
• AformofShortestJobFirst(SJF)scheduling• Notoptimal:supposeclusterofrequestsatfarendofdisk➜starvation!
Head pointer @ 53Queue: 98, 183, 37, 122, 14, 124, 65, 67 SSTFSchedule?Totalheadmovement?
![Page 11: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/11.jpg)
DiskScheduling:SCAN
11
•Armstartsatoneendofdisk•movestowardotherend,servicingrequests
•movementreversed@endofdisk• repeat
•AKAelevatoralgorithm
Head pointer @ 53Queue: 98, 183, 37, 122, 14, 124, 65, 67 SCANSchedule?Totalheadmovement?
![Page 12: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/12.jpg)
DiskScheduling:C-SCAN
12
Head pointer @ 53Queue: 98, 183, 37, 122, 14, 124, 65, 67 C-SCANSchedule?TotalHeadmovement?
• Headmovesfromoneendtoother• servicingrequestsasitgoes• reachestheend,returnstobeginning• Norequestsservicedonreturntrip
• Treatscylindersasacircularlist• wrapsaroundfromlasttofirst
• MoreuniformwaittimethanSCAN
![Page 13: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/13.jpg)
Most SSDs based on NAND-flash • retains its state for months to years without power
SolidStateDrives(Flash)
hKps://flashdba.com/2015/01/09/understanding-flash-floa5ng-gates-and-wear/
Metal Oxide Semiconductor Field Effect Transistor (MOSFET) Floating Gate MOSFET (FGMOS)
![Page 14: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/14.jpg)
Charge is stored in Floating Gate (can have Single and Multi-Level Cells)
NANDFlash
hKps://flashdba.com/2015/01/09/understanding-flash-floa5ng-gates-and-wear/
Floating Gate MOSFET (FGMOS)
![Page 15: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/15.jpg)
FlashOperaOons
15
Eraseblock:setseachcellto“1”• erasegranularity=“erasureblock”=128-512KB• time:severalms
Writepage:canonlywriteerasedpages•writegranularity=1page=2-4KBytes• time:10sofms
Readpage:• readgranularity=1page=2-4KBytes• time:10sofms
![Page 16: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/16.jpg)
FlashLimitaOons
16
•can’twrite1byte/word(mustwritewholeblocks)•limited#oferasecyclesperblock(memorywear)
•103-106erasesandthecellwearsout•readscan“disturb”nearbywordsandoverwritethemwithgarbage
Lotsoftechniquestocompensate:• errorcorrectingcodes• badpage/erasureblockmanagement• wearleveling:tryingtodistributeerasuresacrosstheentiredriver
![Page 17: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/17.jpg)
• Flashdevicefirmwaremapslogicalpage#toaphysicallocation– Garbagecollecterasureblockbycopyinglivepagestonewlocation,thenerase• Moreefficientifblocksstoredatsametimearedeletedatsametime(e.g.,keepblocksofafiletogether)
– Wear-levelling:onlywriteeachphysicalpagealimitednumberoftimes
– Remappagesthatnolongerwork(sectorsparing)• Transparenttothedeviceuser
FlashTranslaOonLayer
![Page 18: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/18.jpg)
SSD vs HDD
SSD HDDCost 10cts/gig 6cts/gigPower 2-3W 6-7WTypical Capacity 1TB 2TBWrite Speed 250MB/sec 200MB/secRead Speed 700MB/sec 200MB/sec
![Page 19: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/19.jpg)
Whatdowewant?
19
Performance:keepingupwiththeCPU•CPU2xfasterevery2years(untilrecently)•Disks20xfasterin3decades
WhatcanwedotoimproveDiskPerformance?Hint#1:Disksdidgetcheaperinthepast3decades…Hint#2:WhenCPUsstoppedgettingfaster,wealsodidthis…
![Page 20: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/20.jpg)
RAID,Step0:Striping
20
RedundantArrayofInexpensiveDisks(RAID)• Inindustry,“I”isfor“Independent”• ThealternativeisSLED,singlelargeexpensivedisk• RAID+RAIDcontrollerlooksjustlikeSLEDtocomputer(yay,abstraction!)
GOALS:
1.Performance•Parallelizeindividualrequests•Supportparallelrequests
TECHNIQUES:
0.Striping
![Page 21: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/21.jpg)
RAID-0
21
Filesstripedacrossdisks
•Read:highthroughput(parallelI/O)•Write:bestthroughputDownsides?
Disk 0 Disk 1D0D4D8
D12
D1D5D9
D13
Disk 2 Disk 3D2D6
D10D14
D3D7D11D15
![Page 22: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/22.jpg)
Whatcouldpossiblygowrong?
22
Failurecanoccurfor:(1)IsolatedDiskSectors(1+sectorsdown,restOK)
• Permanent:physicalmalfunc5on(magne5ccoa5ng,scratches,contaminants)
• Transient:datacorruptedbutnewdatacanbesuccessfullywriKento/readfromsector
(2)En5reDeviceFailure• Damagetodiskhead,electronicfailure,mechanicalwearout
• Detectedbydevicedriver,accessesreturnerrorcodes• annualfailureratesorMeanTimeToFailure(MTTF)
![Page 23: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/23.jpg)
Whatdowealsowant?
23
Reliability:datafetchediswhatyoustoredAvailability:dataistherewhenyouwantit
• Moredisks➜higherprobabilityofsomediskfailing• Stripingreducesreliability
• N disks: 1/nth mean time between failures of 1 disk
WhatcanwedotoimproveDiskReliability?Hint#1:WhenCPUsstoppedbeingreliable,wealsodidthis…
😞
![Page 24: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/24.jpg)
RAID,Step1:Mirroring
24
Toimprovereliability,addredundancy
GOALS:
1.Performance•Parallelizeindividualrequests•Supportparallelrequests
2.Reliability
TECHNIQUES:
0.Striping1.Mirroring
![Page 25: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/25.jpg)
RAID-1
25
DisksMirrored:datawrittenin2placesSimple,expensiveExample:GoogleFileSystemreplicateddataon3disks,spreadacrossmultipleracks
Reads:gotoeitherdisk➜2xfasterthanSLED
•Write:replicatetoeverymirroreddisk➜samespeedasSLED
FullDiskFailure:usesurvivingdiskBitFlipError:Detect?Correct?
![Page 26: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/26.jpg)
RAID,Step2:Parity
26
Torecoverfromfailures,addparity• n-inputXORgivesbit-levelparity(1=odd,0=even)• 1101⊕1100 ⊕0110=0111(parityblock)• Canreconstructanymissingblockfromtheothers
GOALS:1.Performance•Parallelizeindividualrequests•Supportparallelrequests2.Reliability
TECHNIQUES:0.Striping1.Mirroring2.Parity
![Page 27: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/27.jpg)
LesserLovedRAIDS
27
RAID-2:bit-levelstripingwithECCcodes• 7diskarmssynchronizedandmoveinunison• Complicatedcontroller(andhenceveryunpopular)• Tolerates1errorwithnoperformancedegradation
RAID-3:byte-levelstriping+paritydisk• readaccessesalldatadisks• writeaccessesalldatadisks+paritydisk• Ondiskfailure:readparitydisk,computemissingdata
RAID-4:block-levelstriping+paritydisk+betterspatiallocalityfordiskaccess- paritydiskiswritebottleneckandwearsoutfaster
b1p2 p1 p0b2b3b4
byte 0Disk 0 Disk 1 Disk 2 Disk 3 Disk 4
byte 1 byte 2 byte 3 Parity
stripe 0Disk 0 Disk 1 Disk 2 Disk 3 Disk 4
stripe 1 stripe 2 stripe 3 Parity
![Page 28: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/28.jpg)
AwordaboutGranularity
28
Bit-level➜byte-level➜blocklevel•fine-grained:Stripeeachfileacrossalldisks
+ highthroughputforthefile- wasteddiskseektime- limitstotransferof1fileatatime
• coarse-grained:Stripeeachfileoverafewdisks- limitsthroughputfor1file+betteruseofspatiallocality(fordiskseek)+allowsmoreparallelfileaccess
![Page 29: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/29.jpg)
RAID5:RotaOngParityw/Striping
![Page 30: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/30.jpg)
RAID5:RotaOngParityw/Striping
• Write1block:– Readolddatablock– Readoldparityblock– Writenewdatablock– Writenewparityblock(olddata⊕oldparity⊕newdata)
• Writeentirestripe:– Writedatablocksandparityforeachstripinstripe
Goodwriteperformance
• Read:gotocorrectdisk,canoutperformSLEDsandRAID-0
![Page 31: Disks and RAID - Cornell UniversityDisks and RAID CS 4410 Operang Systems Spring 2017 Lorenzo Alvisi Cornell University Anne Bracy See: Ch 12, 14.2 in OSPP textbook The slides are](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fcfa8abd75e011fde2f1ec3/html5/thumbnails/31.jpg)
RAID5:WriteExample
• Write(D2,0111)– Readolddatablock(1010)– Readoldparityblock(1001)– Writenewdatablock(0111)– Writenewparityblock
(olddata⊕oldparity⊕newdata)1010⊕1001⊕0111=0100
D00000
Disk 0 Disk 1 Disk 2 Disk 3 Disk 4D1
1111D2
1010D3
1100P0-31001
0111 0100