lecture 10:disks & file systems - university of...
TRANSCRIPT
Lecture10:Disks&FileSystemsCSE120:PrinciplesofOpera?ngSystems
UCSanDiego:SummerSessionI,2009FrankUyeda
Announcements
• Homework2isduenow.• Project3milestoneWednesdaynight.
• Project2bonuspoints– FixyourbugsfromProject2– ResubmitbytheProject3deadline– Earn½creditbackforallthethingsyoufixed.
2
Announcements
• LabHours:– Frank:tomorrow4p‐?,CSEbasement
• FinalExam:3p‐6ponSaturday,August1
• Ifyouarelost,pleasecometoOfficeHours!Youcanmakeanappointment.
3
ReviewQues?on
• Whichofthefollowingscenariosis/arepossible?– A)APTEisvalidintheTLBandvalidinthepagetable– B)APTEisvalidintheTLBandinvalidinthepagetable– C)APTEisinvalidintheTLBandvalidinthepagetable– D)APTEisinvalidintheTLBandinvalidinthepagetable– E)APTEisnotintheTLBandvalidinthepagetable– F)APTEisnotintheTLBandinvalidinthepagetable
4
Review:DemandPaging
• MemoryManagementUnit(MMU)– Hardwareunitthattranslatesavirtualaddresstoaphysicaladdress
• Transla?onTable– Storedinmainmemory
• Transla?onLookasideBuffer(TLB)– HardwarecachefortheMMU’svirtual‐to‐physicaltransla?onstable
5
CPUTransla?on
TableMMU
Memory
VirtualAddress
PhysicalAddress
TLB
PageFaultshandledsilentlybyOS
PageFault
Disk
Review
• PageSharing– Copyonwrite
• PageReplacement– Globalvs.Localreplacement– Algorithms:
• Belady’sAlgorithm• FIFO• LRU• Clock(LRUapproxima?on)
• WorkingSets– PageFaultFrequency– Thrashing
6
DisksandFileSystems
• Firstwe’lldiscussproper?esofphysicaldisks– Structure– Performance– Scheduling
• Diskproper?esmo?vatehowwebuildfilesystemsonthem– Files– Directories– Sharing– Protec?on– FileSystemLayouts– FileBufferCache– ReadAhead
7
DataStorage
8
1msec=1,000,000nsec Memory(DDR2):2GB:~$30Disk:1.5TB=~$130(source:Hgerdirect.com)
Note:imagefromTanenbaumMOS3/e
PhysicalDiskStructure
• Diskcomponents– Plajers(2surfaces)– Tracks– Sectors– Cylinders– Arm– Heads(1perside)
• Logically,diskbrokendownintosectors– Addressedbycylinder,head,sector
9
Arm Heads Plajers
Cylinder
TopView
Track
Sector
DisksandtheOS
• Disksaremessyandslowphysicaldevices:– Disksjustwritetosectors,nono?onoffilesorotherlogicalpar??ons– Errors,badblocks,missedseeks,etc.– Access?mesaremanyordersofmagnitudeslowerthanmemory
• TheOShidesmuchofthismessfromhigherlevelsolware– Hidelow‐leveldevicecontrol(ini?ateadiskread,etc.)– Presenthigher‐levelabstrac?ons(files,databases,etc.)
10
DiskInterac?on
• Specifyingdiskrequestsrequiresalotofinfo:– Cylinder#,plajersurface#,track#,sector#,transfersize…
• OlderdisksrequiredtheOStospecifyallofthis– TheOSneededtoknowalldiskparameters
• Moderndisksaremorecomplicated– Notallsectorsarethesamesize,sectorsareremapped,…
• Currentdisksprovideahigher‐levelinterface(SCSI)– Thediskexportsitsdataasalogicalarrayofblocks[0…N]
• Diskmapslogicalblockstocylinder/surface/track/sector– Onlyneedtospecifythelogicalblock#toread/write– ButnowthediskparametersarehiddenfromtheOS
11
DiskParameters(2009)
13
SeagateBarracuda7200.11
Capacity 1.5TB
Plajers,Surfaces 4,8
Cache 32MB
Transferrate 62MB/s(inner)–120MB/s(outer)
Sectorsize 512B
Spindlespeed 7200RPM
Randomreadseek?me ~8.5msec
Randomwriteseek?me ~9.5msec
MTBF 750,000hours
Diskinterfacespeeds
SCSI 5MB/secto320MB/sec
ATA 33MB/secto100MB/sec
SerialATA(SATA) 150MB/secto300MB/sec
USB2.0 60MB/sec
Firewire 50MB/sec
DiskPerformance
• Diskrequestperformancedependsupon…..– I/Orequestoverhead:issuingthecommandtothedisk
• Processfileaccesstrapsintokernel,whichneedstoissuehwrequest– Seek:movingthediskarmtothecorrectcylinder
• Dependsonhowfastthediskarmcanmove(increasingveryslowly)– Rota?on:wai?ngforthesectortorotateunderthehead
• Dependsuponrota?onrateofdisk(increasing,butslowly)– Transfer:transferringdatafromsurfaceintodiskcontroller
electronics,sendingitbacktothehost• Dependsondensity(increasingquickly)• Fasterfortracksneartheouteredgeofthedisk–why?
• TheOStriestominimizethecostofallofthesesteps– Par?cularlyseeksandrota?on(why?)
14
DiskScheduling
• Becauseseeksaresoexpensive(milliseconds!),ithelpstoschedulediskrequeststhatarequeuedwai?ngforthedisk– FCFS/FIFO(donothing)
• Reasonablewhenloadislow• Longwai?ng?mesforlongrequestqueues
– SSTF(shortestseek?mefirst)• Minimizearmmovement(seek?me),maximizerequestrate• Favorsmiddletracks
– SCAN(elevator)• Servicerequestsinonedirec?onun?ldone,thenreverse• Discriminatesagainstthehighestandlowesttracks
– C‐SCAN• LikeSCAN,butonlygoinonedirec?on(typewriter)• Reducevarianceinseek?mes
15
DiskScheduling(2)
• Ingeneral,unlesstherearerequestqueues,diskschedulingdoesnothavemuchimpact– Importantforservers,lesssoforPCs
• Moderndisksolendothediskschedulingthemselves– DisksknowtheirlayoutbejerthanOS,canop?mizebejer– Ignores,undoesanyschedulingdonebyOS
16
FileSystems
• Howdofilesystemsfitin?
• Implementanabstrac?on(files)forsecondarystorage• Organizefileslogically(directories)• Permitsharingofdatabetweenprocesses,people,and
machines
• Protectdatafromunwantedaccess(security)
17
Files
• Afileisdatawithsomeproper?es– Contents,size,owner,lastread/write?me,protec?on,etc.
• Afilecanalsohaveatype– Understoodbythefilesystem
• Block,character,device,portal,link,etc.– UnderstoodbyotherpartsoftheOSorrun?melibraries
• Executable,dll,source,object,text,etc.• Afile’stypecanbeencodedinitsnameorcontents
– Windowsencodestypeinname• .com,.exe,.bat,.dll,.jpg,etc…..
– Unixencodestypeincontents• Magicnumbers,ini?alcharacters(e.g.,#!forshellscripts)
18
BasicFileOpera?ons
Unix
• creat(name)
• open(name,how)
• read(fd,buf,len)• write(fd,buf,len)
• sync(fd)
• seek(fd,pos)
• close(fd)
• unlink(name)
19
WindowsNT
• CreateFile(name,CREATE)
• CreateFile(name,OPEN)
• ReadFile(handle,…)• WriteFile(handle,…)
• FlushFileBuffers(handle,…)
• SetFilePointer(handle,…)
• CloseHandle(handle,…)
• DeleteFile(name)
Directories
• Directoriesservetwopurposes– Forusers,theyprovideastructuredwaytoorganizefiles– Forthefilesystem,theyprovideaconvenientnaminginterface
thatallowstheimplementa?ontoseparatelogicalfileorganiza?onfromphysicalfileplacementonthedisk• Whymightthishelp?
• Mostfilesystemssupportmul?‐leveldirectories– Naminghierarchies(/,/usr,/usr/local/,…)
• Mostfilesystemssupporttheno?onofacurrentdirectory– Rela?venamesspecifiedwithrespecttocurrentdirectory– Absolutenamesstartfromtherootofdirectorytree
20
DirectoryInternals
• Adirectoryisalistofentries– <name,loca?on>
– Nameisjustthenameofthefileordirectory– Loca?ondependsuponhowfileisrepresentedondisk
• Listisusuallyunordered(effec?velyrandom)– Entriesusuallysortedbyprogramthatreadsdirectory
• Directoriestypicallystoredinfiles– Onlyneedtomanageonekindofsecondarystorageunit
21
PathNameTransla?on
• Let’ssayyouwanttoopen“/one/two/three”• Whatdoesthefilesystemdo?
– Opendirectory“/”(wellknown,canalwaysfind)– Searchfortheentry,“one”,getloca?onof“one”(indirentry)– Opendirectory“one”,searchfor“two”,getloca?onof“two”– Opendirectory“two”,searchfor“three”,getloca?onof“Three”– Openfile“three”
• Systemsspendalotof?mewalkingdirectorypaths– Thisiswhyopenisseparatefromread/write– OSwillcacheprefixlookupsforperformance
• /a/b,/a/bb,/a/bbb,etc.,allshare“/a”prefix
22
StoringFiles
• Diskispar??onedintoBlocksorSectors– Moderndiskshave512‐bytesectors– Filesystemsusuallyworkinblocksizesof4KB
• Filescanspanmul?pleblocks– Filesizesmayspanmul?pleblocks,ormaybesmall
• Thingstoconsider– Fileaccess:isitrandom,sequen?al?– Filesize:howolendoesitgrow/shrink?
• Soundfamiliar?
23
DiskLayoutStrategies
Con?guousalloca?on• Idea:Allocatespaceforfilelike
doneforcon?guousmemoryorganiza?on
• Pros:Fastfileaccess
• Cons:Fragmenta?on,needscompac?on– Whathappenswhenyou
needtogrow?
v
…
DiskBlocks
Disk
File
24
DiskLayoutStrategies
LinkedAlloca?on• Idea:Linkedlistofblocks,
eachpoin?ngtonext
• Pros:Easytogrow;fastsequen?alaccess
• Cons:Slownon‐sequen?alaccess;whathappensifyouhaveonebadblock?
v
…
DiskBlocks
Disk
File
25
DiskLayoutStrategies
IndexedAlloca?on• Idea:Storeorderedlist
ofblockpointers
• Pros:Goodforrandomaccess,notbadforsequen?al
• Cons:Sizelimit,notasfastforsequen?alaccess
v
…
DiskBlocks
Disk
File
…
26
UnixInodes
• Unixusesanindexedalloca?onstructure– Aninode(indexnode)storesbothmetadata
andthepointerstodiskblocks• Metadataisinforma?onaboutthefile(protec?on,?mestamp,length,refcount,etc….)
• Eachinodecontains15blockpointers– First12aredirectblocks
(e.g.,4KBdiskblocks)– Thensingle,double,triple
indirectblocks Metadata
…
01
121314
…
DiskDataBlocks
…
…
?? 27
ResolvingFileLoca?on/Data
• Inodesdescribewhereondisktheblocksforafileareplaced– Unixinodesarenotdirectories– Directoresarerepresentedinternallyasfiles
• Whatdoesthismeanforhowinodesarestored?
• Directoryentriesmapfilenamestoinodes– Wanttoaccess“/foo”
DataforFooisinhere
firstdatablockforfile“/foo”
inodeforfile“/foo”
Inode18Metadata
0
…
12
inodefordirectory“/”
Inode0Metadata
0
…
12
foo,18bar,451baz,123…
firstdatablockfordirectoryfile“/”
28
ResolvingFileLoca?on/Data
• Inodesdescribewhereondisktheblocksforafileareplaced– Unixinodesarenotdirectories– Directoresarerepresentedinternallyasfiles
• Whatdoesthismeanforhowinodesarestored?
• Directoryentriesmapfilenamestoinodes– Toopen“/foo”,useMasterBlocktofind“/”ondisk
– Open“/”,lookforentry“foo”– Thisentrycontainsthediskblocknumberforinodefor“foo”
– Readtheinode“foo”intomemory
– Theinodesayswherethefirstdatablockisondisk– Readfirstdatablockintomemorytoaccessdatainfile“foo”
Thatwasalotofworktoreadonefile!29
ImprovingPerformance
• Weunderstandhowfilesystemsarestructured– Inodes,datablocks,files,directories,etc…..
• Nowwe’llfocusonhowtheyperform– Wheredoweplacedata?– Arethereanytrickswecanplaytomasklatencies?
• Threecasestudies:– BerkeleyFastFileSystem(FFS)– Log‐StructuredFileSystem(LFS)– RedundantArrayofInexpensiveDisks(RAID)
30
BerkeleyFastFileSystem(FFS)
• TheoriginalUnixfilesystemhadasimple,straighzorwardimplementa?on– Easytoimplementandunderstand– Butverypooru?liza?onofdiskbandwidth(lotsofseeking)
• BSDUnixfolksdidaredesign(mid80s)thattheycalledtheFastFileSystem(FFS)– Improveddisku?liza?on,decreasedresponse?me– McKusick,Joy,Leffler,andFabry
• NowthefilesystemfromwhichallotherUnixfilesystemshavebeencompared
• Goodexampleofbeingdevice‐awareforperformance
31
DataandInodePlacementProblem
• OriginalUnixFShadtwoplacementproblems:• 1)Datablocksallocatedrandomlyinagingfilesystems
– Blocksforthesamefileallocatedsequen?allywhenFSisnew– AsFS“ages”andfills,needtoallocateintoblocksfreedupwhen
otherfilesaredeleted– Problem:Deletedfilesessen?allyrandomlyplaced– So,blocksfornewfilesbecomescajeredacrossthedisk
• 2)Inodesallocatedfarfromblocks– Allinodesatbeginningofdisk,farfromdata– Traversingfilenamepaths,manipula?ngfiles,directories
requiresgoingbackandforthfrominodestodatablocks• Bothoftheseproblemsgeneratemanylongseeks
32
DataandInodePlacementProblemDiskwrites Metadat
a
…
…
Over?me,blockplacementgetsscajered:(“swisscheese”effect)
33
DataandInodePlacementProblem
• 2)Inodesallocatedfarfromblocks– Allinodesatbeginningofdisk,farfromdata– Traversingfilenamepaths,manipula?ngfiles,directories
requiresgoingbackandforthfrominodestodatablocks• Rememberaccessing“/foo”example?
foo,18bar,451baz,123…
inodeforfile“/foo”
Inode18Metadata
0
…
DataforFooisinhere
firstdatablockforfile“/foo”
12
inodefordirectory“/”
Inode0Metadata
0
…
12
firstdatablockfordirectoryfile“/”
Disk …
34
DataandInodePlacementProblem
• OriginalUnixFShadtwoplacementproblems:• 1)Datablocksallocatedrandomlyinagingfilesystems
– Blocksforthesamefileallocatedsequen?allywhenFSisnew– AsFS“ages”andfills,needtoallocateintoblocksfreedupwhen
otherfilesaredeleted– Problem:Deletedfilesessen?allyrandomlyplaced– So,blocksfornewfilesbecomescajeredacrossthedisk
• 2)Inodesallocatedfarfromblocks– Allinodesatbeginningofdisk,farfromdata– Traversingfilenamepaths,manipula?ngfiles,directories
requiresgoingbackandforthfrominodestodatablocks• Bothoftheseproblemsgeneratemanylongseeks
35
CylinderGroups
• BSDFFSaddressedbothoftheseproblemsusingtheno?onofacylindergroup– Diskpar??onedintogroupsofcylinders– Datablocksinsamefileallocatedinsamecylinder– Filesinsamedirectoryallocatedinsamecylinder– Inodesforfilesallocatedinsamecylinderasfiledatablocks
Metadata
…
Inodes,datablocks,etc…
Reducesnumberofseeks!
36
CylinderGroups
• BSDFFSaddressedbothoftheseproblemsusingtheno?onofacylindergroup– Diskpar??onedintogroupsofcylinders– Datablocksinsamefileallocatedinsamecylinder– Filesinsamedirectoryallocatedinsamecylinder– Inodesforfilesallocatedinsamecylinderasfiledatablocks
• Freespacerequirement– Tobeabletoallocateaccordingtocylindergroups,thediskmusthavefreespacescajeredacrosscylinders
– 10%ofthediskisreservedjustforthispurpose• Onlyusedbyroot–whyitispossiblefor“df”toreport>100%
37
ProblemswithSmallBlocks
• Smallblocks(1K)causedtwoproblems:– Lowbandwidthu?liza?on– Smallmaxfilesize(func?onofblocksize)
38
MaximumFileSize:1KBBlocks• RecallUnixinodeshave:
– 12directblocks– 1singleindirectblock,1doubleindirectblock,
1tripleindirectblock• Howlargecanafilebewith1KBblocks?• Singleindirectblock:
– Assuming32‐bitaddresses,wehave4bytesperblockpointer,so1KB/4=256blocks
– So…256*1KB=256KB• Double‐indirectblock:
– 256*256*1KB=64MB• TripleIndirectblock:
– 256*256*256*1KB=16GB• Total:~16GB
39
ProblemswithSmallBlocks
• Smallblocks(1K)causedtwoproblems:– Lowbandwidthu?liza?on– Smallmaxfilesize(func?onofblocksize)
• Fixusinglargerblocks(4K)– Verylargefiles,onlyneedtwolevelsofindirec?onforsuppor?ngfilesofsize2^32
40
MaximumFileSize:4KBBlocks• RecallUnixinodeshave:
– 12directblocks– 1singleindirectblock,1doubleindirectblock,
1tripleindirectblock• Howlargecanafilebewith4KBblocks?• Singleindirectblock:
– Assuming32‐bitaddresses,wehave4bytesperblockpointer,so4KB/4=1024Bblocks
– So…1024*1KB=1MB• Double‐indirectblock:
– 1024*1024*1KB=1GB• TripleIndirectblock:
– 1024*1024*1024*1KB=1TB• Total:~1TB
41
ProblemswithSmallBlocks
• Smallblocks(1K)causedtwoproblems:– Lowbandwidthu?liza?on– Smallmaxfilesize(func?onofblocksize)
• Fixusinglargerblocks(4K)– Verylargefiles,onlyneedtwolevelsofindirec?onforsuppor?ngfilesofsize2^32
– Whynotjustuseallindirectblocks?• Over65%offilesaresmallerthan4KB(Tanenbaum,OSR2006)– What’stheproblemwiththat?
42
ProblemswithSmallBlocks
• Smallblocks(1K)causedtwoproblems:– Lowbandwidthu?liza?on– Smallmaxfilesize(func?onofblocksize)
• Fixusinglargerblocks(4K)– Verylargefiles,onlyneedtwolevelsofindirec?onforsuppor?ngfilesofsize2^32
– Problem:internalfragmenta?on– Fix:Introduce“fragments”(1Kpiecesofablockcanbeusedforother,smallfiles)
43
OtherProblems
• Problem:Mediafailures– Ifyoulosethesuperblock,youloseeverything
• Oratleastrecoveryisexpensive– Solu?on:Replicatemasterblock(superblock)
• Problem:reducedseeks,butevenoneisexpensive– Whatifwecanavoidgoingtodiskatall?
• Next:otherFileSystemtricks
44
FileBufferCache
• Applica?onsexhibitsignificantlocalityforreadingandwri?ngfiles
• Idea:Cachefileblocksinmemorytocapturelocality– Thisiscalledthefilebuffercache– Cacheissystemwide,usedandsharedbyallprocesses– Readingfromthecachemakesadiskperformlikememory– Evena4MBcachecanbeveryeffec?ve
• Issues– ThefilebuffercachecompeteswithVM(tradeoffhere)– LikeVM,ithaslimitedsize– Needreplacementalgorithmsagain(usuallyLRUused)
45
CachingWrites
• Applica?onsassumewritesmakeittodisk– Asaresult,writesareolenslowevenwithcaching
• Severalwaystocompensateforthis– “write‐behind”
• Maintainaqueueofuncommijedblocks• Periodicallyflushthequeuetodisk• Unreliable
– Non‐vola?leRAM(NVRAM)• Aswithwrite‐behind,butmaintainqueueinNVRAM• Expensive
46
ReadAhead(Prefetching)
• Manyfilesystemsimplement“readahead”– FSpredictsthattheprocesswillrequestnextblock– FSgoesaheadandrequestsitfromthedisk…– ..whiletheprocessiscompu?ngonpreviousblock!– Whentheprocessrequestsblock,itwillbeincache– Complementsthediskcache,whichalsoisdoingreadahead
• Forsequen?allyaccessedfilescanmakebigdifference– Unlessblocksforthefilearescajeredacrossthedisk– Filesystemstrytopreventthat,though(duringalloca?ng)
• Unfortunately,thisdoesn’tdoanythingforwrites– Whatifwecouldmakewrite‐behindsequen?alaswell?
47
Log‐structuredFileSystem
• TheLog‐structuredFileSystem(LFS)wasdesignedinresponsetotwotrendsinworkloadandtechnology:
• 1)Diskbandwidthscalingsignificantly(40%ayear)– Latencyisnot
• 2)Largemainmemoriesinmachines– Largebuffercaches– Absorblargefrac?onofreadrequests– Canuseforwritesaswell– Coalescesmallwritesintolargewrites
• LFStakesadvantageofbothofthesetoincreaseFSperformance– RosenblumandOusterhout(Berkeley,‘91)
48
LFS:Approach
Op?mizefordiskwrites– Batchwritesindiskcache
• U?lizeincreaseindiskthroughput– Treatthediskasonebiglogforwrites
• Noneedtoworryaboutspecialseeksorplacement
– Alldatainfilesystemappendedtolog• Datablocks,metadata,inodes,etc.
49
LFSChallenges
• Howdoyoulocatedata?– FFSplacesfilesinapar?cularloca?on– LFSappendsdatatotheendofthelog
• Howdoyoufreedata?– Atsomepoint,youcan’t“append”anymore
– Howdoyoutrackandrecoverstaleblocksinthelog?
51
LFS:Loca?ngData
• FFSusesinodestolocatedatablocks– Inodespre‐allocatedineachcylindergroup– Directoriescontainloca?onsofinodes
• LFSappendsinodesanddata(basicallyeverything)toendofthelog– Makesthemhardtofind
• Approach– Useanotherlevelofindirec?on:Inodemaps
– Inodemapsmapfile#stoinodeloca?on
– Loca?onofinodemapblockskeptincheckpointregion
– Checkpointregionhasafixedloca?on– Cacheinodemapsinmemoryforperformance
52
LFS:Example(inodemaps)
Disk
File1 File2
WriteafileModifyfileWritetofile
WritetofileModifyfile…
inodes
53
CheckpointRegion InodeMap
Aren’treadss+llslow?Relyonbuffercachetostoreinodemaps.Largebuffercachemeansdon’tneedtoworryaboutreads!
LFS:FreeSpaceManagement
• LFSappend‐onlyquicklyrunsoutofdiskspace– Needtorecoverdeletedblocks
• Approach:– Fragmentlogintosegments– Threadsegmentsondisk
• Segmentscanbeanywhere– Reclaimspacebycleaningsegments
• Readsegment• Copylivedatatoendoflog• Nowhavefreesegmentyoucanreuse
• Cleaningisabigproblem– Costlyoverhead
54
LFS:FreeSpaceManagement
• LFSappend‐onlyquicklyrunsoutofdiskspace– Needtorecoverdeletedblocks
• Approach:– Fragmentlogintosegments– Threadsegmentsondisk
• Segmentscanbeanywhere– Reclaimspacebycleaningsegments
• Readsegment• Copylivedatatoendoflog• Nowhavefreesegmentyoucanreuse
• Cleaningisabigproblem– Costlyoverhead
56
LFS:Now
• Revolu?onary(atthe?me)designconceptthatspurredalotofdebateandresearchintheareainthe90s
• Present‐dayfilesystemsusesolupdatesorjournaling,whichseemtobedueinlargeparttotheconceptsfromLFS
57
Summary
• We’veexplainedhowfilesystemscanbestructured– Manytechniquesaresimilartothoseinmemorymanagement
– Unix‐style:Inodes,datablocks,files,directories,etc…..
• Performanceoffilesystemshighlydependentondisktechnology– Seekstakealong?me
– Placementofdatamajers(swiss‐cheeseproblemandseekavoidance)
• BerkeleyFastFileSystem(FFS)– Cylindergroups(whichfilesarelikelytobeaccessedtogether)– Largerblocksizestoincreasethroughput
• Log‐StructuredFileSystem(LFS)– Op?mizeforwrites(batchwrites)
– Relyoncacheforreads(dataplacementprac?callyignored)
• Assortedothertricks– Pre‐fetching(avoidextrafetchesandputinbuffercache)– Delayedwrites(likeLFS;usedinmodernjournalingfilesystems) 58
NextTime
• ReadChapter11.9,12.7,15• CheckWebsiteforcourseannouncements
– hjp://www.cs.ucsd.edu/classes/su09/cse120
59
RAID
• Problem:– Diskdrivesfailfrequently– DisksareSLOW(seek?mes&transferrates)
• Idea:Usemanydisksinparalleltoincreasestoragebandwidth,improvereliability– Filesarestripedacrossdisks– Eachstripepor?onisread/wrijeninparallel– Bandwidthincreaseswithmoredisks
• RedundantArrayofInexpensiveDisks(RAID)– Astoragesystem,notafilesystem– Pajerson,Katz,andGibson(Berkeley,’88)
60
RAIDLevels
• Inmarke?ngliterature,youwillseeRAIDsystemsadver?sedassuppor?ngdifferent“RAIDLevels”
• Herearesomecommonlevels:– RAID0:Striping
• Goodforrandomaccess(noreliability)– RAID1:Mirroring
• Twodisks,writedatatoboth(expensive,1Xstorageoverhead)– RAID5:Floa?ngParity
• Parityblocksfordifferentstripeswrijentodifferentdisks• Nosingleparitydisk,hencenobojleneckatthatdisk
– Raid“10”:Stripingplusmirroring• Higherbandwidth,buts?llhavelargeoverhead• SeethisonUltraDMAPCRAIDdiskcards
61
RAIDChallenges
• Smallfiles(smallwriteslessthanafullstripe)– Needtoreaden?restripe,updatewithsmallwrite,thenwriteen?resegmentouttodisks
• Reliability– Moredisksincreasesthechanceofmediafailure(MTBF)
• Turnreliabilityproblemintoafeature– Useonedisktostoreparitydata
• XORofalldatablocksinstripe– Canrecoveranydatablockfromallothers+parityblock– Hence“redundant”inname– Introducesoverhead,butassumingdisksare“inexpensive”
62