lecture 10:disks & file systems - university of...

62
Lecture 10:Disks & File Systems CSE 120: Principles of Opera?ng Systems UC San Diego: Summer Session I, 2009 Frank Uyeda

Upload: vanxuyen

Post on 27-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Lecture10:Disks&FileSystemsCSE120:PrinciplesofOpera?ngSystems

UCSanDiego:SummerSessionI,2009FrankUyeda

Announcements

•  Homework2isduenow.•  Project3milestoneWednesdaynight.

•  Project2bonuspoints– FixyourbugsfromProject2– ResubmitbytheProject3deadline– Earn½creditbackforallthethingsyoufixed.

2

Announcements

•  LabHours:– Frank:tomorrow4p‐?,CSEbasement

•  FinalExam:3p‐6ponSaturday,August1

•  Ifyouarelost,pleasecometoOfficeHours!Youcanmakeanappointment.

3

ReviewQues?on

•  Whichofthefollowingscenariosis/arepossible?–  A)APTEisvalidintheTLBandvalidinthepagetable–  B)APTEisvalidintheTLBandinvalidinthepagetable–  C)APTEisinvalidintheTLBandvalidinthepagetable–  D)APTEisinvalidintheTLBandinvalidinthepagetable–  E)APTEisnotintheTLBandvalidinthepagetable–  F)APTEisnotintheTLBandinvalidinthepagetable

4

Review:DemandPaging

•  MemoryManagementUnit(MMU)–  Hardwareunitthattranslatesavirtualaddresstoaphysicaladdress

•  Transla?onTable–  Storedinmainmemory

•  Transla?onLookasideBuffer(TLB)–  HardwarecachefortheMMU’svirtual‐to‐physicaltransla?onstable

5

CPUTransla?on

TableMMU

Memory

VirtualAddress

PhysicalAddress

TLB

PageFaultshandledsilentlybyOS

PageFault

Disk

Review

•  PageSharing–  Copyonwrite

•  PageReplacement–  Globalvs.Localreplacement–  Algorithms:

•  Belady’sAlgorithm•  FIFO•  LRU•  Clock(LRUapproxima?on)

•  WorkingSets–  PageFaultFrequency–  Thrashing

6

DisksandFileSystems

•  Firstwe’lldiscussproper?esofphysicaldisks–  Structure–  Performance–  Scheduling

•  Diskproper?esmo?vatehowwebuildfilesystemsonthem–  Files–  Directories–  Sharing–  Protec?on–  FileSystemLayouts–  FileBufferCache–  ReadAhead

7

DataStorage

8

1msec=1,000,000nsec Memory(DDR2):2GB:~$30Disk:1.5TB=~$130(source:Hgerdirect.com)

Note:imagefromTanenbaumMOS3/e

PhysicalDiskStructure

•  Diskcomponents–  Plajers(2surfaces)–  Tracks–  Sectors–  Cylinders– Arm– Heads(1perside)

•  Logically,diskbrokendownintosectors– Addressedbycylinder,head,sector

9

Arm Heads Plajers

Cylinder

TopView

Track

Sector

DisksandtheOS

•  Disksaremessyandslowphysicaldevices:–  Disksjustwritetosectors,nono?onoffilesorotherlogicalpar??ons–  Errors,badblocks,missedseeks,etc.–  Access?mesaremanyordersofmagnitudeslowerthanmemory

•  TheOShidesmuchofthismessfromhigherlevelsolware–  Hidelow‐leveldevicecontrol(ini?ateadiskread,etc.)–  Presenthigher‐levelabstrac?ons(files,databases,etc.)

10

DiskInterac?on

•  Specifyingdiskrequestsrequiresalotofinfo:–  Cylinder#,plajersurface#,track#,sector#,transfersize…

•  OlderdisksrequiredtheOStospecifyallofthis–  TheOSneededtoknowalldiskparameters

•  Moderndisksaremorecomplicated–  Notallsectorsarethesamesize,sectorsareremapped,…

•  Currentdisksprovideahigher‐levelinterface(SCSI)–  Thediskexportsitsdataasalogicalarrayofblocks[0…N]

•  Diskmapslogicalblockstocylinder/surface/track/sector–  Onlyneedtospecifythelogicalblock#toread/write–  ButnowthediskparametersarehiddenfromtheOS

11

12Source:pcguide.com

DiskParameters(2009)

13

SeagateBarracuda7200.11

Capacity 1.5TB

Plajers,Surfaces 4,8

Cache 32MB

Transferrate 62MB/s(inner)–120MB/s(outer)

Sectorsize 512B

Spindlespeed 7200RPM

Randomreadseek?me ~8.5msec

Randomwriteseek?me ~9.5msec

MTBF 750,000hours

Diskinterfacespeeds

SCSI 5MB/secto320MB/sec

ATA 33MB/secto100MB/sec

SerialATA(SATA) 150MB/secto300MB/sec

USB2.0 60MB/sec

Firewire 50MB/sec

DiskPerformance

•  Diskrequestperformancedependsupon…..–  I/Orequestoverhead:issuingthecommandtothedisk

•  Processfileaccesstrapsintokernel,whichneedstoissuehwrequest–  Seek:movingthediskarmtothecorrectcylinder

•  Dependsonhowfastthediskarmcanmove(increasingveryslowly)–  Rota?on:wai?ngforthesectortorotateunderthehead

•  Dependsuponrota?onrateofdisk(increasing,butslowly)–  Transfer:transferringdatafromsurfaceintodiskcontroller

electronics,sendingitbacktothehost•  Dependsondensity(increasingquickly)•  Fasterfortracksneartheouteredgeofthedisk–why?

•  TheOStriestominimizethecostofallofthesesteps–  Par?cularlyseeksandrota?on(why?)

14

DiskScheduling

•  Becauseseeksaresoexpensive(milliseconds!),ithelpstoschedulediskrequeststhatarequeuedwai?ngforthedisk–  FCFS/FIFO(donothing)

•  Reasonablewhenloadislow•  Longwai?ng?mesforlongrequestqueues

–  SSTF(shortestseek?mefirst)•  Minimizearmmovement(seek?me),maximizerequestrate•  Favorsmiddletracks

–  SCAN(elevator)•  Servicerequestsinonedirec?onun?ldone,thenreverse•  Discriminatesagainstthehighestandlowesttracks

–  C‐SCAN•  LikeSCAN,butonlygoinonedirec?on(typewriter)•  Reducevarianceinseek?mes

15

DiskScheduling(2)

•  Ingeneral,unlesstherearerequestqueues,diskschedulingdoesnothavemuchimpact–  Importantforservers,lesssoforPCs

•  Moderndisksolendothediskschedulingthemselves–  DisksknowtheirlayoutbejerthanOS,canop?mizebejer–  Ignores,undoesanyschedulingdonebyOS

16

FileSystems

•  Howdofilesystemsfitin?

•  Implementanabstrac?on(files)forsecondarystorage•  Organizefileslogically(directories)•  Permitsharingofdatabetweenprocesses,people,and

machines

•  Protectdatafromunwantedaccess(security)

17

Files

•  Afileisdatawithsomeproper?es–  Contents,size,owner,lastread/write?me,protec?on,etc.

•  Afilecanalsohaveatype–  Understoodbythefilesystem

•  Block,character,device,portal,link,etc.–  UnderstoodbyotherpartsoftheOSorrun?melibraries

•  Executable,dll,source,object,text,etc.•  Afile’stypecanbeencodedinitsnameorcontents

– Windowsencodestypeinname•  .com,.exe,.bat,.dll,.jpg,etc…..

–  Unixencodestypeincontents•  Magicnumbers,ini?alcharacters(e.g.,#!forshellscripts)

18

BasicFileOpera?ons

Unix

•  creat(name)

•  open(name,how)

•  read(fd,buf,len)•  write(fd,buf,len)

•  sync(fd)

•  seek(fd,pos)

•  close(fd)

•  unlink(name)

19

WindowsNT

•  CreateFile(name,CREATE)

•  CreateFile(name,OPEN)

•  ReadFile(handle,…)•  WriteFile(handle,…)

•  FlushFileBuffers(handle,…)

•  SetFilePointer(handle,…)

•  CloseHandle(handle,…)

•  DeleteFile(name)

Directories

•  Directoriesservetwopurposes–  Forusers,theyprovideastructuredwaytoorganizefiles–  Forthefilesystem,theyprovideaconvenientnaminginterface

thatallowstheimplementa?ontoseparatelogicalfileorganiza?onfromphysicalfileplacementonthedisk•  Whymightthishelp?

•  Mostfilesystemssupportmul?‐leveldirectories–  Naminghierarchies(/,/usr,/usr/local/,…)

•  Mostfilesystemssupporttheno?onofacurrentdirectory–  Rela?venamesspecifiedwithrespecttocurrentdirectory–  Absolutenamesstartfromtherootofdirectorytree

20

DirectoryInternals

•  Adirectoryisalistofentries–  <name,loca?on>

–  Nameisjustthenameofthefileordirectory–  Loca?ondependsuponhowfileisrepresentedondisk

•  Listisusuallyunordered(effec?velyrandom)–  Entriesusuallysortedbyprogramthatreadsdirectory

•  Directoriestypicallystoredinfiles–  Onlyneedtomanageonekindofsecondarystorageunit

21

PathNameTransla?on

•  Let’ssayyouwanttoopen“/one/two/three”•  Whatdoesthefilesystemdo?

–  Opendirectory“/”(wellknown,canalwaysfind)–  Searchfortheentry,“one”,getloca?onof“one”(indirentry)–  Opendirectory“one”,searchfor“two”,getloca?onof“two”–  Opendirectory“two”,searchfor“three”,getloca?onof“Three”–  Openfile“three”

•  Systemsspendalotof?mewalkingdirectorypaths–  Thisiswhyopenisseparatefromread/write–  OSwillcacheprefixlookupsforperformance

•  /a/b,/a/bb,/a/bbb,etc.,allshare“/a”prefix

22

StoringFiles

•  Diskispar??onedintoBlocksorSectors–  Moderndiskshave512‐bytesectors–  Filesystemsusuallyworkinblocksizesof4KB

•  Filescanspanmul?pleblocks–  Filesizesmayspanmul?pleblocks,ormaybesmall

•  Thingstoconsider–  Fileaccess:isitrandom,sequen?al?–  Filesize:howolendoesitgrow/shrink?

•  Soundfamiliar?

23

DiskLayoutStrategies

Con?guousalloca?on•  Idea:Allocatespaceforfilelike

doneforcon?guousmemoryorganiza?on

•  Pros:Fastfileaccess

•  Cons:Fragmenta?on,needscompac?on–  Whathappenswhenyou

needtogrow?

v

DiskBlocks

Disk

File

24

DiskLayoutStrategies

LinkedAlloca?on•  Idea:Linkedlistofblocks,

eachpoin?ngtonext

•  Pros:Easytogrow;fastsequen?alaccess

•  Cons:Slownon‐sequen?alaccess;whathappensifyouhaveonebadblock?

v

DiskBlocks

Disk

File

25

DiskLayoutStrategies

IndexedAlloca?on•  Idea:Storeorderedlist

ofblockpointers

•  Pros:Goodforrandomaccess,notbadforsequen?al

•  Cons:Sizelimit,notasfastforsequen?alaccess

v

DiskBlocks

Disk

File

26

UnixInodes

•  Unixusesanindexedalloca?onstructure–  Aninode(indexnode)storesbothmetadata

andthepointerstodiskblocks•  Metadataisinforma?onaboutthefile(protec?on,?mestamp,length,refcount,etc….)

•  Eachinodecontains15blockpointers–  First12aredirectblocks

(e.g.,4KBdiskblocks)–  Thensingle,double,triple

indirectblocks Metadata

01

121314

DiskDataBlocks

?? 27

ResolvingFileLoca?on/Data

•  Inodesdescribewhereondisktheblocksforafileareplaced–  Unixinodesarenotdirectories–  Directoresarerepresentedinternallyasfiles

•  Whatdoesthismeanforhowinodesarestored?

•  Directoryentriesmapfilenamestoinodes–  Wanttoaccess“/foo”

DataforFooisinhere

firstdatablockforfile“/foo”

inodeforfile“/foo”

Inode18Metadata

0

12

inodefordirectory“/”

Inode0Metadata

0

12

foo,18bar,451baz,123…

firstdatablockfordirectoryfile“/”

28

ResolvingFileLoca?on/Data

•  Inodesdescribewhereondisktheblocksforafileareplaced–  Unixinodesarenotdirectories–  Directoresarerepresentedinternallyasfiles

•  Whatdoesthismeanforhowinodesarestored?

•  Directoryentriesmapfilenamestoinodes–  Toopen“/foo”,useMasterBlocktofind“/”ondisk

–  Open“/”,lookforentry“foo”–  Thisentrycontainsthediskblocknumberforinodefor“foo”

–  Readtheinode“foo”intomemory

–  Theinodesayswherethefirstdatablockisondisk–  Readfirstdatablockintomemorytoaccessdatainfile“foo”

Thatwasalotofworktoreadonefile!29

ImprovingPerformance

•  Weunderstandhowfilesystemsarestructured–  Inodes,datablocks,files,directories,etc…..

•  Nowwe’llfocusonhowtheyperform– Wheredoweplacedata?–  Arethereanytrickswecanplaytomasklatencies?

•  Threecasestudies:–  BerkeleyFastFileSystem(FFS)–  Log‐StructuredFileSystem(LFS)–  RedundantArrayofInexpensiveDisks(RAID)

30

BerkeleyFastFileSystem(FFS)

•  TheoriginalUnixfilesystemhadasimple,straighzorwardimplementa?on–  Easytoimplementandunderstand–  Butverypooru?liza?onofdiskbandwidth(lotsofseeking)

•  BSDUnixfolksdidaredesign(mid80s)thattheycalledtheFastFileSystem(FFS)–  Improveddisku?liza?on,decreasedresponse?me–  McKusick,Joy,Leffler,andFabry

•  NowthefilesystemfromwhichallotherUnixfilesystemshavebeencompared

•  Goodexampleofbeingdevice‐awareforperformance

31

DataandInodePlacementProblem

•  OriginalUnixFShadtwoplacementproblems:•  1)Datablocksallocatedrandomlyinagingfilesystems

–  Blocksforthesamefileallocatedsequen?allywhenFSisnew–  AsFS“ages”andfills,needtoallocateintoblocksfreedupwhen

otherfilesaredeleted–  Problem:Deletedfilesessen?allyrandomlyplaced–  So,blocksfornewfilesbecomescajeredacrossthedisk

•  2)Inodesallocatedfarfromblocks–  Allinodesatbeginningofdisk,farfromdata–  Traversingfilenamepaths,manipula?ngfiles,directories

requiresgoingbackandforthfrominodestodatablocks•  Bothoftheseproblemsgeneratemanylongseeks

32

DataandInodePlacementProblemDiskwrites Metadat

a

Over?me,blockplacementgetsscajered:(“swisscheese”effect)

33

DataandInodePlacementProblem

•  2)Inodesallocatedfarfromblocks–  Allinodesatbeginningofdisk,farfromdata–  Traversingfilenamepaths,manipula?ngfiles,directories

requiresgoingbackandforthfrominodestodatablocks•  Rememberaccessing“/foo”example?

foo,18bar,451baz,123…

inodeforfile“/foo”

Inode18Metadata

0

DataforFooisinhere

firstdatablockforfile“/foo”

12

inodefordirectory“/”

Inode0Metadata

0

12

firstdatablockfordirectoryfile“/”

Disk …

34

DataandInodePlacementProblem

•  OriginalUnixFShadtwoplacementproblems:•  1)Datablocksallocatedrandomlyinagingfilesystems

–  Blocksforthesamefileallocatedsequen?allywhenFSisnew–  AsFS“ages”andfills,needtoallocateintoblocksfreedupwhen

otherfilesaredeleted–  Problem:Deletedfilesessen?allyrandomlyplaced–  So,blocksfornewfilesbecomescajeredacrossthedisk

•  2)Inodesallocatedfarfromblocks–  Allinodesatbeginningofdisk,farfromdata–  Traversingfilenamepaths,manipula?ngfiles,directories

requiresgoingbackandforthfrominodestodatablocks•  Bothoftheseproblemsgeneratemanylongseeks

35

CylinderGroups

•  BSDFFSaddressedbothoftheseproblemsusingtheno?onofacylindergroup–  Diskpar??onedintogroupsofcylinders–  Datablocksinsamefileallocatedinsamecylinder–  Filesinsamedirectoryallocatedinsamecylinder–  Inodesforfilesallocatedinsamecylinderasfiledatablocks

Metadata

Inodes,datablocks,etc…

Reducesnumberofseeks!

36

CylinderGroups

•  BSDFFSaddressedbothoftheseproblemsusingtheno?onofacylindergroup–  Diskpar??onedintogroupsofcylinders–  Datablocksinsamefileallocatedinsamecylinder–  Filesinsamedirectoryallocatedinsamecylinder–  Inodesforfilesallocatedinsamecylinderasfiledatablocks

•  Freespacerequirement–  Tobeabletoallocateaccordingtocylindergroups,thediskmusthavefreespacescajeredacrosscylinders

–  10%ofthediskisreservedjustforthispurpose•  Onlyusedbyroot–whyitispossiblefor“df”toreport>100%

37

ProblemswithSmallBlocks

•  Smallblocks(1K)causedtwoproblems:–  Lowbandwidthu?liza?on–  Smallmaxfilesize(func?onofblocksize)

38

MaximumFileSize:1KBBlocks•  RecallUnixinodeshave:

–  12directblocks–  1singleindirectblock,1doubleindirectblock,

1tripleindirectblock•  Howlargecanafilebewith1KBblocks?•  Singleindirectblock:

–  Assuming32‐bitaddresses,wehave4bytesperblockpointer,so1KB/4=256blocks

–  So…256*1KB=256KB•  Double‐indirectblock:

–  256*256*1KB=64MB•  TripleIndirectblock:

–  256*256*256*1KB=16GB•  Total:~16GB

39

ProblemswithSmallBlocks

•  Smallblocks(1K)causedtwoproblems:–  Lowbandwidthu?liza?on–  Smallmaxfilesize(func?onofblocksize)

•  Fixusinglargerblocks(4K)–  Verylargefiles,onlyneedtwolevelsofindirec?onforsuppor?ngfilesofsize2^32

40

MaximumFileSize:4KBBlocks•  RecallUnixinodeshave:

–  12directblocks–  1singleindirectblock,1doubleindirectblock,

1tripleindirectblock•  Howlargecanafilebewith4KBblocks?•  Singleindirectblock:

–  Assuming32‐bitaddresses,wehave4bytesperblockpointer,so4KB/4=1024Bblocks

–  So…1024*1KB=1MB•  Double‐indirectblock:

–  1024*1024*1KB=1GB•  TripleIndirectblock:

–  1024*1024*1024*1KB=1TB•  Total:~1TB

41

ProblemswithSmallBlocks

•  Smallblocks(1K)causedtwoproblems:–  Lowbandwidthu?liza?on–  Smallmaxfilesize(func?onofblocksize)

•  Fixusinglargerblocks(4K)–  Verylargefiles,onlyneedtwolevelsofindirec?onforsuppor?ngfilesofsize2^32

– Whynotjustuseallindirectblocks?•  Over65%offilesaresmallerthan4KB(Tanenbaum,OSR2006)– What’stheproblemwiththat?

42

ProblemswithSmallBlocks

•  Smallblocks(1K)causedtwoproblems:–  Lowbandwidthu?liza?on–  Smallmaxfilesize(func?onofblocksize)

•  Fixusinglargerblocks(4K)–  Verylargefiles,onlyneedtwolevelsofindirec?onforsuppor?ngfilesofsize2^32

–  Problem:internalfragmenta?on–  Fix:Introduce“fragments”(1Kpiecesofablockcanbeusedforother,smallfiles)

43

OtherProblems

•  Problem:Mediafailures–  Ifyoulosethesuperblock,youloseeverything

•  Oratleastrecoveryisexpensive–  Solu?on:Replicatemasterblock(superblock)

•  Problem:reducedseeks,butevenoneisexpensive– Whatifwecanavoidgoingtodiskatall?

•  Next:otherFileSystemtricks

44

FileBufferCache

•  Applica?onsexhibitsignificantlocalityforreadingandwri?ngfiles

•  Idea:Cachefileblocksinmemorytocapturelocality–  Thisiscalledthefilebuffercache–  Cacheissystemwide,usedandsharedbyallprocesses–  Readingfromthecachemakesadiskperformlikememory–  Evena4MBcachecanbeveryeffec?ve

•  Issues–  ThefilebuffercachecompeteswithVM(tradeoffhere)–  LikeVM,ithaslimitedsize–  Needreplacementalgorithmsagain(usuallyLRUused)

45

CachingWrites

•  Applica?onsassumewritesmakeittodisk– Asaresult,writesareolenslowevenwithcaching

•  Severalwaystocompensateforthis–  “write‐behind”

•  Maintainaqueueofuncommijedblocks•  Periodicallyflushthequeuetodisk•  Unreliable

– Non‐vola?leRAM(NVRAM)•  Aswithwrite‐behind,butmaintainqueueinNVRAM•  Expensive

46

ReadAhead(Prefetching)

•  Manyfilesystemsimplement“readahead”–  FSpredictsthattheprocesswillrequestnextblock–  FSgoesaheadandrequestsitfromthedisk…–  ..whiletheprocessiscompu?ngonpreviousblock!– Whentheprocessrequestsblock,itwillbeincache–  Complementsthediskcache,whichalsoisdoingreadahead

•  Forsequen?allyaccessedfilescanmakebigdifference–  Unlessblocksforthefilearescajeredacrossthedisk–  Filesystemstrytopreventthat,though(duringalloca?ng)

•  Unfortunately,thisdoesn’tdoanythingforwrites– Whatifwecouldmakewrite‐behindsequen?alaswell?

47

Log‐structuredFileSystem

•  TheLog‐structuredFileSystem(LFS)wasdesignedinresponsetotwotrendsinworkloadandtechnology:

•  1)Diskbandwidthscalingsignificantly(40%ayear)–  Latencyisnot

•  2)Largemainmemoriesinmachines–  Largebuffercaches–  Absorblargefrac?onofreadrequests–  Canuseforwritesaswell–  Coalescesmallwritesintolargewrites

•  LFStakesadvantageofbothofthesetoincreaseFSperformance–  RosenblumandOusterhout(Berkeley,‘91)

48

LFS:Approach

Op?mizefordiskwrites– Batchwritesindiskcache

•  U?lizeincreaseindiskthroughput– Treatthediskasonebiglogforwrites

•  Noneedtoworryaboutspecialseeksorplacement

– Alldatainfilesystemappendedtolog•  Datablocks,metadata,inodes,etc.

49

LFS:Example

Disk

File1 File2

WriteafileModifyfileWritetofile

WritetofileModifyfile…

inode

50

LFSChallenges

•  Howdoyoulocatedata?–  FFSplacesfilesinapar?cularloca?on–  LFSappendsdatatotheendofthelog

•  Howdoyoufreedata?–  Atsomepoint,youcan’t“append”anymore

–  Howdoyoutrackandrecoverstaleblocksinthelog?

51

LFS:Loca?ngData

•  FFSusesinodestolocatedatablocks–  Inodespre‐allocatedineachcylindergroup–  Directoriescontainloca?onsofinodes

•  LFSappendsinodesanddata(basicallyeverything)toendofthelog–  Makesthemhardtofind

•  Approach–  Useanotherlevelofindirec?on:Inodemaps

–  Inodemapsmapfile#stoinodeloca?on

–  Loca?onofinodemapblockskeptincheckpointregion

–  Checkpointregionhasafixedloca?on–  Cacheinodemapsinmemoryforperformance

52

LFS:Example(inodemaps)

Disk

File1 File2

WriteafileModifyfileWritetofile

WritetofileModifyfile…

inodes

53

CheckpointRegion InodeMap

Aren’treadss+llslow?Relyonbuffercachetostoreinodemaps.Largebuffercachemeansdon’tneedtoworryaboutreads!

LFS:FreeSpaceManagement

•  LFSappend‐onlyquicklyrunsoutofdiskspace–  Needtorecoverdeletedblocks

•  Approach:–  Fragmentlogintosegments–  Threadsegmentsondisk

•  Segmentscanbeanywhere–  Reclaimspacebycleaningsegments

•  Readsegment•  Copylivedatatoendoflog•  Nowhavefreesegmentyoucanreuse

•  Cleaningisabigproblem–  Costlyoverhead

54

LFSExample(cleaning)

55

Disk

Segment2

Segment1

=deadregion

Segment3

LFS:FreeSpaceManagement

•  LFSappend‐onlyquicklyrunsoutofdiskspace–  Needtorecoverdeletedblocks

•  Approach:–  Fragmentlogintosegments–  Threadsegmentsondisk

•  Segmentscanbeanywhere–  Reclaimspacebycleaningsegments

•  Readsegment•  Copylivedatatoendoflog•  Nowhavefreesegmentyoucanreuse

•  Cleaningisabigproblem–  Costlyoverhead

56

LFS:Now

•  Revolu?onary(atthe?me)designconceptthatspurredalotofdebateandresearchintheareainthe90s

•  Present‐dayfilesystemsusesolupdatesorjournaling,whichseemtobedueinlargeparttotheconceptsfromLFS

57

Summary

•  We’veexplainedhowfilesystemscanbestructured–  Manytechniquesaresimilartothoseinmemorymanagement

–  Unix‐style:Inodes,datablocks,files,directories,etc…..

•  Performanceoffilesystemshighlydependentondisktechnology–  Seekstakealong?me

–  Placementofdatamajers(swiss‐cheeseproblemandseekavoidance)

•  BerkeleyFastFileSystem(FFS)–  Cylindergroups(whichfilesarelikelytobeaccessedtogether)–  Largerblocksizestoincreasethroughput

•  Log‐StructuredFileSystem(LFS)–  Op?mizeforwrites(batchwrites)

–  Relyoncacheforreads(dataplacementprac?callyignored)

•  Assortedothertricks–  Pre‐fetching(avoidextrafetchesandputinbuffercache)–  Delayedwrites(likeLFS;usedinmodernjournalingfilesystems) 58

NextTime

•  ReadChapter11.9,12.7,15•  CheckWebsiteforcourseannouncements

–  hjp://www.cs.ucsd.edu/classes/su09/cse120

59

RAID

•  Problem:–  Diskdrivesfailfrequently–  DisksareSLOW(seek?mes&transferrates)

•  Idea:Usemanydisksinparalleltoincreasestoragebandwidth,improvereliability–  Filesarestripedacrossdisks–  Eachstripepor?onisread/wrijeninparallel–  Bandwidthincreaseswithmoredisks

•  RedundantArrayofInexpensiveDisks(RAID)–  Astoragesystem,notafilesystem–  Pajerson,Katz,andGibson(Berkeley,’88)

60

RAIDLevels

•  Inmarke?ngliterature,youwillseeRAIDsystemsadver?sedassuppor?ngdifferent“RAIDLevels”

•  Herearesomecommonlevels:–  RAID0:Striping

•  Goodforrandomaccess(noreliability)–  RAID1:Mirroring

•  Twodisks,writedatatoboth(expensive,1Xstorageoverhead)–  RAID5:Floa?ngParity

•  Parityblocksfordifferentstripeswrijentodifferentdisks•  Nosingleparitydisk,hencenobojleneckatthatdisk

–  Raid“10”:Stripingplusmirroring•  Higherbandwidth,buts?llhavelargeoverhead•  SeethisonUltraDMAPCRAIDdiskcards

61

RAIDChallenges

•  Smallfiles(smallwriteslessthanafullstripe)–  Needtoreaden?restripe,updatewithsmallwrite,thenwriteen?resegmentouttodisks

•  Reliability– Moredisksincreasesthechanceofmediafailure(MTBF)

•  Turnreliabilityproblemintoafeature–  Useonedisktostoreparitydata

•  XORofalldatablocksinstripe–  Canrecoveranydatablockfromallothers+parityblock–  Hence“redundant”inname–  Introducesoverhead,butassumingdisksare“inexpensive”

62