chapter 12: file system implementation file system structure file system structure file system...

58
CHAPTER 12: FILE SYSTEM IMPLEMENTATION File System Structure File System Structure File System Implementation File System Implementation Directory Implementation Directory Implementation Allocation Methods Allocation Methods Free-Space Management Free-Space Management Efficiency and Performance Efficiency and Performance Recovery Recovery Log-Structured File Systems Log-Structured File Systems NFS NFS

Upload: homer-porter

Post on 26-Dec-2015

261 views

Category:

Documents


4 download

TRANSCRIPT

CHAPTER 12: FILE SYSTEM IMPLEMENTATION

File System StructureFile System Structure File System Implementation File System Implementation Directory ImplementationDirectory Implementation Allocation MethodsAllocation Methods Free-Space Management Free-Space Management Efficiency and PerformanceEfficiency and Performance RecoveryRecovery Log-Structured File SystemsLog-Structured File Systems NFSNFS

FILE-SYSTEM STRUCTURE Disks provide the bulk of secondary storage on which Disks provide the bulk of secondary storage on which

a file system is maintaineda file system is maintained can be written in placecan be written in place Sequential access and direct accessSequential access and direct access I/O in blocks not in bytesI/O in blocks not in bytes

File system allows the data on disks to be stored, File system allows the data on disks to be stored, located, and retrieved easilylocated, and retrieved easily To define how the FS should look to the userTo define how the FS should look to the user To create algorithms and data structures to map the To create algorithms and data structures to map the

logical FS onto the physical secondary-storage logical FS onto the physical secondary-storage devices. devices.

File-System Structure: Layered file system

File-System Structure: Layered file system I/O control I/O control

consists of device drivers and interrupt handlers to consists of device drivers and interrupt handlers to transfer information between the main memory transfer information between the main memory and the disk system. and the disk system.

Input: retrieve Input: retrieve drive 1, cylinder 73, track 2, sector 10drive 1, cylinder 73, track 2, sector 10 Output: low-level, hardware-specific instructions Output: low-level, hardware-specific instructions

that are used by the hardware controllerthat are used by the hardware controller Basic file system Basic file system

issues generic commands to the appropriate device issues generic commands to the appropriate device driver to read and write physical blocks on the diskdriver to read and write physical blocks on the disk

Input: retrieve Input: retrieve block 123block 123 Output: retrieve Output: retrieve drive 1, cylinder 73, track 2, sector 10drive 1, cylinder 73, track 2, sector 10

File-System Structure: Layered file system The file-organization module The file-organization module

knows about the files and their logical blocks, as knows about the files and their logical blocks, as well as physical blocks.well as physical blocks.

To translate a file’s logical block addresses to its To translate a file’s logical block addresses to its physical block addresses. physical block addresses.

Each file’s logical block addresses are Each file’s logical block addresses are numbered from 0 (or 1) through N.numbered from 0 (or 1) through N.

Each file’s physical block addresses are Each file’s physical block addresses are different, are unique within a partition. different, are unique within a partition.

Free-space manager: Free-space manager: Tracks unallocated blocksTracks unallocated blocksAnd provides these blocks when requested. And provides these blocks when requested.

File-System Structure: Layered file system The logical file system The logical file system

manages metadata information (metadata v.s. manages metadata information (metadata v.s. actual data)actual data)

To manage the directory structureTo manage the directory structure To manage the file structures via FCB (file control To manage the file structures via FCB (file control

blocks)blocks)Contains information such as ownership, Contains information such as ownership,

permissions, location of the file contentspermissions, location of the file contentsResponsible for protection and security.Responsible for protection and security.

File-System Structure: Example FSes Windows: Windows:

FAT (File Allocation Table) (12, 16, 32) FAT (File Allocation Table) (12, 16, 32) NTFS (Windows NT File System)NTFS (Windows NT File System)

UNIX: UNIX: UFS (Unix File system) UFS (Unix File system) ext2 ext2 ext3 ext3 more /proc/filesystemsmore /proc/filesystems cd /usr/src/linux-2.4/fs # to explore for more. cd /usr/src/linux-2.4/fs # to explore for more.

FILE-SYSTEM IMPLEMENTATION

Overview: Data structures for implementing a FSOverview: Data structures for implementing a FS On-disk structuresOn-disk structures In-memory structuresIn-memory structures

Partitions and mountingPartitions and mounting VFSVFS

File-System Implementation: Overview On-Disk structuresOn-Disk structures

Boot control blockBoot control block (boot block, partition boot sector) (boot block, partition boot sector)Used to boot an OS from that partition. Used to boot an OS from that partition.

Partition control blockPartition control block (super block, Master File Table) (super block, Master File Table)Block numbers, block size, free-block count, free-blocBlock numbers, block size, free-block count, free-bloc

k pointers, free FCB count and FCB pointersk pointers, free FCB count and FCB pointers Directory structureDirectory structure used to organized the files used to organized the files

Linear list / hash tablesLinear list / hash tables FCBFCB (inode, vnode, Master File Table record) (inode, vnode, Master File Table record)

File permissions, ownership, size, location of the data File permissions, ownership, size, location of the data blocksblocks

File-System Implementation: Overview In-memory structures used for both file-system In-memory structures used for both file-system

management and performance improvement via management and performance improvement via cachingcaching In-memory partition table containing information In-memory partition table containing information

about each mounted partition. about each mounted partition. In-memory directory structures containing the In-memory directory structures containing the

directory information of recently accessed directory information of recently accessed directories. directories.

System-wide open-file table containing a copy of the System-wide open-file table containing a copy of the FCB of each open file as well as other information. FCB of each open file as well as other information.

Per-process open file-tables containing a pointer to Per-process open file-tables containing a pointer to the appropriate entry in the system-wide open-file the appropriate entry in the system-wide open-file table, as well as other information. table, as well as other information.

File-System Implementation: Overview To create a new fileTo create a new file

An application program calls the logical file systemAn application program calls the logical file system The logical file system The logical file system

To allocate a new FCB (see the next slide for FCB)To allocate a new FCB (see the next slide for FCB)To read in the appropriate directory To read in the appropriate directory

• UNIX treats a directory exactly as a file. UNIX treats a directory exactly as a file. • Windows NT treats a directory as a record Windows NT treats a directory as a record

inside the MFT. inside the MFT. To add a new entryTo add a new entryTo fill in the new entry with the filename and the To fill in the new entry with the filename and the

new FCBnew FCBTo write it back to the diskTo write it back to the disk

File-System Implementation: Overview: FCB

File-System Implementation: Overview To open a fileTo open a file

To pass a file name to the logical file systemTo pass a file name to the logical file system To search the directory for the given file nameTo search the directory for the given file name To read in the file’s FCBTo read in the file’s FCB To put the file’s FCB to the system-wide open-file To put the file’s FCB to the system-wide open-file

table table To add a new entry in the per-process open-file To add a new entry in the per-process open-file

table, with a pointer to the entry in the system-table, with a pointer to the entry in the system-wide open-table and some other fieldswide open-table and some other fields

To return a pointer to the appropriate entry in the To return a pointer to the appropriate entry in the per-process file system table. (file descriptor or file per-process file system table. (file descriptor or file handle)handle)

File-System Implementation: Overview To close a fileTo close a file

To remove the entry in the per-process open-file To remove the entry in the per-process open-file tabletable

To decrement the system-wide open-file entry’s To decrement the system-wide open-file entry’s open count. open count.

If open count is 0, copy the updated file If open count is 0, copy the updated file information to the disk-based directory structure information to the disk-based directory structure and delete this entry. and delete this entry.

File-System Implementation: Overview

File-System Implementation: Overview To use the file system as interface to other system To use the file system as interface to other system

aspects, such as networking. aspects, such as networking. To use caches to speed up the file operationsTo use caches to speed up the file operations

BSD UNIXBSD UNIX To use the file system for other purpose such as To use the file system for other purpose such as

networking interface. networking interface.

File-System Implementation: partitions and mounting

Partitions vs disksPartitions vs disks A disk can be sliced into multiple partitionsA disk can be sliced into multiple partitions A partition can span multiple disksA partition can span multiple disks

Partitions can either be “raw” or “cooked”Partitions can either be “raw” or “cooked” Raw partition: Raw partition:

No file systemNo file systemswap space, databaseswap space, database

Cooked partition:Cooked partition:Has file systemHas file systemBoot block can be used for selective booting. Boot block can be used for selective booting. Super blockSuper block

File-System Implementation: partitions and mounting

To mount a partition before using its FSTo mount a partition before using its FS Manual mounting vs automatic mountingManual mounting vs automatic mounting

How to mount a partitionHow to mount a partition To read in the super block via its device driverTo read in the super block via its device driver To verify its consistencyTo verify its consistency To repair it if necessary (fsck)To repair it if necessary (fsck) To add an entry in the in-memory mount table struTo add an entry in the in-memory mount table stru

cture. cture. How to mount a partition for Windows How to mount a partition for Windows

To mount at bootTo mount at boot To mount manuallyTo mount manually

File-System Implementation: partitions and mounting

How to mount a partition for UNIXHow to mount a partition for UNIX To mount a partition at a directoryTo mount a partition at a directory To add a entry at the mount-tableTo add a entry at the mount-table To let one field of the mount-table entry point to thTo let one field of the mount-table entry point to th

e super block of the FS on that devicee super block of the FS on that device To set a flag in the in-memory copy of the inode fTo set a flag in the in-memory copy of the inode f

or that directory, indicating this directory is a mouor that directory, indicating this directory is a mount pointnt point

To set a field in the in-memory copy of the inode fTo set a field in the in-memory copy of the inode for that directory point to an entry in the mount tablor that directory point to an entry in the mount table, indicating which device is mounted there. e, indicating which device is mounted there.

File-System Implementation: VFS How to support multiple FSesHow to support multiple FSes ?? How to integrate many Fses into a directory structure? How to integrate many Fses into a directory structure? How to seamlessly move among various FSes?How to seamlessly move among various FSes?

To write directory and file routines for each types.To write directory and file routines for each types. To use VFS (VFS uses oo techniques to simplify, organizTo use VFS (VFS uses oo techniques to simplify, organiz

e, and modularize the implementation)e, and modularize the implementation) Contributed by SUN Microsystems.Contributed by SUN Microsystems.

File-System Implementation: VFS Top layer: file-system interfaceTop layer: file-system interface

Open, read, write, and close and file descriptorsOpen, read, write, and close and file descriptors The middle layer: VFSThe middle layer: VFS

To separate FS generic operations from their implementatTo separate FS generic operations from their implementation by defining a clean VFS interfaceion by defining a clean VFS interface

The VFS is based on a file-representation, called a vnode, The VFS is based on a file-representation, called a vnode, that contains a numerical designator for a network-wide uthat contains a numerical designator for a network-wide unique file. (UNIX inodes are unique within only a single nique file. (UNIX inodes are unique within only a single file system)file system)

The bottom layerThe bottom layer Various FS implementationVarious FS implementation

Ext3Ext3NFSNFS

File-System Implementation: VFS

DIRECTORY IMPLEMENTATIONLinear listLinear list To use a linear list of file names with pointers to the To use a linear list of file names with pointers to the

data blocks. data blocks. To find a fileTo find a file

To require a linear search.To require a linear search. To create a fileTo create a file

To search the directory to make sure no existing To search the directory to make sure no existing file has the same file name.file has the same file name.

To add a new entry at the end of the directory.To add a new entry at the end of the directory. To delete a fileTo delete a file

To search the directory for the file.To search the directory for the file. To remove the entry.To remove the entry. To free the space allocated to this file.To free the space allocated to this file.

Directory ImplementationLinear listLinear list To reuse a directory entryTo reuse a directory entry

To mark the entry as used.To mark the entry as used. To attach it to a list of free directory entries.To attach it to a list of free directory entries. To decrease the length of the directory.To decrease the length of the directory.

Discussion:Discussion: Simple to program, time-consuming to executeSimple to program, time-consuming to execute The searching is expensiveThe searching is expensive

Binary searchBinary search• To keep it sorted listTo keep it sorted list• To use B-treeTo use B-tree

Directory ImplementationHash TableHash Table To use a linear list to store the directory entries and to use To use a linear list to store the directory entries and to use

a hash table to quickly find out the directory entry given a a hash table to quickly find out the directory entry given a file name. file name.

No collisions allowed (Each hash entry has a single value)No collisions allowed (Each hash entry has a single value) Use a hash function to map file name to a hash value. Use a hash function to map file name to a hash value. Hash function should be dynamically changed. Hash function should be dynamically changed. Fastest.Fastest.

Collisions allowed (Each hash entry has a list of multiple Collisions allowed (Each hash entry has a list of multiple values)values) Use a hash function to map file name to a hash value Use a hash function to map file name to a hash value

and use this value to index the hash table and then and use this value to index the hash table and then search the list to find out the directory entry.search the list to find out the directory entry.

Faster.Faster.

ALLOCATION METHODS An allocation method refers to how disk An allocation method refers to how disk

blocks are allocated for files:blocks are allocated for files:

Contiguous allocationContiguous allocation

Linked allocationLinked allocation

Indexed allocationIndexed allocation

Allocation Methods: Contiguous Allocation The contiguous-allocation methods requires each file The contiguous-allocation methods requires each file

to occupy a set of contiguous blocks on the diskto occupy a set of contiguous blocks on the disk The directory for a file consists only its starting The directory for a file consists only its starting

location (block #) and length (number of blocks) (See location (block #) and length (number of blocks) (See the next slide)the next slide)

Discussions: Discussions: Supports both sequential access and direct access.Supports both sequential access and direct access. Simple to implement.Simple to implement. External fragmentation (dynamic storage-External fragmentation (dynamic storage-

allocation problem).allocation problem). How to specify an initial size for a fileHow to specify an initial size for a file

Under estimating its sizeUnder estimating its sizeOver estimating its sizeOver estimating its size

Allocation Methods: Contiguous Allocation

Allocation Methods: Contiguous Allocation Extent-based file systemExtent-based file system

Extent-based file systems allocate disk blocks in Extent-based file systems allocate disk blocks in eextentsxtents. .

An An extentextent is a contiguous block of disks. Extents a is a contiguous block of disks. Extents are allocated for file allocation. A file consists of onre allocated for file allocation. A file consists of one or more extents.e or more extents.

Many newer file systems (I.e. Veritas File System) Many newer file systems (I.e. Veritas File System) use a modified contiguous allocation scheme.use a modified contiguous allocation scheme.

Contiguous allocation can be combined with other allContiguous allocation can be combined with other allocation methods.ocation methods. Contiguous allocation for small filesContiguous allocation for small files Other allocations for large files.Other allocations for large files.

Allocation Methods: Linked Allocation With linked allocation, With linked allocation,

Each file is a linked list of disk blocks; Each file is a linked list of disk blocks; The disk blocks may be scattered anywhere on the The disk blocks may be scattered anywhere on the

disk. disk. The directory contains a pointer to the first and last The directory contains a pointer to the first and last

blocks of the file. (See the next slide)blocks of the file. (See the next slide) Disk blockDisk block

An example linked file (see the next slide)An example linked file (see the next slide)

pointerblock =

Allocation Methods: Linked Allocation

Allocation Methods: Linked Allocation Simple – need only starting addressSimple – need only starting address Free-space management system – no waste of Free-space management system – no waste of

space space No random accessNo random access Pointers waste space: to use clusters rather Pointers waste space: to use clusters rather

than sectors than sectors To improve usageTo improve usage To speed upTo speed up

Poor reliabilityPoor reliability Imagine the pointer is messed up.Imagine the pointer is messed up.

Allocation Methods: Linked Allocation FAT FSFAT FS

FAT (File Allocation Table) duplicatedFAT (File Allocation Table) duplicated Supports direct accessSupports direct access CachedCached Poor disk utilizationPoor disk utilization

Allocation Methods: Linked Allocation

Allocation Methods: Indexed Allocation Problems:Problems:

External fragmentation and size-declaration for External fragmentation and size-declaration for contiguous allocationcontiguous allocation

Direct access for linked allocationDirect access for linked allocation Indexed allocationIndexed allocation

Bringing all the pointers together into one location: Bringing all the pointers together into one location:

index block index block (See the next slide(See the next slide)) Logical viewLogical view

index table

Allocation Methods: Indexed Allocation

Allocation Methods: Indexed Allocation Need index blockNeed index block Access methods: sequential access and direct access. Access methods: sequential access and direct access. Mapping from logical to physical in a file of Mapping from logical to physical in a file of

maximum size of 256K words (or 1024KB) and block maximum size of 256K words (or 1024KB) and block size of 512 words. We need only 1 block for index size of 512 words. We need only 1 block for index table. (512x2KB=1024KB)table. (512x2KB=1024KB)

Index blocks waste spaceIndex blocks waste space How large should the index block be?How large should the index block be?

Linked schemeLinked scheme Multilevel indexMultilevel index Combined schemeCombined scheme

Allocation Methods: Indexed Allocation Linked schemeLinked scheme

For a small file, one index blockFor a small file, one index block For a large file, more index blocks can be linked For a large file, more index blocks can be linked

together.together. Multilevel indexMultilevel index

1-level index block: 1024*4KB1-level index block: 1024*4KB 2-level index block: 1024x1024*4KB2-level index block: 1024x1024*4KB 3-level index block: 1024x1024*1024*4KB3-level index block: 1024x1024*1024*4KB (similar to paging)(similar to paging)

Allocation Methods: Indexed Allocation

outer-index

index table file

Allocation Methods: Indexed Allocation Combined schemeCombined scheme

12 for direct pointers12 for direct pointers 1 single-indirect block1 single-indirect block 1 double-indirect block1 double-indirect block 1 triple-indirect block1 triple-indirect block

Allocation Methods: Indexed Allocation

Allocation Methods: Performance Two criteria:Two criteria:

Storage utilization efficiencyStorage utilization efficiency Data block access timeData block access time

Contiguous allocation: Good for known-size fileContiguous allocation: Good for known-size file Linked allocation: Good for storage utilization Linked allocation: Good for storage utilization Indexed allocation: Access time depends on index Indexed allocation: Access time depends on index

structure, file size, block positionstructure, file size, block position Conclusion:Conclusion:

Combining contiguous allocation and linked Combining contiguous allocation and linked allocation (Some OS)allocation (Some OS)

Combining contiguous allocation and index Combining contiguous allocation and index allocation (SUN)allocation (SUN)

FREE-SPACE MANAGEMENT

Bit vectorBit vector Linked ListsLinked Lists GroupingGrouping CountingCounting

Free-Space Management: Bit vector The free-space list is implemented as a bit map or bit The free-space list is implemented as a bit map or bit

vector. Each block is represented by 1 bit. vector. Each block is represented by 1 bit. If the block is free, the bit is 1; If the block is free, the bit is 1; if the block is allocated, the bit is 0.if the block is allocated, the bit is 0.

An exampleAn example 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 150 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 00 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0

Free-Space Management: Bit vector Discussion:Discussion:

Simple to implement Simple to implement Efficient to find the first free blockEfficient to find the first free block Easy to get contiguous filesEasy to get contiguous files Block number calculation:

(number of bits per word) *(number of 0-value words)

+offset of first 1 bit Bit map requires extra space. Example:Bit map requires extra space. Example:

block size = 2block size = 21212 bytes bytesdisk size = 2disk size = 23030 bytes (1 gigabyte) bytes (1 gigabyte)nn = 2 = 23030/2/21212 = 2 = 21818 bits (or 32K bytes) bits (or 32K bytes)

Free-Space Management: Linked Lists

Linked list (free Linked list (free list)list) Cannot get Cannot get

contiguous contiguous space easilyspace easily

No waste of No waste of spacespace

Free-Space Management: Others Grouping Grouping

To linked blocks to store the addressesTo linked blocks to store the addresses To group n-1 addresses into an address blockTo group n-1 addresses into an address block To use the last address in an address block to point To use the last address in an address block to point

to next address block.to next address block. Easier to find a large number of free blocks.Easier to find a large number of free blocks.

CountingCounting Every entry is a pair of (starting address, Every entry is a pair of (starting address,

contiguous block number) rather than just a contiguous block number) rather than just a addressaddress

The total list is smaller. The total list is smaller.

EFFICIENCY AND PERFORMANCE EfficiencyEfficiency

Preallocating the i-node on a partition (UNIX FS)Preallocating the i-node on a partition (UNIX FS) Different cluster size (BSD UNIX)Different cluster size (BSD UNIX) File sizeFile size To fix the parameters or dynamically change the pTo fix the parameters or dynamically change the p

arameters. arameters.

Efficiency and Performance Page CachePage Cache

I/O Without a Unified Buffer CacheI/O Without a Unified Buffer Cache I/O Using a Unified Buffer CacheI/O Using a Unified Buffer Cache

Reads/WritesReads/Writes Synchronous reads (initially) Synchronous reads (initially)

/Aynchronous reads (later on)/Aynchronous reads (later on) Aynchronous writes (normally)Aynchronous writes (normally)

/ Synchronous write/Aynchronous (sometimes)/ Synchronous write/Aynchronous (sometimes) RAM Disks vs OS caching. RAM Disks vs OS caching.

Efficiency and Performance I/O Without a Unified Buffer Cache

Efficiency and Performance I/O Using a Unified Buffer Cache

Efficiency and Performance:Various Disk-Caching Locations

Recovery Consistency checkingConsistency checking

Metadata are more important than actual dataMetadata are more important than actual data

UNIX caches directory entries for reads,UNIX caches directory entries for reads,

Not any data write that results in space allocation, Not any data write that results in space allocation, or other metadata changes.or other metadata changes.

Backup and restoreBackup and restore Backup schemeBackup scheme

Day 1: complete backupDay 1: complete backupDay 2,3,4,…n: incremental backupDay 2,3,4,…n: incremental backup

Save the backup in different places. Save the backup in different places.

LOG STRUCTURED FILE SYSTEMS

An file operation, such as file create, can involve manAn file operation, such as file create, can involve many structural changes within the file system on the disk.y structural changes within the file system on the disk. Directory structures are modifiedDirectory structures are modified FCBs are allocatedFCBs are allocated Data blocks are allocatedData blocks are allocated The free counts for all of these blocks are decreaseThe free counts for all of these blocks are decrease

d, …d, … An file operation can be interrupted, can cause inconsAn file operation can be interrupted, can cause incons

istency, difficult to recoveristency, difficult to recover Transaction (DBMS). Transaction (DBMS).

Log Structured File System Log structuredLog structured (or journaling) file systems record ea (or journaling) file systems record ea

ch update to the file system as a ch update to the file system as a transactiontransaction.. All transactions are written to a All transactions are written to a loglog. A transaction is c. A transaction is c

onsidered onsidered committedcommitted once it is written to the log. Ho once it is written to the log. However, the file system may not yet be updated.wever, the file system may not yet be updated.

The transactions in the log are asynchronously writteThe transactions in the log are asynchronously written to the file system. When the file system is modified, n to the file system. When the file system is modified, the transaction is removed from the log.the transaction is removed from the log.

If the file system crashes, all remaining transactions iIf the file system crashes, all remaining transactions in the log must still be performed.n the log must still be performed.

NFS An implementation and a specification of a An implementation and a specification of a

software system for accessing remote files software system for accessing remote files across LANs (or WANs).across LANs (or WANs).

Exports Exports mountmount

Homework 12.112.1 12.412.4 12.512.5 12.612.6