the design and implementation of a log-structured file system

39
THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley

Upload: maxwell-rasmussen

Post on 13-Mar-2016

27 views

Category:

Documents


0 download

DESCRIPTION

THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM. M. Rosenblum and J. K. Ousterhout University of California, Berkeley. THE PAPER. Presents a new file system architecture allowing mostly sequential writes Assumes most data will be in RAM cache - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

THE DESIGN AND IMPLEMENTATIONOF A LOG-STRUCTURED FILE SYSTEM

M. Rosenblum and J. K. Ousterhout

University of California, Berkeley

Page 2: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

THE PAPER

• Presents a new file system architecture allowing mostly sequential writes

• Assumes most data will be in RAM cache– Settles for more complex, slower disk reads

• Describes a mechanism for reclaiming disk space– Essential part of paper

Page 3: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

OVERVIEW

• Introduction• Key ideas• Data structures• Simulation results • Sprite implementation• Conclusion

Page 4: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

INTRODUCTION

• Processor speeds increase at an exponential rate

• Main memory sizes increase at an exponential rate

• Disk capacities are improving rapidly• Disk access times have evolved much more

slowly

Page 5: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Consequences

• Larger memory sizes mean larger caches– Caches will capture most read accesses– Disk traffic will be dominated by writes– Caches can act as write buffers replacing

many small writes by fewer bigger writes• Key issue is to increase disk write performance

by eliminating seeks

Page 6: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Workload considerations

• Disk system performance is strongly affected by workload

• Office and engineering workloads are dominated by accesses to small files– Many random disk accesses– File creation and deletion times dominated by

directory and i-node updates– Hardest on file system

Page 7: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Limitations of existing file systems

• They spread information around the disk– I-nodes stored apart from data blocks– less than 5% of disk bandwidth is used to

access new data• Use synchronous writes to update directories

and i-nodes– Required for consistency– Less efficient than asynchronous writes

Page 8: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

KEY IDEA

• Write all modifications to disk sequentially in a log-like structure

– Convert many small random writes into large sequential transfers

– Use file cache as write buffer

Page 9: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Main advantages

• Replaces many small random writes by fewer sequential writes

• Faster recovery after a crash– All blocks that were recently written are at the tail

end of log– No need to check whole file system for

inconsistencies• Like UNIX and Windows 95/98 do

Page 10: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

THE LOG

• Only structure on disk• Contains i-nodes and data blocks• Includes indexing information so that files can be

read back from the log relatively efficiently• Most reads will access data that are already in

the cache

Page 11: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Disk layouts of LFS and UNIX

Disk

Disk

Log

Inode Directory Data Inode map

LFS

Unix FFS

dir1

dir2

file1

file2

dir1

dir2

file1

file2

Page 12: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Index structures

• I-node map maintains the location of all i-node blocks– I-node map blocks are stored on the log

• Along with data blocks and i-node blocks– Active blocks are cached in main memory

• A fixed checkpoint region on each disk contains the addresses of all i-node map blocks at checkpoint time

Page 13: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Accessing an i-nodeFixed locationbut not up to date

I-node map blocksspread on the log

I-node blocks alsospread on the log

Log

Log

Checkpoint Area

Page 14: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

The way it worksFixed locationbut not up to date

I-node map blocksspread on the log

I-node blocks alsospread on the log

Log

Log

Checkpoint Area

Active blockscached in RAM

Active blockscached in RAM

Page 15: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Summary

Page 16: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Segments

• Must maintain large free extents for writing new data

• Disk is divided into large fixed-size extents called segments (512 kB in Sprite LFS)

• Segments are always written sequentially from one end to the other

• Old segments must be cleaned before they are reused

Page 17: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Segment usage table

• One entry per segment• Contains

– Number of free blocks in segment– Time of last write

• Used by the segment cleaner to decide which segments to clean first

Page 18: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Segment cleaning (I)

• Old segments contain– live data– “dead data” belonging to files that were

overwritten or deleted• Segment cleaning involves writing out the live data• A segment summary block identifies each piece of

information in the segment

Page 19: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Segment cleaning (II)

• Segment cleaning process involves1. Reading a number of segments into memory2. Identifying the live data3. Writing them back to a smaller number of

clean segments• Key issue is where to write these live data

– Want to avoid repeated moves of stable files

Page 20: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Write cost

u = utilization

Page 21: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Segment Cleaning Policies• Greedy policy: always cleans the least-utilized

segments

Page 22: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Simulation results (I)

• Consider two file access patterns– Uniform– Hot-and-cold: (100 - x) % of the accesses

involve x % of the files90% of the accesses involve 10% of the files(a rather crude model)

Page 23: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Greedy policy

No variance= formula

Page 24: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Key• No variance displays write cost computed from

formula assuming that all segments have the same utilization u (not true!)

• LFS uniform uses a greedy policy• LFS hot-and-cold uses a greedy policy that

sorts live blocks by age• FFS improved is an estimation of the best

possible FFS performance

Page 25: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Comments

• Write cost is very sensitive to disk utilization– Higher disk utilizations result in more frequent

segment cleanings• Free space in cold segments is more valuable

than free space in hot segments– Value of a segment free space depends on

the stability of live blocks in segment

Page 26: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Copying live blocks

• Age sort: – Sorts the blocks by the time they were last

modified – Groups blocks of similar age together into

new segments• Age of a block is good predictor of its survival

Page 27: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Segment utilizations

Page 28: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Comments

• Locality causes the distribution to be more skewed towards the utilization at which cleaning occurs.

• Segments are cleaned at higher utilizations than they could

Page 29: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Cost benefit policy

• Uses criterion

Page 30: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Using a cost-benefit policy

75%

15%

Page 31: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

What happens

• Hot and cold segments are now cleaned at different utilization thresholds– 75% utilization for cold segments– 15% utilization for hot segments

• And it works much better!

Page 32: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Using a cost benefit policy

Page 33: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Comments

• Cost benefit policy works much better

Page 34: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Sprite LFS• Outperforms current Unix file systems by an order

of magnitude for writes to small files• Matches or exceeds Unix performance for reads

and large writes• Even when segment cleaning overhead is

included– Can use 70% of the disk bandwidth for writing– Unix file systems typically can use only 5-10%

Page 35: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Crash recovery (I)• Uses checkpoints

– Position in the log at which all file system structures are consistent and complete

• Sprite LFS performs checkpoints at periodic intervals or when the file system is unmounted or shut down

• Checkpoint region is then written on a special fixed position; contains addresses of all blocks in inode map and segment usage table

Page 36: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Crash recovery (II)

The Log

CheckpointArea

Last Checkpoint

Roll Forward

Page 37: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

Crash recovery (III)

• Recovering to latest checkpoint would result in loss of too many recently written data blocks

• Sprite LFS also includes roll-forward– When system restarts after a crash, it scans

through the log segments that were written after the last checkpoint

– When summary block indicates presence of a new i-node, Sprite LFS updates the i-node map

Page 38: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

SUMMARY• Log-structured file system

– Writes much larger amounts of new data to disk per disk I/O

– Uses most of the disk’s bandwidth• Free space management done through dividing

disk into fixed-size segments • Lowest segment cleaning overhead achieved

with cost-benefit policy

Page 39: THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED  FILE SYSTEM

ACKNOWLEDGMENTS

• Most figures were lifted from a PowerPoint presentation of same paper by Yongsuk Lee