bridging the information gap in storage protocol stacks

Bridging the Information Gapin Storage Protocol Stacks

Timothy E. Denehy,Andrea C. Arpaci-Dusseau,

and Remzi H. Arpaci-Dusseau

University of Wisconsin, Madison

2 of 32

State of Affairs

Namespace, Files, Metadata, Layout,

Free Space

Block Based, Read/Write

Parallelism, Redundancy

File System

Storage System

Interface

3 of 32

• Information gap may cause problems– Poor performance

• Partial stripe write operations– Duplicated functionality

• Logging in file system and storage system– Reduced functionality

• Storage system lacks knowledge of files• Time to re-examine the division of labor

Problem

4 of 32

• Enhance the storage interface– Expose performance and failure information

• Use information to provide new functionality– On-line expansion– Dynamic parallelism– Flexible redundancy

Our Approach

Informed LFS

Exposed RAID

5 of 32

Outline

• ERAID Overview• I·LFS Overview• Functionality and Evaluation

– On-line expansion– Dynamic parallelism– Flexible redundancy– Lazy redundancy

• Conclusion

6 of 32

• Backwards compatibility– Block-based interface– Linear, concatenated address space

• Expose information to the file system above– Regions– Performance– Failure

• Allow file system to utilize semantic knowledge

ERAID Goals

7 of 32

• Region– Contiguous portion of the address space

• Regions can be added to expand the address space• Region composition

– RAID: One region for all disks– Exposed: Separate regions for each disk– Hybrid

ERAID Regions

ERAID

8 of 32

• Exposed on a per-region basis• Queue length and throughput• Reveals

– Static disk heterogeneity– Dynamic performance and load fluctuations

ERAID Performance Information

ERAID

9 of 32

• Exposed on a per-region basis• Number of tolerable failures• Reveals

– Static differences in failure characteristics– Dynamic failures to file system above

ERAID Failure Information

RAID1

XERAID

10 of 32

Outline



• Conclusion

11 of 32

• Log-structured file system– Transforms all writes into large sequential writes– All data and metadata is written to a log– Log is a collection of segments– Segment table describes each segment– Cleaner process produces empty segments

• Why use LFS for an informed file system?– Write-anywhere design provides flexibility– Ideas applicable to other file systems

I·LFS Overview

12 of 32

• Goals– Improve performance, functionality, and manageability– Minimize system complexity

• Exploits ERAID information to provide– On-line expansion– Dynamic parallelism– Flexible redundancy– Lazy redundancy

I·LFS Overview

13 of 32

• NetBSD 1.5• 1 GHz Intel Pentium III Xeon• 128 MB RAM• Four fast disks

– Seagate Cheetah 36XL, 21.6 MB/s• Four slow disks

– Seagate Barracuda 4XL, 7.5 MB/s

I·LFS Experimental Platform

14 of 32

I·LFS Baseline Performance

• Four slow disks: 30 MB/s• Four fast disks: 80 MB/s

15 of 32

Outline



• Conclusion

16 of 32

• Goal: Expand storage incrementally– Capacity– Performance

• Ideal: Instant disk addition– Minimize downtime– Simplify administration

• I·LFS supports on-line addition of new disks

I·LFS On-line Expansion

17 of 32

• ERAID: Expandable address space• Expansion is equivalent to adding empty segments• Start with an oversized segment table• Activate new portion of segment table

I·LFS On-line Expansion Details

18 of 32

I·LFS On-line Expansion Experiment

• I·LFS immediately takes advantage of each extra disk

19 of 32

• Goal: Perform well on heterogeneous storage– Static performance differences– Dynamic performance fluctuations

• Ideal: Maximize throughput of the storage system• I·LFS writes data proportionate to performance

I·LFS Dynamic Parallelism

20 of 32

• ERAID: Dynamic performance information• Most file system routines are not changed

– Aware of only the ERAID linear address space– Reduces file system complexity

• Segment selection routine– Aware of ERAID regions and performance– Chooses next segment based on current performance

I·LFS Dynamic Parallelism Details

21 of 32

I·LFS Static Parallelism Experiment

• Simple striping limited by the rate of the slowest disk• I·LFS provides the full throughput of the system

22 of 32

I·LFS Dynamic Parallelism Experiment

• I·LFS adjusts to the performance fluctuation

23 of 32

• Goal: Offer new redundancy options to users• Ideal: Range of mechanisms and granularities• I·LFS provides mirrored per-file redundancy

I·LFS Flexible Redundancy

24 of 32

• ERAID: Region failure characteristics• Use separate files for redundancy

– Even inode N for original files– Odd inode N+1 for redundant files– Original and redundant data in different sets of regions

• Flexible data placement within the regions• Use recursive vnode operations for redundant files

– Leverage existing routines to reduce complexity

I·LFS Flexible Redundancy Details

25 of 32

I·LFS Flexible Redundancy Experiment

• I·LFS provides a throughput and reliability tradeoff

26 of 32

• Goal: Avoid replication performance penalty• Ideal: Replicate data immediately before failure• I·LFS offers redundancy with delayed replication• Avoids replication penalty for short-lived files

I·LFS Lazy Redundancy

27 of 32

• ERAID: Region failure characteristics• Segments needing replication are flagged• Cleaner acts as replicator

– Locates flagged segments– Checks data liveness and lifetime– Generates redundant copies of files

I·LFS Lazy Redundancy

28 of 32

I·LFS Lazy Redundancy Experiment

• I·LFS avoids performance penalty for short-lived files

29 of 32

Outline



• Conclusion

30 of 32

Comparison with Traditional Systems

• On-line expansion– Yes

• Dynamic parallelism (heterogeneous storage)– Yes, but with duplicated functionality

• Flexible redundancy– No, the storage system is not aware of file composition

• Lazy redundancy– No, the storage system is not aware of file deletions

31 of 32

Conclusion

• Introduced ERAID and I·LFS• Extra information enables new functionality

– Difficult or impossible in traditional systems• Minimal complexity

– 19% increase in code size• Time to re-examine the division of labor

32 of 32

Questions?

http://www.cs.wisc.edu/wind/

bridging the information gap in storage protocol stacks

Documents

informed file system

address spaceregions

mbsfour fast disks

region basisqueue length

intel pentium

segmentcleaner process

segmentswhy use lfs

free spaceblock