ffsck: the fast file system checker · evaluation 7 . e2fsck checking approach phase checking tasks...

Post on 13-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ffsck: The Fast File System Checker + Ao Ma, + Chris Dragga, + Andrea C. Arpaci-Dusseau, + Remzi H. Arpaci-Dusseau

1

* Backup Recovery Systems Division, EMC

+ University of Wisconsin, Madison

*

Background

File system • Data integrity is critically important

• Should be robust and reliable

Factors can corrupt FS • Unclean shutdown / system crash

• File system bugs

• Hardware failures

2

Existing Solutions • Journaling

• Copy-on-Write

• Soft updates

• Debugging

• Bug finding tools

• Checksums

• Scrubbing

Unclean shutdown

System crash

Bugs

Hardware failures

Reduce the probability of faults

but can’t protect against all of them

3

fsck: Last Resort To Repair

4

Approach • Scan FS offline

• Check metadata redundancy info

• Restore a damaged FS back to usable state

Dilemma • Fsck is slow, causing long downtime

• Unpredictable checking time

fsck Challenges

File system is evolving

• Capacity is increasing

• More complexity (more bugs and hardware failures)

Impact on fsck

• Longer downtime

• Use more frequently

5

What we did

Fast file system checker (ffsck)

• 10 times faster with identical checking policy

A modified version of ext3 (rext3)

• 20% improvement in big writes

• 43% improvement in random reads

• 10% degradation in small reads

6

Outline fsck analysis • Checking approach

• Performance

• FS tradeoffs

ffsck & rext3 • Goals

• Novel features

Evaluation

7

E2fsck Checking Approach Phase checking tasks

1 Scan inodes and indirect blocks in a logical order

2 Check each directory individually

3 Check directory connectivity

4 Check inode reference count and remove orphan inodes

5 Update the on-disk copies if necessary

8

Performance Analysis

750GB disk, 1GB memory, e2fsprog 1.41.12, Linux 2.6.28

Initialize disk image by creating directories with small files (4KB- 2MB)

Increase FS size by creating new files and appending data to files(4KB-1MB)

E2fsck doesn’t scale well

Phase 1 dominates the checking time

9

1666

2554

3398

4176

0

1000

2000

3000

4000

5000

150 GB 300 GB 450 GB 600 GB

phase 1 phase 2 phase 3 phase 4 phase 5

Total Checking Time (second)

FS size

Cumulative Time Spent on Reading Indirect Blocks and Inode Blocks

Indirect blocks are the bottleneck 10

0 50

100 150 200 250 300 350 400 450 500

1 21 41 61 81 101

inode block indirect block

Millisecond

Read Block Number

File System Design Tradeoffs

11

Identical allocation for data and indirect blocks

• Pros: store them contiguously and facilitate sequential access

• Cons: metadata scatters across disk

Rely on tree structure to locate indirect blocks • Pros: simple and straightforward

• Cons: impose a strict ordering of access

Fsck: An Afterthought Design

12

Repairing capability is not prioritized

• File system has limited support for checker

• Checker is developed as a peripheral addition, rather

than a tight component

Outline fsck analysis • Checking approach

• Performance

• FS tradeoffs

ffsck & rext3 • Goals

• Novel features

Evaluation

13

Ffsck and Rext3

Prioritize fast repair when design FS • Fast scan

• Robust checking performance

• Competitive FS performance

ffsck and rext3 is based on e2fsck and ext3

14

Basic Layout of Ext3

Block Group 0

Super block

Group Descriptor

Data Bitmap

Inode Bitmap

Inode table

Data blocks

Block Group n Block Group i

15

Overview of ffsck and rext3

New disk layout

Disk-order scan

Self-check and Cross-check

Fast recovery with bitmap snapshot

16

Rext3 Disk Layout Decouple allocation

Improve metadata density

Indirect block, directory data block

Super block

Group Descriptor

Data Bitmap

Inode Bitmap

Inode table

Data blocks Indirect region

Inode table Indirect region Data blocks

New allocation example

Ino Ind1

D1 … Ind 2

Da Db

17

Rext3 Disk Layout

More additional seeks?

Disk track buffer

18

Track buffer

2

3

4

5

6

7

8

1

9

10

11

12

13

14

15 Spindle

Rotates this way

16

Heads Data

16MB

32MB

64MB

128MB

Size keeps increasing

Disk-order Scan Most efficient way to scan metadata Predictable scanning time

Super block

Group Descriptor

Data Bitmap

Inode Bitmap

Inode table

Indirect region

Data Region

Metadata region

Data Region

Data Region

Metadata region

Metadata region

block group 0 block group 1 block group n

read seek read seek read seek

19

Memory Pressure

20

Disk-order scan accesses all the indirect

blocks without using the indirect tree

• Can’t perform checking until all the related

metadata are cached

• Impractical for large-scale FS

Self-check, Cross-check

Separate self-check and cross-check

• Self-ID is added

• Finish most checks without referring to other

metadata (self-check)

Self-check and discard

• Once self-check is performed, remove unused

fields for cross-check

21

Example: Self-check & Discard

ino

Its own LBA

Blk_num = 84

Last pointer offset = 83

Last Pointer: 94

Compression ratio is nearly 250:1

Indirect block disk copy

Self-check: 1. blk range check 2. bitmap

ino

Blk 11

Blk 12

Blk ….

Blk 94

self-check Indirect block memory copy

22

Example: Cross-check of File Size Inode

Last pointer offset: 13

Double Indirect

Last pointer offset: 2

Indirect Block

Last pointer offset: 12

Last pointer: 36

Last pointer: 157

Last pointer : 950

LBA: 36

LBA: 157

1. Partially rebuild the tree structure

2. Calculate the file size using offset

23

Fast Recovery with Bitmap Snapshot

Costly double scan of inode and indirect blocks

• Detect multiple-claimed blocks in 1st scan

• Detect their owners during 2nd scan

ffsck: 1 full scan + 1 partial rescan

• ffsck builds a list of bitmap snapshots to limit the

rescan’s scope

24

Fast Recovery with Bitmap Snapshot

Create snapshot for each group of inodes

0

0

0

0

0

0

0

1 2 3 4 5 6 7

1

1

0

1

0

0

0

1

1

1

1

1

1

0

1

1

1

1

1

1

1

snapshot1 snapshot2 snapshot3

Only need to rescan

the group of inodes for snapshot1

25

Summary

Decouple allocation • Improve metadata density

Disk-order scan

• Most efficient scan approach

Self-check & cross-check

• Avoid memory saturation

Fast recovery with bitmap snapshot

• One full scan + partial rescan

26

Outline fsck analysis • Checking approach

• Performance

• FS tradeoffs

ffsck & rext3 • Goals

• Novel features

Evaluation

27

Checking Performance Comparison

Question1:

Will ffsck scale well?

Question2:

Can ffsck perform consistently as the FS ages?

28

Checking Time Comparison

1666

2554

3398

4176

462 464 468 471

0

1000

2000

3000

4000

5000

150GB 300GB 450GB 600GB

e2fsck ffsck

Time (seconds)

FS size

29

750GB disk, 1GB memory, e2fsprog 1.41.12, Linux 2.6.28

Initialize disk image by creating directories with small files (4KB- 2MB)

Increase FS size by creating new files and appending data to files(4KB-1MB)

Ffsck checking time is determined

when the file system is created

Checking Speed Comparison on Aging FS Image

MB/s

Operations/Group

30

Aging FS image by performing file

creations, appends, truncations and deletions

(750GB partition, roughly 95% utilization)

6.23 5.87 5.43 5.29 5.16

61.89 62.31 61.17 60.94 60.91

102

0

20

40

60

80

100

120

0 250 500 750 1000

e2fsck on ext3 ffsck on rext3 optimal disk bandwidth

Much faster and robust

File System Comparison

Question1:

Can rext3 compete with ext3 in sequential reads?

Question2:

What is its impact on sequential writes?

Question3 and 4:

What about random reads and macro-benchmark?

31

Sequential Read

0.64 6.98

51.9

100

121 121

0.64 6.21

51.1

99.3

119 121

0

20

40

60

80

100

120

140

10KB 100KB 1MB 10MB 100MB 1GB

ext3 rext3

File size

Throughput (MB/sec)

32

The disk track buffer allows rext3 to match ext3’s

performance, except for small reads

8.4% penalty

Sequential Write

4.62

23.4

59.1 70.5

81 87.6

4.58

24.2

59.7 69.4

88.5

104

0

20

40

60

80

100

120

12KB 100KB 1MB 10MB 100MB 1GB

ext3 rext3

Throughput (MB/S)

File Size

33

Indirect region aids ext3’s ordered journaling mechanism

9.3% + 19% +

Random Read

27% - 43% improvement

Indirect region benefits from disk buffer

0.329 0.352 0.398

0.47 0.48 0.505

0

0.1

0.2

0.3

0.4

0.5

0.6

128 256 512

ext3 rext3 Throughput (MB/S)

Read number

34

Randomly read 4KB blocks from a 2GB file

Postmark

76 165

338

719

76 160

341

710

0

500

1000

1000 2000 4000 8000

ext3 rext3

Filebench

Time (Seconds)

Transaction number

2.5

3.43

0.57

2.4

3.23

0.57

0

2

4

File Server Web Server Varmail

ext3 rext3

35 Competitive performance

Summary

Make fast repair a primary concern of FS design

FS provides direct support for the fast checker

Benefits:

• 10 times checking speed

• Big improvement for large writes and random reads

• Small penalty for small reads

36

Conclusion

• How to protect against corruptions is well-known

• FS repairing is important but receives little attention

• Build the checker as an integral component rather than

a peripheral addition

• ffsck is not a universal solution, other FSes may

require other methods

37

Thanks!

Questions?

38

Wisconsin Institute on Software-defined Datacenters in Madison http://wisdom.cs.wisc.edu/

top related