the btrfs filesystem - oraclemason/presentation/btrfs-eu11.pdf · the btrfs filesystem jointly...

18
The Btrfs Filesystem Chris Mason

Upload: lynhan

Post on 15-Feb-2018

245 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

The BtrfsFilesystem

Chris Mason

Page 2: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

The Btrfs Filesystem

• Jointly developed by a number of companies

Oracle, Redhat, Fujitsu, Intel, SUSE, many others

• All data and metadata is written via copy-on-write

• CRCs maintained for all metadata and data

• Efficient writable snapshots

• Multi-device support

• Online resize and defrag

• Transparent compression

• Efficient storage for small files

• SSD optimizations and trim support

Page 3: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

Btrfs Progress

• Extensive performance and stability fixes

• Significant code cleanups

• Efficient free space caching across reboots

• Delayed metadata insertion and deletion

• Background scrubbing

• New LZO compression mode

• New Snappy compression mode in development

• Batched discard (fitrim ioctl)

• Per-inode flags to control COW, compression

• Automatic file defrag option

• Focus on stability and performance

Page 4: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

Logging Improvements

• Btrfs fsck log was rewriting some items over and over again

• New code from Fujitsu bumps the metadata generationnumbers inside a transaction

• Cuts down log traffic by 75%

• Will go into 3.2 merge window

Page 5: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

Metadata Fragmentation

• Btrfs btree uses key ordering to group related items into thesame metadata block

• COW tends to fragment the btree over time

• Larger blocksizes lower metadata overhead and improveperformance

• Larger blocksizes provide very inexpensive btreedefragmentation

• Ex: Intel 120GB MLC drive:

4KB Random Reads – 78MB/s8KB Random Reads – 137MB/s16KB Random Reads – 186MB/s

• Code queued up for Linux 3.3 allows larger btree blocks

Page 6: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

Scrub

• Btrfs CRCs allow us to verify data stored on disk

• CRC errors can be corrected by reading a good copy of theblock from another drive

• New scrubbing code scans the allocated data and metadatablocks (Arne Jansen)

• Any CRC errors are fixed during the scan if a second copyexists

• Will be extended to track and offline bad devices

• (Scrub Demo)

Page 7: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

Discard/Trim

• Trim and discard notify storage that we’re done with a block

• Btrfs now supports both real-time trim and batched

• Real-time trims blocks as they are freed

• Batched trims all free space via an ioctl

Page 8: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

Drive Swapping

• GSOC project

• Current raid rebuild works via the rebalance code

• Moves all extents into new locations as it rebuilds

• Drive swapping will replace an existing drive in place

• Uses extent-allocation map to limit the number of bytes read

• Can also restripe between different RAID levels

Page 9: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

Efficient Backups

• Advanced btrfs send/receive tool in development (JanSchmidt)

• Transmits in a neutral format so corruptions are not duplicated

Page 10: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

Embedded Systems

• Btrfs is fairly friendly to small machines• Btrfs is not quite as friendly to small disks

But this is getting better

• Btrfs works very well overall on low end flash

Page 11: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

RAID 5/6

• Initial implementation from Intel some time ago

• Merge pending completion of fsck work

• Will also add triple mirroring

• Mixed raid modes for metadata and data are included

Page 12: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

When Bad Things Happen to Good Data

• Beta filesystem recovery tool from Josef Bacik

Risk free – copies data out of the corrupt FS

• tree root history log to recover from many hardware errors

• New fsck releases on the way to repair in place

• git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git recovery-beta

• (demo)

Page 13: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

Billions of Files?

• Dramatic differences in filesystem writeback patterns

• Sequential IO still matters on modern SSDs

• Btrfs COW allows flexible writeback patterns

• Ext4 and XFS tend to get stuck behind their logs

• Btrfs tends to produce more sequential writes and morerandom reads

Page 14: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

File Creation Benchmark Summary

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

File

s/se

c

Btrfs SSDXFS SSDExt4 SSDBtrfsXFSExt4

• Btrfs duplicates metadataby default

2x the writes

• Btrfs stores the file namethree times

• Btrfs and XFS are CPUbound on SSD

Page 15: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

0 45 90 135 180 225 270 315 330

Time (seconds)

0

20

40

60

80

100

120

140

160

MB

/s

File CreationThroughput

BtrfsXFSExt4

Page 16: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

0 45 90 135 180 225 270 315

Time (seconds)

0

1500

3000

4500

6000

7500

9000

10500

12000

IO /

sec

IOPs

BtrfsXFSExt4

Page 17: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

IO Animations

• Ext4 is seeking between a large number of disk areas

• XFS is walking forward through a series of distinct disk areas

• Both XFS and Ext4 show heavy log activity

• Btrfs is doing sequential writes and some random reads

Page 18: The Btrfs Filesystem - Oraclemason/presentation/btrfs-eu11.pdf · The Btrfs Filesystem Jointly developed by a number of companies Oracle, Redhat, Fujitsu, Intel, SUSE, many others

Thank You!

• Chris Mason <[email protected]>

• http://btrfs.wiki.kernel.org