local file systems update
DESCRIPTION
The slides gives sn overview on what is currently cooking in local Linux file systems and what has been done in the recent past. With btrfs getting stabilized, xfs gaining more traction, ext4 improvements, new storage capabilities and file system requirements we are in the exciting new era where it might be hard to keep on track with the recent development. This talk should get you a picture on where are we heading to, get you familiar with the new features and capabilities and give you an idea how to use them correctly.TRANSCRIPT
Local file systems update
Red Hat
Lukas Czerner
February 23, 2013
Copyright © 2013 Lukas Czerner, Red Hat.Permission is granted to copy, distribute and/or modify thisdocument under the terms of the GNU Free DocumentationLicense, Version 1.3 or any later version published by the FreeSoftware Foundation; with no Invariant Sections, no Front-CoverTexts, and no Back-Cover Texts. A copy of the license is includedin the COPYING file.
Agenda
1 Linux file systems overview
2 Challenges we’re facing today
3 Xfs
4 Ext4
5 Btrfs
6 Questions ?
Part I
Linux kernel file systems overview
File systems in Linux kernel
Linux kernel has a number of file systemsCluster, network, localSpecial purpose file systemsVirtual file systems
Close interaction with other Linux kernel subsystemsMemory ManagementBlock layerVFS - virtual file system switch
Optional stackable device driversdevice mappermdraid
Applications (Processes)
direct I/O(O_DIRECT)
VFS
PageCache
nvme
hooked in Device Drivers(hook in similar likestacked devices like
mdraid/device mapper do)
iomemory-vslwith module option
LVMBlock I/O Layer
optional stackable devices on topof “normal” block devices – work on bios
mdraid devicemapper
drbd ...
I/O Scheduler
maps bios to requests
deadlinecfq noop
request-baseddevice mapper targets
dm-multipath
SCSI mid layer virtio_blk iomemory-vsl
Physical devices
HDD SSD DVDdrive
MicronPCIe Card
Fusion-ioPCIe Card
LSIRAID
AdaptecRAID
QlogicHBA
EmulexHBA
...
anonymous pages(malloc)
read
(2)
write
(2)
open
(2)
stat(2
)
chm
od(2
)
...
BIOs (Block I/O)
sysfs(transport attributes)
/dev/vd*
SCSI upper layer
/dev/sda .../dev/sdb/dev/fio*
SCSI low layermegaraid sas aacraid qla2xxx ...libata
ahci ata_piix ...
lpfc
Transport Classesscsi_transport_fc
scsi_transport_sas
scsi_transport_...
lvm
/dev/fio*
/dev/nvme#n#
mtip32xx/dev/rssd*
The Linux I/O Stack Diagram (version 0.1, 2012-03-06)http://www.thomas-krenn.com/en/oss/linux-io-stack-diagram.htmlCreated by Werner Fischer and Georg SchönbergerLicense: CC-BY-SA 3.0, see http://creativecommons.org/licenses/by-sa/3.0/
block based FSext2 ext3
btrfs
ext4
xfs ifs
iso9660 ...
NFS coda
Network FS
gfs ocfs
smbfs ...
pseudo FS specialpurpose FSproc sysfs
futexfs
usbfs ...
tmpfs ramfs
devtmpfspipefs
network
nvmedevice
The Linux I/O Stack Diagramversion 0.1, 2012-03-06
outlines the Linux I/O stack as of Kernel version 3.3
Most active local file systems
File system Commits Developers Active developersExt4 648 112 13
Ext3 105 43 2
Xfs 650 61 8
Btrfs 1302 114 21
Number of lines of code
10000
20000
30000
40000
50000
60000
70000
80000
01/01/05
01/01/06
01/01/07
01/01/08
01/01/09
01/01/10
01/01/11
01/01/12
01/01/13
01/01/14
Line
s of
cod
e
Ext3Btrfs
XfsExt4
Part II
Challenges we’re facing today
Scalability
Common hardware storage capacity increasesYou can buy single 4TB drive for a reasonable priceBigger file system and file size
Common hardware computing power and parallelismincreases
More processes/threads accessing the file systemLocking issues
I/O stack designed for high latency low IOPSProblems solved in networking subsystem
Reliability
Scalability and Reliability are closely coupled problems
Being able to fix your file systemIn reasonable timeWith reasonable memory requirements
Detect errors before your application doesMetadata checksummingMetadata should be self describingOnline file system scrub
New types of storage
Non-volatile memoryWear levelling more-or-less solved in firmwareBlock layer has it’s IOPS limitationsWe can expect bigger erase blocks
Thinly provisioned storageLying to users to get more from expensive storageFilesystems can throw away most of it’s locality optimizationCut down performanceDevice mapper dm-thinp target
Hierarchical storageHide inexpensive slow storage behind expensive fast storagePerformance depends on working set sizeImprove performanceDevice mapper dm-cache target, bcache
Maintainability issues
More file systems with different use casesMultiple set of incompatible user space applicationsDifferent set of features and defaultsEach file system have different management requirements
Requirements from different types of storageSSDThin provisioningBigger sector sizes
Deeper storage technology stackmdraiddevice mappermultipath
Having a centralized management tool is incredibly useful
Having a central source of information is a must
System Storage Manager http://storagemanager.sf.net
Part III
What’s new in xfs
Scalability improvements
Delayed loggingImpressive improvements in metadata modificationperformanceSingle threaded workload still slower then ext4, but not muchWith more threads scales much better than ext4On-disk format change
XFS scales well up to hundreds of terabytesAllocation scalabilityFree space indexing
Locking optimization
Pretty much the best choice for beefy configurations with lotsof storage
Reliability improvements
Metadata checksummingCRC to detect errorsMetadata verification as it is written to or read from diskOn-disk format change
Future workReverse mapping allocation treeOnline transparent error correctionOnline metadata scrub
Part IV
What’s new in ext4
Scalability improvements
Based on very old architectureFree space tracked in bitmaps on diskStatic metadata positionsLimited size of allocation groupsLimited file size limit (16TB)Advantages are resilient on-disk format and backwards andforward compatibility
Some improvements with bigalloc featureGroup number of blocs into clustersCluster is now the smallest allocation unitTrade-off between performance and space utilization efficiency
Extent status tree for tracking delayed extentsNo longer need to scan page cache to find delalloc blocks
Scalability is very much limited by design, on-disk format andbackwards compatibility
Reliability improvements
Better memory utilization of user space toolsNo longer stores whole bitmaps - converted to extentsBiggest advantage for e2fsck
Faster file system creationInode table initialization postponed to kernelHuge time saver when creating bigger file systems
Metadata checksummingCRC to detect errorsNot enabled by default
Part V
What’s new in btrfs
Getting stabilized
Performance improvements is not where the focus isright now
Design specific performance problemsOptimization needed in future
Still under heavy development
Not all features are yet ready or even implemented
File system stabilization takes a long time
Reliability in btrfs
Userspace tools not in very good shape
Fsck utility still not fully finished
Neither kernel nor userspace handles errors gracefully
Very good design to build on
Metadata and data checksummingBack referenceOnline filesystem scrub
Resources
Linux Weekly News http://lwn.net
Kernel mailing lists http://vger.kernel.org
linux-fsdevellinux-ext4linux-btrfslinux-xfs
Linux Kernel code http://kernel.org
Linux IO stack diagramhttp://www.thomas-krenn.com/en/oss/linuxiostack-diagram.html
The end.Thanks for listening.