file systems: why, how and where

Post on 22-Jan-2018

429 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

File Systems: Why, How and Where

Philip DerbekoenSilo2017

The Tragedy of

FileSystems

The Tragedy of FileSystems• Scale and scalability•Reliability •Recovery•Complexity

The Tragedy of FileSystems

The Tragedy of FileSystems• Scale and scalability•Reliability •Recovery•Complexity • Flexibility for developers

Challenges

1. Metadata performance2. Reliability and Recovery 3. Small files performance4. Large files performance 5. Storage Management

Components

•Block allocation•Directory Management• File and Directory operations• Inode handling•Transactions and journaling• Superblock handling• FS tree•Other

Beginning – Sequential

Sequential File System

Header:name,length

Footer:NameCRC

DATA

UNTIL …

Disk – The king of storage

Disk - Anatomy

You are here

Simplest possible FS – FAT

ext2

How it is done

file

file

file

dentry

dentry

inode

inode

File attributes

Direct blocks

Direct blocks

Indirect blocks

Double Indirect blocks

Triple Indirect blocks

DirectDirectDirectDirect

DirectDirectDirect

DirectDirectDirectDirect

Ext2 - fsck

•Unclean mount or mount counter•Not everything can be solved•Plan:

• Superblock check• Free Blocks• Inode sanity• Inode links• Duplicates• Bad Blocks • Directory checks

SLOW!!!

Other consistency options – Soft Updates

•Dependency Rules:1. Never point to uninitialized structure2. Never reuse before nullifying the pointers3. Never reset an old pointer before a new one was set

Other consistency

options - COW

ext3

1. Journaling2. Online file system growth3. Directory indexing (not really, as was done for ext2 as well)

Ext3 - journaling

TxB TxEInodeBit

mapWriteBack

Data

Ext3 - journaling

TxB TxEInodeBit

mapOrdered

Data

Ext3 - journaling

TxB TxEInodeBit

map DataFull Journal

Data

Ext3 – Journal final comments

• Journal-assisted recovery: Redo Logging•Commit Batching• Journal Cleaning – mark the last checkpoint in journal superblock•Deletes and reuse

Ext4

•Backward and forward compatible – up to a certain point• Scalability• “Sequentiality” improvements:

• Extent-based allocations• Journal checksum speed up• Delayed allocations

•Transparent Encryption

Performance Optimizations

1. Synchronization of operations (the less is the better)2. Locality of allocations3. I/O Scheduling4. Scalability5. Caching6. Pre-fetching

New Sheriff in town

New Features

• Snapshotting•Versioning•Backups•Deduplication•Data and meta data checksums

BTRFS (“Better FS”)

BTRFS (“Better FS”)

BTRFS (“Better FS”)

Newer is better?

FS Sizes

FS Patches (Linux 2.6 over 5079 patches)

•Maintenance (45%)•Bugs (35%) – constant bug fixing over the life of FS•Performance •Reliability• Features

FAST 2013 – “A Study of Linux File System Evolution”

Bug Consequences

•Corruption•Crash• Failure of operation•Deadlock•Hang•Memory leak•Other

FAST 2013 – “A Study of Linux File System Evolution”

38% of bugs are on failure paths

freq

uenc

y

“Timeline” – facts should not mess a storyBerkley

FFS

ext2

ext3

ReiserFS

ext4

ZFS

BTRFS

FAT

FAT32

NTFS

WinFS (dead)

ReFS

HFS

HFS+

APFS

What was not covered

• Shared, network, distributed and clustered file systems:• WAFL• AFS• GFS and DFS• WebDav

•Volume Management•UnionFS (Knoppix CD+HDD, Docker layers)

The End

Keep in touch: philip@ensilo.com

top related