zfs: the last word in file systems - is it ? swaminathan sundararaman sriram subramanian
TRANSCRIPT
ZFS: The last word in File Systems- IS IT ?
Swaminathan SundararamanSriram Subramanian
ZFS: Zettabyte File System
The last word in file systems "We've rethought everything and
rearchitected it," - Jeff Bonwick, Sun distinguished engineer and chief architect
of ZFS. "We've thrown away 20 years of old
technology that was based on assumptions no longer true today."
Our Goal
To uncover interesting policies of ZFS
Focus on How ZFS automatically chooses multiple block
sizes, to match workload
Policy and performance analysis of ZFS during synchronous workloads
OS
Methodology
Semantic Block Analysis [ Prabhakaran et. al. ’05]
File System
Disk
Application
Pseudo Device Driver Block Inference
Workload
Preliminary Results
Naïve block allocation policy Does not work well for random workloads
Dynamic merges small block writes Suffers from Read-Modify-Write for some
workload Poor ZFS Intent Log blocks allocation policy Dynamically changes the block writing
mechanism based on workload (under investigation)
Outline
Infrastructure Block Classification Strategy
Policies Block Allocation Dynamic block resizing ZFS Intent Log (ZIL)
Conclusion
Infrastructure
Pseudo Device Driver Implemented a Block Driver using Layered Device
Interface (LDI) Ioctls to control collection of statistics
Issue: Solaris did not allow us to issue ioctls to pseudo block drivers
Solution: Indirection Wrote a dummy character driver and redirected the
ioctl requests to our block device
Infrastructure (Contd.)
Selective classification
Log files for Offline block analysis
Negligible performance overheads
Asynchronously written to the log file
Block Classification Strategy
Uber blocks 1024 byte blocks Identified by its Magic Flag
Data blocks Identified by a special pattern
Pattern repeated after ever 512 byte offset
Individual data blocks identified by seq. increasing numbers
Block Classification Strategy
ZIL blocks
Identified by its Magic Flag
Meta-data blocks
Rest of the blocks
Sequential Write of 1GB file
Block size: 4K ZFS Caches
small block writes
Large sequential 128k block writes
0
16
32
48
64
80
96
112
128
144
Blo
ck S
ize
in K
B
Block Sizes
Random writes inside 4GB file
0
16
32
48
64
80
96
112
128
144
0 1 2 3 4 5 6 7 8 9 10 11
Blo
ck
Siz
e in
KB
Block Sizes
Large 128k block write for every small 4k write
Block size: 4K
Random Writes of 4K blocks
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9
0
10
20
30
40
50
60
70
80
90
Expected ZFS Offset
OffsetBlock Size
36 40
36 40
20 40
84 88
0 88
20 88
52 88
16 88
4 88
Random Writes of 512bytes
0
16
32
48
64
80
96
112
128
144
160
0 1 2 3 4 5 6 7 8
Blo
ck S
ize
in K
B
0
16
32
48
64
80
96
112
128
144
160
Off
set
in K
B
Offset Block size
0 0.5
16 16.5
64 64.5
32 32.5
150 128
128 128
127 127.5
Inference
Block Allocation
Purely based on file offsets
Block size is set to128K for offsets >= 128k
Block size is a multiple of 512 bytes for offsets < 128k
NOT based on dynamic workload characteristics
Small Sequential Writes of 4K
0
16
32
48
64
80
96
112
128
144
0 16 32 48 64 80 96 112 128 144 160 176 192 208 224
File Size in KB
Blo
ck S
ize
in K
B
ZFS Ideal
Write 4K blocks
Sleep 10 sec
Write Next block
Small Seq. Writes of 32KBytes
0
16
32
48
64
80
96
112
128
144
0 32 64 96 128 160 192 224 256 288 320
File Size in KB
Blo
ck
Siz
e in
KB
ZFS Ideal
Unmount after every write
0
20
40
60
80
100
120
140
0 20 40 60 80 100 120 140 160 180
Blo
ck S
ize
in K
B
Append Data Read from Disk
Dynamic Resizing of Blocks
Until file sizes < 128k
Appending data to small files in inefficient
If data is not in memory
Small append converted to Read-Modify-Write
COW in ZFS
Copy-on-write design makes most disk writes sequential
Multiple block sizes, automatically chosen to match workload
ZIL Block Chaining
ZIL Block Allocation
0
5
10
15
20
25
30
35
40
1 2 3 4 5 6 7 8 9 10 11
Bloc
k Si
ze in
KB
1024
3072
16K
32K
64K
ZIL Block Allocation 33K
0
5
10
15
20
25
30
35
40
1 2 3 4 5 6 7 8 9 10 11 12
Blo
ck S
ize
in K
B
33K
Conclusions
Block Allocation Purely based on file offsets NOT based on dynamic workload characteristics
Dynamic Resizing of Blocks Until file sizes < 128k Appending data to small files in inefficient
ZFS Intent Log Internal fragmentation
Bad blocks allocation policy Block chaining Mechanism
Conclusion
ZFS: The last Word in file systems Might be the latest word definitely not the last word !
Questions ?