journaled file system (jfs) for linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · journaled...

36
Journaled File System (JFS) for Linux LinuxWorld Conference, New York 1/23/2003 Steve Best Steve Best [email protected] [email protected] Linux Technology Center - JFS for Linux Linux Technology Center - JFS for Linux http://oss.software.ibm.com/jfs http://oss.software.ibm.com/jfs IBM Austin IBM Austin

Upload: buinhan

Post on 23-May-2018

233 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Journaled File System (JFS) for LinuxLinuxWorld Conference, New York

1/23/2003

Steve BestSteve [email protected]@us.ibm.com

Linux Technology Center - JFS for LinuxLinux Technology Center - JFS for Linuxhttp://oss.software.ibm.com/jfshttp://oss.software.ibm.com/jfs

IBM AustinIBM Austin

Page 2: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Overview of Talk

Features of JFSWhy log/journalPerformance

JFS projectGPL LicensedSource of the portGoal to run on all architectures

(x86, PowerPC 32 & 64, S/390, ARM)Goal to get into kernel.org source 2.4.x & 2.5.xNew features being added

Other Journaling File SystemsExt3, ReiserFS, XFS

Page 3: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Virtual and Filesystem

Syscall

Application

LibC

Virtual File System (VFS)

NFSext2 JFS proc SMB

Blockdev Kernel Network

Page 4: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Journal File Systems

Ext3Compatible with Ext2Both meta-data & user data journalingBlock type journaling

ReiserFS New file layoutBalanced treesBlock type journaling

XFSPorted from IRIXTransaction type journaling

Page 5: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

JFS Team members

IBM:Barry Arndt ([email protected])Steve Best ([email protected]) Dave Kleikamp ([email protected])

Community:Christoph Hellwig ([email protected])....others

Page 6: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

The problem is that FS must update multiple structures during logical operation.

Using logical write file operation exampleit takes multiple media I/Os to accomplishif the crash happens between these I/Os the FS isn't in consistent state

Non-journaled FS have to examine all of the file system's meta-data using fsckJournaled file systems uses atomic transactions to keep track of meta-data changes.

replay log by applying log records for appropriate transactions

Why journal?

Page 7: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

JFS Port

Proven Journaling FS technology (10+ years in AIX)New "ground-up" scalable design started in 1995

Design goals: Performance, Robustness, SMPTeam members from original JFS

Designed/Developed this File System

JFS for Linux OS2 parent source baseOS/2 compatible option

Where has the source base shipped?OS/2 Warp Server for e-business 4/99OS/2 Warp Client (fixpack 10/00)AIX 5L called JFS2 4/01

Page 8: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Scalable 64-bit file system:File size max 4 PB w/ 4k block sizeMax aggregate 4 PB w/4k block size

Note: above values are limited by Linux I/O structures not being 64-bit in size.

2.4 LimitsSigned 32 bit 2^31 limit 1 TB max.2 TB limit is the max.

2.5 Limits16 TB limit caused by page cache as of 2.5.58

JFS Features

Page 9: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Journaling of meta-data onlyRestarts after crash immediatelyDesign included journaling from the startExtensive use of B+tree's throughout JFSExtent-based allocationUnicode (UTF16)Built to scale. In memory and on-disk data structures are designed to scale without practical limits.Designed to operate on SMP hardware, with code optimized for at least an 4-way SMP machine

JFS Features

Page 10: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Performance: An extent is a sequence of contiguous aggregate blocks allocated to JFS object.

JFS uses 24-bit value for the length of an extentExtent range in size from 1 to 2(24) -1 blocksMaximum extent is 512 * 2(24)-1 bytes (~8G)Maximum extent is 4k * 2(24)-1 bytes (~64G)

Note: these limits only apply to single extent; in no way limit the overall file size.

Extent-based addressing structuresProduces compact, efficient mapping logical offsets within files to physical addresses on diskB+tree populated with extent descriptors

JFS Features

Page 11: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Performance: B+tree use is extensive throughout JFS

File layout (inode containing the root of a B+tree which describes the extents containing user data)Reading and writing extentsTraversalDirectory entries sorted by nameDirectory Slot free list

JFS Features

Page 12: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Variable block size Block sizes 512*, 1024*, 2048*, 4096

Dynamic disk inode allocationAllocate/free disk inodes as required

Decouples disk inodes from fixed disk locations

Directory organization (methods)1st method stores up to 8 entries directly into directory's inode (used for small directories)2nd method B+tree keyed on name (used for larger directories)

JFS Features

Page 13: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Allocation GroupsPartitions the File System into regionsPrimary purpose of AGs is provide scalability & parallelism within the FS

JFS Features

Page 14: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Support for Sparse and Dense filesSparse files reduce blocks written to diskDense files disk allocation covers the complete file size

Capability to increase the file system size LVM or EVMS and then remount the FS

LVM -> Logical Volume Managerhttp://www.sistina.com/products_lvm_download.htm

EVMS -> Enterprise Volume Management Systemhttp://sourceforge.net/projects/evms/

Support on-line re-sizing (1.0.21)mount -o remount,resize /mount_point

JFS Features

Page 15: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Support for SnapshotUse LVM or EVMS

Setup the volume to use as the snapshotStop the File System operations (VFS operation)Take the snapshotRestart the File System operations (VFS operation)Mount the snapshot volumeCreate your backup using the snapshot volumeRemove the snapshot volume

JFS Features

Page 16: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Support for Extended Attributes (EA)Arbitrary name/value pairs that are associated with files or directoriesEA can be stored directly in the inode

Support for Access Control Lists (ACLs)Support more fine-grained permissionsStore ACLs as Extended Attributes

Extended Attributes and ACLshttp://acl.bestbits.at/

JFS Features

Page 17: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Journaling Basics

Reserve log spaceReserve log spaceReserve log spaceReserve log spaceAllocate transaction block, lock modify metadataAllocate transaction block, lock modify metadataAllocate transaction block, lock modify metadataAllocate transaction block, lock modify metadata

Metadata Buffers

On Disk LogStart End

Page 18: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Transaction CommitTransaction CommitTransaction CommitTransaction CommitCopy modified metadata into in memory log buffersCopy modified metadata into in memory log buffersCopy modified metadata into in memory log buffersCopy modified metadata into in memory log buffersPin buffers in memory and unlockPin buffers in memory and unlockPin buffers in memory and unlockPin buffers in memory and unlockTransaction is completeTransaction is completeTransaction is completeTransaction is complete

Journaling Basics

Metadata Buffers

On Disk LogStart End

In mem log buffers

Page 19: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Journaling Basics

Write in memory log out to log deviceTriggered by:

log buffer fullsynchronous transaction (O_SYNC write)sync activity

Metadata Buffers

On Disk LogStart End

In memory log buffers

Page 20: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Dirty metadata

disk space

Write metadata out to the diskTriggered by:

Flush activity Memory pressure log space pressure

Journaling Basics

Page 21: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Metadata write completes Removes metadata locks

Journaling Basics

Dirty metadata

disk space

Page 22: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Only meta-data changes:File creation (create)Linking (link)Making directory (mkdir)Making node (mknod)Removing file (unlink)Symbolic link (symlink)Set EA Truncate regular file Growing a file

What operations are logged

Page 23: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Brief explanation of the create transaction flow: tid = txBegin(dip->i_sb, 0);

tblk = tid_to_tblock(tid);tblk->xflag |= COMMIT_CREATE;tblk->ip = ip;iplist[0] = dip;iplist[1] = ip;

/* work is done to create file */

rc = txCommit(tid, 2, &iplist[0], 0);txEnd(tid);

Logging create example

Page 24: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Layout of Log

Circular link list of transaction "block"in memorywritten to disk

location of log is found by superblockLog file

create by mkfs.jfs (internal or external)Internal log sizedefault 0.4% of the aggregate sizemaximum size 32M (internal log)15G -> defaults 8192 aggregate blocksExternal log sizemaximum size 128M

Page 25: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Where is JFS today?Announced & Shipped 2/2/2000 at LinuxWorld NYC

What has been completed64 code drops so farJFS patch files to support multi-levels of the kernel

(2.4.3-2.4.x) kernel patch & utility patch fileCompletely independent of any kernel changes (easy

integration path) Release 1.0.0 (production) 6/2001Accepted by Alan Cox 2.4.18pre9-ac4 (2/14/02)Accepted by Linus for 2.5.6-pre2 (2/28/02)Accepted by Marcelo Tosatti 2-4.20-pre4(8/20/02)Release 1.1.1 12/17/2002

Page 26: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

JFS for Linux

Utility area:

jfs_mkfs -> Formatjfs_fsck -> Check and repair file system

- Replays the logjfs_defrag * -> Defragmentation of file system jfs_tune -> Configuration of the FSjfs_debugfs -> Peek and change JFS on-disk structuresjfs_logdump -> Service-only dumps contents of log filejfs_fscklog -> Service-only extract/display log from fsck

Page 27: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Distros

Distributions shipping JFSTurbolinux 7.0 Workstation (8/01) was 1stMandrake Linux 8.1, 8.2, 9.0SuSE Linux 7.3 , 8.0, 8.1, SLES 8.0Red Hat 7.3, 8.0Slackware 8.1United Linux 1.0others......

Page 28: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

JFS WIP

Near term:Performance improvements in FSAdding support for external log to be shared

by more than one FSAdding defragmentation of FSMount option for backup programs to restore

without journaling

Longer term:QuotaData Management API (DMAPI)

Page 29: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Performance improvements

Summary DataThreads JFS EXT3 JFS+Patch

1 14.04 12.24 14.212 0.57 13.29 14.814 1.25 14.32 15.446 1.36 13.37 15.66

Note: Data is throughput in MB/sec.10 % improvement over Ext3

tiobench showed sequential write problem

Page 30: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Performance improvements

1 2 4 6Number of threads

0

5

10

15

20

Thro

ughp

ut in

MB/

sec

JFS EXT3 JFS+dbAllocate patch

Effect of dbAllocate Patch on JFS Performancetiobench - Sequential Write

4-way 500 MHz, SCSI, Kernel 2.4.20-pre6

problem was solved:keeping current allocation group for current open filefixed 1.0.23 release

Page 31: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Journaling File Systems

ReiserFS 2.4.1

Ext3 2.4.15

JFS 2.5.6; 2.4.20

XFS 2.5.36; external patch for 2.4.x

www.kernel.org source tree

Page 32: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

File System & File Sizes

ReiserFS Ext3 XFS JFS

Max. files 4G 4G 4G 4GSubdirs/dir 65K 32K 4G 65KMax. filesize

16TB* 2TB 16TB* 16TB*

Max. FS size

16TB* 16TB 16TB* 16TB*

Notes:Block device limit in 2.4 was 2TBBlock device limit in 2.5 has been raised* Issue as of 2.5.48 is page cache has limit 16TB

Filesystems limits on 32-bit architectures

Page 33: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Journaling File Systems

Ext3 patcheson sourceforge as the ext3 module in the "gkernel" project

http://www.zipworld.com.au/~akpm/linux/ext3/

ReiserFS web pagehttp://www.namesys.com

XFS web pagehttp://oss.sgi.com/projects/xfs/

JFS web pagehttp://oss.software.ibm.com/jfs

Page 34: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Journaling File Systems Articles

"Journaled Filesystem" by Steve Best, David Gordon, and Ibrahim Haddad, Linux Journal January 2003

"Journaling File System" by Steve Best, Linux Magazine 10/2002http://www.linux-mag.com/2002-10/jfs_01.html

"Journaling Filesystems" by Moshe Bar, Linux Magazine 8/2000http://www.linux-mag.com/2000-08/journaling_01.html

"Journal File Systems" by Juan I. Santos Florido, Linux Gazette 7/2000http://www.linuxgazette.com/issue55/florido.html

"Journaling File Systems For Linux" by Moshe Bar, BYTE.com 5/2000http://www.byte.com/documents/s=365/byt20000524s0001/

Page 35: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

JFS Project urls

JFS Web pagehttp://oss.software.ibm.com/jfs

JFS Overview white paperhttp://www-106.ibm.com/developerworks/library/l-jfs.html

JFS Layout white paperhttp://www-106.ibm.com/developerworks/library/l-jfslayout/

JFS Log white paperhttp://www.usenix.org/publications/library/proceedings/als2000/best.html

JFS Mailing listhttp://oss.software.ibm.com/pipermail/jfs-discussion/

Page 36: Journaled File System (JFS) for Linuxjfs.sourceforge.net/project/pub/jfslwe0103.pdf · Journaled File System (JFS) for Linux LinuxWorld Conference, New York ... AIX 5L called JFS2

Questions..........