why we need ext4

Download why we need ext4

If you can't read please download the document

Upload: haorobin-dong

Post on 16-Apr-2017

7.634 views

Category:

Technology


1 download

TRANSCRIPT

Why we need ext4?

Robin Dong

ext2 global layout

Image from: http://learn.akae.cn/media/ch29s02.html

ext2 global layout

Super Block (1 block)GDT (multi blocks)Block Bitmap (1 block)Inode Bitmap (1 block)Inode table (multi blocks)

Super-block and GDT are vital, therefore other groups will store their copies.If mkfs with sparse_super(default) not all groups have the copy of super block and GDT, only Group 0,1,3,5,7,32,52,72,33,53,73....have it.

ext2 global layout

There is a structure called Reserved GDT which is putted after GDT and before Block-bitmap, it is also a large file.It is used for resize feature which could expand the size of whole filesystem.

ext2 file layout

Image from: http://e2fsprogs.sourceforge.net/ext2intro.html

The ext2 directory layout is just like regular file, but the content of its data block is stored bystruct ext2_dir_entry

ext2 directory layout

Image from: http://www.pluto.it/files/journal/pj9811/e2fs.html

The length of ext2_dir_entry is obviously different, so when users try to find a file in directory, ext2 have to check filename one by one. (It can't use some algorithm like binary-search) If there is a large number of files in a directory, searching operation will be inefficent.

ext2 directory layout

ext2 directory remove

Image from: http://blog.csdn.net/anghlq/archive/2011/05/17/6427052.aspx

ext2 directory pack

e2fsck -D

Optimize directories in filesystem. This option causes e2fsck to try to optimize all directories, either by reindexing them if the filesystem supports directory indexing, or by sorting and compressing directories for smaller directories, or for filesystems using traditional linear directories.

Regular Symlink: link path is stored in data block

Fast Symlink: link path is stored in inode (if link path is smaller than 56 bytes)

ext2 symlink

ext2 symlink

Image from: http://www.pluto.it/files/journal/pj9811/e2fs.html

ext2 hard link

ext2 xattr

ext2 xattr

* +--------------------+

* | header |

* | entry 1 | |

* | entry 2 | | growing downwards

* | entry 3 | v

* | four null bytes |

* | . . . |

* | value 1 | ^

* | value 3 | | growing upwards

* | value 2 | |

* +--------------------+

ext2: badblock

e2fsck use program badblocks to detect bad blocks and mark these blocks as used in block bitmap.

If meta-data is in bad blocks,e2fsck will try to allocate new block for it.

enhane of ext3

Journalext3 could be looked like an ext2 filesystem with a journal file

dir_indexmore efficent directory-searching

ext3: journal

ext2 filesystem may corrupt after reboot from exception like power reset directly.

Journal will ensure filesystem consistent or recovery filesystem on system boot.

Journal modeWriteback

Ordered

Journal

ext3: dir_index

Compute hash value of ext3_dir_entry

Find dx_entry against hash value in root block by binary-search

Find ext3_dir_entry in leaf block one by one

ext3: dir_index

Advantage: dir_index could have no more than two level indexs , therefore finding a file in directory needs to read 3 blocks at most.Imaging an ext3 filesystem with 4K block size, a directory could contain about 5 million files (file name is 100 bytes)

Disadvantage: when add files to a directory, the b-tree will split, but after deleting files, the b-tree will not merge.A directory with a few files will occupy many blocks.

ext3 xattr

Put xattr into inode.

Less IO

mkfs.ext3 -I 256 /dev/sda

limits of ext2/ext3

Block SizeMax file sizeMax filesystem size

1KB16GB2TB

2KB256GB8TB

4KB2TB16TB

8KB (ppc arch)2TB32TB

Read data from the indirect block of a file will make extra IO

ext4

ext4 inherits all the features of ext2/ext3

Larger filesystemMax file size: 16TBMax filesystem size: 1EB(1048576TB)

ext4: meta_bg

Image from: http://www.ibm.com/developerworks/cn/linux/l-cn-filesrc5/

ext4: meta_bg

Group Descriptor size is 64 bytes

Imaging an ext4 filesystem with block_size = 1K1K/64 = 16 a meta group will contain 16 groups.The meta-GDT(1 block) will be put in Group 0, Group1, Group15Group 16, Group17, Group31Group 32, Group33, Group63...

ext4: flex_bg

ext4: flex_bg

Merge Block-Bitmap/Inode-Bitmap/Inode-table to Group 0

The position of Super-block and GDT follow the rule of sparse

Advantage: save the space of Group 1,Group 2,Group 3 (especially for the extent of ext4)

ext4: uninit_bg

mkfs.ext4 -O uninit_bg

Create a filesystem without initializing all of the block groups. This feature also enables checksums and highest-inode-used statistics in each blockgroup. This feature can speed up filesystem creation time noticeably (if lazy_itable_init is enabled), and can also reduce e2fsck time dramatically.

ext4: uninit_bg

When init block-group?lazy_itable_init run

ext4_new_inode ext4_read_block_bitmap

ext4: extent

Image from: http://www.ibm.com/developerworks/cn/linux/l-cn-filesrc5/

ext4: extent

An ext4_extent could point to 128MB continuious space.

Example: a 300G file in ext3 will occupied 300MB meta-data-blocks, but in ext4 it only occupuied 36KB

ext4: delay allocation

It consists of delaying block allocation until the data is going to be written to the disk

This improves performance and reduces fragmentation by improving block allocation decisions based on the actual file size

Q & AThanks!