Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 93e3270c authored by Jose R. Santos's avatar Jose R. Santos Committed by Theodore Ts'o
Browse files

ext4: Documentation updates.



Some of the information in Documentation/filesystems/ext4.txt is out
of date and in need of an update.

Signed-off-by: default avatarJose R. Santos <jrs@us.ibm.com>
Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
parent 5f21b0e6
Loading
Loading
Loading
Loading
+62 −44
Original line number Diff line number Diff line
@@ -13,72 +13,89 @@ Mailing list: linux-ext4@vger.kernel.org
1. Quick usage instructions:
===========================

  - Grab updated e2fsprogs from
    ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim/
    This is a patchset on top of e2fsprogs-1.39, which can be found at
  - Compile and install the latest version of e2fsprogs (as of this
    writing version 1.41) from:

    http://sourceforge.net/project/showfiles.php?group_id=2406
	
	or

    ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/

  - It's still mke2fs -j /dev/hda1
	or grab the latest git repository from:

    git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git

  - Create a new filesystem using the ext4dev filesystem type:

    	# mke2fs -t ext4dev /dev/hda1

  - mount /dev/hda1 /wherever -t ext4dev
    Or configure an existing ext3 filesystem to support extents and set
    the test_fs flag to indicate that it's ok for an in-development
    filesystem to touch this filesystem:

  - To enable extents,
	# tune2fs -O extents -E test_fs /dev/hda1

	mount /dev/hda1 /wherever -t ext4dev -o extents
    If the filesystem was created with 128 byte inodes, it can be
    converted to use 256 byte for greater efficiency via:

  - The filesystem is compatible with the ext3 driver until you add a file
    which has extents (ie: `mount -o extents', then create a file).
        # tune2fs -I 256 /dev/hda1

    NOTE: The "extents" mount flag is temporary.  It will soon go away and
    extents will be enabled by the "-o extents" flag to mke2fs or tune2fs
    (Note: we currently do not have tools to convert an ext4dev
    filesystem back to ext3; so please do not do try this on production
    filesystems.)

  - Mounting:

	# mount -t ext4dev /dev/hda1 /wherever

  - When comparing performance with other filesystems, remember that
    ext3/4 by default offers higher data integrity guarantees than most.  So
    when comparing with a metadata-only journalling filesystem, use `mount -o
    data=writeback'.  And you might as well use `mount -o nobh' too along
    with it.  Making the journal larger than the mke2fs default often helps
    performance with metadata-intensive workloads.
    ext3/4 by default offers higher data integrity guarantees than most.
    So when comparing with a metadata-only journalling filesystem, such
    as ext3, use `mount -o data=writeback'.  And you might as well use
    `mount -o nobh' too along with it.  Making the journal larger than
    the mke2fs default often helps performance with metadata-intensive
    workloads.

2. Features
===========

2.1 Currently available

* ability to use filesystems > 16TB
* ability to use filesystems > 16TB (e2fsprogs support not available yet)
* extent format reduces metadata overhead (RAM, IO for access, transactions)
* extent format more robust in face of on-disk corruption due to magics,
* internal redunancy in tree

2.1 Previously available, soon to be enabled by default by "mkefs.ext4":

* dir_index and resize inode will be on by default
* large inodes will be used by default for fast EAs, nsec timestamps, etc
* improved file allocation (multi-block alloc, delayed alloc)
* fix 32000 subdirectory limit
* nsec timestamps for mtime, atime, ctime, create time
* inode version field on disk (NFSv4, Lustre)
* reduced e2fsck time via uninit_bg feature
* journal checksumming for robustness, performance
* persistent file preallocation (e.g for streaming media, databases)
* ability to pack bitmaps and inode tables into larger virtual groups via the
  flex_bg feature
* large file support
* Inode allocation using large virtual block groups via flex_bg

2.2 Candidate features for future inclusion

There are several under discussion, whether they all make it in is
partly a function of how much time everyone has to work on them:
* Online defrag (patches available but not well tested)
* reduced mke2fs time via lazy itable initialization in conjuction with
  the uninit_bg feature (capability to do this is available in e2fsprogs
  but a kernel thread to do lazy zeroing of unused inode table blocks
  after filesystem is first mounted is required for safety)

* improved file allocation (multi-block alloc, delayed alloc; basically done)
* fix 32000 subdirectory limit (patch exists, needs some e2fsck work)
* nsec timestamps for mtime, atime, ctime, create time (patch exists,
  needs some e2fsck work)
* inode version field on disk (NFSv4, Lustre; prototype exists)
* reduced mke2fs/e2fsck time via uninitialized groups (prototype exists)
* journal checksumming for robustness, performance (prototype exists)
* persistent file preallocation (e.g for streaming media, databases)
There are several others under discussion, whether they all make it in is
partly a function of how much time everyone has to work on them. Features like
metadata checksumming have been discussed and planned for a bit but no patches
exist yet so I'm not sure they're in the near-term roadmap.

Features like metadata checksumming have been discussed and planned for
a bit but no patches exist yet so I'm not sure they're in the near-term
roadmap.
The big performance win will come with mballoc, delalloc and flex_bg
grouping of bitmaps and inode tables.  Some test results available here:

The big performance win will come with mballoc and delalloc.  CFS has
been using mballoc for a few years already with Lustre, and IBM + Bull
did a lot of benchmarking on it.  The reason it isn't in the first set of
patches is partly a manageability issue, and partly because it doesn't
directly affect the on-disk format (outside of much better allocation)
so it isn't critical to get into the first round of changes.  I believe
Alex is working on a new set of patches right now.
 - http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
 - http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html

3. Options
==========
@@ -224,7 +241,7 @@ stripe=n Number of filesystem blocks that mballoc will try
			disks *  RAID chunk size in file system blocks.

Data Mode
---------
=========
There are 3 different data modes:

* writeback mode
@@ -256,7 +273,8 @@ kernel source: <file:fs/ext4/>
		<file:fs/jbd2/>

programs:	http://e2fsprogs.sourceforge.net/
		http://ext2resize.sourceforge.net

useful links:	http://fedoraproject.org/wiki/ext3-devel
		http://www.bullopensource.org/ext4/
		http://ext4.wiki.kernel.org/index.php/Main_Page
		http://fedoraproject.org/wiki/Features/Ext4