Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 (10f3e23f) · Commits · e / devices / android_kernel_teracube_mt6765

Documentation/conf.py

+1 −1

Original line number	Diff line number	Diff line
		@@ -34,7 +34,7 @@ needs_sphinx = '1.3'
		# Add any Sphinx extension module names here, as strings. They can be
		# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
		# ones.
		extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure']
		extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig']

		# The name of the math extension changed on Sphinx 1.4
		if major == 1 and minor > 3:

Documentation/filesystems/ext4.txt→Documentation/filesystems/ext4/ext4.rst

+64 −78

Original line number	Diff line number	Diff line
		.. SPDX-License-Identifier: GPL-2.0

		Ext4 Filesystem
		===============
		========================
		General Information
		========================

		Ext4 is an advanced level of the ext3 filesystem which incorporates
		scalability and reliability enhancements for supporting large filesystems
		@@ -11,31 +13,24 @@ Mailing list: linux-ext4@vger.kernel.org
		Web site: http://ext4.wiki.kernel.org


		1. Quick usage instructions:
		===========================
		Quick usage instructions
		========================

		Note: More extensive information for getting started with ext4 can be
		found at the ext4 wiki site at the URL:
		http://ext4.wiki.kernel.org/index.php/Ext4_Howto

		- Compile and install the latest version of e2fsprogs (as of this
		writing version 1.41.3) from:
		- The latest version of e2fsprogs can be found at:

		http://sourceforge.net/project/showfiles.php?group_id=2406
		https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/

		or

		https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
		http://sourceforge.net/project/showfiles.php?group_id=2406

		or grab the latest git repository from:

		git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git

		- Note that it is highly important to install the mke2fs.conf file
		that comes with the e2fsprogs 1.41.x sources in /etc/mke2fs.conf. If
		you have edited the /etc/mke2fs.conf file installed on your system,
		you will need to merge your changes with the version from e2fsprogs
		1.41.x.
		https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git

		- Create a new filesystem using the ext4 filesystem type:

		@@ -50,10 +45,6 @@ Note: More extensive information for getting started with ext4 can be

		# tune2fs -I 256 /dev/hda1

		(Note: we currently do not have tools to convert an ext4
		filesystem back to ext3; so please do not do try this on production
		filesystems.)

		- Mounting:

		# mount -t ext4 /dev/hda1 /wherever
		@@ -75,10 +66,11 @@ Note: More extensive information for getting started with ext4 can be
		the filesystem with a large journal can also be helpful for
		metadata-intensive workloads.

		2. Features
		===========
		Features
		========

		2.1 Currently available
		Currently Available
		-------------------

		* ability to use filesystems > 16TB (e2fsprogs support not available yet)
		* extent format reduces metadata overhead (RAM, IO for access, transactions)
		@@ -103,31 +95,15 @@ Note: More extensive information for getting started with ext4 can be
		[1] Filesystems with a block size of 1k may see a limit imposed by the
		directory hash tree having a maximum depth of two.

		2.2 Candidate features for future inclusion

		* online defrag (patches available but not well tested)
		* reduced mke2fs time via lazy itable initialization in conjunction with
		the uninit_bg feature (capability to do this is available in e2fsprogs
		but a kernel thread to do lazy zeroing of unused inode table blocks
		after filesystem is first mounted is required for safety)

		There are several others under discussion, whether they all make it in is
		partly a function of how much time everyone has to work on them. Features like
		metadata checksumming have been discussed and planned for a bit but no patches
		exist yet so I'm not sure they're in the near-term roadmap.

		The big performance win will come with mballoc, delalloc and flex_bg
		grouping of bitmaps and inode tables. Some test results available here:

		- http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-write-2.6.27-rc1.html
		- http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-readwrite-2.6.27-rc1.html

		3. Options
		==========
		Options
		=======

		When mounting an ext4 filesystem, the following option are accepted:
		(*) == default

		======================= =======================================================
		Mount Option Description
		======================= =======================================================
		ro Mount filesystem read only. Note that ext4 will
		replay the journal (and thus write to the
		partition) even when mounted "read only". The
		@@ -387,12 +363,14 @@ i_version Enable 64-bit inode version support. This option is
		dax Use direct access (no page cache). See
		Documentation/filesystems/dax.txt. Note that
		this option is incompatible with data=journal.
		======================= =======================================================

		Data Mode
		=========
		There are 3 different data modes:

		* writeback mode

		In data=writeback mode, ext4 does not journal data at all. This mode provides
		a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
		mode - metadata journaling. A crash+recovery can cause incorrect data to
		@@ -400,20 +378,23 @@ appear in files which were written shortly before the crash. This mode will
		typically provide the best ext4 performance.

		* ordered mode

		In data=ordered mode, ext4 only officially journals metadata, but it logically
		groups metadata information related to data changes with the data blocks into a
		single unit called a transaction. When it's time to write the new metadata
		out to disk, the associated data blocks are written first. In general,
		this mode performs slightly slower than writeback but significantly faster than journal mode.
		groups metadata information related to data changes with the data blocks into
		a single unit called a transaction. When it's time to write the new metadata
		out to disk, the associated data blocks are written first. In general, this
		mode performs slightly slower than writeback but significantly faster than
		journal mode.

		* journal mode

		data=journal mode provides full data and metadata journaling. All new data is
		written to the journal first, and then to its final location.
		In the event of a crash, the journal can be replayed, bringing both data and
		metadata into a consistent state. This mode is the slowest except when data
		needs to be read from and written to disk at the same time where it
		outperforms all others modes. Enabling this mode will disable delayed
		allocation and O_DIRECT support.
		written to the journal first, and then to its final location. In the event of
		a crash, the journal can be replayed, bringing both data and metadata into a
		consistent state. This mode is the slowest except when data needs to be read
		from and written to disk at the same time where it outperforms all others
		modes. Enabling this mode will disable delayed allocation and O_DIRECT
		support.

		/proc entries
		=============
		@@ -425,10 +406,12 @@ Information about mounted ext4 file systems can be found in
		in table below.

		Files in /proc/fs/ext4/<devname>
		..............................................................................

		================ =======
		File Content
		================ =======
		mb_groups details of multiblock allocator buddy cache of free blocks
		..............................................................................
		================ =======

		/sys entries
		============
		@@ -439,11 +422,13 @@ Information about mounted ext4 file systems can be found in
		/sys/fs/ext4/dm-0). The files in each per-device directory are shown
		in table below.

		Files in /sys/fs/ext4/<devname>
		Files in /sys/fs/ext4/<devname>:

		(see also Documentation/ABI/testing/sysfs-fs-ext4)
		..............................................................................
		File Content

		============================= =================================================
		File Content
		============================= =================================================
		delayed_allocation_blocks This file is read-only and shows the number of
		blocks that are dirty in the page cache, but
		which do not have their location in the
		@@ -508,7 +493,7 @@ Files in /sys/fs/ext4/<devname>
		in the file system. If there is not enough space
		for the reserved space when mounting the file
		mount will _not_ fail.
		..............................................................................
		============================= =================================================

		Ioctls
		======
		@@ -518,8 +503,10 @@ through the system call interfaces. The list of all Ext4 specific ioctls are
		shown in the table below.

		Table of Ext4 specific ioctls
		..............................................................................

		============================= =================================================
		Ioctl Description
		============================= =================================================
		EXT4_IOC_GETFLAGS Get additional attributes associated with inode.
		The ioctl argument is an integer bitfield, with
		bit values described in ext4.h. This ioctl is an
		@@ -610,8 +597,7 @@ Table of Ext4 specific ioctls
		normal user by accident.
		The data blocks of the previous boot loader
		will be associated with the given inode.

		..............................................................................
		============================= =================================================

		References
		==========

Documentation/filesystems/ext4/index.rst

0 → 100644

+17 −0

Original line number	Diff line number	Diff line
		.. SPDX-License-Identifier: GPL-2.0

		===============
		ext4 Filesystem
		===============

		General usage and on-disk artifacts writen by ext4. More documentation may
		be ported from the wiki as time permits. This should be considered the
		canonical source of information as the details here have been reviewed by
		the ext4 community.

		.. toctree::
		:maxdepth: 5
		:numbered:

		ext4
		ondisk/index

Documentation/filesystems/ext4/ondisk/about.rst

0 → 100644

+44 −0

Original line number	Diff line number	Diff line
		.. SPDX-License-Identifier: GPL-2.0

		About this Book
		===============

		This document attempts to describe the on-disk format for ext4
		filesystems. The same general ideas should apply to ext2/3 filesystems
		as well, though they do not support all the features that ext4 supports,
		and the fields will be shorter.

		NOTE: This is a work in progress, based on notes that the author
		(djwong) made while picking apart a filesystem by hand. The data
		structure definitions should be current as of Linux 4.18 and
		e2fsprogs-1.44. All comments and corrections are welcome, since there is
		undoubtedly plenty of lore that might not be reflected in freshly
		created demonstration filesystems.

		License
		-------
		This book is licensed under the terms of the GNU Public License, v2.

		Terminology
		-----------

		ext4 divides a storage device into an array of logical blocks both to
		reduce bookkeeping overhead and to increase throughput by forcing larger
		transfer sizes. Generally, the block size will be 4KiB (the same size as
		pages on x86 and the block layer's default block size), though the
		actual size is calculated as 2 ^ (10 + ``sb.s_log_block_size``) bytes.
		Throughout this document, disk locations are given in terms of these
		logical blocks, not raw LBAs, and not 1024-byte blocks. For the sake of
		convenience, the logical block size will be referred to as
		``$block_size`` throughout the rest of the document.

		When referenced in ``preformatted text`` blocks, ``sb`` refers to fields
		in the super block, and ``inode`` refers to fields in an inode table
		entry.

		Other References
		----------------

		Also see http://www.nongnu.org/ext2-doc/ for quite a collection of
		information about ext2/3. Here's another old reference:
		http://wiki.osdev.org/Ext2

Documentation/filesystems/ext4/ondisk/allocators.rst

0 → 100644

+56 −0

Original line number	Diff line number	Diff line
		.. SPDX-License-Identifier: GPL-2.0

		Block and Inode Allocation Policy
		---------------------------------

		ext4 recognizes (better than ext3, anyway) that data locality is
		generally a desirably quality of a filesystem. On a spinning disk,
		keeping related blocks near each other reduces the amount of movement
		that the head actuator and disk must perform to access a data block,
		thus speeding up disk IO. On an SSD there of course are no moving parts,
		but locality can increase the size of each transfer request while
		reducing the total number of requests. This locality may also have the
		effect of concentrating writes on a single erase block, which can speed
		up file rewrites significantly. Therefore, it is useful to reduce
		fragmentation whenever possible.

		The first tool that ext4 uses to combat fragmentation is the multi-block
		allocator. When a file is first created, the block allocator
		speculatively allocates 8KiB of disk space to the file on the assumption
		that the space will get written soon. When the file is closed, the
		unused speculative allocations are of course freed, but if the
		speculation is correct (typically the case for full writes of small
		files) then the file data gets written out in a single multi-block
		extent. A second related trick that ext4 uses is delayed allocation.
		Under this scheme, when a file needs more blocks to absorb file writes,
		the filesystem defers deciding the exact placement on the disk until all
		the dirty buffers are being written out to disk. By not committing to a
		particular placement until it's absolutely necessary (the commit timeout
		is hit, or sync() is called, or the kernel runs out of memory), the hope
		is that the filesystem can make better location decisions.

		The third trick that ext4 (and ext3) uses is that it tries to keep a
		file's data blocks in the same block group as its inode. This cuts down
		on the seek penalty when the filesystem first has to read a file's inode
		to learn where the file's data blocks live and then seek over to the
		file's data blocks to begin I/O operations.

		The fourth trick is that all the inodes in a directory are placed in the
		same block group as the directory, when feasible. The working assumption
		here is that all the files in a directory might be related, therefore it
		is useful to try to keep them all together.

		The fifth trick is that the disk volume is cut up into 128MB block
		groups; these mini-containers are used as outlined above to try to
		maintain data locality. However, there is a deliberate quirk -- when a
		directory is created in the root directory, the inode allocator scans
		the block groups and puts that directory into the least heavily loaded
		block group that it can find. This encourages directories to spread out
		over a disk; as the top-level directory/file blobs fill up one block
		group, the allocators simply move on to the next block group. Allegedly
		this scheme evens out the loading on the block groups, though the author
		suspects that the directories which are so unlucky as to land towards
		the end of a spinning drive get a raw deal performance-wise.

		Of course if all of these mechanisms fail, one can always use e4defrag
		to defragment files.