Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 (fd048088) · Commits · e / devices / android_kernel_fairphone_FP5

Documentation/filesystems/ext4.txt

+10 −4

Original line number	Diff line number	Diff line
		@@ -32,9 +32,9 @@ Mailing list: linux-ext4@vger.kernel.org
		you will need to merge your changes with the version from e2fsprogs
		1.41.x.

		- Create a new filesystem using the ext4dev filesystem type:
		- Create a new filesystem using the ext4 filesystem type:

		# mke2fs -t ext4dev /dev/hda1
		# mke2fs -t ext4 /dev/hda1

		Or configure an existing ext3 filesystem to support extents and set
		the test_fs flag to indicate that it's ok for an in-development
		@@ -47,13 +47,13 @@ Mailing list: linux-ext4@vger.kernel.org

		# tune2fs -I 256 /dev/hda1

		(Note: we currently do not have tools to convert an ext4dev
		(Note: we currently do not have tools to convert an ext4
		filesystem back to ext3; so please do not do try this on production
		filesystems.)

		- Mounting:

		# mount -t ext4dev /dev/hda1 /wherever
		# mount -t ext4 /dev/hda1 /wherever

		- When comparing performance with other filesystems, remember that
		ext3/4 by default offers higher data integrity guarantees than most.
		@@ -177,6 +177,11 @@ barrier=<0\|1(*)> This enables/disables the use of write barriers in
		your disks are battery-backed in one way or another,
		disabling barriers may safely improve performance.

		inode_readahead=n This tuning parameter controls the maximum
		number of inode table blocks that ext4's inode
		table readahead algorithm will pre-read into
		the buffer cache. The default value is 32 blocks.

		orlov (*) This enables the new Orlov block allocator. It is
		enabled by default.

		@@ -252,6 +257,7 @@ stripe=n Number of filesystem blocks that mballoc will try
		delalloc (*) Deferring block allocation until write-out time.
		nodelalloc Disable delayed allocation. Blocks are allocation
		when data is copied from user to page cache.

		Data Mode
		=========
		There are 3 different data modes:

Documentation/filesystems/fiemap.txt

0 → 100644

+228 −0

Original line number	Diff line number	Diff line
		============
		Fiemap Ioctl
		============

		The fiemap ioctl is an efficient method for userspace to get file
		extent mappings. Instead of block-by-block mapping (such as bmap), fiemap
		returns a list of extents.


		Request Basics
		--------------

		A fiemap request is encoded within struct fiemap:

		struct fiemap {
		__u64 fm_start; /* logical offset (inclusive) at
		* which to start mapping (in) */
		__u64 fm_length; /* logical length of mapping which
		* userspace cares about (in) */
		__u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */
		__u32 fm_mapped_extents; /* number of extents that were
		* mapped (out) */
		__u32 fm_extent_count; /* size of fm_extents array (in) */
		__u32 fm_reserved;
		struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */
		};


		fm_start, and fm_length specify the logical range within the file
		which the process would like mappings for. Extents returned mirror
		those on disk - that is, the logical offset of the 1st returned extent
		may start before fm_start, and the range covered by the last returned
		extent may end after fm_length. All offsets and lengths are in bytes.

		Certain flags to modify the way in which mappings are looked up can be
		set in fm_flags. If the kernel doesn't understand some particular
		flags, it will return EBADR and the contents of fm_flags will contain
		the set of flags which caused the error. If the kernel is compatible
		with all flags passed, the contents of fm_flags will be unmodified.
		It is up to userspace to determine whether rejection of a particular
		flag is fatal to it's operation. This scheme is intended to allow the
		fiemap interface to grow in the future but without losing
		compatibility with old software.

		fm_extent_count specifies the number of elements in the fm_extents[] array
		that can be used to return extents. If fm_extent_count is zero, then the
		fm_extents[] array is ignored (no extents will be returned), and the
		fm_mapped_extents count will hold the number of extents needed in
		fm_extents[] to hold the file's current mapping. Note that there is
		nothing to prevent the file from changing between calls to FIEMAP.

		The following flags can be set in fm_flags:

		* FIEMAP_FLAG_SYNC
		If this flag is set, the kernel will sync the file before mapping extents.

		* FIEMAP_FLAG_XATTR
		If this flag is set, the extents returned will describe the inodes
		extended attribute lookup tree, instead of it's data tree.


		Extent Mapping
		--------------

		Extent information is returned within the embedded fm_extents array
		which userspace must allocate along with the fiemap structure. The
		number of elements in the fiemap_extents[] array should be passed via
		fm_extent_count. The number of extents mapped by kernel will be
		returned via fm_mapped_extents. If the number of fiemap_extents
		allocated is less than would be required to map the requested range,
		the maximum number of extents that can be mapped in the fm_extent[]
		array will be returned and fm_mapped_extents will be equal to
		fm_extent_count. In that case, the last extent in the array will not
		complete the requested range and will not have the FIEMAP_EXTENT_LAST
		flag set (see the next section on extent flags).

		Each extent is described by a single fiemap_extent structure as
		returned in fm_extents.

		struct fiemap_extent {
		__u64 fe_logical; /* logical offset in bytes for the start of
		* the extent */
		__u64 fe_physical; /* physical offset in bytes for the start
		* of the extent */
		__u64 fe_length; /* length in bytes for the extent */
		__u64 fe_reserved64[2];
		__u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */
		__u32 fe_reserved[3];
		};

		All offsets and lengths are in bytes and mirror those on disk. It is valid
		for an extents logical offset to start before the request or it's logical
		length to extend past the request. Unless FIEMAP_EXTENT_NOT_ALIGNED is
		returned, fe_logical, fe_physical, and fe_length will be aligned to the
		block size of the file system. With the exception of extents flagged as
		FIEMAP_EXTENT_MERGED, adjacent extents will not be merged.

		The fe_flags field contains flags which describe the extent returned.
		A special flag, FIEMAP_EXTENT_LAST is always set on the last extent in
		the file so that the process making fiemap calls can determine when no
		more extents are available, without having to call the ioctl again.

		Some flags are intentionally vague and will always be set in the
		presence of other more specific flags. This way a program looking for
		a general property does not have to know all existing and future flags
		which imply that property.

		For example, if FIEMAP_EXTENT_DATA_INLINE or FIEMAP_EXTENT_DATA_TAIL
		are set, FIEMAP_EXTENT_NOT_ALIGNED will also be set. A program looking
		for inline or tail-packed data can key on the specific flag. Software
		which simply cares not to try operating on non-aligned extents
		however, can just key on FIEMAP_EXTENT_NOT_ALIGNED, and not have to
		worry about all present and future flags which might imply unaligned
		data. Note that the opposite is not true - it would be valid for
		FIEMAP_EXTENT_NOT_ALIGNED to appear alone.

		* FIEMAP_EXTENT_LAST
		This is the last extent in the file. A mapping attempt past this
		extent will return nothing.

		* FIEMAP_EXTENT_UNKNOWN
		The location of this extent is currently unknown. This may indicate
		the data is stored on an inaccessible volume or that no storage has
		been allocated for the file yet.

		* FIEMAP_EXTENT_DELALLOC
		- This will also set FIEMAP_EXTENT_UNKNOWN.
		Delayed allocation - while there is data for this extent, it's
		physical location has not been allocated yet.

		* FIEMAP_EXTENT_ENCODED
		This extent does not consist of plain filesystem blocks but is
		encoded (e.g. encrypted or compressed). Reading the data in this
		extent via I/O to the block device will have undefined results.

		Note that it is always undefined to try to update the data
		in-place by writing to the indicated location without the
		assistance of the filesystem, or to access the data using the
		information returned by the FIEMAP interface while the filesystem
		is mounted. In other words, user applications may only read the
		extent data via I/O to the block device while the filesystem is
		unmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is
		clear; user applications must not try reading or writing to the
		filesystem via the block device under any other circumstances.

		* FIEMAP_EXTENT_DATA_ENCRYPTED
		- This will also set FIEMAP_EXTENT_ENCODED
		The data in this extent has been encrypted by the file system.

		* FIEMAP_EXTENT_NOT_ALIGNED
		Extent offsets and length are not guaranteed to be block aligned.

		* FIEMAP_EXTENT_DATA_INLINE
		This will also set FIEMAP_EXTENT_NOT_ALIGNED
		Data is located within a meta data block.

		* FIEMAP_EXTENT_DATA_TAIL
		This will also set FIEMAP_EXTENT_NOT_ALIGNED
		Data is packed into a block with data from other files.

		* FIEMAP_EXTENT_UNWRITTEN
		Unwritten extent - the extent is allocated but it's data has not been
		initialized. This indicates the extent's data will be all zero if read
		through the filesystem but the contents are undefined if read directly from
		the device.

		* FIEMAP_EXTENT_MERGED
		This will be set when a file does not support extents, i.e., it uses a block
		based addressing scheme. Since returning an extent for each block back to
		userspace would be highly inefficient, the kernel will try to merge most
		adjacent blocks into 'extents'.


		VFS -> File System Implementation
		---------------------------------

		File systems wishing to support fiemap must implement a ->fiemap callback on
		their inode_operations structure. The fs ->fiemap call is responsible for
		defining it's set of supported fiemap flags, and calling a helper function on
		each discovered extent:

		struct inode_operations {
		...

		int (fiemap)(struct inode , struct fiemap_extent_info *, u64 start,
		u64 len);

		->fiemap is passed struct fiemap_extent_info which describes the
		fiemap request:

		struct fiemap_extent_info {
		unsigned int fi_flags; /* Flags as passed from user */
		unsigned int fi_extents_mapped; /* Number of mapped extents */
		unsigned int fi_extents_max; /* Size of fiemap_extent array */
		struct fiemap_extent fi_extents_start; / Start of fiemap_extent array */
		};

		It is intended that the file system should not need to access any of this
		structure directly.


		Flag checking should be done at the beginning of the ->fiemap callback via the
		fiemap_check_flags() helper:

		int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);

		The struct fieinfo should be passed in as recieved from ioctl_fiemap(). The
		set of fiemap flags which the fs understands should be passed via fs_flags. If
		fiemap_check_flags finds invalid user flags, it will place the bad values in
		fieinfo->fi_flags and return -EBADR. If the file system gets -EBADR, from
		fiemap_check_flags(), it should immediately exit, returning that error back to
		ioctl_fiemap().


		For each extent in the request range, the file system should call
		the helper function, fiemap_fill_next_extent():

		int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
		u64 phys, u64 len, u32 flags, u32 dev);

		fiemap_fill_next_extent() will use the passed values to populate the
		next free extent in the fm_extents array. 'General' extent flags will
		automatically be set from specific flags on behalf of the calling file
		system so that the userspace API is not broken.

		fiemap_fill_next_extent() returns 0 on success, and 1 when the
		user-supplied fm_extents array is full. If an error is encountered
		while copying the extent to user memory, -EFAULT will be returned.

Documentation/filesystems/proc.txt

+36 −37

Original line number	Diff line number	Diff line
		@@ -923,45 +923,44 @@ CPUs.
		The "procs_blocked" line gives the number of processes currently blocked,
		waiting for I/O to complete.


		1.9 Ext4 file system parameters
		------------------------------
		Ext4 file system have one directory per partition under /proc/fs/ext4/
		# ls /proc/fs/ext4/hdc/
		group_prealloc max_to_scan mb_groups mb_history min_to_scan order2_req
		stats stream_req

		mb_groups:
		This file gives the details of multiblock allocator buddy cache of free blocks

		mb_history:
		Multiblock allocation history.

		stats:
		This file indicate whether the multiblock allocator should start collecting
		statistics. The statistics are shown during unmount

		group_prealloc:
		The multiblock allocator normalize the block allocation request to
		group_prealloc filesystem blocks if we don't have strip value set.
		The stripe value can be specified at mount time or during mke2fs.

		max_to_scan:
		How long multiblock allocator can look for a best extent (in found extents)

		min_to_scan:
		How long multiblock allocator must look for a best extent

		order2_req:
		Multiblock allocator use 2^N search using buddies only for requests greater
		than or equal to order2_req. The request size is specfied in file system
		blocks. A value of 2 indicate only if the requests are greater than or equal
		to 4 blocks.

		stream_req:
		Files smaller than stream_req are served by the stream allocator, whose
		purpose is to pack requests as close each to other as possible to
		produce smooth I/O traffic. Avalue of 16 indicate that file smaller than 16
		filesystem block size will use group based preallocation.

		Information about mounted ext4 file systems can be found in
		/proc/fs/ext4. Each mounted filesystem will have a directory in
		/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
		/proc/fs/ext4/dm-0). The files in each per-device directory are shown
		in Table 1-10, below.

		Table 1-10: Files in /proc/fs/ext4/<devname>
		..............................................................................
		File Content
		mb_groups details of multiblock allocator buddy cache of free blocks
		mb_history multiblock allocation history
		stats controls whether the multiblock allocator should start
		collecting statistics, which are shown during the unmount
		group_prealloc the multiblock allocator will round up allocation
		requests to a multiple of this tuning parameter if the
		stripe size is not set in the ext4 superblock
		max_to_scan The maximum number of extents the multiblock allocator
		will search to find the best extent
		min_to_scan The minimum number of extents the multiblock allocator
		will search to find the best extent
		order2_req Tuning parameter which controls the minimum size for
		requests (as a power of 2) where the buddy cache is
		used
		stream_req Files which have fewer blocks than this tunable
		parameter will have their blocks allocated out of a
		block group specific preallocation pool, so that small
		files are packed closely together. Each large file
		will have its blocks allocated out of its own unique
		preallocation pool.
		inode_readahead Tuning parameter which controls the maximum number of
		inode table blocks that ext4's inode table readahead
		algorithm will pre-read into the buffer cache
		..............................................................................


		------------------------------------------------------------------------------
		Summary

MAINTAINERS

+3 −2

Original line number	Diff line number	Diff line
		@@ -1659,9 +1659,10 @@ L: linux-ext4@vger.kernel.org
		S: Maintained

		EXT4 FILE SYSTEM
		P: Stephen Tweedie, Andrew Morton
		M: sct@redhat.com, akpm@linux-foundation.org, adilger@sun.com
		P: Theodore Ts'o
		M: tytso@mit.edu, adilger@sun.com
		L: linux-ext4@vger.kernel.org
		W: http://ext4.wiki.kernel.org
		S: Maintained

		F71805F HARDWARE MONITORING DRIVER

fs/Kconfig

+51 −37

Original line number	Diff line number	Diff line
		@@ -136,37 +136,51 @@ config EXT3_FS_SECURITY
		If you are not using a security module that requires using
		extended attributes for file security labels, say N.

		config EXT4DEV_FS
		tristate "Ext4dev/ext4 extended fs support development (EXPERIMENTAL)"
		depends on EXPERIMENTAL
		config EXT4_FS
		tristate "The Extended 4 (ext4) filesystem"
		select JBD2
		select CRC16
		help
		Ext4dev is a predecessor filesystem of the next generation
		extended fs ext4, based on ext3 filesystem code. It will be
		renamed ext4 fs later, once ext4dev is mature and stabilized.
		This is the next generation of the ext3 filesystem.

		Unlike the change from ext2 filesystem to ext3 filesystem,
		the on-disk format of ext4dev is not the same as ext3 any more:
		it is based on extent maps and it supports 48-bit physical block
		numbers. These combined on-disk format changes will allow
		ext4dev/ext4 to handle more than 16 TB filesystem volumes --
		a hard limit that ext3 cannot overcome without changing the
		on-disk format.

		Other than extent maps and 48-bit block numbers, ext4dev also is
		likely to have other new features such as persistent preallocation,
		high resolution time stamps, and larger file support etc. These
		features will be added to ext4dev gradually.
		the on-disk format of ext4 is not forwards compatible with
		ext3; it is based on extent maps and it supports 48-bit
		physical block numbers. The ext4 filesystem also supports delayed
		allocation, persistent preallocation, high resolution time stamps,
		and a number of other features to improve performance and speed
		up fsck time. For more information, please see the web pages at
		http://ext4.wiki.kernel.org.

		The ext4 filesystem will support mounting an ext3
		filesystem; while there will be some performance gains from
		the delayed allocation and inode table readahead, the best
		performance gains will require enabling ext4 features in the
		filesystem, or formating a new filesystem as an ext4
		filesystem initially.

		To compile this file system support as a module, choose M here. The
		module will be called ext4dev.

		If unsure, say N.

		config EXT4DEV_FS_XATTR
		bool "Ext4dev extended attributes"
		depends on EXT4DEV_FS
		config EXT4DEV_COMPAT
		bool "Enable ext4dev compatibility"
		depends on EXT4_FS
		help
		Starting with 2.6.28, the name of the ext4 filesystem was
		renamed from ext4dev to ext4. Unfortunately there are some
		lagecy userspace programs (such as klibc's fstype) have
		"ext4dev" hardcoded.

		To enable backwards compatibility so that systems that are
		still expecting to mount ext4 filesystems using ext4dev,
		chose Y here. This feature will go away by 2.6.31, so
		please arrange to get your userspace programs fixed!

		config EXT4_FS_XATTR
		bool "Ext4 extended attributes"
		depends on EXT4_FS
		default y
		help
		Extended attributes are name:value pairs associated with inodes by
		@@ -175,11 +189,11 @@ config EXT4DEV_FS_XATTR

		If unsure, say N.

		You need this for POSIX ACL support on ext4dev/ext4.
		You need this for POSIX ACL support on ext4.

		config EXT4DEV_FS_POSIX_ACL
		bool "Ext4dev POSIX Access Control Lists"
		depends on EXT4DEV_FS_XATTR
		config EXT4_FS_POSIX_ACL
		bool "Ext4 POSIX Access Control Lists"
		depends on EXT4_FS_XATTR
		select FS_POSIX_ACL
		help
		POSIX Access Control Lists (ACLs) support permissions for users and
		@@ -190,14 +204,14 @@ config EXT4DEV_FS_POSIX_ACL

		If you don't know what Access Control Lists are, say N

		config EXT4DEV_FS_SECURITY
		bool "Ext4dev Security Labels"
		depends on EXT4DEV_FS_XATTR
		config EXT4_FS_SECURITY
		bool "Ext4 Security Labels"
		depends on EXT4_FS_XATTR
		help
		Security labels support alternative access control models
		implemented by security modules like SELinux. This option
		enables an extended attribute handler for file security
		labels in the ext4dev/ext4 filesystem.
		labels in the ext4 filesystem.

		If you are not using a security module that requires using
		extended attributes for file security labels, say N.
		@@ -240,22 +254,22 @@ config JBD2
		help
		This is a generic journaling layer for block devices that support
		both 32-bit and 64-bit block numbers. It is currently used by
		the ext4dev/ext4 filesystem, but it could also be used to add
		the ext4 filesystem, but it could also be used to add
		journal support to other file systems or block devices such
		as RAID or LVM.

		If you are using ext4dev/ext4, you need to say Y here. If you are not
		using ext4dev/ext4 then you will probably want to say N.
		If you are using ext4, you need to say Y here. If you are not
		using ext4 then you will probably want to say N.

		To compile this device as a module, choose M here. The module will be
		called jbd2. If you are compiling ext4dev/ext4 into the kernel,
		called jbd2. If you are compiling ext4 into the kernel,
		you cannot compile this code as a module.

		config JBD2_DEBUG
		bool "JBD2 (ext4dev/ext4) debugging support"
		bool "JBD2 (ext4) debugging support"
		depends on JBD2 && DEBUG_FS
		help
		If you are using the ext4dev/ext4 journaled file system (or
		If you are using the ext4 journaled file system (or
		potentially any other filesystem/device using JBD2), this option
		allows you to enable debugging output while the system is running,
		in order to help track down any problems you are having.
		@@ -270,9 +284,9 @@ config JBD2_DEBUG
		config FS_MBCACHE
		# Meta block cache for Extended Attributes (ext2/ext3/ext4)
		tristate
		depends on EXT2_FS_XATTR \|\| EXT3_FS_XATTR \|\| EXT4DEV_FS_XATTR
		default y if EXT2_FS=y \|\| EXT3_FS=y \|\| EXT4DEV_FS=y
		default m if EXT2_FS=m \|\| EXT3_FS=m \|\| EXT4DEV_FS=m
		depends on EXT2_FS_XATTR \|\| EXT3_FS_XATTR \|\| EXT4_FS_XATTR
		default y if EXT2_FS=y \|\| EXT3_FS=y \|\| EXT4_FS=y
		default m if EXT2_FS=m \|\| EXT3_FS=m \|\| EXT4_FS=m

		config REISERFS_FS
		tristate "Reiserfs support"