Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 70cb0d02 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull ext4 updates from Ted Ts'o:
 "Added new ext4 debugging ioctls to allow userspace to get information
  about the state of the extent status cache.

  Dropped workaround for pre-1970 dates which were encoded incorrectly
  in pre-4.4 kernels. Since both the kernel correctly generates, and
  e2fsck detects and fixes this issue for the past four years, it'e time
  to drop the workaround. (Also, it's not like files with dates in the
  distant past were all that common in the first place.)

  A lot of miscellaneous bug fixes and cleanups, including some ext4
  Documentation fixes. Also included are two minor bug fixes in
  fs/unicode"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits)
  unicode: make array 'token' static const, makes object smaller
  unicode: Move static keyword to the front of declarations
  ext4: add missing bigalloc documentation.
  ext4: fix kernel oops caused by spurious casefold flag
  ext4: fix integer overflow when calculating commit interval
  ext4: use percpu_counters for extent_status cache hits/misses
  ext4: fix potential use after free after remounting with noblock_validity
  jbd2: add missing tracepoint for reserved handle
  ext4: fix punch hole for inline_data file systems
  ext4: rework reserved cluster accounting when invalidating pages
  ext4: documentation fixes
  ext4: treat buffers with write errors as containing valid data
  ext4: fix warning inside ext4_convert_unwritten_extents_endio
  ext4: set error return correctly when ext4_htree_store_dirent fails
  ext4: drop legacy pre-1970 encoding workaround
  ext4: add new ioctl EXT4_IOC_GET_ES_CACHE
  ext4: add a new ioctl EXT4_IOC_GETSTATE
  ext4: add a new ioctl EXT4_IOC_CLEAR_ES_CACHE
  jbd2: flush_descriptor(): Do not decrease buffer head's ref count
  ext4: remove unnecessary error check
  ...
parents 104c0d6b 040823b5
Loading
Loading
Loading
Loading
+22 −10
Original line number Diff line number Diff line
@@ -9,14 +9,26 @@ ext4 code is not prepared to handle the case where the block size
exceeds the page size. However, for a filesystem of mostly huge files,
it is desirable to be able to allocate disk blocks in units of multiple
blocks to reduce both fragmentation and metadata overhead. The
`bigalloc <Bigalloc>`__ feature provides exactly this ability. The
administrator can set a block cluster size at mkfs time (which is stored
in the s\_log\_cluster\_size field in the superblock); from then on, the
block bitmaps track clusters, not individual blocks. This means that
block groups can be several gigabytes in size (instead of just 128MiB);
however, the minimum allocation unit becomes a cluster, not a block,
even for directories. TaoBao had a patchset to extend the “use units of
clusters instead of blocks” to the extent tree, though it is not clear
where those patches went-- they eventually morphed into “extent tree v2”
but that code has not landed as of May 2015.
bigalloc feature provides exactly this ability.

The bigalloc feature (EXT4_FEATURE_RO_COMPAT_BIGALLOC) changes ext4 to
use clustered allocation, so that each bit in the ext4 block allocation
bitmap addresses a power of two number of blocks. For example, if the
file system is mainly going to be storing large files in the 4-32
megabyte range, it might make sense to set a cluster size of 1 megabyte.
This means that each bit in the block allocation bitmap now addresses
256 4k blocks. This shrinks the total size of the block allocation
bitmaps for a 2T file system from 64 megabytes to 256 kilobytes. It also
means that a block group addresses 32 gigabytes instead of 128 megabytes,
also shrinking the amount of file system overhead for metadata.

The administrator can set a block cluster size at mkfs time (which is
stored in the s\_log\_cluster\_size field in the superblock); from then
on, the block bitmaps track clusters, not individual blocks. This means
that block groups can be several gigabytes in size (instead of just
128MiB); however, the minimum allocation unit becomes a cluster, not a
block, even for directories. TaoBao had a patchset to extend the “use
units of clusters instead of blocks” to the extent tree, though it is
not clear where those patches went-- they eventually morphed into
“extent tree v2” but that code has not landed as of May 2015.
+5 −5
Original line number Diff line number Diff line
@@ -71,11 +71,11 @@ if the flex\_bg size is 4, then group 0 will contain (in order) the
superblock, group descriptors, data block bitmaps for groups 0-3, inode
bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining
space in group 0 is for file data. The effect of this is to group the
block metadata close together for faster loading, and to enable large
files to be continuous on disk. Backup copies of the superblock and
group descriptors are always at the beginning of block groups, even if
flex\_bg is enabled. The number of block groups that make up a flex\_bg
is given by 2 ^ ``sb.s_log_groups_per_flex``.
block group metadata close together for faster loading, and to enable
large files to be continuous on disk. Backup copies of the superblock
and group descriptors are always at the beginning of block groups, even
if flex\_bg is enabled. The number of block groups that make up a
flex\_bg is given by 2 ^ ``sb.s_log_groups_per_flex``.

Meta Block Groups
-----------------
+3 −1
Original line number Diff line number Diff line
@@ -10,7 +10,9 @@ block groups. Block size is specified at mkfs time and typically is
4KiB. You may experience mounting problems if block size is greater than
page size (i.e. 64KiB blocks on a i386 which only has 4KiB memory
pages). By default a filesystem can contain 2^32 blocks; if the '64bit'
feature is enabled, then a filesystem can have 2^64 blocks.
feature is enabled, then a filesystem can have 2^64 blocks. The location
of structures is stored in terms of the block number the structure lives
in and not the absolute offset on disk.

For 32-bit filesystems, limits are as follows:

+1 −1
Original line number Diff line number Diff line
@@ -59,7 +59,7 @@ is at most 263 bytes long, though on disk you'll need to reference
     - File name.

Since file names cannot be longer than 255 bytes, the new directory
entry format shortens the rec\_len field and uses the space for a file
entry format shortens the name\_len field and uses the space for a file
type flag, probably to avoid having to load every inode during directory
tree traversal. This format is ``ext4_dir_entry_2``, which is at most
263 bytes long, though on disk you'll need to reference
+6 −3
Original line number Diff line number Diff line
@@ -99,9 +99,12 @@ The block group descriptor is laid out in ``struct ext4_group_desc``.
   * - 0x1E
     - \_\_le16
     - bg\_checksum
     - Group descriptor checksum; crc16(sb\_uuid+group+desc) if the
       RO\_COMPAT\_GDT\_CSUM feature is set, or crc32c(sb\_uuid+group\_desc) &
       0xFFFF if the RO\_COMPAT\_METADATA\_CSUM feature is set.
     - Group descriptor checksum; crc16(sb\_uuid+group\_num+bg\_desc) if the
       RO\_COMPAT\_GDT\_CSUM feature is set, or
       crc32c(sb\_uuid+group\_num+bg\_desc) & 0xFFFF if the
       RO\_COMPAT\_METADATA\_CSUM feature is set.  The bg\_checksum
       field in bg\_desc is skipped when calculating crc16 checksum,
       and set to zero if crc32c checksum is used.
   * -
     -
     -
Loading