Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 5abe3795 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull ext4 updates from Ted Ts'o:
 "Add as a feature case-insensitive directories (the casefold feature)
  using Unicode 12.1.

  Also, the usual largish number of cleanups and bug fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
  ext4: export /sys/fs/ext4/feature/casefold if Unicode support is present
  ext4: fix ext4_show_options for file systems w/o journal
  unicode: refactor the rule for regenerating utf8data.h
  docs: ext4.rst: document case-insensitive directories
  ext4: Support case-insensitive file name lookups
  ext4: include charset encoding information in the superblock
  MAINTAINERS: add Unicode subsystem entry
  unicode: update unicode database unicode version 12.1.0
  unicode: introduce test module for normalized utf8 implementation
  unicode: implement higher level API for string handling
  unicode: reduce the size of utf8data[]
  unicode: introduce code for UTF-8 normalization
  unicode: introduce UTF-8 character database
  ext4: actually request zeroing of inode table after grow
  ext4: cond_resched in work-heavy group loops
  ext4: fix use-after-free race with debug_want_extra_isize
  ext4: avoid drop reference to iloc.bh twice
  ext4: ignore e_value_offs for xattrs with value-in-ea-inode
  ext4: protect journal inode's blocks using block_validity
  ext4: use BUG() instead of BUG_ON(1)
  ...
parents e5fef2a9 db90f419
Loading
Loading
Loading
Loading
+38 −0
Original line number Diff line number Diff line
@@ -91,10 +91,48 @@ Currently Available
* large block (up to pagesize) support
* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
  the ordering)
* Case-insensitive file name lookups

[1] Filesystems with a block size of 1k may see a limit imposed by the
directory hash tree having a maximum depth of two.

case-insensitive file name lookups
======================================================

The case-insensitive file name lookup feature is supported on a
per-directory basis, allowing the user to mix case-insensitive and
case-sensitive directories in the same filesystem.  It is enabled by
flipping the +F inode attribute of an empty directory.  The
case-insensitive string match operation is only defined when we know how
text in encoded in a byte sequence.  For that reason, in order to enable
case-insensitive directories, the filesystem must have the
casefold feature, which stores the filesystem-wide encoding
model used.  By default, the charset adopted is the latest version of
Unicode (12.1.0, by the time of this writing), encoded in the UTF-8
form.  The comparison algorithm is implemented by normalizing the
strings to the Canonical decomposition form, as defined by Unicode,
followed by a byte per byte comparison.

The case-awareness is name-preserving on the disk, meaning that the file
name provided by userspace is a byte-per-byte match to what is actually
written in the disk.  The Unicode normalization format used by the
kernel is thus an internal representation, and not exposed to the
userspace nor to the disk, with the important exception of disk hashes,
used on large case-insensitive directories with DX feature.  On DX
directories, the hash must be calculated using the casefolded version of
the filename, meaning that the normalization format used actually has an
impact on where the directory entry is stored.

When we change from viewing filenames as opaque byte sequences to seeing
them as encoded strings we need to address what happens when a program
tries to create a file with an invalid name.  The Unicode subsystem
within the kernel leaves the decision of what to do in this case to the
filesystem, which select its preferred behavior by enabling/disabling
the strict mode.  When Ext4 encounters one of those strings and the
filesystem did not require strict mode, it falls back to considering the
entire string as an opaque byte sequence, which still allows the user to
operate on that file, but the case-insensitive lookups won't work.

Options
=======

+2 −0
Original line number Diff line number Diff line
@@ -176,6 +176,7 @@ mkprep
mkregtable
mktables
mktree
mkutf8data
modpost
modules.builtin
modules.order
@@ -254,6 +255,7 @@ vsyscall_32.lds
wanxlfw.inc
uImage
unifdef
utf8data.h
wakeup.bin
wakeup.elf
wakeup.lds
+6 −0
Original line number Diff line number Diff line
@@ -15984,6 +15984,12 @@ F: drivers/uwb/
F:	include/linux/uwb.h
F:	include/linux/uwb/

UNICODE SUBSYSTEM:
M:	Gabriel Krisman Bertazi <krisman@collabora.com>
L:	linux-fsdevel@vger.kernel.org
S:	Supported
F:	fs/unicode/

UNICORE32 ARCHITECTURE:
M:	Guan Xuetao <gxt@pku.edu.cn>
W:	http://mprc.pku.edu.cn/~guanxuetao/linux
+1 −0
Original line number Diff line number Diff line
@@ -317,5 +317,6 @@ endif # NETWORK_FILESYSTEMS

source "fs/nls/Kconfig"
source "fs/dlm/Kconfig"
source "fs/unicode/Kconfig"

endmenu
+1 −0
Original line number Diff line number Diff line
@@ -92,6 +92,7 @@ obj-$(CONFIG_EXPORTFS) += exportfs/
obj-$(CONFIG_NFSD)		+= nfsd/
obj-$(CONFIG_LOCKD)		+= lockd/
obj-$(CONFIG_NLS)		+= nls/
obj-$(CONFIG_UNICODE)		+= unicode/
obj-$(CONFIG_SYSV_FS)		+= sysv/
obj-$(CONFIG_CIFS)		+= cifs/
obj-$(CONFIG_HPFS_FS)		+= hpfs/
Loading