Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 9bc9ccd7 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull vfs updates from Al Viro:
 "All kinds of stuff this time around; some more notable parts:

   - RCU'd vfsmounts handling
   - new primitives for coredump handling
   - files_lock is gone
   - Bruce's delegations handling series
   - exportfs fixes

  plus misc stuff all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
  ecryptfs: ->f_op is never NULL
  locks: break delegations on any attribute modification
  locks: break delegations on link
  locks: break delegations on rename
  locks: helper functions for delegation breaking
  locks: break delegations on unlink
  namei: minor vfs_unlink cleanup
  locks: implement delegations
  locks: introduce new FL_DELEG lock flag
  vfs: take i_mutex on renamed file
  vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  vfs: don't use PARENT/CHILD lock classes for non-directories
  vfs: pull ext4's double-i_mutex-locking into common code
  exportfs: fix quadratic behavior in filehandle lookup
  exportfs: better variable name
  exportfs: move most of reconnect_path to helper function
  exportfs: eliminate unused "noprogress" counter
  exportfs: stop retrying once we race with rename/remove
  exportfs: clear DISCONNECTED on all parents sooner
  exportfs: more detailed comment for path_reconnect
  ...
parents f0230294 bdd35366
Loading
Loading
Loading
Loading
+22 −9
Original line number Diff line number Diff line
@@ -2,6 +2,10 @@
kinds of locks - per-inode (->i_mutex) and per-filesystem
(->s_vfs_rename_mutex).

	When taking the i_mutex on multiple non-directory objects, we
always acquire the locks in order by increasing address.  We'll call
that "inode pointer" order in the following.

	For our purposes all operations fall in 5 classes:

1) read access.  Locking rules: caller locks directory we are accessing.
@@ -12,8 +16,9 @@ kinds of locks - per-inode (->i_mutex) and per-filesystem
locks victim and calls the method.

4) rename() that is _not_ cross-directory.  Locking rules: caller locks
the parent, finds source and target, if target already exists - locks it
and then calls the method.
the parent and finds source and target.  If target already exists, lock
it.  If source is a non-directory, lock it.  If that means we need to
lock both, lock them in inode pointer order.

5) link creation.  Locking rules:
	* lock parent
@@ -30,7 +35,9 @@ rules:
		fail with -ENOTEMPTY
	* if new parent is equal to or is a descendent of source
		fail with -ELOOP
	* if target exists - lock it.
	* If target exists, lock it.  If source is a non-directory, lock
	  it.  In case that means we need to lock both source and target,
	  do so in inode pointer order.
	* call the method.


@@ -56,9 +63,11 @@ objects - A < B iff A is an ancestor of B.
    renames will be blocked on filesystem lock and we don't start changing
    the order until we had acquired all locks).

(3) any operation holds at most one lock on non-directory object and
    that lock is acquired after all other locks.  (Proof: see descriptions
    of operations).
(3) locks on non-directory objects are acquired only after locks on
    directory objects, and are acquired in inode pointer order.
    (Proof: all operations but renames take lock on at most one
    non-directory object, except renames, which take locks on source and
    target in inode pointer order in the case they are not directories.)

	Now consider the minimal deadlock.  Each process is blocked on
attempt to acquire some lock and already holds at least one lock.  Let's
@@ -66,9 +75,13 @@ consider the set of contended locks. First of all, filesystem lock is
not contended, since any process blocked on it is not holding any locks.
Thus all processes are blocked on ->i_mutex.

	Non-directory objects are not contended due to (3).  Thus link
creation can't be a part of deadlock - it can't be blocked on source
and it means that it doesn't hold any locks.
	By (3), any process holding a non-directory lock can only be
waiting on another non-directory lock with a larger address.  Therefore
the process holding the "largest" such lock can always make progress, and
non-directory objects are not included in the set of contended locks.

	Thus link creation can't be a part of deadlock - it can't be
blocked on source and it means that it doesn't hold any locks.

	Any contended object is either held by cross-directory rename or
has a child that is also contended.  Indeed, suppose that it is held by
+8 −0
Original line number Diff line number Diff line
@@ -455,3 +455,11 @@ in your dentry operations instead.
	vfs_follow_link has been removed.  Filesystems must use nd_set_link
	from ->follow_link for normal symlinks, or nd_jump_link for magic
	/proc/<pid> style links.
--
[mandatory]
	iget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be
	called with both ->i_lock and inode_hash_lock held; the former is *not*
	taken anymore, so verify that your callbacks do not rely on it (none
	of the in-tree instances did).  inode_hash_lock is still held,
	of course, so they are still serialized wrt removal from inode hash,
	as well as wrt set() callback of iget5_locked().
+1 −1
Original line number Diff line number Diff line
@@ -122,7 +122,7 @@ static inline int get_sigset_t(sigset_t *set,
	return 0;
}

int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
int copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from)
{
	int err;

+4 −8
Original line number Diff line number Diff line
@@ -11,8 +11,7 @@ Elf64_Half elf_core_extra_phdrs(void)
	return GATE_EHDR->e_phnum;
}

int elf_core_write_extra_phdrs(struct file *file, loff_t offset, size_t *size,
			       unsigned long limit)
int elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset)
{
	const struct elf_phdr *const gate_phdrs =
		(const struct elf_phdr *) (GATE_ADDR + GATE_EHDR->e_phoff);
@@ -35,15 +34,13 @@ int elf_core_write_extra_phdrs(struct file *file, loff_t offset, size_t *size,
			phdr.p_offset += ofs;
		}
		phdr.p_paddr = 0; /* match other core phdrs */
		*size += sizeof(phdr);
		if (*size > limit || !dump_write(file, &phdr, sizeof(phdr)))
		if (!dump_emit(cprm, &phdr, sizeof(phdr)))
			return 0;
	}
	return 1;
}

int elf_core_write_extra_data(struct file *file, size_t *size,
			      unsigned long limit)
int elf_core_write_extra_data(struct coredump_params *cprm)
{
	const struct elf_phdr *const gate_phdrs =
		(const struct elf_phdr *) (GATE_ADDR + GATE_EHDR->e_phoff);
@@ -54,8 +51,7 @@ int elf_core_write_extra_data(struct file *file, size_t *size,
			void *addr = (void *)gate_phdrs[i].p_vaddr;
			size_t memsz = PAGE_ALIGN(gate_phdrs[i].p_memsz);

			*size += memsz;
			if (*size > limit || !dump_write(file, addr, memsz))
			if (!dump_emit(cprm, addr, memsz))
				return 0;
			break;
		}
+1 −1
Original line number Diff line number Diff line
@@ -105,7 +105,7 @@ restore_sigcontext (struct sigcontext __user *sc, struct sigscratch *scr)
}

int
copy_siginfo_to_user (siginfo_t __user *to, siginfo_t *from)
copy_siginfo_to_user (siginfo_t __user *to, const siginfo_t *from)
{
	if (!access_ok(VERIFY_WRITE, to, sizeof(siginfo_t)))
		return -EFAULT;
Loading