Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 267a4c76 authored by Hugh Dickins's avatar Hugh Dickins Committed by Linus Torvalds
Browse files

tmpfs: fix shmem_evict_inode() warnings on i_blocks



Dmitry Vyukov provides a little program, autogenerated by syzkaller,
which races a fault on a mapping of a sparse memfd object, against
truncation of that object below the fault address: run repeatedly for a
few minutes, it reliably generates shmem_evict_inode()'s
WARN_ON(inode->i_blocks).

(But there's nothing specific to memfd here, nor to the fstat which it
happened to use to generate the fault: though that looked suspicious,
since a shmem_recalc_inode() had been added there recently.  The same
problem can be reproduced with open+unlink in place of memfd_create, and
with fstatfs in place of fstat.)

v3.7 commit 0f3c42f5 ("tmpfs: change final i_blocks BUG to WARNING")
explains one cause of such a warning (a race with shmem_writepage to
swap), and possible solutions; but we never took it further, and this
syzkaller incident turns out to have a different cause.

shmem_getpage_gfp()'s error recovery, when a freshly allocated page is
then found to be beyond eof, looks plausible - decrementing the alloced
count that was just before incremented - but in fact can go wrong, if a
racing thread (the truncator, for example) gets its shmem_recalc_inode()
in just after our delete_from_page_cache().  delete_from_page_cache()
decrements nrpages, that shmem_recalc_inode() will balance the books by
decrementing alloced itself, then our decrement of alloced take it one
too low: leading to the WARNING when the object is finally evicted.

Once the new page has been exposed in the page cache,
shmem_getpage_gfp() must leave it to shmem_recalc_inode() itself to get
the accounting right in all cases (and not fall through from "trunc:" to
"decused:").  Adjust that error recovery block; and the reinitialization
of info and sbinfo can be removed too.

While we're here, fix shmem_writepage() to avoid the original issue: it
will be safe against a racing shmem_recalc_inode(), if it merely
increments swapped before the shmem_delete_from_page_cache() which
decrements nrpages (but it must then do its own shmem_recalc_inode()
before that, while still in balance, instead of after).  (Aside: why do
we shmem_recalc_inode() here in the swap path? Because its raison d'etre
is to cope with clean sparse shmem pages being reclaimed behind our
back: so here when swapping is a good place to look for that case.) But
I've not now managed to reproduce this bug, even without the patch.

I don't see why I didn't do that earlier: perhaps inhibited by the
preference to eliminate shmem_recalc_inode() altogether.  Driven by this
incident, I do now have a patch to do so at last; but still want to sit
on it for a bit, there's a couple of questions yet to be resolved.

Signed-off-by: default avatarHugh Dickins <hughd@google.com>
Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent dbe409e4
Loading
Loading
Loading
Loading
+14 −20
Original line number Original line Diff line number Diff line
@@ -843,14 +843,14 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
		list_add_tail(&info->swaplist, &shmem_swaplist);
		list_add_tail(&info->swaplist, &shmem_swaplist);


	if (add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) {
	if (add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) {
		swap_shmem_alloc(swap);
		shmem_delete_from_page_cache(page, swp_to_radix_entry(swap));

		spin_lock(&info->lock);
		spin_lock(&info->lock);
		info->swapped++;
		shmem_recalc_inode(inode);
		shmem_recalc_inode(inode);
		info->swapped++;
		spin_unlock(&info->lock);
		spin_unlock(&info->lock);


		swap_shmem_alloc(swap);
		shmem_delete_from_page_cache(page, swp_to_radix_entry(swap));

		mutex_unlock(&shmem_swaplist_mutex);
		mutex_unlock(&shmem_swaplist_mutex);
		BUG_ON(page_mapped(page));
		BUG_ON(page_mapped(page));
		swap_writepage(page, wbc);
		swap_writepage(page, wbc);
@@ -1078,7 +1078,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
	if (sgp != SGP_WRITE && sgp != SGP_FALLOC &&
	if (sgp != SGP_WRITE && sgp != SGP_FALLOC &&
	    ((loff_t)index << PAGE_CACHE_SHIFT) >= i_size_read(inode)) {
	    ((loff_t)index << PAGE_CACHE_SHIFT) >= i_size_read(inode)) {
		error = -EINVAL;
		error = -EINVAL;
		goto failed;
		goto unlock;
	}
	}


	if (page && sgp == SGP_WRITE)
	if (page && sgp == SGP_WRITE)
@@ -1246,11 +1246,15 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
	/* Perhaps the file has been truncated since we checked */
	/* Perhaps the file has been truncated since we checked */
	if (sgp != SGP_WRITE && sgp != SGP_FALLOC &&
	if (sgp != SGP_WRITE && sgp != SGP_FALLOC &&
	    ((loff_t)index << PAGE_CACHE_SHIFT) >= i_size_read(inode)) {
	    ((loff_t)index << PAGE_CACHE_SHIFT) >= i_size_read(inode)) {
		if (alloced) {
			ClearPageDirty(page);
			delete_from_page_cache(page);
			spin_lock(&info->lock);
			shmem_recalc_inode(inode);
			spin_unlock(&info->lock);
		}
		error = -EINVAL;
		error = -EINVAL;
		if (alloced)
		goto unlock;
			goto trunc;
		else
			goto failed;
	}
	}
	*pagep = page;
	*pagep = page;
	return 0;
	return 0;
@@ -1258,23 +1262,13 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
	/*
	/*
	 * Error recovery.
	 * Error recovery.
	 */
	 */
trunc:
	info = SHMEM_I(inode);
	ClearPageDirty(page);
	delete_from_page_cache(page);
	spin_lock(&info->lock);
	info->alloced--;
	inode->i_blocks -= BLOCKS_PER_PAGE;
	spin_unlock(&info->lock);
decused:
decused:
	sbinfo = SHMEM_SB(inode->i_sb);
	if (sbinfo->max_blocks)
	if (sbinfo->max_blocks)
		percpu_counter_add(&sbinfo->used_blocks, -1);
		percpu_counter_add(&sbinfo->used_blocks, -1);
unacct:
unacct:
	shmem_unacct_blocks(info->flags, 1);
	shmem_unacct_blocks(info->flags, 1);
failed:
failed:
	if (swap.val && error != -EINVAL &&
	if (swap.val && !shmem_confirm_swap(mapping, index, swap))
	    !shmem_confirm_swap(mapping, index, swap))
		error = -EEXIST;
		error = -EEXIST;
unlock:
unlock:
	if (page) {
	if (page) {