Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 32ee34ed authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull btrfs updates from David Sterba:
 "New features:

   - swapfile support - after a long time it's here, with some
     limitations where COW design does not work well with the swap
     implementation (nodatacow file, no compression, cannot be
     snapshotted, not possible on multiple devices, ...), as this is the
     most restricted but working setup, we'll try to improve that in the
     future

   - metadata uuid - an optional incompat feature to assign a new
     filesystem UUID without overwriting all metadata blocks, stored
     only in superblock

   - more balance messages are printed to system log, initial is in the
     format of the command line that would be used to start it

  Fixes:

   - tag pages of a snapshot to better separate pages that are involved
     in the snapshot (and need to get synced) from newly dirtied pages
     that could slow down or even livelock the snapshot operation

   - improved check of filesystem id associated with a device during
     scan to detect duplicate devices that could be mixed up during
     mount

   - fix device replace state transitions, eg. when it ends up
     interrupted and reboot tries to restart balance too, or when
     start/cancel ioctls race

   - fix a crash due to a race when quotas are enabled during snapshot
     creation

   - GFP_NOFS/memalloc_nofs_* fixes due to GFP_KERNEL allocations in
     transaction context

   - fix fsync of files with multiple hard links in new directories

   - fix race of send with transaction commits that create snapshots

  Core changes:

   - cleanups:
      * further removals of now-dead fsync code
      * core function for finding free extent has been split and
        provides a base for further cleanups to make the logic more
        understandable
      * removed lot of indirect callbacks for data and metadata inodes
      * simplified refcounting and locking for cloned extent buffers
      * removed redundant function arguments
      * defines converted to enums where appropriate

   - separate reserve for delayed refs from global reserve, update logic
     to do less trickery and ad-hoc heuristics, move out some related
     expensive operations from transaction commit or file truncate

   - dev-replace switched from custom locking scheme to semaphore

   - remove first phase of balance that tried to make some space for the
     relocation by calling shrink and grow, this did not work as
     expected and only introduced more error states due to potential
     resize failures, slightly improves the runtime as the chunks on all
     devices are not needlessly enumerated

   - clone and deduplication now use generic helper that adds a few more
     checks that were missing from the original btrfs implementation of
     the ioctls"

* tag 'for-4.21-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (125 commits)
  btrfs: Fix typos in comments and strings
  btrfs: improve error handling of btrfs_add_link
  Btrfs: use generic_remap_file_range_prep() for cloning and deduplication
  btrfs: Refactor main loop in extent_readpages
  btrfs: Remove 1st shrink/grow phase from balance
  Btrfs: send, fix race with transaction commits that create snapshots
  Btrfs: use nofs context when initializing security xattrs to avoid deadlock
  btrfs: run delayed items before dropping the snapshot
  btrfs: catch cow on deleting snapshots
  btrfs: extent-tree: cleanup one-shot usage of @blocksize in do_walk_down
  Btrfs: scrub, move setup of nofs contexts higher in the stack
  btrfs: scrub: move scrub_setup_ctx allocation out of device_list_mutex
  btrfs: scrub: pass fs_info to scrub_setup_ctx
  btrfs: fix truncate throttling
  btrfs: don't run delayed refs in the end transaction logic
  btrfs: rework btrfs_check_space_for_delayed_refs
  btrfs: add new flushing states for the delayed refs rsv
  btrfs: update may_commit_transaction to use the delayed refs rsv
  btrfs: introduce delayed_refs_rsv
  btrfs: only track ref_heads in delayed_ref_updates
  ...
parents 7bbbf2c2 52042d8e
Loading
Loading
Loading
Loading
+2 −11
Original line number Diff line number Diff line
@@ -591,7 +591,7 @@ unode_aux_to_inode_list(struct ulist_node *node)
}

/*
 * We maintain three seperate rbtrees: one for direct refs, one for
 * We maintain three separate rbtrees: one for direct refs, one for
 * indirect refs which have a key, and one for indirect refs which do not
 * have a key. Each tree does merge on insertion.
 *
@@ -695,7 +695,7 @@ static int resolve_indirect_refs(struct btrfs_fs_info *fs_info,
		}

		/*
		 * Now it's a direct ref, put it in the the direct tree. We must
		 * Now it's a direct ref, put it in the direct tree. We must
		 * do this last because the ref could be merged/freed here.
		 */
		prelim_ref_insert(fs_info, &preftrees->direct, ref, NULL);
@@ -2020,9 +2020,6 @@ static int iterate_inode_refs(u64 inum, struct btrfs_root *fs_root,
			ret = -ENOMEM;
			break;
		}
		extent_buffer_get(eb);
		btrfs_tree_read_lock(eb);
		btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
		btrfs_release_path(path);

		item = btrfs_item_nr(slot);
@@ -2042,7 +2039,6 @@ static int iterate_inode_refs(u64 inum, struct btrfs_root *fs_root,
			len = sizeof(*iref) + name_len;
			iref = (struct btrfs_inode_ref *)((char *)iref + len);
		}
		btrfs_tree_read_unlock_blocking(eb);
		free_extent_buffer(eb);
	}

@@ -2083,10 +2079,6 @@ static int iterate_inode_extrefs(u64 inum, struct btrfs_root *fs_root,
			ret = -ENOMEM;
			break;
		}
		extent_buffer_get(eb);

		btrfs_tree_read_lock(eb);
		btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
		btrfs_release_path(path);

		item_size = btrfs_item_size_nr(eb, slot);
@@ -2107,7 +2099,6 @@ static int iterate_inode_extrefs(u64 inum, struct btrfs_root *fs_root,
			cur_offset += btrfs_inode_extref_name_len(eb, extref);
			cur_offset += sizeof(*extref);
		}
		btrfs_tree_read_unlock_blocking(eb);
		free_extent_buffer(eb);

		offset++;
+13 −1
Original line number Diff line number Diff line
@@ -20,7 +20,7 @@
 * new data the application may have written before commit.
 */
enum {
	BTRFS_INODE_ORDERED_DATA_CLOSE = 0,
	BTRFS_INODE_ORDERED_DATA_CLOSE,
	BTRFS_INODE_DUMMY,
	BTRFS_INODE_IN_DEFRAG,
	BTRFS_INODE_HAS_ASYNC_EXTENT,
@@ -29,6 +29,7 @@ enum {
	BTRFS_INODE_IN_DELALLOC_LIST,
	BTRFS_INODE_READDIO_NEED_LOCK,
	BTRFS_INODE_HAS_PROPS,
	BTRFS_INODE_SNAPSHOT_FLUSH,
};

/* in memory btrfs inode */
@@ -146,6 +147,12 @@ struct btrfs_inode {
	 */
	u64 last_unlink_trans;

	/*
	 * Track the transaction id of the last transaction used to create a
	 * hard link for the inode. This is used by the log tree (fsync).
	 */
	u64 last_link_trans;

	/*
	 * Number of bytes outstanding that are going to need csums.  This is
	 * used in ENOSPC accounting.
@@ -253,6 +260,11 @@ static inline bool btrfs_is_free_space_inode(struct btrfs_inode *inode)
	return false;
}

static inline bool is_data_inode(struct inode *inode)
{
	return btrfs_ino(BTRFS_I(inode)) != BTRFS_BTREE_INODE_OBJECTID;
}

static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode,
						 int mod)
{
+12 −12
Original line number Diff line number Diff line
@@ -1202,24 +1202,24 @@ static void btrfsic_read_from_block_data(
	void *dstv, u32 offset, size_t len)
{
	size_t cur;
	size_t offset_in_page;
	size_t pgoff;
	char *kaddr;
	char *dst = (char *)dstv;
	size_t start_offset = block_ctx->start & ((u64)PAGE_SIZE - 1);
	size_t start_offset = offset_in_page(block_ctx->start);
	unsigned long i = (start_offset + offset) >> PAGE_SHIFT;

	WARN_ON(offset + len > block_ctx->len);
	offset_in_page = (start_offset + offset) & (PAGE_SIZE - 1);
	pgoff = offset_in_page(start_offset + offset);

	while (len > 0) {
		cur = min(len, ((size_t)PAGE_SIZE - offset_in_page));
		cur = min(len, ((size_t)PAGE_SIZE - pgoff));
		BUG_ON(i >= DIV_ROUND_UP(block_ctx->len, PAGE_SIZE));
		kaddr = block_ctx->datav[i];
		memcpy(dst, kaddr + offset_in_page, cur);
		memcpy(dst, kaddr + pgoff, cur);

		dst += cur;
		len -= cur;
		offset_in_page = 0;
		pgoff = 0;
		i++;
	}
}
@@ -1601,7 +1601,7 @@ static int btrfsic_read_block(struct btrfsic_state *state,
	BUG_ON(block_ctx->datav);
	BUG_ON(block_ctx->pagev);
	BUG_ON(block_ctx->mem_to_free);
	if (block_ctx->dev_bytenr & ((u64)PAGE_SIZE - 1)) {
	if (!PAGE_ALIGNED(block_ctx->dev_bytenr)) {
		pr_info("btrfsic: read_block() with unaligned bytenr %llu\n",
		       block_ctx->dev_bytenr);
		return -1;
@@ -1720,7 +1720,7 @@ static int btrfsic_test_for_metadata(struct btrfsic_state *state,
	num_pages = state->metablock_size >> PAGE_SHIFT;
	h = (struct btrfs_header *)datav[0];

	if (memcmp(h->fsid, fs_info->fsid, BTRFS_FSID_SIZE))
	if (memcmp(h->fsid, fs_info->fs_devices->fsid, BTRFS_FSID_SIZE))
		return 1;

	for (i = 0; i < num_pages; i++) {
@@ -1778,7 +1778,7 @@ static void btrfsic_process_written_block(struct btrfsic_dev_state *dev_state,
				return;
			}
			is_metadata = 1;
			BUG_ON(BTRFS_SUPER_INFO_SIZE & (PAGE_SIZE - 1));
			BUG_ON(!PAGE_ALIGNED(BTRFS_SUPER_INFO_SIZE));
			processed_len = BTRFS_SUPER_INFO_SIZE;
			if (state->print_mask &
			    BTRFSIC_PRINT_MASK_TREE_BEFORE_SB_WRITE) {
@@ -2327,7 +2327,7 @@ static int btrfsic_check_all_ref_blocks(struct btrfsic_state *state,
		 * write operations. Therefore it keeps the linkage
		 * information for a block until a block is
		 * rewritten. This can temporarily cause incorrect
		 * and even circular linkage informations. This
		 * and even circular linkage information. This
		 * causes no harm unless such blocks are referenced
		 * by the most recent super block.
		 */
@@ -2892,12 +2892,12 @@ int btrfsic_mount(struct btrfs_fs_info *fs_info,
	struct list_head *dev_head = &fs_devices->devices;
	struct btrfs_device *device;

	if (fs_info->nodesize & ((u64)PAGE_SIZE - 1)) {
	if (!PAGE_ALIGNED(fs_info->nodesize)) {
		pr_info("btrfsic: cannot handle nodesize %d not being a multiple of PAGE_SIZE %ld!\n",
		       fs_info->nodesize, PAGE_SIZE);
		return -1;
	}
	if (fs_info->sectorsize & ((u64)PAGE_SIZE - 1)) {
	if (!PAGE_ALIGNED(fs_info->sectorsize)) {
		pr_info("btrfsic: cannot handle sectorsize %d not being a multiple of PAGE_SIZE %ld!\n",
		       fs_info->sectorsize, PAGE_SIZE);
		return -1;
+11 −15
Original line number Diff line number Diff line
@@ -229,7 +229,6 @@ static noinline void end_compressed_writeback(struct inode *inode,
 */
static void end_compressed_bio_write(struct bio *bio)
{
	struct extent_io_tree *tree;
	struct compressed_bio *cb = bio->bi_private;
	struct inode *inode;
	struct page *page;
@@ -248,14 +247,10 @@ static void end_compressed_bio_write(struct bio *bio)
	 * call back into the FS and do all the end_io operations
	 */
	inode = cb->inode;
	tree = &BTRFS_I(inode)->io_tree;
	cb->compressed_pages[0]->mapping = cb->inode->i_mapping;
	tree->ops->writepage_end_io_hook(cb->compressed_pages[0],
					 cb->start,
					 cb->start + cb->len - 1,
					 NULL,
					 bio->bi_status ?
					 BLK_STS_OK : BLK_STS_NOTSUPP);
	btrfs_writepage_endio_finish_ordered(cb->compressed_pages[0],
			cb->start, cb->start + cb->len - 1,
			bio->bi_status ? BLK_STS_OK : BLK_STS_NOTSUPP);
	cb->compressed_pages[0]->mapping = NULL;

	end_compressed_writeback(inode, cb);
@@ -306,7 +301,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
	blk_status_t ret;
	int skip_sum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;

	WARN_ON(start & ((u64)PAGE_SIZE - 1));
	WARN_ON(!PAGE_ALIGNED(start));
	cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS);
	if (!cb)
		return BLK_STS_RESOURCE;
@@ -337,7 +332,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
		page = compressed_pages[pg_index];
		page->mapping = inode->i_mapping;
		if (bio->bi_iter.bi_size)
			submit = btrfs_merge_bio_hook(page, 0, PAGE_SIZE, bio, 0);
			submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio,
							  0);

		page->mapping = NULL;
		if (submit || bio_add_page(bio, page, PAGE_SIZE, 0) <
@@ -481,7 +477,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,

		if (page->index == end_index) {
			char *userpage;
			size_t zero_offset = isize & (PAGE_SIZE - 1);
			size_t zero_offset = offset_in_page(isize);

			if (zero_offset) {
				int zeros;
@@ -615,7 +611,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
		page->index = em_start >> PAGE_SHIFT;

		if (comp_bio->bi_iter.bi_size)
			submit = btrfs_merge_bio_hook(page, 0, PAGE_SIZE,
			submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE,
							  comp_bio, 0);

		page->mapping = NULL;
@@ -1207,7 +1203,7 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,
/*
 * Shannon Entropy calculation
 *
 * Pure byte distribution analysis fails to determine compressiability of data.
 * Pure byte distribution analysis fails to determine compressibility of data.
 * Try calculating entropy to estimate the average minimum number of bits
 * needed to encode the sampled data.
 *
@@ -1271,7 +1267,7 @@ static u8 get4bits(u64 num, int shift) {

/*
 * Use 4 bits as radix base
 * Use 16 u32 counters for calculating new possition in buf array
 * Use 16 u32 counters for calculating new position in buf array
 *
 * @array     - array that will be sorted
 * @array_buf - buffer array to store sorting results
+32 −14
Original line number Diff line number Diff line
@@ -12,6 +12,7 @@
#include "transaction.h"
#include "print-tree.h"
#include "locking.h"
#include "volumes.h"

static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
		      *root, struct btrfs_path *path, int level);
@@ -224,7 +225,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
	else
		btrfs_set_header_owner(cow, new_root_objectid);

	write_extent_buffer_fsid(cow, fs_info->fsid);
	write_extent_buffer_fsid(cow, fs_info->fs_devices->metadata_uuid);

	WARN_ON(btrfs_header_generation(buf) > trans->transid);
	if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID)
@@ -1050,7 +1051,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
	else
		btrfs_set_header_owner(cow, root->root_key.objectid);

	write_extent_buffer_fsid(cow, fs_info->fsid);
	write_extent_buffer_fsid(cow, fs_info->fs_devices->metadata_uuid);

	ret = update_ref_for_cow(trans, root, buf, cow, &last_ref);
	if (ret) {
@@ -1290,7 +1291,6 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
	btrfs_tree_read_unlock_blocking(eb);
	free_extent_buffer(eb);

	extent_buffer_get(eb_rewin);
	btrfs_tree_read_lock(eb_rewin);
	__tree_mod_log_rewind(fs_info, eb_rewin, time_seq, tm);
	WARN_ON(btrfs_header_nritems(eb_rewin) >
@@ -1362,7 +1362,6 @@ get_old_root(struct btrfs_root *root, u64 time_seq)

	if (!eb)
		return NULL;
	extent_buffer_get(eb);
	btrfs_tree_read_lock(eb);
	if (old_root) {
		btrfs_set_header_bytenr(eb, eb->start);
@@ -1415,7 +1414,7 @@ static inline int should_cow_block(struct btrfs_trans_handle *trans,
	 *
	 * What is forced COW:
	 *    when we create snapshot during committing the transaction,
	 *    after we've finished coping src root, we must COW the shared
	 *    after we've finished copying src root, we must COW the shared
	 *    block to ensure the metadata consistency.
	 */
	if (btrfs_header_generation(buf) == trans->transid &&
@@ -1441,6 +1440,10 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle *trans,
	u64 search_start;
	int ret;

	if (test_bit(BTRFS_ROOT_DELETING, &root->state))
		btrfs_err(fs_info,
			"COW'ing blocks on a fs root that's being dropped");

	if (trans->transaction != fs_info->running_transaction)
		WARN(1, KERN_CRIT "trans %llu running %llu\n",
		       trans->transid,
@@ -2584,14 +2587,27 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
	root_lock = BTRFS_READ_LOCK;

	if (p->search_commit_root) {
		/* The commit roots are read only so we always do read locks */
		if (p->need_commit_sem)
		/*
		 * The commit roots are read only so we always do read locks,
		 * and we always must hold the commit_root_sem when doing
		 * searches on them, the only exception is send where we don't
		 * want to block transaction commits for a long time, so
		 * we need to clone the commit root in order to avoid races
		 * with transaction commits that create a snapshot of one of
		 * the roots used by a send operation.
		 */
		if (p->need_commit_sem) {
			down_read(&fs_info->commit_root_sem);
			b = btrfs_clone_extent_buffer(root->commit_root);
			up_read(&fs_info->commit_root_sem);
			if (!b)
				return ERR_PTR(-ENOMEM);

		} else {
			b = root->commit_root;
			extent_buffer_get(b);
		}
		level = btrfs_header_level(b);
		if (p->need_commit_sem)
			up_read(&fs_info->commit_root_sem);
		/*
		 * Ensure that all callers have set skip_locking when
		 * p->search_commit_root = 1.
@@ -2717,6 +2733,10 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
again:
	prev_cmp = -1;
	b = btrfs_search_slot_get_root(root, p, write_lock_level);
	if (IS_ERR(b)) {
		ret = PTR_ERR(b);
		goto done;
	}

	while (b) {
		level = btrfs_header_level(b);
@@ -3751,7 +3771,7 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
		/* Key greater than all keys in the leaf, right neighbor has
		 * enough room for it and we're not emptying our leaf to delete
		 * it, therefore use right neighbor to insert the new item and
		 * no need to touch/dirty our left leaft. */
		 * no need to touch/dirty our left leaf. */
		btrfs_tree_unlock(left);
		free_extent_buffer(left);
		path->nodes[0] = right;
@@ -5390,7 +5410,6 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
		ret = -ENOMEM;
		goto out;
	}
	extent_buffer_get(left_path->nodes[left_level]);

	right_level = btrfs_header_level(right_root->commit_root);
	right_root_level = right_level;
@@ -5401,7 +5420,6 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
		ret = -ENOMEM;
		goto out;
	}
	extent_buffer_get(right_path->nodes[right_level]);
	up_read(&fs_info->commit_root_sem);

	if (left_level == 0)
Loading