Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 7a771cea authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull device mapper updates from Mike Snitzer:

 - Fix dm-raid transient device failure processing and other smaller
   tweaks.

 - Add journal support to the DM raid target to close the 'write hole'
   on raid 4/5/6.

 - Fix dm-cache corruption, due to rounding bug, when cache exceeds 2TB.

 - Add 'metadata2' feature to dm-cache to separate the dirty bitset out
   from other cache metadata. This improves speed of shutting down a
   large cache device (which implies writing out dirty bits).

 - Fix a memory leak during dm-stats data structure destruction.

 - Fix a DM multipath round-robin path selector performance regression
   that was caused by less precise balancing across all paths.

 - Lastly, introduce a DM core fix for a long-standing DM snapshot
   deadlock that is rooted in the complexity of the device stack used in
   conjunction with block core maintaining bios on current->bio_list to
   manage recursion in generic_make_request(). A more comprehensive fix
   to block core (and its hook in the cpu scheduler) would be wonderful
   but this DM-specific fix is pragmatic considering how difficult it
   has been to make progress on a generic fix.

* tag 'dm-4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits)
  dm: flush queued bios when process blocks to avoid deadlock
  dm round robin: revert "use percpu 'repeat_count' and 'current_path'"
  dm stats: fix a leaked s->histogram_boundaries array
  dm space map metadata: constify dm_space_map structures
  dm cache metadata: use cursor api in blocks_are_clean_separate_dirty()
  dm persistent data: add cursor skip functions to the cursor APIs
  dm cache metadata: use dm_bitset_new() to create the dirty bitset in format 2
  dm bitset: add dm_bitset_new()
  dm cache metadata: name the cache block that couldn't be loaded
  dm cache metadata: add "metadata2" feature
  dm cache metadata: use bitset cursor api to load discard bitset
  dm bitset: introduce cursor api
  dm btree: use GFP_NOFS in dm_btree_del()
  dm space map common: memcpy the disk root to ensure it's arch aligned
  dm block manager: add unlikely() annotations on dm_bufio error paths
  dm cache: fix corruption seen when using cache > 2TB
  dm raid: cleanup awkward branching in raid_message() option processing
  dm raid: use mddev rather than rdev->mddev
  dm raid: use read_disk_sb() throughout
  dm raid: add raid4/5/6 journaling support
  ...
parents e67bd12d d67a5f4b
Loading
Loading
Loading
Loading
+4 −0
Original line number Original line Diff line number Diff line
@@ -207,6 +207,10 @@ Optional feature arguments are:
		   block, then the cache block is invalidated.
		   block, then the cache block is invalidated.
		   To enable passthrough mode the cache must be clean.
		   To enable passthrough mode the cache must be clean.


   metadata2	: use version 2 of the metadata.  This stores the dirty bits
                  in a separate btree, which improves speed of shutting
		  down the cache.

A policy called 'default' is always registered.  This is an alias for
A policy called 'default' is always registered.  This is an alias for
the policy we currently think is giving best all round performance.
the policy we currently think is giving best all round performance.


+17 −0
Original line number Original line Diff line number Diff line
@@ -161,6 +161,15 @@ The target is named "raid" and it accepts the following parameters:
		the RAID type (i.e. the allocation algorithm) as well, e.g.
		the RAID type (i.e. the allocation algorithm) as well, e.g.
		changing from raid5_ls to raid5_n.
		changing from raid5_ls to raid5_n.


	[journal_dev <dev>]
		This option adds a journal device to raid4/5/6 raid sets and
		uses it to close the 'write hole' caused by the non-atomic updates
		to the component devices which can cause data loss during recovery.
		The journal device is used as writethrough thus causing writes to
		be throttled versus non-journaled raid4/5/6 sets.
		Takeover/reshape is not possible with a raid4/5/6 journal device;
		it has to be deconfigured before requesting these.

<#raid_devs>: The number of devices composing the array.
<#raid_devs>: The number of devices composing the array.
	Each device consists of two entries.  The first is the device
	Each device consists of two entries.  The first is the device
	containing the metadata (if any); the second is the one containing the
	containing the metadata (if any); the second is the one containing the
@@ -245,6 +254,9 @@ recovery. Here is a fuller description of the individual fields:
	<data_offset>   The current data offset to the start of the user data on
	<data_offset>   The current data offset to the start of the user data on
			each component device of a raid set (see the respective
			each component device of a raid set (see the respective
			raid parameter to support out-of-place reshaping).
			raid parameter to support out-of-place reshaping).
	<journal_char>	'A' - active raid4/5/6 journal device.
			'D' - dead journal device.
			'-' - no journal device.




Message Interface
Message Interface
@@ -314,3 +326,8 @@ Version History
1.9.0   Add support for RAID level takeover/reshape/region size
1.9.0   Add support for RAID level takeover/reshape/region size
	and set size reduction.
	and set size reduction.
1.9.1   Fix activation of existing RAID 4/10 mapped devices
1.9.1   Fix activation of existing RAID 4/10 mapped devices
1.9.2   Don't emit '- -' on the status table line in case the constructor
	fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
	'D' on the status line.  If '- -' is passed into the constructor, emit
	'- -' on the table line and '-' as the status line health character.
1.10.0  Add support for raid4/5/6 journal device
+298 −55
Original line number Original line Diff line number Diff line
@@ -25,7 +25,7 @@
 * defines a range of metadata versions that this module can handle.
 * defines a range of metadata versions that this module can handle.
 */
 */
#define MIN_CACHE_VERSION 1
#define MIN_CACHE_VERSION 1
#define MAX_CACHE_VERSION 1
#define MAX_CACHE_VERSION 2


#define CACHE_METADATA_CACHE_SIZE 64
#define CACHE_METADATA_CACHE_SIZE 64


@@ -55,6 +55,7 @@ enum mapping_bits {


	/*
	/*
	 * The data on the cache is different from that on the origin.
	 * The data on the cache is different from that on the origin.
	 * This flag is only used by metadata format 1.
	 */
	 */
	M_DIRTY = 2
	M_DIRTY = 2
};
};
@@ -93,12 +94,18 @@ struct cache_disk_superblock {
	__le32 write_misses;
	__le32 write_misses;


	__le32 policy_version[CACHE_POLICY_VERSION_SIZE];
	__le32 policy_version[CACHE_POLICY_VERSION_SIZE];

	/*
	 * Metadata format 2 fields.
	 */
	__le64 dirty_root;
} __packed;
} __packed;


struct dm_cache_metadata {
struct dm_cache_metadata {
	atomic_t ref_count;
	atomic_t ref_count;
	struct list_head list;
	struct list_head list;


	unsigned version;
	struct block_device *bdev;
	struct block_device *bdev;
	struct dm_block_manager *bm;
	struct dm_block_manager *bm;
	struct dm_space_map *metadata_sm;
	struct dm_space_map *metadata_sm;
@@ -141,12 +148,19 @@ struct dm_cache_metadata {
	 */
	 */
	bool fail_io:1;
	bool fail_io:1;


	/*
	 * Metadata format 2 fields.
	 */
	dm_block_t dirty_root;
	struct dm_disk_bitset dirty_info;

	/*
	/*
	 * These structures are used when loading metadata.  They're too
	 * These structures are used when loading metadata.  They're too
	 * big to put on the stack.
	 * big to put on the stack.
	 */
	 */
	struct dm_array_cursor mapping_cursor;
	struct dm_array_cursor mapping_cursor;
	struct dm_array_cursor hint_cursor;
	struct dm_array_cursor hint_cursor;
	struct dm_bitset_cursor dirty_cursor;
};
};


/*-------------------------------------------------------------------
/*-------------------------------------------------------------------
@@ -170,6 +184,7 @@ static void sb_prepare_for_write(struct dm_block_validator *v,
static int check_metadata_version(struct cache_disk_superblock *disk_super)
static int check_metadata_version(struct cache_disk_superblock *disk_super)
{
{
	uint32_t metadata_version = le32_to_cpu(disk_super->version);
	uint32_t metadata_version = le32_to_cpu(disk_super->version);

	if (metadata_version < MIN_CACHE_VERSION || metadata_version > MAX_CACHE_VERSION) {
	if (metadata_version < MIN_CACHE_VERSION || metadata_version > MAX_CACHE_VERSION) {
		DMERR("Cache metadata version %u found, but only versions between %u and %u supported.",
		DMERR("Cache metadata version %u found, but only versions between %u and %u supported.",
		      metadata_version, MIN_CACHE_VERSION, MAX_CACHE_VERSION);
		      metadata_version, MIN_CACHE_VERSION, MAX_CACHE_VERSION);
@@ -310,6 +325,11 @@ static void __copy_sm_root(struct dm_cache_metadata *cmd,
	       sizeof(cmd->metadata_space_map_root));
	       sizeof(cmd->metadata_space_map_root));
}
}


static bool separate_dirty_bits(struct dm_cache_metadata *cmd)
{
	return cmd->version >= 2;
}

static int __write_initial_superblock(struct dm_cache_metadata *cmd)
static int __write_initial_superblock(struct dm_cache_metadata *cmd)
{
{
	int r;
	int r;
@@ -341,7 +361,7 @@ static int __write_initial_superblock(struct dm_cache_metadata *cmd)
	disk_super->flags = 0;
	disk_super->flags = 0;
	memset(disk_super->uuid, 0, sizeof(disk_super->uuid));
	memset(disk_super->uuid, 0, sizeof(disk_super->uuid));
	disk_super->magic = cpu_to_le64(CACHE_SUPERBLOCK_MAGIC);
	disk_super->magic = cpu_to_le64(CACHE_SUPERBLOCK_MAGIC);
	disk_super->version = cpu_to_le32(MAX_CACHE_VERSION);
	disk_super->version = cpu_to_le32(cmd->version);
	memset(disk_super->policy_name, 0, sizeof(disk_super->policy_name));
	memset(disk_super->policy_name, 0, sizeof(disk_super->policy_name));
	memset(disk_super->policy_version, 0, sizeof(disk_super->policy_version));
	memset(disk_super->policy_version, 0, sizeof(disk_super->policy_version));
	disk_super->policy_hint_size = 0;
	disk_super->policy_hint_size = 0;
@@ -362,6 +382,9 @@ static int __write_initial_superblock(struct dm_cache_metadata *cmd)
	disk_super->write_hits = cpu_to_le32(0);
	disk_super->write_hits = cpu_to_le32(0);
	disk_super->write_misses = cpu_to_le32(0);
	disk_super->write_misses = cpu_to_le32(0);


	if (separate_dirty_bits(cmd))
		disk_super->dirty_root = cpu_to_le64(cmd->dirty_root);

	return dm_tm_commit(cmd->tm, sblock);
	return dm_tm_commit(cmd->tm, sblock);
}
}


@@ -382,6 +405,13 @@ static int __format_metadata(struct dm_cache_metadata *cmd)
	if (r < 0)
	if (r < 0)
		goto bad;
		goto bad;


	if (separate_dirty_bits(cmd)) {
		dm_disk_bitset_init(cmd->tm, &cmd->dirty_info);
		r = dm_bitset_empty(&cmd->dirty_info, &cmd->dirty_root);
		if (r < 0)
			goto bad;
	}

	dm_disk_bitset_init(cmd->tm, &cmd->discard_info);
	dm_disk_bitset_init(cmd->tm, &cmd->discard_info);
	r = dm_bitset_empty(&cmd->discard_info, &cmd->discard_root);
	r = dm_bitset_empty(&cmd->discard_info, &cmd->discard_root);
	if (r < 0)
	if (r < 0)
@@ -407,9 +437,10 @@ static int __format_metadata(struct dm_cache_metadata *cmd)
static int __check_incompat_features(struct cache_disk_superblock *disk_super,
static int __check_incompat_features(struct cache_disk_superblock *disk_super,
				     struct dm_cache_metadata *cmd)
				     struct dm_cache_metadata *cmd)
{
{
	uint32_t features;
	uint32_t incompat_flags, features;


	features = le32_to_cpu(disk_super->incompat_flags) & ~DM_CACHE_FEATURE_INCOMPAT_SUPP;
	incompat_flags = le32_to_cpu(disk_super->incompat_flags);
	features = incompat_flags & ~DM_CACHE_FEATURE_INCOMPAT_SUPP;
	if (features) {
	if (features) {
		DMERR("could not access metadata due to unsupported optional features (%lx).",
		DMERR("could not access metadata due to unsupported optional features (%lx).",
		      (unsigned long)features);
		      (unsigned long)features);
@@ -470,6 +501,7 @@ static int __open_metadata(struct dm_cache_metadata *cmd)
	}
	}


	__setup_mapping_info(cmd);
	__setup_mapping_info(cmd);
	dm_disk_bitset_init(cmd->tm, &cmd->dirty_info);
	dm_disk_bitset_init(cmd->tm, &cmd->discard_info);
	dm_disk_bitset_init(cmd->tm, &cmd->discard_info);
	sb_flags = le32_to_cpu(disk_super->flags);
	sb_flags = le32_to_cpu(disk_super->flags);
	cmd->clean_when_opened = test_bit(CLEAN_SHUTDOWN, &sb_flags);
	cmd->clean_when_opened = test_bit(CLEAN_SHUTDOWN, &sb_flags);
@@ -548,6 +580,7 @@ static unsigned long clear_clean_shutdown(unsigned long flags)
static void read_superblock_fields(struct dm_cache_metadata *cmd,
static void read_superblock_fields(struct dm_cache_metadata *cmd,
				   struct cache_disk_superblock *disk_super)
				   struct cache_disk_superblock *disk_super)
{
{
	cmd->version = le32_to_cpu(disk_super->version);
	cmd->flags = le32_to_cpu(disk_super->flags);
	cmd->flags = le32_to_cpu(disk_super->flags);
	cmd->root = le64_to_cpu(disk_super->mapping_root);
	cmd->root = le64_to_cpu(disk_super->mapping_root);
	cmd->hint_root = le64_to_cpu(disk_super->hint_root);
	cmd->hint_root = le64_to_cpu(disk_super->hint_root);
@@ -567,6 +600,9 @@ static void read_superblock_fields(struct dm_cache_metadata *cmd,
	cmd->stats.write_hits = le32_to_cpu(disk_super->write_hits);
	cmd->stats.write_hits = le32_to_cpu(disk_super->write_hits);
	cmd->stats.write_misses = le32_to_cpu(disk_super->write_misses);
	cmd->stats.write_misses = le32_to_cpu(disk_super->write_misses);


	if (separate_dirty_bits(cmd))
		cmd->dirty_root = le64_to_cpu(disk_super->dirty_root);

	cmd->changed = false;
	cmd->changed = false;
}
}


@@ -625,6 +661,13 @@ static int __commit_transaction(struct dm_cache_metadata *cmd,
	 */
	 */
	BUILD_BUG_ON(sizeof(struct cache_disk_superblock) > 512);
	BUILD_BUG_ON(sizeof(struct cache_disk_superblock) > 512);


	if (separate_dirty_bits(cmd)) {
		r = dm_bitset_flush(&cmd->dirty_info, cmd->dirty_root,
				    &cmd->dirty_root);
		if (r)
			return r;
	}

	r = dm_bitset_flush(&cmd->discard_info, cmd->discard_root,
	r = dm_bitset_flush(&cmd->discard_info, cmd->discard_root,
			    &cmd->discard_root);
			    &cmd->discard_root);
	if (r)
	if (r)
@@ -649,6 +692,8 @@ static int __commit_transaction(struct dm_cache_metadata *cmd,
		update_flags(disk_super, mutator);
		update_flags(disk_super, mutator);


	disk_super->mapping_root = cpu_to_le64(cmd->root);
	disk_super->mapping_root = cpu_to_le64(cmd->root);
	if (separate_dirty_bits(cmd))
		disk_super->dirty_root = cpu_to_le64(cmd->dirty_root);
	disk_super->hint_root = cpu_to_le64(cmd->hint_root);
	disk_super->hint_root = cpu_to_le64(cmd->hint_root);
	disk_super->discard_root = cpu_to_le64(cmd->discard_root);
	disk_super->discard_root = cpu_to_le64(cmd->discard_root);
	disk_super->discard_block_size = cpu_to_le64(cmd->discard_block_size);
	disk_super->discard_block_size = cpu_to_le64(cmd->discard_block_size);
@@ -698,7 +743,8 @@ static void unpack_value(__le64 value_le, dm_oblock_t *block, unsigned *flags)
static struct dm_cache_metadata *metadata_open(struct block_device *bdev,
static struct dm_cache_metadata *metadata_open(struct block_device *bdev,
					       sector_t data_block_size,
					       sector_t data_block_size,
					       bool may_format_device,
					       bool may_format_device,
					       size_t policy_hint_size)
					       size_t policy_hint_size,
					       unsigned metadata_version)
{
{
	int r;
	int r;
	struct dm_cache_metadata *cmd;
	struct dm_cache_metadata *cmd;
@@ -709,6 +755,7 @@ static struct dm_cache_metadata *metadata_open(struct block_device *bdev,
		return ERR_PTR(-ENOMEM);
		return ERR_PTR(-ENOMEM);
	}
	}


	cmd->version = metadata_version;
	atomic_set(&cmd->ref_count, 1);
	atomic_set(&cmd->ref_count, 1);
	init_rwsem(&cmd->root_lock);
	init_rwsem(&cmd->root_lock);
	cmd->bdev = bdev;
	cmd->bdev = bdev;
@@ -757,7 +804,8 @@ static struct dm_cache_metadata *lookup(struct block_device *bdev)
static struct dm_cache_metadata *lookup_or_open(struct block_device *bdev,
static struct dm_cache_metadata *lookup_or_open(struct block_device *bdev,
						sector_t data_block_size,
						sector_t data_block_size,
						bool may_format_device,
						bool may_format_device,
						size_t policy_hint_size)
						size_t policy_hint_size,
						unsigned metadata_version)
{
{
	struct dm_cache_metadata *cmd, *cmd2;
	struct dm_cache_metadata *cmd, *cmd2;


@@ -768,7 +816,8 @@ static struct dm_cache_metadata *lookup_or_open(struct block_device *bdev,
	if (cmd)
	if (cmd)
		return cmd;
		return cmd;


	cmd = metadata_open(bdev, data_block_size, may_format_device, policy_hint_size);
	cmd = metadata_open(bdev, data_block_size, may_format_device,
			    policy_hint_size, metadata_version);
	if (!IS_ERR(cmd)) {
	if (!IS_ERR(cmd)) {
		mutex_lock(&table_lock);
		mutex_lock(&table_lock);
		cmd2 = lookup(bdev);
		cmd2 = lookup(bdev);
@@ -800,10 +849,11 @@ static bool same_params(struct dm_cache_metadata *cmd, sector_t data_block_size)
struct dm_cache_metadata *dm_cache_metadata_open(struct block_device *bdev,
struct dm_cache_metadata *dm_cache_metadata_open(struct block_device *bdev,
						 sector_t data_block_size,
						 sector_t data_block_size,
						 bool may_format_device,
						 bool may_format_device,
						 size_t policy_hint_size)
						 size_t policy_hint_size,
						 unsigned metadata_version)
{
{
	struct dm_cache_metadata *cmd = lookup_or_open(bdev, data_block_size,
	struct dm_cache_metadata *cmd = lookup_or_open(bdev, data_block_size, may_format_device,
						       may_format_device, policy_hint_size);
						       policy_hint_size, metadata_version);


	if (!IS_ERR(cmd) && !same_params(cmd, data_block_size)) {
	if (!IS_ERR(cmd) && !same_params(cmd, data_block_size)) {
		dm_cache_metadata_close(cmd);
		dm_cache_metadata_close(cmd);
@@ -829,7 +879,7 @@ void dm_cache_metadata_close(struct dm_cache_metadata *cmd)
/*
/*
 * Checks that the given cache block is either unmapped or clean.
 * Checks that the given cache block is either unmapped or clean.
 */
 */
static int block_unmapped_or_clean(struct dm_cache_metadata *cmd, dm_cblock_t b,
static int block_clean_combined_dirty(struct dm_cache_metadata *cmd, dm_cblock_t b,
				      bool *result)
				      bool *result)
{
{
	int r;
	int r;
@@ -838,10 +888,8 @@ static int block_unmapped_or_clean(struct dm_cache_metadata *cmd, dm_cblock_t b,
	unsigned flags;
	unsigned flags;


	r = dm_array_get_value(&cmd->info, cmd->root, from_cblock(b), &value);
	r = dm_array_get_value(&cmd->info, cmd->root, from_cblock(b), &value);
	if (r) {
	if (r)
		DMERR("block_unmapped_or_clean failed");
		return r;
		return r;
	}


	unpack_value(value, &ob, &flags);
	unpack_value(value, &ob, &flags);
	*result = !((flags & M_VALID) && (flags & M_DIRTY));
	*result = !((flags & M_VALID) && (flags & M_DIRTY));
@@ -849,7 +897,7 @@ static int block_unmapped_or_clean(struct dm_cache_metadata *cmd, dm_cblock_t b,
	return 0;
	return 0;
}
}


static int blocks_are_unmapped_or_clean(struct dm_cache_metadata *cmd,
static int blocks_are_clean_combined_dirty(struct dm_cache_metadata *cmd,
					   dm_cblock_t begin, dm_cblock_t end,
					   dm_cblock_t begin, dm_cblock_t end,
					   bool *result)
					   bool *result)
{
{
@@ -857,9 +905,11 @@ static int blocks_are_unmapped_or_clean(struct dm_cache_metadata *cmd,
	*result = true;
	*result = true;


	while (begin != end) {
	while (begin != end) {
		r = block_unmapped_or_clean(cmd, begin, result);
		r = block_clean_combined_dirty(cmd, begin, result);
		if (r)
		if (r) {
			DMERR("block_clean_combined_dirty failed");
			return r;
			return r;
		}


		if (!*result) {
		if (!*result) {
			DMERR("cache block %llu is dirty",
			DMERR("cache block %llu is dirty",
@@ -873,6 +923,67 @@ static int blocks_are_unmapped_or_clean(struct dm_cache_metadata *cmd,
	return 0;
	return 0;
}
}


static int blocks_are_clean_separate_dirty(struct dm_cache_metadata *cmd,
					   dm_cblock_t begin, dm_cblock_t end,
					   bool *result)
{
	int r;
	bool dirty_flag;
	*result = true;

	r = dm_bitset_cursor_begin(&cmd->dirty_info, cmd->dirty_root,
				   from_cblock(begin), &cmd->dirty_cursor);
	if (r) {
		DMERR("%s: dm_bitset_cursor_begin for dirty failed", __func__);
		return r;
	}

	r = dm_bitset_cursor_skip(&cmd->dirty_cursor, from_cblock(begin));
	if (r) {
		DMERR("%s: dm_bitset_cursor_skip for dirty failed", __func__);
		dm_bitset_cursor_end(&cmd->dirty_cursor);
		return r;
	}

	while (begin != end) {
		/*
		 * We assume that unmapped blocks have their dirty bit
		 * cleared.
		 */
		dirty_flag = dm_bitset_cursor_get_value(&cmd->dirty_cursor);
		if (dirty_flag) {
			DMERR("%s: cache block %llu is dirty", __func__,
			      (unsigned long long) from_cblock(begin));
			dm_bitset_cursor_end(&cmd->dirty_cursor);
			*result = false;
			return 0;
		}

		r = dm_bitset_cursor_next(&cmd->dirty_cursor);
		if (r) {
			DMERR("%s: dm_bitset_cursor_next for dirty failed", __func__);
			dm_bitset_cursor_end(&cmd->dirty_cursor);
			return r;
		}

		begin = to_cblock(from_cblock(begin) + 1);
	}

	dm_bitset_cursor_end(&cmd->dirty_cursor);

	return 0;
}

static int blocks_are_unmapped_or_clean(struct dm_cache_metadata *cmd,
					dm_cblock_t begin, dm_cblock_t end,
					bool *result)
{
	if (separate_dirty_bits(cmd))
		return blocks_are_clean_separate_dirty(cmd, begin, end, result);
	else
		return blocks_are_clean_combined_dirty(cmd, begin, end, result);
}

static bool cmd_write_lock(struct dm_cache_metadata *cmd)
static bool cmd_write_lock(struct dm_cache_metadata *cmd)
{
{
	down_write(&cmd->root_lock);
	down_write(&cmd->root_lock);
@@ -950,7 +1061,17 @@ int dm_cache_resize(struct dm_cache_metadata *cmd, dm_cblock_t new_cache_size)
	r = dm_array_resize(&cmd->info, cmd->root, from_cblock(cmd->cache_blocks),
	r = dm_array_resize(&cmd->info, cmd->root, from_cblock(cmd->cache_blocks),
			    from_cblock(new_cache_size),
			    from_cblock(new_cache_size),
			    &null_mapping, &cmd->root);
			    &null_mapping, &cmd->root);
	if (!r)
	if (r)
		goto out;

	if (separate_dirty_bits(cmd)) {
		r = dm_bitset_resize(&cmd->dirty_info, cmd->dirty_root,
				     from_cblock(cmd->cache_blocks), from_cblock(new_cache_size),
				     false, &cmd->dirty_root);
		if (r)
			goto out;
	}

	cmd->cache_blocks = new_cache_size;
	cmd->cache_blocks = new_cache_size;
	cmd->changed = true;
	cmd->changed = true;


@@ -995,14 +1116,6 @@ static int __clear_discard(struct dm_cache_metadata *cmd, dm_dblock_t b)
				   from_dblock(b), &cmd->discard_root);
				   from_dblock(b), &cmd->discard_root);
}
}


static int __is_discarded(struct dm_cache_metadata *cmd, dm_dblock_t b,
			  bool *is_discarded)
{
	return dm_bitset_test_bit(&cmd->discard_info, cmd->discard_root,
				  from_dblock(b), &cmd->discard_root,
				  is_discarded);
}

static int __discard(struct dm_cache_metadata *cmd,
static int __discard(struct dm_cache_metadata *cmd,
		     dm_dblock_t dblock, bool discard)
		     dm_dblock_t dblock, bool discard)
{
{
@@ -1032,24 +1145,40 @@ static int __load_discards(struct dm_cache_metadata *cmd,
			   load_discard_fn fn, void *context)
			   load_discard_fn fn, void *context)
{
{
	int r = 0;
	int r = 0;
	dm_block_t b;
	uint32_t b;
	bool discard;
	struct dm_bitset_cursor c;


	for (b = 0; b < from_dblock(cmd->discard_nr_blocks); b++) {
	if (from_dblock(cmd->discard_nr_blocks) == 0)
		dm_dblock_t dblock = to_dblock(b);
		/* nothing to do */
		return 0;


	if (cmd->clean_when_opened) {
	if (cmd->clean_when_opened) {
			r = __is_discarded(cmd, dblock, &discard);
		r = dm_bitset_flush(&cmd->discard_info, cmd->discard_root, &cmd->discard_root);
		if (r)
			return r;

		r = dm_bitset_cursor_begin(&cmd->discard_info, cmd->discard_root,
					   from_dblock(cmd->discard_nr_blocks), &c);
		if (r)
		if (r)
			return r;
			return r;
		} else
			discard = false;


		r = fn(context, cmd->discard_block_size, dblock, discard);
		for (b = 0; b < from_dblock(cmd->discard_nr_blocks); b++) {
			r = fn(context, cmd->discard_block_size, to_dblock(b),
			       dm_bitset_cursor_get_value(&c));
			if (r)
			if (r)
				break;
				break;
		}
		}


		dm_bitset_cursor_end(&c);

	} else {
		for (b = 0; b < from_dblock(cmd->discard_nr_blocks); b++) {
			r = fn(context, cmd->discard_block_size, to_dblock(b), false);
			if (r)
				return r;
		}
	}

	return r;
	return r;
}
}


@@ -1177,7 +1306,7 @@ static bool hints_array_available(struct dm_cache_metadata *cmd,
		hints_array_initialized(cmd);
		hints_array_initialized(cmd);
}
}


static int __load_mapping(struct dm_cache_metadata *cmd,
static int __load_mapping_v1(struct dm_cache_metadata *cmd,
			     uint64_t cb, bool hints_valid,
			     uint64_t cb, bool hints_valid,
			     struct dm_array_cursor *mapping_cursor,
			     struct dm_array_cursor *mapping_cursor,
			     struct dm_array_cursor *hint_cursor,
			     struct dm_array_cursor *hint_cursor,
@@ -1206,8 +1335,51 @@ static int __load_mapping(struct dm_cache_metadata *cmd,


		r = fn(context, oblock, to_cblock(cb), flags & M_DIRTY,
		r = fn(context, oblock, to_cblock(cb), flags & M_DIRTY,
		       le32_to_cpu(hint), hints_valid);
		       le32_to_cpu(hint), hints_valid);
		if (r)
		if (r) {
			DMERR("policy couldn't load cblock");
			DMERR("policy couldn't load cache block %llu",
			      (unsigned long long) from_cblock(to_cblock(cb)));
		}
	}

	return r;
}

static int __load_mapping_v2(struct dm_cache_metadata *cmd,
			     uint64_t cb, bool hints_valid,
			     struct dm_array_cursor *mapping_cursor,
			     struct dm_array_cursor *hint_cursor,
			     struct dm_bitset_cursor *dirty_cursor,
			     load_mapping_fn fn, void *context)
{
	int r = 0;

	__le64 mapping;
	__le32 hint = 0;

	__le64 *mapping_value_le;
	__le32 *hint_value_le;

	dm_oblock_t oblock;
	unsigned flags;
	bool dirty;

	dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le);
	memcpy(&mapping, mapping_value_le, sizeof(mapping));
	unpack_value(mapping, &oblock, &flags);

	if (flags & M_VALID) {
		if (hints_valid) {
			dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le);
			memcpy(&hint, hint_value_le, sizeof(hint));
		}

		dirty = dm_bitset_cursor_get_value(dirty_cursor);
		r = fn(context, oblock, to_cblock(cb), dirty,
		       le32_to_cpu(hint), hints_valid);
		if (r) {
			DMERR("policy couldn't load cache block %llu",
			      (unsigned long long) from_cblock(to_cblock(cb)));
		}
	}
	}


	return r;
	return r;
@@ -1238,8 +1410,26 @@ static int __load_mappings(struct dm_cache_metadata *cmd,
		}
		}
	}
	}


	if (separate_dirty_bits(cmd)) {
		r = dm_bitset_cursor_begin(&cmd->dirty_info, cmd->dirty_root,
					   from_cblock(cmd->cache_blocks),
					   &cmd->dirty_cursor);
		if (r) {
			dm_array_cursor_end(&cmd->hint_cursor);
			dm_array_cursor_end(&cmd->mapping_cursor);
			return r;
		}
	}

	for (cb = 0; ; cb++) {
	for (cb = 0; ; cb++) {
		r = __load_mapping(cmd, cb, hints_valid,
		if (separate_dirty_bits(cmd))
			r = __load_mapping_v2(cmd, cb, hints_valid,
					      &cmd->mapping_cursor,
					      &cmd->hint_cursor,
					      &cmd->dirty_cursor,
					      fn, context);
		else
			r = __load_mapping_v1(cmd, cb, hints_valid,
					      &cmd->mapping_cursor, &cmd->hint_cursor,
					      &cmd->mapping_cursor, &cmd->hint_cursor,
					      fn, context);
					      fn, context);
		if (r)
		if (r)
@@ -1264,12 +1454,23 @@ static int __load_mappings(struct dm_cache_metadata *cmd,
				goto out;
				goto out;
			}
			}
		}
		}

		if (separate_dirty_bits(cmd)) {
			r = dm_bitset_cursor_next(&cmd->dirty_cursor);
			if (r) {
				DMERR("dm_bitset_cursor_next for dirty failed");
				goto out;
			}
		}
	}
	}
out:
out:
	dm_array_cursor_end(&cmd->mapping_cursor);
	dm_array_cursor_end(&cmd->mapping_cursor);
	if (hints_valid)
	if (hints_valid)
		dm_array_cursor_end(&cmd->hint_cursor);
		dm_array_cursor_end(&cmd->hint_cursor);


	if (separate_dirty_bits(cmd))
		dm_bitset_cursor_end(&cmd->dirty_cursor);

	return r;
	return r;
}
}


@@ -1352,13 +1553,55 @@ static int __dirty(struct dm_cache_metadata *cmd, dm_cblock_t cblock, bool dirty


}
}


int dm_cache_set_dirty(struct dm_cache_metadata *cmd,
static int __set_dirty_bits_v1(struct dm_cache_metadata *cmd, unsigned nr_bits, unsigned long *bits)
		       dm_cblock_t cblock, bool dirty)
{
	int r;
	unsigned i;
	for (i = 0; i < nr_bits; i++) {
		r = __dirty(cmd, to_cblock(i), test_bit(i, bits));
		if (r)
			return r;
	}

	return 0;
}

static int is_dirty_callback(uint32_t index, bool *value, void *context)
{
	unsigned long *bits = context;
	*value = test_bit(index, bits);
	return 0;
}

static int __set_dirty_bits_v2(struct dm_cache_metadata *cmd, unsigned nr_bits, unsigned long *bits)
{
	int r = 0;

	/* nr_bits is really just a sanity check */
	if (nr_bits != from_cblock(cmd->cache_blocks)) {
		DMERR("dirty bitset is wrong size");
		return -EINVAL;
	}

	r = dm_bitset_del(&cmd->dirty_info, cmd->dirty_root);
	if (r)
		return r;

	cmd->changed = true;
	return dm_bitset_new(&cmd->dirty_info, &cmd->dirty_root, nr_bits, is_dirty_callback, bits);
}

int dm_cache_set_dirty_bits(struct dm_cache_metadata *cmd,
			    unsigned nr_bits,
			    unsigned long *bits)
{
{
	int r;
	int r;


	WRITE_LOCK(cmd);
	WRITE_LOCK(cmd);
	r = __dirty(cmd, cblock, dirty);
	if (separate_dirty_bits(cmd))
		r = __set_dirty_bits_v2(cmd, nr_bits, bits);
	else
		r = __set_dirty_bits_v1(cmd, nr_bits, bits);
	WRITE_UNLOCK(cmd);
	WRITE_UNLOCK(cmd);


	return r;
	return r;
+7 −4
Original line number Original line Diff line number Diff line
@@ -45,18 +45,20 @@
 * As these various flags are defined they should be added to the
 * As these various flags are defined they should be added to the
 * following masks.
 * following masks.
 */
 */

#define DM_CACHE_FEATURE_COMPAT_SUPP	  0UL
#define DM_CACHE_FEATURE_COMPAT_SUPP	  0UL
#define DM_CACHE_FEATURE_COMPAT_RO_SUPP	  0UL
#define DM_CACHE_FEATURE_COMPAT_RO_SUPP	  0UL
#define DM_CACHE_FEATURE_INCOMPAT_SUPP	  0UL
#define DM_CACHE_FEATURE_INCOMPAT_SUPP	  0UL


/*
/*
 * Reopens or creates a new, empty metadata volume.
 * Reopens or creates a new, empty metadata volume.  Returns an ERR_PTR on
 * Returns an ERR_PTR on failure.
 * failure.  If reopening then features must match.
 */
 */
struct dm_cache_metadata *dm_cache_metadata_open(struct block_device *bdev,
struct dm_cache_metadata *dm_cache_metadata_open(struct block_device *bdev,
						 sector_t data_block_size,
						 sector_t data_block_size,
						 bool may_format_device,
						 bool may_format_device,
						 size_t policy_hint_size);
						 size_t policy_hint_size,
						 unsigned metadata_version);


void dm_cache_metadata_close(struct dm_cache_metadata *cmd);
void dm_cache_metadata_close(struct dm_cache_metadata *cmd);


@@ -91,7 +93,8 @@ int dm_cache_load_mappings(struct dm_cache_metadata *cmd,
			   load_mapping_fn fn,
			   load_mapping_fn fn,
			   void *context);
			   void *context);


int dm_cache_set_dirty(struct dm_cache_metadata *cmd, dm_cblock_t cblock, bool dirty);
int dm_cache_set_dirty_bits(struct dm_cache_metadata *cmd,
			    unsigned nr_bits, unsigned long *bits);


struct dm_cache_statistics {
struct dm_cache_statistics {
	uint32_t read_hits;
	uint32_t read_hits;
+25 −19
Original line number Original line Diff line number Diff line
@@ -179,6 +179,7 @@ enum cache_io_mode {
struct cache_features {
struct cache_features {
	enum cache_metadata_mode mode;
	enum cache_metadata_mode mode;
	enum cache_io_mode io_mode;
	enum cache_io_mode io_mode;
	unsigned metadata_version;
};
};


struct cache_stats {
struct cache_stats {
@@ -248,7 +249,7 @@ struct cache {
	/*
	/*
	 * Fields for converting from sectors to blocks.
	 * Fields for converting from sectors to blocks.
	 */
	 */
	uint32_t sectors_per_block;
	sector_t sectors_per_block;
	int sectors_per_block_shift;
	int sectors_per_block_shift;


	spinlock_t lock;
	spinlock_t lock;
@@ -2534,13 +2535,14 @@ static void init_features(struct cache_features *cf)
{
{
	cf->mode = CM_WRITE;
	cf->mode = CM_WRITE;
	cf->io_mode = CM_IO_WRITEBACK;
	cf->io_mode = CM_IO_WRITEBACK;
	cf->metadata_version = 1;
}
}


static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
			  char **error)
			  char **error)
{
{
	static struct dm_arg _args[] = {
	static struct dm_arg _args[] = {
		{0, 1, "Invalid number of cache feature arguments"},
		{0, 2, "Invalid number of cache feature arguments"},
	};
	};


	int r;
	int r;
@@ -2566,6 +2568,9 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
		else if (!strcasecmp(arg, "passthrough"))
		else if (!strcasecmp(arg, "passthrough"))
			cf->io_mode = CM_IO_PASSTHROUGH;
			cf->io_mode = CM_IO_PASSTHROUGH;


		else if (!strcasecmp(arg, "metadata2"))
			cf->metadata_version = 2;

		else {
		else {
			*error = "Unrecognised cache feature requested";
			*error = "Unrecognised cache feature requested";
			return -EINVAL;
			return -EINVAL;
@@ -2820,7 +2825,8 @@ static int cache_create(struct cache_args *ca, struct cache **result)


	cmd = dm_cache_metadata_open(cache->metadata_dev->bdev,
	cmd = dm_cache_metadata_open(cache->metadata_dev->bdev,
				     ca->block_size, may_format,
				     ca->block_size, may_format,
				     dm_cache_policy_get_hint_size(cache->policy));
				     dm_cache_policy_get_hint_size(cache->policy),
				     ca->features.metadata_version);
	if (IS_ERR(cmd)) {
	if (IS_ERR(cmd)) {
		*error = "Error creating metadata object";
		*error = "Error creating metadata object";
		r = PTR_ERR(cmd);
		r = PTR_ERR(cmd);
@@ -3165,21 +3171,16 @@ static int cache_end_io(struct dm_target *ti, struct bio *bio, int error)


static int write_dirty_bitset(struct cache *cache)
static int write_dirty_bitset(struct cache *cache)
{
{
	unsigned i, r;
	int r;


	if (get_cache_mode(cache) >= CM_READ_ONLY)
	if (get_cache_mode(cache) >= CM_READ_ONLY)
		return -EINVAL;
		return -EINVAL;


	for (i = 0; i < from_cblock(cache->cache_size); i++) {
	r = dm_cache_set_dirty_bits(cache->cmd, from_cblock(cache->cache_size), cache->dirty_bitset);
		r = dm_cache_set_dirty(cache->cmd, to_cblock(i),
	if (r)
				       is_dirty(cache, to_cblock(i)));
		metadata_operation_failed(cache, "dm_cache_set_dirty_bits", r);
		if (r) {
			metadata_operation_failed(cache, "dm_cache_set_dirty", r);
			return r;
		}
	}


	return 0;
	return r;
}
}


static int write_discard_bitset(struct cache *cache)
static int write_discard_bitset(struct cache *cache)
@@ -3540,11 +3541,11 @@ static void cache_status(struct dm_target *ti, status_type_t type,


		residency = policy_residency(cache->policy);
		residency = policy_residency(cache->policy);


		DMEMIT("%u %llu/%llu %u %llu/%llu %u %u %u %u %u %u %lu ",
		DMEMIT("%u %llu/%llu %llu %llu/%llu %u %u %u %u %u %u %lu ",
		       (unsigned)DM_CACHE_METADATA_BLOCK_SIZE,
		       (unsigned)DM_CACHE_METADATA_BLOCK_SIZE,
		       (unsigned long long)(nr_blocks_metadata - nr_free_blocks_metadata),
		       (unsigned long long)(nr_blocks_metadata - nr_free_blocks_metadata),
		       (unsigned long long)nr_blocks_metadata,
		       (unsigned long long)nr_blocks_metadata,
		       cache->sectors_per_block,
		       (unsigned long long)cache->sectors_per_block,
		       (unsigned long long) from_cblock(residency),
		       (unsigned long long) from_cblock(residency),
		       (unsigned long long) from_cblock(cache->cache_size),
		       (unsigned long long) from_cblock(cache->cache_size),
		       (unsigned) atomic_read(&cache->stats.read_hit),
		       (unsigned) atomic_read(&cache->stats.read_hit),
@@ -3555,14 +3556,19 @@ static void cache_status(struct dm_target *ti, status_type_t type,
		       (unsigned) atomic_read(&cache->stats.promotion),
		       (unsigned) atomic_read(&cache->stats.promotion),
		       (unsigned long) atomic_read(&cache->nr_dirty));
		       (unsigned long) atomic_read(&cache->nr_dirty));


		if (cache->features.metadata_version == 2)
			DMEMIT("2 metadata2 ");
		else
			DMEMIT("1 ");

		if (writethrough_mode(&cache->features))
		if (writethrough_mode(&cache->features))
			DMEMIT("1 writethrough ");
			DMEMIT("writethrough ");


		else if (passthrough_mode(&cache->features))
		else if (passthrough_mode(&cache->features))
			DMEMIT("1 passthrough ");
			DMEMIT("passthrough ");


		else if (writeback_mode(&cache->features))
		else if (writeback_mode(&cache->features))
			DMEMIT("1 writeback ");
			DMEMIT("writeback ");


		else {
		else {
			DMERR("%s: internal error: unknown io mode: %d",
			DMERR("%s: internal error: unknown io mode: %d",
@@ -3810,7 +3816,7 @@ static void cache_io_hints(struct dm_target *ti, struct queue_limits *limits)


static struct target_type cache_target = {
static struct target_type cache_target = {
	.name = "cache",
	.name = "cache",
	.version = {1, 9, 0},
	.version = {1, 10, 0},
	.module = THIS_MODULE,
	.module = THIS_MODULE,
	.ctr = cache_ctr,
	.ctr = cache_ctr,
	.dtr = cache_dtr,
	.dtr = cache_dtr,
Loading