Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 0be600a5 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge tag 'for-4.16/dm-changes' of...

Merge tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - DM core fixes to ensure that bio submission follows a depth-first
   tree walk; this is critical to allow forward progress without the
   need to use the bioset's BIOSET_NEED_RESCUER.

 - Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure.

 - DM core cleanups and improvements to make bio-based DM more efficient
   (e.g. reduced memory footprint as well leveraging per-bio-data more).

 - Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages
   the more direct IO submission path in the block layer; this mode is
   used by DM multipath and also optimizes targets like DM thin-pool
   that stack directly on NVMe data device.

 - DM multipath improvements to factor out legacy SCSI-only (e.g.
   scsi_dh) code paths to allow for more optimized support for NVMe
   multipath.

 - A fix for DM multipath path selectors (service-time and queue-length)
   to select paths in a more balanced way; largely academic but doesn't
   hurt.

 - Numerous DM raid target fixes and improvements.

 - Add a new DM "unstriped" target that enables Intel to workaround
   firmware limitations in some NVMe drives that are striped internally
   (this target also works when stacked above the DM "striped" target).

 - Various Documentation fixes and improvements.

 - Misc cleanups and fixes across various DM infrastructure and targets
   (e.g. bufio, flakey, log-writes, snapshot).

* tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (69 commits)
  dm cache: Documentation: update default migration_throttling value
  dm mpath selector: more evenly distribute ties
  dm unstripe: fix target length versus number of stripes size check
  dm thin: fix trailing semicolon in __remap_and_issue_shared_cell
  dm table: fix NVMe bio-based dm_table_determine_type() validation
  dm: various cleanups to md->queue initialization code
  dm mpath: delay the retry of a request if the target responded as busy
  dm mpath: return DM_MAPIO_DELAY_REQUEUE if QUEUE_IO or PG_INIT_REQUIRED
  dm mpath: return DM_MAPIO_REQUEUE on blk-mq rq allocation failure
  dm log writes: fix max length used for kstrndup
  dm: backfill missing calls to mutex_destroy()
  dm snapshot: use mutex instead of rw_semaphore
  dm flakey: check for null arg_name in parse_features()
  dm thin: extend thinpool status format string with omitted fields
  dm thin: fixes in thin-provisioning.txt
  dm thin: document representation of <highest mapped sector> when there is none
  dm thin: fix documentation relative to low water mark threshold
  dm cache: be consistent in specifying sectors and SI units in cache.txt
  dm cache: delete obsoleted paragraph in cache.txt
  dm cache: fix grammar in cache-policies.txt
  ...
parents 040639b7 9614e2ba
Loading
Loading
Loading
Loading
+2 −2
Original line number Diff line number Diff line
@@ -60,7 +60,7 @@ Memory usage:
The mq policy used a lot of memory; 88 bytes per cache block on a 64
bit machine.

smq uses 28bit indexes to implement it's data structures rather than
smq uses 28bit indexes to implement its data structures rather than
pointers.  It avoids storing an explicit hit count for each block.  It
has a 'hotspot' queue, rather than a pre-cache, which uses a quarter of
the entries (each hotspot block covers a larger area than a single
@@ -84,7 +84,7 @@ resulting in better promotion/demotion decisions.

Adaptability:
The mq policy maintained a hit count for each cache block.  For a
different block to get promoted to the cache it's hit count has to
different block to get promoted to the cache its hit count has to
exceed the lowest currently in the cache.  This meant it could take a
long time for the cache to adapt between varying IO patterns.

+2 −7
Original line number Diff line number Diff line
@@ -59,7 +59,7 @@ Fixed block size
The origin is divided up into blocks of a fixed size.  This block size
is configurable when you first create the cache.  Typically we've been
using block sizes of 256KB - 1024KB.  The block size must be between 64
(32KB) and 2097152 (1GB) and a multiple of 64 (32KB).
sectors (32KB) and 2097152 sectors (1GB) and a multiple of 64 sectors (32KB).

Having a fixed block size simplifies the target a lot.  But it is
something of a compromise.  For instance, a small part of a block may be
@@ -119,7 +119,7 @@ doing here to avoid migrating during those peak io moments.

For the time being, a message "migration_threshold <#sectors>"
can be used to set the maximum number of sectors being migrated,
the default being 204800 sectors (or 100MB).
the default being 2048 sectors (1MB).

Updating on-disk metadata
-------------------------
@@ -143,11 +143,6 @@ the policy how big this chunk is, but it should be kept small. Like the
dirty flags this data is lost if there's a crash so a safe fallback
value should always be possible.

For instance, the 'mq' policy, which is currently the default policy,
uses this facility to store the hit count of the cache blocks.  If
there's a crash this information will be lost, which means the cache
may be less efficient until those hit counts are regenerated.

Policy hints affect performance, not correctness.

Policy messaging
+4 −1
Original line number Diff line number Diff line
@@ -343,5 +343,8 @@ Version History
1.11.0  Fix table line argument order
	(wrong raid10_copies/raid10_format sequence)
1.11.1  Add raid4/5/6 journal write-back support via journal_mode option
1.12.1  fix for MD deadlock between mddev_suspend() and md_write_start() available
1.12.1  Fix for MD deadlock between mddev_suspend() and md_write_start() available
1.13.0  Fix dev_health status at end of "recover" (was 'a', now 'A')
1.13.1  Fix deadlock caused by early md_stop_writes().  Also fix size an
	state races.
1.13.2  Fix raid redundancy validation and avoid keeping raid set frozen
+4 −0
Original line number Diff line number Diff line
@@ -49,6 +49,10 @@ The difference between persistent and transient is with transient
snapshots less metadata must be saved on disk - they can be kept in
memory by the kernel.

When loading or unloading the snapshot target, the corresponding
snapshot-origin or snapshot-merge target must be suspended. A failure to
suspend the origin target could result in data corruption.


* snapshot-merge <origin> <COW device> <persistent> <chunksize>

+10 −4
Original line number Diff line number Diff line
@@ -112,9 +112,11 @@ $low_water_mark is expressed in blocks of size $data_block_size. If
free space on the data device drops below this level then a dm event
will be triggered which a userspace daemon should catch allowing it to
extend the pool device.  Only one such event will be sent.
Resuming a device with a new table itself triggers an event so the
userspace daemon can use this to detect a situation where a new table
already exceeds the threshold.

No special event is triggered if a just resumed device's free space is below
the low water mark. However, resuming a device always triggers an
event; a userspace daemon should verify that free space exceeds the low
water mark when handling this event.

A low water mark for the metadata device is maintained in the kernel and
will trigger a dm event if free space on the metadata device drops below
@@ -274,7 +276,8 @@ ii) Status

    <transaction id> <used metadata blocks>/<total metadata blocks>
    <used data blocks>/<total data blocks> <held metadata root>
    [no_]discard_passdown ro|rw
    ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space
    needs_check|-

    transaction id:
	A 64-bit number used by userspace to help synchronise with metadata
@@ -394,3 +397,6 @@ ii) Status
	If the pool has encountered device errors and failed, the status
	will just contain the string 'Fail'.  The userspace recovery
	tools should then be used.

    In the case where <nr mapped sectors> is 0, there is no highest
    mapped sector and the value of <highest mapped sector> is unspecified.
Loading