Merge tag 'dm-4.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm (6597ac8a) · Commits · e / devices / android_kernel_oneplus_sm8150

Documentation/device-mapper/cache-policies.txt

+64 −3

Original line number	Diff line number	Diff line
		@@ -25,10 +25,10 @@ trying to see when the io scheduler has let the ios run.
		Overview of supplied cache replacement policies
		===============================================

		multiqueue
		----------
		multiqueue (mq)
		---------------

		This policy is the default.
		This policy has been deprecated in favor of the smq policy (see below).

		The multiqueue policy has three sets of 16 queues: one set for entries
		waiting for the cache and another two for those in the cache (a set for
		@@ -73,6 +73,67 @@ If you're trying to quickly warm a new cache device you may wish to
		reduce these to encourage promotion. Remember to switch them back to
		their defaults after the cache fills though.

		Stochastic multiqueue (smq)
		---------------------------

		This policy is the default.

		The stochastic multi-queue (smq) policy addresses some of the problems
		with the multiqueue (mq) policy.

		The smq policy (vs mq) offers the promise of less memory utilization,
		improved performance and increased adaptability in the face of changing
		workloads. SMQ also does not have any cumbersome tuning knobs.

		Users may switch from "mq" to "smq" simply by appropriately reloading a
		DM table that is using the cache target. Doing so will cause all of the
		mq policy's hints to be dropped. Also, performance of the cache may
		degrade slightly until smq recalculates the origin device's hotspots
		that should be cached.

		Memory usage:
		The mq policy uses a lot of memory; 88 bytes per cache block on a 64
		bit machine.

		SMQ uses 28bit indexes to implement it's data structures rather than
		pointers. It avoids storing an explicit hit count for each block. It
		has a 'hotspot' queue rather than a pre cache which uses a quarter of
		the entries (each hotspot block covers a larger area than a single
		cache block).

		All these mean smq uses ~25bytes per cache block. Still a lot of
		memory, but a substantial improvement nontheless.

		Level balancing:
		MQ places entries in different levels of the multiqueue structures
		based on their hit count (~ln(hit count)). This means the bottom
		levels generally have the most entries, and the top ones have very
		few. Having unbalanced levels like this reduces the efficacy of the
		multiqueue.

		SMQ does not maintain a hit count, instead it swaps hit entries with
		the least recently used entry from the level above. The over all
		ordering being a side effect of this stochastic process. With this
		scheme we can decide how many entries occupy each multiqueue level,
		resulting in better promotion/demotion decisions.

		Adaptability:
		The MQ policy maintains a hit count for each cache block. For a
		different block to get promoted to the cache it's hit count has to
		exceed the lowest currently in the cache. This means it can take a
		long time for the cache to adapt between varying IO patterns.
		Periodically degrading the hit counts could help with this, but I
		haven't found a nice general solution.

		SMQ doesn't maintain hit counts, so a lot of this problem just goes
		away. In addition it tracks performance of the hotspot queue, which
		is used to decide which blocks to promote. If the hotspot queue is
		performing badly then it starts moving entries more quickly between
		levels. This lets it adapt to new IO patterns very quickly.

		Performance:
		Testing SMQ shows substantially better performance than MQ.

		cleaner
		-------

Documentation/device-mapper/cache.txt

+7 −2

Original line number	Diff line number	Diff line
		@@ -221,6 +221,7 @@ Status
		<#read hits> <#read misses> <#write hits> <#write misses>
		<#demotions> <#promotions> <#dirty> <#features> <features>*
		<#core args> <core args>* <policy name> <#policy args> <policy args>*
		<cache metadata mode>

		metadata block size : Fixed block size for each metadata block in
		sectors
		@@ -251,8 +252,12 @@ core args : Key/value pairs for tuning the core
		e.g. migration_threshold
		policy name : Name of the policy
		#policy args : Number of policy arguments to follow (must be even)
		policy args : Key/value pairs
		e.g. sequential_threshold
		policy args : Key/value pairs e.g. sequential_threshold
		cache metadata mode : ro if read-only, rw if read-write
		In serious cases where even a read-only mode is deemed unsafe
		no further I/O will be permitted and the status will just
		contain the string 'Fail'. The userspace recovery tools
		should then be used.

		Messages
		--------

Documentation/device-mapper/dm-raid.txt

+2 −0

Original line number	Diff line number	Diff line
		@@ -224,3 +224,5 @@ Version History
		New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
		1.5.1 Add ability to restore transiently failed devices on resume.
		1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check".
		1.6.0 Add discard support (and devices_handle_discard_safely module param).
		1.7.0 Add support for MD RAID0 mappings.

Documentation/device-mapper/statistics.txt

+37 −4

Original line number	Diff line number	Diff line
		@@ -13,9 +13,14 @@ the range specified.
		The I/O statistics counters for each step-sized area of a region are
		in the same format as /sys/block/*/stat or /proc/diskstats (see:
		Documentation/iostats.txt). But two extra counters (12 and 13) are
		provided: total time spent reading and writing in milliseconds. All
		these counters may be accessed by sending the @stats_print message to
		the appropriate DM device via dmsetup.
		provided: total time spent reading and writing. When the histogram
		argument is used, the 14th parameter is reported that represents the
		histogram of latencies. All these counters may be accessed by sending
		the @stats_print message to the appropriate DM device via dmsetup.

		The reported times are in milliseconds and the granularity depends on
		the kernel ticks. When the option precise_timestamps is used, the
		reported times are in nanoseconds.

		Each region has a corresponding unique identifier, which we call a
		region_id, that is assigned when the region is created. The region_id
		@@ -33,7 +38,9 @@ memory is used by reading
		Messages
		========

		@stats_create <range> <step> [<program_id> [<aux_data>]]
		@stats_create <range> <step>
		[<number_of_optional_arguments> <optional_arguments>...]
		[<program_id> [<aux_data>]]

		Create a new region and return the region_id.

		@@ -48,6 +55,29 @@ Messages
		"/<number_of_areas>" - the range is subdivided into the specified
		number of areas.

		<number_of_optional_arguments>
		The number of optional arguments

		<optional_arguments>
		The following optional arguments are supported
		precise_timestamps - use precise timer with nanosecond resolution
		instead of the "jiffies" variable. When this argument is
		used, the resulting times are in nanoseconds instead of
		milliseconds. Precise timestamps are a little bit slower
		to obtain than jiffies-based timestamps.
		histogram:n1,n2,n3,n4,... - collect histogram of latencies. The
		numbers n1, n2, etc are times that represent the boundaries
		of the histogram. If precise_timestamps is not used, the
		times are in milliseconds, otherwise they are in
		nanoseconds. For each range, the kernel will report the
		number of requests that completed within this range. For
		example, if we use "histogram:10,20,30", the kernel will
		report four numbers a:b:c:d. a is the number of requests
		that took 0-10 ms to complete, b is the number of requests
		that took 10-20 ms to complete, c is the number of requests
		that took 20-30 ms to complete and d is the number of
		requests that took more than 30 ms to complete.

		<program_id>
		An optional parameter. A name that uniquely identifies
		the userspace owner of the range. This groups ranges together
		@@ -55,6 +85,9 @@ Messages
		created and ignore those created by others.
		The kernel returns this string back in the output of
		@stats_list message, but it doesn't use it for anything else.
		If we omit the number of optional arguments, program id must not
		be a number, otherwise it would be interpreted as the number of
		optional arguments.

		<aux_data>
		An optional parameter. A word that provides auxiliary data

drivers/md/Kconfig

+12 −0

Original line number	Diff line number	Diff line
		@@ -304,6 +304,18 @@ config DM_CACHE_MQ
		This is meant to be a general purpose policy. It prioritises
		reads over writes.

		config DM_CACHE_SMQ
		tristate "Stochastic MQ Cache Policy (EXPERIMENTAL)"
		depends on DM_CACHE
		default y
		---help---
		A cache policy that uses a multiqueue ordered by recent hits
		to select which blocks should be promoted and demoted.
		This is meant to be a general purpose policy. It prioritises
		reads over writes. This SMQ policy (vs MQ) offers the promise
		of less memory utilization, improved performance and increased
		adaptability in the face of changing workloads.

		config DM_CACHE_CLEANER
		tristate "Cleaner Cache Policy (EXPERIMENTAL)"
		depends on DM_CACHE