Merge branch 'md-next' into md-linus (e265eb3a) · Commits · e / devices / android_kernel_teracube_emerald

Documentation/admin-guide/md.rst

+29 −3

Original line number	Diff line number	Diff line
		@@ -401,7 +401,30 @@ All md devices contain:
		once the array becomes non-degraded, and this fact has been
		recorded in the metadata.

		consistency_policy
		This indicates how the array maintains consistency in case of unexpected
		shutdown. It can be:

		none
		Array has no redundancy information, e.g. raid0, linear.

		resync
		Full resync is performed and all redundancy is regenerated when the
		array is started after unclean shutdown.

		bitmap
		Resync assisted by a write-intent bitmap.

		journal
		For raid4/5/6, journal device is used to log transactions and replay
		after unclean shutdown.

		ppl
		For raid5 only, Partial Parity Log is used to close the write hole and
		eliminate resync.

		The accepted values when writing to this file are ``ppl`` and ``resync``,
		used to enable and disable PPL.


		As component devices are added to an md array, they appear in the ``md``
		@@ -563,6 +586,9 @@ Each directory contains:
		adds bad blocks without acknowledging them. This is largely
		for testing.

		ppl_sector, ppl_size
		Location and size (in sectors) of the space used for Partial Parity Log
		on this device.


		An active md device will also contain an entry for each active device

Documentation/md/md-cluster.txt

+1 −1

Original line number	Diff line number	Diff line
		@@ -321,4 +321,4 @@ The algorithm is:

		There are somethings which are not supported by cluster MD yet.

		- update size and change array_sectors.
		- change array_sectors.

Documentation/md/raid5-ppl.txt

0 → 100644

+44 −0

Original line number	Diff line number	Diff line
		Partial Parity Log

		Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue
		addressed by PPL is that after a dirty shutdown, parity of a particular stripe
		may become inconsistent with data on other member disks. If the array is also
		in degraded state, there is no way to recalculate parity, because one of the
		disks is missing. This can lead to silent data corruption when rebuilding the
		array or using it is as degraded - data calculated from parity for array blocks
		that have not been touched by a write request during the unclean shutdown can
		be incorrect. Such condition is known as the RAID5 Write Hole. Because of
		this, md by default does not allow starting a dirty degraded array.

		Partial parity for a write operation is the XOR of stripe data chunks not
		modified by this write. It is just enough data needed for recovering from the
		write hole. XORing partial parity with the modified chunks produces parity for
		the stripe, consistent with its state before the write operation, regardless of
		which chunk writes have completed. If one of the not modified data disks of
		this stripe is missing, this updated parity can be used to recover its
		contents. PPL recovery is also performed when starting an array after an
		unclean shutdown and all disks are available, eliminating the need to resync
		the array. Because of this, using write-intent bitmap and PPL together is not
		supported.

		When handling a write request PPL writes partial parity before new data and
		parity are dispatched to disks. PPL is a distributed log - it is stored on
		array member drives in the metadata area, on the parity drive of a particular
		stripe. It does not require a dedicated journaling drive. Write performance is
		reduced by up to 30%-40% but it scales with the number of drives in the array
		and the journaling drive does not become a bottleneck or a single point of
		failure.

		Unlike raid5-cache, the other solution in md for closing the write hole, PPL is
		not a true journal. It does not protect from losing in-flight data, only from
		silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is
		performed for this stripe (parity is not updated). So it is possible to have
		arbitrary data in the written part of a stripe if that disk is lost. In such
		case the behavior is the same as in plain raid5.

		PPL is available for md version-1 metadata and external (specifically IMSM)
		metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl.

		Currently, volatile write-back cache should be disabled on all member drives
		when using PPL. Otherwise it cannot guarantee consistency in case of power
		failure.

block/bio.c

+13 −48

Original line number	Diff line number	Diff line
		@@ -633,20 +633,21 @@ struct bio bio_clone_fast(struct bio bio, gfp_t gfp_mask, struct bio_set *bs)
		}
		EXPORT_SYMBOL(bio_clone_fast);

		static struct bio __bio_clone_bioset(struct bio bio_src, gfp_t gfp_mask,
		struct bio_set *bs, int offset,
		int size)
		/**
		* bio_clone_bioset - clone a bio
		* @bio_src: bio to clone
		* @gfp_mask: allocation priority
		* @bs: bio_set to allocate from
		*
		* Clone bio. Caller will own the returned bio, but not the actual data it
		* points to. Reference count of returned bio will be one.
		*/
		struct bio bio_clone_bioset(struct bio bio_src, gfp_t gfp_mask,
		struct bio_set *bs)
		{
		struct bvec_iter iter;
		struct bio_vec bv;
		struct bio *bio;
		struct bvec_iter iter_src = bio_src->bi_iter;

		/* for supporting partial clone */
		if (offset \|\| size != bio_src->bi_iter.bi_size) {
		bio_advance_iter(bio_src, &iter_src, offset);
		iter_src.bi_size = size;
		}

		/*
		* Pre immutable biovecs, __bio_clone() used to just do a memcpy from
		@@ -670,8 +671,7 @@ static struct bio __bio_clone_bioset(struct bio bio_src, gfp_t gfp_mask,
		* __bio_clone_fast() anyways.
		*/

		bio = bio_alloc_bioset(gfp_mask, __bio_segments(bio_src,
		&iter_src), bs);
		bio = bio_alloc_bioset(gfp_mask, bio_segments(bio_src), bs);
		if (!bio)
		return NULL;
		bio->bi_bdev = bio_src->bi_bdev;
		@@ -688,7 +688,7 @@ static struct bio __bio_clone_bioset(struct bio bio_src, gfp_t gfp_mask,
		bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0];
		break;
		default:
		__bio_for_each_segment(bv, bio_src, iter, iter_src)
		bio_for_each_segment(bv, bio_src, iter)
		bio->bi_io_vec[bio->bi_vcnt++] = bv;
		break;
		}
		@@ -707,43 +707,8 @@ static struct bio __bio_clone_bioset(struct bio bio_src, gfp_t gfp_mask,

		return bio;
		}

		/**
		* bio_clone_bioset - clone a bio
		* @bio_src: bio to clone
		* @gfp_mask: allocation priority
		* @bs: bio_set to allocate from
		*
		* Clone bio. Caller will own the returned bio, but not the actual data it
		* points to. Reference count of returned bio will be one.
		*/
		struct bio bio_clone_bioset(struct bio bio_src, gfp_t gfp_mask,
		struct bio_set *bs)
		{
		return __bio_clone_bioset(bio_src, gfp_mask, bs, 0,
		bio_src->bi_iter.bi_size);
		}
		EXPORT_SYMBOL(bio_clone_bioset);

		/**
		* bio_clone_bioset_partial - clone a partial bio
		* @bio_src: bio to clone
		* @gfp_mask: allocation priority
		* @bs: bio_set to allocate from
		* @offset: cloned starting from the offset
		* @size: size for the cloned bio
		*
		* Clone bio. Caller will own the returned bio, but not the actual data it
		* points to. Reference count of returned bio will be one.
		*/
		struct bio bio_clone_bioset_partial(struct bio bio_src, gfp_t gfp_mask,
		struct bio_set *bs, int offset,
		int size)
		{
		return __bio_clone_bioset(bio_src, gfp_mask, bs, offset, size);
		}
		EXPORT_SYMBOL(bio_clone_bioset_partial);

		/**
		* bio_add_pc_page - attempt to add page to bio
		* @q: the target queue

drivers/md/Makefile

+1 −1

Original line number	Diff line number	Diff line
		@@ -18,7 +18,7 @@ dm-cache-cleaner-y += dm-cache-policy-cleaner.o
		dm-era-y += dm-era-target.o
		dm-verity-y += dm-verity-target.o
		md-mod-y += md.o bitmap.o
		raid456-y += raid5.o raid5-cache.o
		raid456-y += raid5.o raid5-cache.o raid5-ppl.o

		# Note: link order is important. All raid personalities
		# and must come before md.o, as they each initialise