Merge branch 'for-linus' of git://git.kernel.dk/linux-block (a7e546f1) · Commits · e / devices / android_kernel_fairphone_FP3

Documentation/block/00-INDEX

+8 −2

Original line number	Diff line number	Diff line
		@@ -3,15 +3,21 @@
		biodoc.txt
		- Notes on the Generic Block Layer Rewrite in Linux 2.5
		capability.txt
		- Generic Block Device Capability (/sys/block/<disk>/capability)
		- Generic Block Device Capability (/sys/block/<device>/capability)
		cfq-iosched.txt
		- CFQ IO scheduler tunables
		data-integrity.txt
		- Block data integrity
		deadline-iosched.txt
		- Deadline IO scheduler tunables
		ioprio.txt
		- Block io priorities (in CFQ scheduler)
		queue-sysfs.txt
		- Queue's sysfs entries
		request.txt
		- The members of struct request (in include/linux/blkdev.h)
		stat.txt
		- Block layer statistics in /sys/block/<dev>/stat
		- Block layer statistics in /sys/block/<device>/stat
		switching-sched.txt
		- Switching I/O schedulers at runtime
		writeback_cache_control.txt

Documentation/block/cfq-iosched.txt

+77 −0

Original line number	Diff line number	Diff line
		CFQ (Complete Fairness Queueing)
		===============================

		The main aim of CFQ scheduler is to provide a fair allocation of the disk
		I/O bandwidth for all the processes which requests an I/O operation.

		CFQ maintains the per process queue for the processes which request I/O
		operation(syncronous requests). In case of asynchronous requests, all the
		requests from all the processes are batched together according to their
		process's I/O priority.

		CFQ ioscheduler tunables
		========================

		@@ -25,6 +36,72 @@ there are multiple spindles behind single LUN (Host based hardware RAID
		controller or for storage arrays), setting slice_idle=0 might end up in better
		throughput and acceptable latencies.

		back_seek_max
		-------------
		This specifies, given in Kbytes, the maximum "distance" for backward seeking.
		The distance is the amount of space from the current head location to the
		sectors that are backward in terms of distance.

		This parameter allows the scheduler to anticipate requests in the "backward"
		direction and consider them as being the "next" if they are within this
		distance from the current head location.

		back_seek_penalty
		-----------------
		This parameter is used to compute the cost of backward seeking. If the
		backward distance of request is just 1/back_seek_penalty from a "front"
		request, then the seeking cost of two requests is considered equivalent.

		So scheduler will not bias toward one or the other request (otherwise scheduler
		will bias toward front request). Default value of back_seek_penalty is 2.

		fifo_expire_async
		-----------------
		This parameter is used to set the timeout of asynchronous requests. Default
		value of this is 248ms.

		fifo_expire_sync
		----------------
		This parameter is used to set the timeout of synchronous requests. Default
		value of this is 124ms. In case to favor synchronous requests over asynchronous
		one, this value should be decreased relative to fifo_expire_async.

		slice_async
		-----------
		This parameter is same as of slice_sync but for asynchronous queue. The
		default value is 40ms.

		slice_async_rq
		--------------
		This parameter is used to limit the dispatching of asynchronous request to
		device request queue in queue's slice time. The maximum number of request that
		are allowed to be dispatched also depends upon the io priority. Default value
		for this is 2.

		slice_sync
		----------
		When a queue is selected for execution, the queues IO requests are only
		executed for a certain amount of time(time_slice) before switching to another
		queue. This parameter is used to calculate the time slice of synchronous
		queue.

		time_slice is computed using the below equation:-
		time_slice = slice_sync + (slice_sync/5 * (4 - prio)). To increase the
		time_slice of synchronous queue, increase the value of slice_sync. Default
		value is 100ms.

		quantum
		-------
		This specifies the number of request dispatched to the device queue. In a
		queue's time slice, a request will not be dispatched if the number of request
		in the device exceeds this parameter. This parameter is used for synchronous
		request.

		In case of storage with several disk, this setting can limit the parallel
		processing of request. Therefore, increasing the value can imporve the
		performace although this can cause the latency of some I/O to increase due
		to more number of requests.

		CFQ IOPS Mode for group scheduling
		===================================
		Basic CFQ design is to provide priority based time slices. Higher priority

Documentation/block/queue-sysfs.txt

+64 −0

Original line number	Diff line number	Diff line
		@@ -9,20 +9,71 @@ These files are the ones found in the /sys/block/xxx/queue/ directory.
		Files denoted with a RO postfix are readonly and the RW postfix means
		read-write.

		add_random (RW)
		----------------
		This file allows to trun off the disk entropy contribution. Default
		value of this file is '1'(on).

		discard_granularity (RO)
		-----------------------
		This shows the size of internal allocation of the device in bytes, if
		reported by the device. A value of '0' means device does not support
		the discard functionality.

		discard_max_bytes (RO)
		----------------------
		Devices that support discard functionality may have internal limits on
		the number of bytes that can be trimmed or unmapped in a single operation.
		The discard_max_bytes parameter is set by the device driver to the maximum
		number of bytes that can be discarded in a single operation. Discard
		requests issued to the device must not exceed this limit. A discard_max_bytes
		value of 0 means that the device does not support discard functionality.

		discard_zeroes_data (RO)
		------------------------
		When read, this file will show if the discarded block are zeroed by the
		device or not. If its value is '1' the blocks are zeroed otherwise not.

		hw_sector_size (RO)
		-------------------
		This is the hardware sector size of the device, in bytes.

		iostats (RW)
		-------------
		This file is used to control (on/off) the iostats accounting of the
		disk.

		logical_block_size (RO)
		-----------------------
		This is the logcal block size of the device, in bytes.

		max_hw_sectors_kb (RO)
		----------------------
		This is the maximum number of kilobytes supported in a single data transfer.

		max_integrity_segments (RO)
		---------------------------
		When read, this file shows the max limit of integrity segments as
		set by block layer which a hardware controller can handle.

		max_sectors_kb (RW)
		-------------------
		This is the maximum number of kilobytes that the block layer will allow
		for a filesystem request. Must be smaller than or equal to the maximum
		size allowed by the hardware.

		max_segments (RO)
		-----------------
		Maximum number of segments of the device.

		max_segment_size (RO)
		---------------------
		Maximum segment size of the device.

		minimum_io_size (RO)
		--------------------
		This is the smallest preferred io size reported by the device.

		nomerges (RW)
		-------------
		This enables the user to disable the lookup logic involved with IO
		@@ -45,11 +96,24 @@ per-block-cgroup request pool. IOW, if there are N block cgroups,
		each request queue may have upto N request pools, each independently
		regulated by nr_requests.

		optimal_io_size (RO)
		--------------------
		This is the optimal io size reported by the device.

		physical_block_size (RO)
		------------------------
		This is the physical block size of device, in bytes.

		read_ahead_kb (RW)
		------------------
		Maximum number of kilobytes to read-ahead for filesystems on this block
		device.

		rotational (RW)
		---------------
		This file is used to stat if the device is of rotational type or
		non-rotational type.

		rq_affinity (RW)
		----------------
		If this option is '1', the block layer will migrate request completions to the

block/blk-lib.c

+28 −13

Original line number	Diff line number	Diff line
		@@ -44,6 +44,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
		struct request_queue *q = bdev_get_queue(bdev);
		int type = REQ_WRITE \| REQ_DISCARD;
		unsigned int max_discard_sectors;
		unsigned int granularity, alignment, mask;
		struct bio_batch bb;
		struct bio *bio;
		int ret = 0;
		@@ -54,18 +55,20 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
		if (!blk_queue_discard(q))
		return -EOPNOTSUPP;

		/* Zero-sector (unknown) and one-sector granularities are the same. */
		granularity = max(q->limits.discard_granularity >> 9, 1U);
		mask = granularity - 1;
		alignment = (bdev_discard_alignment(bdev) >> 9) & mask;

		/*
		* Ensure that max_discard_sectors is of the proper
		* granularity
		* granularity, so that requests stay aligned after a split.
		*/
		max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
		max_discard_sectors = round_down(max_discard_sectors, granularity);
		if (unlikely(!max_discard_sectors)) {
		/* Avoid infinite loop below. Being cautious never hurts. */
		return -EOPNOTSUPP;
		} else if (q->limits.discard_granularity) {
		unsigned int disc_sects = q->limits.discard_granularity >> 9;

		max_discard_sectors &= ~(disc_sects - 1);
		}

		if (flags & BLKDEV_DISCARD_SECURE) {
		@@ -79,25 +82,37 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
		bb.wait = &wait;

		while (nr_sects) {
		unsigned int req_sects;
		sector_t end_sect;

		bio = bio_alloc(gfp_mask, 1);
		if (!bio) {
		ret = -ENOMEM;
		break;
		}

		req_sects = min_t(sector_t, nr_sects, max_discard_sectors);

		/*
		* If splitting a request, and the next starting sector would be
		* misaligned, stop the discard at the previous aligned sector.
		*/
		end_sect = sector + req_sects;
		if (req_sects < nr_sects && (end_sect & mask) != alignment) {
		end_sect =
		round_down(end_sect - alignment, granularity)
		+ alignment;
		req_sects = end_sect - sector;
		}

		bio->bi_sector = sector;
		bio->bi_end_io = bio_batch_end_io;
		bio->bi_bdev = bdev;
		bio->bi_private = &bb;

		if (nr_sects > max_discard_sectors) {
		bio->bi_size = max_discard_sectors << 9;
		nr_sects -= max_discard_sectors;
		sector += max_discard_sectors;
		} else {
		bio->bi_size = nr_sects << 9;
		nr_sects = 0;
		}
		bio->bi_size = req_sects << 9;
		nr_sects -= req_sects;
		sector = end_sect;

		atomic_inc(&bb.done);
		submit_bio(type, bio);

block/blk-merge.c

+82 −35

Original line number	Diff line number	Diff line
		@@ -110,43 +110,28 @@ static int blk_phys_contig_segment(struct request_queue q, struct bio bio,
		return 0;
		}

		/*
		* map a request to scatterlist, return number of sg entries setup. Caller
		* must make sure sg can hold rq->nr_phys_segments entries
		*/
		int blk_rq_map_sg(struct request_queue q, struct request rq,
		struct scatterlist *sglist)
		static void
		__blk_segment_map_sg(struct request_queue q, struct bio_vec bvec,
		struct scatterlist sglist, struct bio_vec *bvprv,
		struct scatterlist *sg, int nsegs, int *cluster)
		{
		struct bio_vec bvec, bvprv;
		struct req_iterator iter;
		struct scatterlist *sg;
		int nsegs, cluster;

		nsegs = 0;
		cluster = blk_queue_cluster(q);

		/*
		* for each bio in rq
		*/
		bvprv = NULL;
		sg = NULL;
		rq_for_each_segment(bvec, rq, iter) {
		int nbytes = bvec->bv_len;

		if (bvprv && cluster) {
		if (sg->length + nbytes > queue_max_segment_size(q))
		if (bvprv && cluster) {
		if ((*sg)->length + nbytes > queue_max_segment_size(q))
		goto new_segment;

		if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
		if (!BIOVEC_PHYS_MERGEABLE(*bvprv, bvec))
		goto new_segment;
		if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
		if (!BIOVEC_SEG_BOUNDARY(q, *bvprv, bvec))
		goto new_segment;

		sg->length += nbytes;
		(*sg)->length += nbytes;
		} else {
		new_segment:
		if (!sg)
		sg = sglist;
		if (!*sg)
		*sg = sglist;
		else {
		/*
		* If the driver previously mapped a shorter
		@@ -158,14 +143,39 @@ int blk_rq_map_sg(struct request_queue q, struct request rq,
		* termination bit to avoid doing a full
		* sg_init_table() in drivers for each command.
		*/
		sg->page_link &= ~0x02;
		sg = sg_next(sg);
		(*sg)->page_link &= ~0x02;
		sg = sg_next(sg);
		}

		sg_set_page(sg, bvec->bv_page, nbytes, bvec->bv_offset);
		nsegs++;
		sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
		(*nsegs)++;
		}
		bvprv = bvec;
		*bvprv = bvec;
		}

		/*
		* map a request to scatterlist, return number of sg entries setup. Caller
		* must make sure sg can hold rq->nr_phys_segments entries
		*/
		int blk_rq_map_sg(struct request_queue q, struct request rq,
		struct scatterlist *sglist)
		{
		struct bio_vec bvec, bvprv;
		struct req_iterator iter;
		struct scatterlist *sg;
		int nsegs, cluster;

		nsegs = 0;
		cluster = blk_queue_cluster(q);

		/*
		* for each bio in rq
		*/
		bvprv = NULL;
		sg = NULL;
		rq_for_each_segment(bvec, rq, iter) {
		__blk_segment_map_sg(q, bvec, sglist, &bvprv, &sg,
		&nsegs, &cluster);
		} /* segments in rq */


		@@ -199,6 +209,43 @@ int blk_rq_map_sg(struct request_queue q, struct request rq,
		}
		EXPORT_SYMBOL(blk_rq_map_sg);

		/**
		* blk_bio_map_sg - map a bio to a scatterlist
		* @q: request_queue in question
		* @bio: bio being mapped
		* @sglist: scatterlist being mapped
		*
		* Note:
		* Caller must make sure sg can hold bio->bi_phys_segments entries
		*
		* Will return the number of sg entries setup
		*/
		int blk_bio_map_sg(struct request_queue q, struct bio bio,
		struct scatterlist *sglist)
		{
		struct bio_vec bvec, bvprv;
		struct scatterlist *sg;
		int nsegs, cluster;
		unsigned long i;

		nsegs = 0;
		cluster = blk_queue_cluster(q);

		bvprv = NULL;
		sg = NULL;
		bio_for_each_segment(bvec, bio, i) {
		__blk_segment_map_sg(q, bvec, sglist, &bvprv, &sg,
		&nsegs, &cluster);
		} /* segments in bio */

		if (sg)
		sg_mark_end(sg);

		BUG_ON(bio->bi_phys_segments && nsegs > bio->bi_phys_segments);
		return nsegs;
		}
		EXPORT_SYMBOL(blk_bio_map_sg);

		static inline int ll_new_hw_segment(struct request_queue *q,
		struct request *req,
		struct bio *bio)