Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit d05d7f40 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block

Pull core block updates from Jens Axboe:

   - the big change is the cleanup from Mike Christie, cleaning up our
     uses of command types and modified flags.  This is what will throw
     some merge conflicts

   - regression fix for the above for btrfs, from Vincent

   - following up to the above, better packing of struct request from
     Christoph

   - a 2038 fix for blktrace from Arnd

   - a few trivial/spelling fixes from Bart Van Assche

   - a front merge check fix from Damien, which could cause issues on
     SMR drives

   - Atari partition fix from Gabriel

   - convert cfq to highres timers, since jiffies isn't granular enough
     for some devices these days.  From Jan and Jeff

   - CFQ priority boost fix idle classes, from me

   - cleanup series from Ming, improving our bio/bvec iteration

   - a direct issue fix for blk-mq from Omar

   - fix for plug merging not involving the IO scheduler, like we do for
     other types of merges.  From Tahsin

   - expose DAX type internally and through sysfs.  From Toshi and Yigal

* 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits)
  block: Fix front merge check
  block: do not merge requests without consulting with io scheduler
  block: Fix spelling in a source code comment
  block: expose QUEUE_FLAG_DAX in sysfs
  block: add QUEUE_FLAG_DAX for devices to advertise their DAX support
  Btrfs: fix comparison in __btrfs_map_block()
  block: atari: Return early for unsupported sector size
  Doc: block: Fix a typo in queue-sysfs.txt
  cfq-iosched: Charge at least 1 jiffie instead of 1 ns
  cfq-iosched: Fix regression in bonnie++ rewrite performance
  cfq-iosched: Convert slice_resid from u64 to s64
  block: Convert fifo_time from ulong to u64
  blktrace: avoid using timespec
  block/blk-cgroup.c: Declare local symbols static
  block/bio-integrity.c: Add #include "blk.h"
  block/partition-generic.c: Remove a set-but-not-used variable
  block: bio: kill BIO_MAX_SIZE
  cfq-iosched: temporarily boost queue priority for idle classes
  block: drbd: avoid to use BIO_MAX_SIZE
  block: bio: remove BIO_MAX_SECTORS
  ...
parents 75a442ef 17007f39
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -53,7 +53,7 @@ disk.

logical_block_size (RO)
-----------------------
This is the logcal block size of the device, in bytes.
This is the logical block size of the device, in bytes.

max_hw_sectors_kb (RO)
----------------------
+14 −14
Original line number Diff line number Diff line
@@ -20,11 +20,11 @@ a forced cache flush, and the Force Unit Access (FUA) flag for requests.
Explicit cache flushes
----------------------

The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from
The REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from
the filesystem and will make sure the volatile cache of the storage device
has been flushed before the actual I/O operation is started.  This explicitly
guarantees that previously completed write requests are on non-volatile
storage before the flagged bio starts. In addition the REQ_FLUSH flag can be
storage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be
set on an otherwise empty bio structure, which causes only an explicit cache
flush without any dependent I/O.  It is recommend to use
the blkdev_issue_flush() helper for a pure cache flush.
@@ -41,21 +41,21 @@ signaled after the data has been committed to non-volatile storage.
Implementation details for filesystems
--------------------------------------

Filesystems can simply set the REQ_FLUSH and REQ_FUA bits and do not have to
Filesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to
worry if the underlying devices need any explicit cache flushing and how
the Forced Unit Access is implemented.  The REQ_FLUSH and REQ_FUA flags
the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
may both be set on a single bio.


Implementation details for make_request_fn based block drivers
--------------------------------------------------------------

These drivers will always see the REQ_FLUSH and REQ_FUA bits as they sit
These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
directly below the submit_bio interface.  For remapping drivers the REQ_FUA
bits need to be propagated to underlying devices, and a global flush needs
to be implemented for bios with the REQ_FLUSH bit set.  For real device
drivers that do not have a volatile cache the REQ_FLUSH and REQ_FUA bits
on non-empty bios can simply be ignored, and REQ_FLUSH requests without
to be implemented for bios with the REQ_PREFLUSH bit set.  For real device
drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
data can be completed successfully without doing any work.  Drivers for
devices with volatile caches need to implement the support for these
flags themselves without any help from the block layer.
@@ -65,17 +65,17 @@ Implementation details for request_fn based block drivers
--------------------------------------------------------------

For devices that do not support volatile write caches there is no driver
support required, the block layer completes empty REQ_FLUSH requests before
entering the driver and strips off the REQ_FLUSH and REQ_FUA bits from
support required, the block layer completes empty REQ_PREFLUSH requests before
entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
requests that have a payload.  For devices with volatile write caches the
driver needs to tell the block layer that it supports flushing caches by
doing:

	blk_queue_write_cache(sdkp->disk->queue, true, false);

and handle empty REQ_FLUSH requests in its prep_fn/request_fn.  Note that
REQ_FLUSH requests with a payload are automatically turned into a sequence
of an empty REQ_FLUSH request followed by the actual write by the block
and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
REQ_PREFLUSH requests with a payload are automatically turned into a sequence
of an empty REQ_OP_FLUSH request followed by the actual write by the block
layer.  For devices that also support the FUA bit the block layer needs
to be told to pass through the REQ_FUA bit using:

@@ -83,4 +83,4 @@ to be told to pass through the REQ_FUA bit using:

and the driver must handle write requests that have the REQ_FUA bit set
in prep_fn/request_fn.  If the FUA bit is not natively supported the block
layer turns it into an empty REQ_FLUSH request after the actual write.
layer turns it into an empty REQ_OP_FLUSH request after the actual write.
+5 −5
Original line number Diff line number Diff line
@@ -14,14 +14,14 @@ Log Ordering

We log things in order of completion once we are sure the write is no longer in
cache.  This means that normal WRITE requests are not actually logged until the
next REQ_FLUSH request.  This is to make it easier for userspace to replay the
log in a way that correlates to what is on disk and not what is in cache, to
make it easier to detect improper waiting/flushing.
next REQ_PREFLUSH request.  This is to make it easier for userspace to replay
the log in a way that correlates to what is on disk and not what is in cache,
to make it easier to detect improper waiting/flushing.

This works by attaching all WRITE requests to a list once the write completes.
Once we see a REQ_FLUSH request we splice this list onto the request and once
Once we see a REQ_PREFLUSH request we splice this list onto the request and once
the FLUSH request completes we log all of the WRITEs and then the FLUSH.  Only
completed WRITEs, at the time the REQ_FLUSH is issued, are added in order to
completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to
simulate the worst case scenario with regard to power failures.  Consider the
following example (W means write, C means complete):

+1 −1
Original line number Diff line number Diff line
@@ -1286,7 +1286,7 @@ static void do_ubd_request(struct request_queue *q)

		req = dev->request;

		if (req->cmd_flags & REQ_FLUSH) {
		if (req_op(req) == REQ_OP_FLUSH) {
			io_req = kmalloc(sizeof(struct io_thread_req),
					 GFP_ATOMIC);
			if (io_req == NULL) {
+1 −0
Original line number Diff line number Diff line
@@ -26,6 +26,7 @@
#include <linux/bio.h>
#include <linux/workqueue.h>
#include <linux/slab.h>
#include "blk.h"

#define BIP_INLINE_VECS	4

Loading