Commit d7067512 authored Jul 03, 2018 by Josef Bacik Committed by Jens Axboe Jul 09, 2018

block: introduce blk-iolatency io controller



Current IO controllers for the block layer are less than ideal for our
use case.  The io.max controller is great at hard limiting, but it is
not work conserving.  This patch introduces io.latency.  You provide a
latency target for your group and we monitor the io in short windows to
make sure we are not exceeding those latency targets.  This makes use of
the rq-qos infrastructure and works much like the wbt stuff.  There are
a few differences from wbt

 - It's bio based, so the latency covers the whole block layer in addition to
   the actual io.
 - We will throttle all IO types that comes in here if we need to.
 - We use the mean latency over the 100ms window.  This is because writes can
   be particularly fast, which could give us a false sense of the impact of
   other workloads on our protected workload.
 - By default there's no throttling, we set the queue_depth to INT_MAX so that
   we can have as many outstanding bio's as we're allowed to.  Only at
   throttle time do we pay attention to the actual queue depth.
 - We backcharge cgroups for root cg issued IO and induce artificial
   delays in order to deal with cases like metadata only or swap heavy
   workloads.

In testing this has worked out relatively well.  Protected workloads
will throttle noisy workloads down to 1 io at time if they are doing
normal IO on their own, or induce up to a 1 second delay per syscall if
they are doing a lot of root issued IO (metadata/swap IO).

Our testing has revolved mostly around our production web servers where
we have hhvm (the web server application) in a protected group and
everything else in another group.  We see slightly higher requests per
second (RPS) on the test tier vs the control tier, and much more stable
RPS across all machines in the test tier vs the control tier.

Another test we run is a slow memory allocator in the unprotected group.
Before this would eventually push us into swap and cause the whole box
to die and not recover at all.  With these patches we see slight RPS
drops (usually 10-15%) before the memory consumer is properly killed and
things recover within seconds.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

parent 67b42d0b

block/Kconfig

+12 −0

Original line number	Diff line number	Diff line
		@@ -149,6 +149,18 @@ config BLK_WBT
		dynamically on an algorithm loosely based on CoDel, factoring in
		the realtime performance of the disk.

		config BLK_CGROUP_IOLATENCY
		bool "Enable support for latency based cgroup IO protection"
		depends on BLK_CGROUP=y
		default n
		---help---
		Enabling this option enables the .latency interface for IO throttling.
		The IO controller will attempt to maintain average IO latencies below
		the configured latency target, throttling anybody with a higher latency
		target than the victimized group.

		Note, this is an experimental interface and could be changed someday.

		config BLK_WBT_SQ
		bool "Single queue writeback throttling"
		default n

block/Makefile

+1 −0

Original line number	Diff line number	Diff line
		@@ -17,6 +17,7 @@ obj-$(CONFIG_BLK_DEV_BSG) += bsg.o
		obj-$(CONFIG_BLK_DEV_BSGLIB) += bsg-lib.o
		obj-$(CONFIG_BLK_CGROUP) += blk-cgroup.o
		obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-throttle.o
		obj-$(CONFIG_BLK_CGROUP_IOLATENCY) += blk-iolatency.o
		obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
		obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
		obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o

block/blk-cgroup.c

+8 −0

Original line number	Diff line number	Diff line
		@@ -1238,6 +1238,14 @@ int blkcg_init_queue(struct request_queue *q)
		if (preloaded)
		radix_tree_preload_end();

		ret = blk_iolatency_init(q);
		if (ret) {
		spin_lock_irq(q->queue_lock);
		blkg_destroy_all(q);
		spin_unlock_irq(q->queue_lock);
		return ret;
		}

		ret = blk_throtl_init(q);
		if (ret) {
		spin_lock_irq(q->queue_lock);

block/blk-iolatency.c

0 → 100644

+930 −0

File added.

Preview size limit exceeded, changes collapsed.

block/blk.h

+6 −0

Original line number	Diff line number	Diff line
		@@ -412,4 +412,10 @@ static inline void blk_queue_bounce(struct request_queue q, struct bio *bio)

		extern void blk_drain_queue(struct request_queue *q);

		#ifdef CONFIG_BLK_CGROUP_IOLATENCY
		extern int blk_iolatency_init(struct request_queue *q);
		#else
		static inline int blk_iolatency_init(struct request_queue *q) { return 0; }
		#endif

		#endif /* BLK_INTERNAL_H */