Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit e9dd2b68 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block

* 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits)
  cfq-iosched: Fix a gcc 4.5 warning and put some comments
  block: Turn bvec_k{un,}map_irq() into static inline functions
  block: fix accounting bug on cross partition merges
  block: Make the integrity mapped property a bio flag
  block: Fix double free in blk_integrity_unregister
  block: Ensure physical block size is unsigned int
  blkio-throttle: Fix possible multiplication overflow in iops calculations
  blkio-throttle: limit max iops value to UINT_MAX
  blkio-throttle: There is no need to convert jiffies to milli seconds
  blkio-throttle: Fix link failure failure on i386
  blkio: Recalculate the throttled bio dispatch time upon throttle limit change
  blkio: Add root group to td->tg_list
  blkio: deletion of a cgroup was causes oops
  blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
  block: set the bounce_pfn to the actual DMA limit rather than to max memory
  block: revert bad fix for memory hotplug causing bounces
  Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
  block: set the bounce_pfn to the actual DMA limit rather than to max memory
  block: Prevent hang_check firing during long I/O
  cfq: improve fsync performance for small files
  ...

Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h
parents 4f3a29da b4627321
Loading
Loading
Loading
Loading
+103 −3
Original line number Diff line number Diff line
@@ -8,12 +8,17 @@ both at leaf nodes as well as at intermediate nodes in a storage hierarchy.
Plan is to use the same cgroup based management interface for blkio controller
and based on user options switch IO policies in the background.

In the first phase, this patchset implements proportional weight time based
division of disk policy. It is implemented in CFQ. Hence this policy takes
effect only on leaf nodes when CFQ is being used.
Currently two IO control policies are implemented. First one is proportional
weight time based division of disk policy. It is implemented in CFQ. Hence
this policy takes effect only on leaf nodes when CFQ is being used. The second
one is throttling policy which can be used to specify upper IO rate limits
on devices. This policy is implemented in generic block layer and can be
used on leaf nodes as well as higher level logical devices like device mapper.

HOWTO
=====
Proportional Weight division of bandwidth
-----------------------------------------
You can do a very simple testing of running two dd threads in two different
cgroups. Here is what you can do.

@@ -55,6 +60,35 @@ cgroups. Here is what you can do.
  group dispatched to the disk. We provide fairness in terms of disk time, so
  ideally io.disk_time of cgroups should be in proportion to the weight.

Throttling/Upper Limit policy
-----------------------------
- Enable Block IO controller
	CONFIG_BLK_CGROUP=y

- Enable throttling in block layer
	CONFIG_BLK_DEV_THROTTLING=y

- Mount blkio controller
        mount -t cgroup -o blkio none /cgroup/blkio

- Specify a bandwidth rate on particular device for root group. The format
  for policy is "<major>:<minor>  <byes_per_second>".

        echo "8:16  1048576" > /cgroup/blkio/blkio.read_bps_device

  Above will put a limit of 1MB/second on reads happening for root group
  on device having major/minor number 8:16.

- Run dd to read a file and see if rate is throttled to 1MB/s or not.

		# dd if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
		# iflag=direct
        1024+0 records in
        1024+0 records out
        4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s

 Limits for writes can be put using blkio.write_bps_device file.

Various user visible config options
===================================
CONFIG_BLK_CGROUP
@@ -68,8 +102,13 @@ CONFIG_CFQ_GROUP_IOSCHED
	- Enables group scheduling in CFQ. Currently only 1 level of group
	  creation is allowed.

CONFIG_BLK_DEV_THROTTLING
	- Enable block device throttling support in block layer.

Details of cgroup files
=======================
Proportional weight policy files
--------------------------------
- blkio.weight
	- Specifies per cgroup weight. This is default weight of the group
	  on all the devices until and unless overridden by per device rule.
@@ -210,6 +249,67 @@ Details of cgroup files
	  and minor number of the device and third field specifies the number
	  of times a group was dequeued from a particular device.

Throttling/Upper limit policy files
-----------------------------------
- blkio.throttle.read_bps_device
	- Specifies upper limit on READ rate from the device. IO rate is
	  specified in bytes per second. Rules are per deivce. Following is
	  the format.

  echo "<major>:<minor>  <rate_bytes_per_second>" > /cgrp/blkio.read_bps_device

- blkio.throttle.write_bps_device
	- Specifies upper limit on WRITE rate to the device. IO rate is
	  specified in bytes per second. Rules are per deivce. Following is
	  the format.

  echo "<major>:<minor>  <rate_bytes_per_second>" > /cgrp/blkio.write_bps_device

- blkio.throttle.read_iops_device
	- Specifies upper limit on READ rate from the device. IO rate is
	  specified in IO per second. Rules are per deivce. Following is
	  the format.

  echo "<major>:<minor>  <rate_io_per_second>" > /cgrp/blkio.read_iops_device

- blkio.throttle.write_iops_device
	- Specifies upper limit on WRITE rate to the device. IO rate is
	  specified in io per second. Rules are per deivce. Following is
	  the format.

  echo "<major>:<minor>  <rate_io_per_second>" > /cgrp/blkio.write_iops_device

Note: If both BW and IOPS rules are specified for a device, then IO is
      subjectd to both the constraints.

- blkio.throttle.io_serviced
	- Number of IOs (bio) completed to/from the disk by the group (as
	  seen by throttling policy). These are further divided by the type
	  of operation - read or write, sync or async. First two fields specify
	  the major and minor number of the device, third field specifies the
	  operation type and the fourth field specifies the number of IOs.

	  blkio.io_serviced does accounting as seen by CFQ and counts are in
	  number of requests (struct request). On the other hand,
	  blkio.throttle.io_serviced counts number of IO in terms of number
	  of bios as seen by throttling policy.  These bios can later be
	  merged by elevator and total number of requests completed can be
	  lesser.

- blkio.throttle.io_service_bytes
	- Number of bytes transferred to/from the disk by the group. These
	  are further divided by the type of operation - read or write, sync
	  or async. First two fields specify the major and minor number of the
	  device, third field specifies the operation type and the fourth field
	  specifies the number of bytes.

	  These numbers should roughly be same as blkio.io_service_bytes as
	  updated by CFQ. The difference between two is that
	  blkio.io_service_bytes will not be updated if CFQ is not operating
	  on request queue.

Common files among various policies
-----------------------------------
- blkio.reset_stats
	- Writing an int to this file will result in resetting all the stats
	  for that cgroup.
+12 −0
Original line number Diff line number Diff line
@@ -77,6 +77,18 @@ config BLK_DEV_INTEGRITY
	T10/SCSI Data Integrity Field or the T13/ATA External Path
	Protection.  If in doubt, say N.

config BLK_DEV_THROTTLING
	bool "Block layer bio throttling support"
	depends on BLK_CGROUP=y && EXPERIMENTAL
	default n
	---help---
	Block layer bio throttling support. It can be used to limit
	the IO rate to a device. IO rate policies are per cgroup and
	one needs to mount and use blkio cgroup controller for creating
	cgroups and specifying per device IO rate policies.

	See Documentation/cgroups/blkio-controller.txt for more information.

endif # BLOCK

config BLOCK_COMPAT
+1 −0
Original line number Diff line number Diff line
@@ -9,6 +9,7 @@ obj-$(CONFIG_BLOCK) := elevator.o blk-core.o blk-tag.o blk-sysfs.o \

obj-$(CONFIG_BLK_DEV_BSG)	+= bsg.o
obj-$(CONFIG_BLK_CGROUP)	+= blk-cgroup.o
obj-$(CONFIG_BLK_DEV_THROTTLING)	+= blk-throttle.o
obj-$(CONFIG_IOSCHED_NOOP)	+= noop-iosched.o
obj-$(CONFIG_IOSCHED_DEADLINE)	+= deadline-iosched.o
obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
+646 −158

File changed.

Preview size limit exceeded, changes collapsed.

+82 −5
Original line number Diff line number Diff line
@@ -15,6 +15,14 @@

#include <linux/cgroup.h>

enum blkio_policy_id {
	BLKIO_POLICY_PROP = 0,		/* Proportional Bandwidth division */
	BLKIO_POLICY_THROTL,		/* Throttling */
};

/* Max limits for throttle policy */
#define THROTL_IOPS_MAX		UINT_MAX

#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE)

#ifndef CONFIG_BLK_CGROUP
@@ -65,6 +73,35 @@ enum blkg_state_flags {
	BLKG_empty,
};

/* cgroup files owned by proportional weight policy */
enum blkcg_file_name_prop {
	BLKIO_PROP_weight = 1,
	BLKIO_PROP_weight_device,
	BLKIO_PROP_io_service_bytes,
	BLKIO_PROP_io_serviced,
	BLKIO_PROP_time,
	BLKIO_PROP_sectors,
	BLKIO_PROP_io_service_time,
	BLKIO_PROP_io_wait_time,
	BLKIO_PROP_io_merged,
	BLKIO_PROP_io_queued,
	BLKIO_PROP_avg_queue_size,
	BLKIO_PROP_group_wait_time,
	BLKIO_PROP_idle_time,
	BLKIO_PROP_empty_time,
	BLKIO_PROP_dequeue,
};

/* cgroup files owned by throttle policy */
enum blkcg_file_name_throtl {
	BLKIO_THROTL_read_bps_device,
	BLKIO_THROTL_write_bps_device,
	BLKIO_THROTL_read_iops_device,
	BLKIO_THROTL_write_iops_device,
	BLKIO_THROTL_io_service_bytes,
	BLKIO_THROTL_io_serviced,
};

struct blkio_cgroup {
	struct cgroup_subsys_state css;
	unsigned int weight;
@@ -112,6 +149,8 @@ struct blkio_group {
	char path[128];
	/* The device MKDEV(major, minor), this group has been created for */
	dev_t dev;
	/* policy which owns this blk group */
	enum blkio_policy_id plid;

	/* Need to serialize the stats in the case of reset/update */
	spinlock_t stats_lock;
@@ -121,24 +160,60 @@ struct blkio_group {
struct blkio_policy_node {
	struct list_head node;
	dev_t dev;
	/* This node belongs to max bw policy or porportional weight policy */
	enum blkio_policy_id plid;
	/* cgroup file to which this rule belongs to */
	int fileid;

	union {
		unsigned int weight;
		/*
		 * Rate read/write in terms of byptes per second
		 * Whether this rate represents read or write is determined
		 * by file type "fileid".
		 */
		u64 bps;
		unsigned int iops;
	} val;
};

extern unsigned int blkcg_get_weight(struct blkio_cgroup *blkcg,
				     dev_t dev);
extern uint64_t blkcg_get_read_bps(struct blkio_cgroup *blkcg,
				     dev_t dev);
extern uint64_t blkcg_get_write_bps(struct blkio_cgroup *blkcg,
				     dev_t dev);
extern unsigned int blkcg_get_read_iops(struct blkio_cgroup *blkcg,
				     dev_t dev);
extern unsigned int blkcg_get_write_iops(struct blkio_cgroup *blkcg,
				     dev_t dev);

typedef void (blkio_unlink_group_fn) (void *key, struct blkio_group *blkg);
typedef void (blkio_update_group_weight_fn) (struct blkio_group *blkg,
						unsigned int weight);

typedef void (blkio_update_group_weight_fn) (void *key,
			struct blkio_group *blkg, unsigned int weight);
typedef void (blkio_update_group_read_bps_fn) (void * key,
			struct blkio_group *blkg, u64 read_bps);
typedef void (blkio_update_group_write_bps_fn) (void *key,
			struct blkio_group *blkg, u64 write_bps);
typedef void (blkio_update_group_read_iops_fn) (void *key,
			struct blkio_group *blkg, unsigned int read_iops);
typedef void (blkio_update_group_write_iops_fn) (void *key,
			struct blkio_group *blkg, unsigned int write_iops);

struct blkio_policy_ops {
	blkio_unlink_group_fn *blkio_unlink_group_fn;
	blkio_update_group_weight_fn *blkio_update_group_weight_fn;
	blkio_update_group_read_bps_fn *blkio_update_group_read_bps_fn;
	blkio_update_group_write_bps_fn *blkio_update_group_write_bps_fn;
	blkio_update_group_read_iops_fn *blkio_update_group_read_iops_fn;
	blkio_update_group_write_iops_fn *blkio_update_group_write_iops_fn;
};

struct blkio_policy_type {
	struct list_head list;
	struct blkio_policy_ops ops;
	enum blkio_policy_id plid;
};

/* Blkio controller policy registration */
@@ -212,7 +287,8 @@ static inline void blkiocg_set_start_empty_time(struct blkio_group *blkg) {}
extern struct blkio_cgroup blkio_root_cgroup;
extern struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup);
extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
			struct blkio_group *blkg, void *key, dev_t dev);
	struct blkio_group *blkg, void *key, dev_t dev,
	enum blkio_policy_id plid);
extern int blkiocg_del_blkio_group(struct blkio_group *blkg);
extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg,
						void *key);
@@ -234,7 +310,8 @@ static inline struct blkio_cgroup *
cgroup_to_blkio_cgroup(struct cgroup *cgroup) { return NULL; }

static inline void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg,
			struct blkio_group *blkg, void *key, dev_t dev) {}
		struct blkio_group *blkg, void *key, dev_t dev,
		enum blkio_policy_id plid) {}

static inline int
blkiocg_del_blkio_group(struct blkio_group *blkg) { return 0; }
Loading