Loading Documentation/arm64/sve.txt +16 −0 Original line number Diff line number Diff line Loading @@ -56,6 +56,18 @@ model features for SVE is included in Appendix A. is to connect to a target process first and then attempt a ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory between userspace and the kernel, the register value is encoded in memory in an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at byte offset i from the start of the memory representation. This affects for example the signal frame (struct sve_context) and ptrace interface (struct user_sve_header) and associated data. Beware that on big-endian systems this results in a different byte order than for the FPSIMD V-registers, which are stored as single host-endian 128-bit values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at byte offset i. (struct fpsimd_context, struct user_fpsimd_state). 2. Vector length terminology ----------------------------- Loading Loading @@ -124,6 +136,10 @@ the SVE instruction set architecture. size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to the members. * Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the start of the register's representation in memory. * If the SVE context is too big to fit in sigcontext.__reserved[], then extra space is allocated on the stack, an extra_context record is written in __reserved[] referencing this space. sve_context is then written in the Loading Documentation/block/switching-sched.txt +8 −10 Original line number Diff line number Diff line Loading @@ -13,11 +13,9 @@ you can do so by typing: # mount none /sys -t sysfs As of the Linux 2.6.10 kernel, it is now possible to change the IO scheduler for a given block device on the fly (thus making it possible, for instance, to set the CFQ scheduler for the system default, but set a specific device to use the deadline or noop schedulers - which can improve that device's throughput). It is possible to change the IO scheduler for a given block device on the fly to select one of mq-deadline, none, bfq, or kyber schedulers - which can improve that device's throughput. To set a specific scheduler, simply do this: Loading @@ -30,8 +28,8 @@ The list of defined schedulers can be found by simply doing a "cat /sys/block/DEV/queue/scheduler" - the list of valid names will be displayed, with the currently selected scheduler in brackets: # cat /sys/block/hda/queue/scheduler noop deadline [cfq] # echo deadline > /sys/block/hda/queue/scheduler # cat /sys/block/hda/queue/scheduler noop [deadline] cfq # cat /sys/block/sda/queue/scheduler [mq-deadline] kyber bfq none # echo none >/sys/block/sda/queue/scheduler # cat /sys/block/sda/queue/scheduler [none] mq-deadline kyber bfq Documentation/cgroup-v1/blkio-controller.txt +7 −89 Original line number Diff line number Diff line Loading @@ -8,61 +8,13 @@ both at leaf nodes as well as at intermediate nodes in a storage hierarchy. Plan is to use the same cgroup based management interface for blkio controller and based on user options switch IO policies in the background. Currently two IO control policies are implemented. First one is proportional weight time based division of disk policy. It is implemented in CFQ. Hence this policy takes effect only on leaf nodes when CFQ is being used. The second one is throttling policy which can be used to specify upper IO rate limits on devices. This policy is implemented in generic block layer and can be used on leaf nodes as well as higher level logical devices like device mapper. One IO control policy is throttling policy which can be used to specify upper IO rate limits on devices. This policy is implemented in generic block layer and can be used on leaf nodes as well as higher level logical devices like device mapper. HOWTO ===== Proportional Weight division of bandwidth ----------------------------------------- You can do a very simple testing of running two dd threads in two different cgroups. Here is what you can do. - Enable Block IO controller CONFIG_BLK_CGROUP=y - Enable group scheduling in CFQ CONFIG_CFQ_GROUP_IOSCHED=y - Compile and boot into kernel and mount IO controller (blkio); see cgroups.txt, Why are cgroups needed?. mount -t tmpfs cgroup_root /sys/fs/cgroup mkdir /sys/fs/cgroup/blkio mount -t cgroup -o blkio none /sys/fs/cgroup/blkio - Create two cgroups mkdir -p /sys/fs/cgroup/blkio/test1/ /sys/fs/cgroup/blkio/test2 - Set weights of group test1 and test2 echo 1000 > /sys/fs/cgroup/blkio/test1/blkio.weight echo 500 > /sys/fs/cgroup/blkio/test2/blkio.weight - Create two same size files (say 512MB each) on same disk (file1, file2) and launch two dd threads in different cgroup to read those files. sync echo 3 > /proc/sys/vm/drop_caches dd if=/mnt/sdb/zerofile1 of=/dev/null & echo $! > /sys/fs/cgroup/blkio/test1/tasks cat /sys/fs/cgroup/blkio/test1/tasks dd if=/mnt/sdb/zerofile2 of=/dev/null & echo $! > /sys/fs/cgroup/blkio/test2/tasks cat /sys/fs/cgroup/blkio/test2/tasks - At macro level, first dd should finish first. To get more precise data, keep on looking at (with the help of script), at blkio.disk_time and blkio.disk_sectors files of both test1 and test2 groups. This will tell how much disk time (in milliseconds), each group got and how many sectors each group dispatched to the disk. We provide fairness in terms of disk time, so ideally io.disk_time of cgroups should be in proportion to the weight. Throttling/Upper Limit policy ----------------------------- - Enable Block IO controller Loading Loading @@ -94,7 +46,7 @@ Throttling/Upper Limit policy Hierarchical Cgroups ==================== Both CFQ and throttling implement hierarchy support; however, Throttling implements hierarchy support; however, throttling's hierarchy support is enabled iff "sane_behavior" is enabled from cgroup side, which currently is a development option and not publicly available. Loading @@ -107,9 +59,8 @@ If somebody created a hierarchy like as follows. | test3 CFQ by default and throttling with "sane_behavior" will handle the hierarchy correctly. For details on CFQ hierarchy support, refer to Documentation/block/cfq-iosched.txt. For throttling, all limits apply Throttling with "sane_behavior" will handle the hierarchy correctly. For throttling, all limits apply to the whole subtree while all statistics are local to the IOs directly generated by tasks in that cgroup. Loading @@ -130,10 +81,6 @@ CONFIG_DEBUG_BLK_CGROUP - Debug help. Right now some additional stats file show up in cgroup if this option is enabled. CONFIG_CFQ_GROUP_IOSCHED - Enables group scheduling in CFQ. Currently only 1 level of group creation is allowed. CONFIG_BLK_DEV_THROTTLING - Enable block device throttling support in block layer. Loading Loading @@ -344,32 +291,3 @@ Common files among various policies - blkio.reset_stats - Writing an int to this file will result in resetting all the stats for that cgroup. CFQ sysfs tunable ================= /sys/block/<disk>/queue/iosched/slice_idle ------------------------------------------ On a faster hardware CFQ can be slow, especially with sequential workload. This happens because CFQ idles on a single queue and single queue might not drive deeper request queue depths to keep the storage busy. In such scenarios one can try setting slice_idle=0 and that would switch CFQ to IOPS (IO operations per second) mode on NCQ supporting hardware. That means CFQ will not idle between cfq queues of a cfq group and hence be able to driver higher queue depth and achieve better throughput. That also means that cfq provides fairness among groups in terms of IOPS and not in terms of disk time. /sys/block/<disk>/queue/iosched/group_idle ------------------------------------------ If one disables idling on individual cfq queues and cfq service trees by setting slice_idle=0, group_idle kicks in. That means CFQ will still idle on the group in an attempt to provide fairness among groups. By default group_idle is same as slice_idle and does not do anything if slice_idle is enabled. One can experience an overall throughput drop if you have created multiple groups and put applications in that group which are not driving enough IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle on individual groups and throughput should improve. Documentation/cgroup-v1/hugetlb.txt +13 −9 Original line number Diff line number Diff line Loading @@ -32,14 +32,18 @@ Brief summary of control files hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB limit For a system supporting two hugepage size (16M and 16G) the control For a system supporting three hugepage sizes (64k, 32M and 1G), the control files include: hugetlb.16GB.limit_in_bytes hugetlb.16GB.max_usage_in_bytes hugetlb.16GB.usage_in_bytes hugetlb.16GB.failcnt hugetlb.16MB.limit_in_bytes hugetlb.16MB.max_usage_in_bytes hugetlb.16MB.usage_in_bytes hugetlb.16MB.failcnt hugetlb.1GB.limit_in_bytes hugetlb.1GB.max_usage_in_bytes hugetlb.1GB.usage_in_bytes hugetlb.1GB.failcnt hugetlb.64KB.limit_in_bytes hugetlb.64KB.max_usage_in_bytes hugetlb.64KB.usage_in_bytes hugetlb.64KB.failcnt hugetlb.32MB.limit_in_bytes hugetlb.32MB.max_usage_in_bytes hugetlb.32MB.usage_in_bytes hugetlb.32MB.failcnt Makefile +1 −1 Original line number Diff line number Diff line Loading @@ -2,7 +2,7 @@ VERSION = 5 PATCHLEVEL = 2 SUBLEVEL = 0 EXTRAVERSION = -rc4 EXTRAVERSION = -rc5 NAME = Golden Lions # *DOCUMENTATION* Loading Loading
Documentation/arm64/sve.txt +16 −0 Original line number Diff line number Diff line Loading @@ -56,6 +56,18 @@ model features for SVE is included in Appendix A. is to connect to a target process first and then attempt a ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory between userspace and the kernel, the register value is encoded in memory in an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at byte offset i from the start of the memory representation. This affects for example the signal frame (struct sve_context) and ptrace interface (struct user_sve_header) and associated data. Beware that on big-endian systems this results in a different byte order than for the FPSIMD V-registers, which are stored as single host-endian 128-bit values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at byte offset i. (struct fpsimd_context, struct user_fpsimd_state). 2. Vector length terminology ----------------------------- Loading Loading @@ -124,6 +136,10 @@ the SVE instruction set architecture. size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to the members. * Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the start of the register's representation in memory. * If the SVE context is too big to fit in sigcontext.__reserved[], then extra space is allocated on the stack, an extra_context record is written in __reserved[] referencing this space. sve_context is then written in the Loading
Documentation/block/switching-sched.txt +8 −10 Original line number Diff line number Diff line Loading @@ -13,11 +13,9 @@ you can do so by typing: # mount none /sys -t sysfs As of the Linux 2.6.10 kernel, it is now possible to change the IO scheduler for a given block device on the fly (thus making it possible, for instance, to set the CFQ scheduler for the system default, but set a specific device to use the deadline or noop schedulers - which can improve that device's throughput). It is possible to change the IO scheduler for a given block device on the fly to select one of mq-deadline, none, bfq, or kyber schedulers - which can improve that device's throughput. To set a specific scheduler, simply do this: Loading @@ -30,8 +28,8 @@ The list of defined schedulers can be found by simply doing a "cat /sys/block/DEV/queue/scheduler" - the list of valid names will be displayed, with the currently selected scheduler in brackets: # cat /sys/block/hda/queue/scheduler noop deadline [cfq] # echo deadline > /sys/block/hda/queue/scheduler # cat /sys/block/hda/queue/scheduler noop [deadline] cfq # cat /sys/block/sda/queue/scheduler [mq-deadline] kyber bfq none # echo none >/sys/block/sda/queue/scheduler # cat /sys/block/sda/queue/scheduler [none] mq-deadline kyber bfq
Documentation/cgroup-v1/blkio-controller.txt +7 −89 Original line number Diff line number Diff line Loading @@ -8,61 +8,13 @@ both at leaf nodes as well as at intermediate nodes in a storage hierarchy. Plan is to use the same cgroup based management interface for blkio controller and based on user options switch IO policies in the background. Currently two IO control policies are implemented. First one is proportional weight time based division of disk policy. It is implemented in CFQ. Hence this policy takes effect only on leaf nodes when CFQ is being used. The second one is throttling policy which can be used to specify upper IO rate limits on devices. This policy is implemented in generic block layer and can be used on leaf nodes as well as higher level logical devices like device mapper. One IO control policy is throttling policy which can be used to specify upper IO rate limits on devices. This policy is implemented in generic block layer and can be used on leaf nodes as well as higher level logical devices like device mapper. HOWTO ===== Proportional Weight division of bandwidth ----------------------------------------- You can do a very simple testing of running two dd threads in two different cgroups. Here is what you can do. - Enable Block IO controller CONFIG_BLK_CGROUP=y - Enable group scheduling in CFQ CONFIG_CFQ_GROUP_IOSCHED=y - Compile and boot into kernel and mount IO controller (blkio); see cgroups.txt, Why are cgroups needed?. mount -t tmpfs cgroup_root /sys/fs/cgroup mkdir /sys/fs/cgroup/blkio mount -t cgroup -o blkio none /sys/fs/cgroup/blkio - Create two cgroups mkdir -p /sys/fs/cgroup/blkio/test1/ /sys/fs/cgroup/blkio/test2 - Set weights of group test1 and test2 echo 1000 > /sys/fs/cgroup/blkio/test1/blkio.weight echo 500 > /sys/fs/cgroup/blkio/test2/blkio.weight - Create two same size files (say 512MB each) on same disk (file1, file2) and launch two dd threads in different cgroup to read those files. sync echo 3 > /proc/sys/vm/drop_caches dd if=/mnt/sdb/zerofile1 of=/dev/null & echo $! > /sys/fs/cgroup/blkio/test1/tasks cat /sys/fs/cgroup/blkio/test1/tasks dd if=/mnt/sdb/zerofile2 of=/dev/null & echo $! > /sys/fs/cgroup/blkio/test2/tasks cat /sys/fs/cgroup/blkio/test2/tasks - At macro level, first dd should finish first. To get more precise data, keep on looking at (with the help of script), at blkio.disk_time and blkio.disk_sectors files of both test1 and test2 groups. This will tell how much disk time (in milliseconds), each group got and how many sectors each group dispatched to the disk. We provide fairness in terms of disk time, so ideally io.disk_time of cgroups should be in proportion to the weight. Throttling/Upper Limit policy ----------------------------- - Enable Block IO controller Loading Loading @@ -94,7 +46,7 @@ Throttling/Upper Limit policy Hierarchical Cgroups ==================== Both CFQ and throttling implement hierarchy support; however, Throttling implements hierarchy support; however, throttling's hierarchy support is enabled iff "sane_behavior" is enabled from cgroup side, which currently is a development option and not publicly available. Loading @@ -107,9 +59,8 @@ If somebody created a hierarchy like as follows. | test3 CFQ by default and throttling with "sane_behavior" will handle the hierarchy correctly. For details on CFQ hierarchy support, refer to Documentation/block/cfq-iosched.txt. For throttling, all limits apply Throttling with "sane_behavior" will handle the hierarchy correctly. For throttling, all limits apply to the whole subtree while all statistics are local to the IOs directly generated by tasks in that cgroup. Loading @@ -130,10 +81,6 @@ CONFIG_DEBUG_BLK_CGROUP - Debug help. Right now some additional stats file show up in cgroup if this option is enabled. CONFIG_CFQ_GROUP_IOSCHED - Enables group scheduling in CFQ. Currently only 1 level of group creation is allowed. CONFIG_BLK_DEV_THROTTLING - Enable block device throttling support in block layer. Loading Loading @@ -344,32 +291,3 @@ Common files among various policies - blkio.reset_stats - Writing an int to this file will result in resetting all the stats for that cgroup. CFQ sysfs tunable ================= /sys/block/<disk>/queue/iosched/slice_idle ------------------------------------------ On a faster hardware CFQ can be slow, especially with sequential workload. This happens because CFQ idles on a single queue and single queue might not drive deeper request queue depths to keep the storage busy. In such scenarios one can try setting slice_idle=0 and that would switch CFQ to IOPS (IO operations per second) mode on NCQ supporting hardware. That means CFQ will not idle between cfq queues of a cfq group and hence be able to driver higher queue depth and achieve better throughput. That also means that cfq provides fairness among groups in terms of IOPS and not in terms of disk time. /sys/block/<disk>/queue/iosched/group_idle ------------------------------------------ If one disables idling on individual cfq queues and cfq service trees by setting slice_idle=0, group_idle kicks in. That means CFQ will still idle on the group in an attempt to provide fairness among groups. By default group_idle is same as slice_idle and does not do anything if slice_idle is enabled. One can experience an overall throughput drop if you have created multiple groups and put applications in that group which are not driving enough IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle on individual groups and throughput should improve.
Documentation/cgroup-v1/hugetlb.txt +13 −9 Original line number Diff line number Diff line Loading @@ -32,14 +32,18 @@ Brief summary of control files hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB limit For a system supporting two hugepage size (16M and 16G) the control For a system supporting three hugepage sizes (64k, 32M and 1G), the control files include: hugetlb.16GB.limit_in_bytes hugetlb.16GB.max_usage_in_bytes hugetlb.16GB.usage_in_bytes hugetlb.16GB.failcnt hugetlb.16MB.limit_in_bytes hugetlb.16MB.max_usage_in_bytes hugetlb.16MB.usage_in_bytes hugetlb.16MB.failcnt hugetlb.1GB.limit_in_bytes hugetlb.1GB.max_usage_in_bytes hugetlb.1GB.usage_in_bytes hugetlb.1GB.failcnt hugetlb.64KB.limit_in_bytes hugetlb.64KB.max_usage_in_bytes hugetlb.64KB.usage_in_bytes hugetlb.64KB.failcnt hugetlb.32MB.limit_in_bytes hugetlb.32MB.max_usage_in_bytes hugetlb.32MB.usage_in_bytes hugetlb.32MB.failcnt
Makefile +1 −1 Original line number Diff line number Diff line Loading @@ -2,7 +2,7 @@ VERSION = 5 PATCHLEVEL = 2 SUBLEVEL = 0 EXTRAVERSION = -rc4 EXTRAVERSION = -rc5 NAME = Golden Lions # *DOCUMENTATION* Loading