Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 0ff38490 authored by Christoph Lameter's avatar Christoph Lameter Committed by Linus Torvalds
Browse files

[PATCH] zone_reclaim: dynamic slab reclaim



Currently one can enable slab reclaim by setting an explicit option in
/proc/sys/vm/zone_reclaim_mode.  Slab reclaim is then used as a final
option if the freeing of unmapped file backed pages is not enough to free
enough pages to allow a local allocation.

However, that means that the slab can grow excessively and that most memory
of a node may be used by slabs.  We have had a case where a machine with
46GB of memory was using 40-42GB for slab.  Zone reclaim was effective in
dealing with pagecache pages.  However, slab reclaim was only done during
global reclaim (which is a bit rare on NUMA systems).

This patch implements slab reclaim during zone reclaim.  Zone reclaim
occurs if there is a danger of an off node allocation.  At that point we

1. Shrink the per node page cache if the number of pagecache
   pages is more than min_unmapped_ratio percent of pages in a zone.

2. Shrink the slab cache if the number of the nodes reclaimable slab pages
   (patch depends on earlier one that implements that counter)
   are more than min_slab_ratio (a new /proc/sys/vm tunable).

The shrinking of the slab cache is a bit problematic since it is not node
specific.  So we simply calculate what point in the slab we want to reach
(current per node slab use minus the number of pages that neeed to be
allocated) and then repeately run the global reclaim until that is
unsuccessful or we have reached the limit.  I hope we will have zone based
slab reclaim at some point which will make that easier.

The default for the min_slab_ratio is 5%

Also remove the slab option from /proc/sys/vm/zone_reclaim_mode.

[akpm@osdl.org: cleanups]
Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent 972d1a7b
Loading
Loading
Loading
Loading
+20 −7
Original line number Diff line number Diff line
@@ -29,6 +29,7 @@ Currently, these files are in /proc/sys/vm:
- drop-caches
- zone_reclaim_mode
- min_unmapped_ratio
- min_slab_ratio
- panic_on_oom

==============================================================
@@ -138,7 +139,6 @@ This is value ORed together of
1	= Zone reclaim on
2	= Zone reclaim writes dirty pages out
4	= Zone reclaim swaps pages
8	= Also do a global slab reclaim pass

zone_reclaim_mode is set during bootup to 1 if it is determined that pages
from remote zones will cause a measurable performance reduction. The
@@ -162,18 +162,13 @@ Allowing regular swap effectively restricts allocations to the local
node unless explicitly overridden by memory policies or cpuset
configurations.

It may be advisable to allow slab reclaim if the system makes heavy
use of files and builds up large slab caches. However, the slab
shrink operation is global, may take a long time and free slabs
in all nodes of the system.

=============================================================

min_unmapped_ratio:

This is available only on NUMA kernels.

A percentage of the file backed pages in each zone.  Zone reclaim will only
A percentage of the total pages in each zone.  Zone reclaim will only
occur if more than this percentage of pages are file backed and unmapped.
This is to insure that a minimal amount of local pages is still available for
file I/O even if the node is overallocated.
@@ -182,6 +177,24 @@ The default is 1 percent.

=============================================================

min_slab_ratio:

This is available only on NUMA kernels.

A percentage of the total pages in each zone.  On Zone reclaim
(fallback from the local zone occurs) slabs will be reclaimed if more
than this percentage of pages in a zone are reclaimable slab pages.
This insures that the slab growth stays under control even in NUMA
systems that rarely perform global reclaim.

The default is 5 percent.

Note that slab reclaim is triggered in a per zone / node fashion.
The process of reclaiming slab memory is currently not node specific
and may not be fast.

=============================================================

panic_on_oom

This enables or disables panic on out-of-memory feature.  If this is set to 1,
+3 −0
Original line number Diff line number Diff line
@@ -171,6 +171,7 @@ struct zone {
	 * zone reclaim becomes active if more unmapped pages exist.
	 */
	unsigned long		min_unmapped_pages;
	unsigned long		min_slab_pages;
	struct per_cpu_pageset	*pageset[NR_CPUS];
#else
	struct per_cpu_pageset	pageset[NR_CPUS];
@@ -448,6 +449,8 @@ int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int, struct file
					void __user *, size_t *, loff_t *);
int sysctl_min_unmapped_ratio_sysctl_handler(struct ctl_table *, int,
			struct file *, void __user *, size_t *, loff_t *);
int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *, int,
			struct file *, void __user *, size_t *, loff_t *);

#include <linux/topology.h>
/* Returns the number of the current Node. */
+1 −0
Original line number Diff line number Diff line
@@ -193,6 +193,7 @@ extern long vm_total_pages;
#ifdef CONFIG_NUMA
extern int zone_reclaim_mode;
extern int sysctl_min_unmapped_ratio;
extern int sysctl_min_slab_ratio;
extern int zone_reclaim(struct zone *, gfp_t, unsigned int);
#else
#define zone_reclaim_mode 0
+1 −0
Original line number Diff line number Diff line
@@ -191,6 +191,7 @@ enum
	VM_MIN_UNMAPPED=32,	/* Set min percent of unmapped pages */
	VM_PANIC_ON_OOM=33,	/* panic at out-of-memory */
	VM_VDSO_ENABLED=34,	/* map VDSO into new processes? */
	VM_MIN_SLAB=35,		 /* Percent pages ignored by zone reclaim */
};


+11 −0
Original line number Diff line number Diff line
@@ -943,6 +943,17 @@ static ctl_table vm_table[] = {
		.extra1		= &zero,
		.extra2		= &one_hundred,
	},
	{
		.ctl_name	= VM_MIN_SLAB,
		.procname	= "min_slab_ratio",
		.data		= &sysctl_min_slab_ratio,
		.maxlen		= sizeof(sysctl_min_slab_ratio),
		.mode		= 0644,
		.proc_handler	= &sysctl_min_slab_ratio_sysctl_handler,
		.strategy	= &sysctl_intvec,
		.extra1		= &zero,
		.extra2		= &one_hundred,
	},
#endif
#ifdef CONFIG_X86_32
	{
Loading