Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 753ee728 authored by Martin Hicks's avatar Martin Hicks Committed by Linus Torvalds
Browse files

[PATCH] VM: early zone reclaim

This is the core of the (much simplified) early reclaim.  The goal of this
patch is to reclaim some easily-freed pages from a zone before falling back
onto another zone.

One of the major uses of this is NUMA machines.  With the default allocator
behavior the allocator would look for memory in another zone, which might be
off-node, before trying to reclaim from the current zone.

This adds a zone tuneable to enable early zone reclaim.  It is selected on a
per-zone basis and is turned on/off via syscall.

Adding some extra throttling on the reclaim was also required (patch
4/4).  Without the machine would grind to a crawl when doing a "make -j"
kernel build.  Even with this patch the System Time is higher on
average, but it seems tolerable.  Here are some numbers for kernbench
runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:

			wall  user   sys   %cpu  ctx sw.  sleeps
			----  ----   ---   ----   ------  ------
No patch		1009  1384   847   258   298170   504402
w/patch, no reclaim     880   1376   667   288   254064   396745
w/patch & reclaim       1079  1385   926   252   291625   548873

These numbers are the average of 2 runs of 3 "make -j" runs done right
after system boot.  Run-to-run variability for "make -j" is huge, so
these numbers aren't terribly useful except to seee that with reclaim
the benchmark still finishes in a reasonable amount of time.

I also looked at the NUMA hit/miss stats for the "make -j" runs and the
reclaim doesn't make any difference when the machine is thrashing away.

Doing a "make -j8" on a single node that is filled with page cache pages
takes 700 seconds with reclaim turned on and 735 seconds without reclaim
(due to remote memory accesses).

The simple zone_reclaim syscall program is at
http://www.bork.org/~mort/sgi/zone_reclaim.c



Signed-off-by: default avatarMartin Hicks <mort@sgi.com>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent bfbb38fb
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -251,7 +251,7 @@ ENTRY(sys_call_table)
	.long sys_io_submit
	.long sys_io_cancel
	.long sys_fadvise64	/* 250 */
	.long sys_ni_syscall
	.long sys_set_zone_reclaim
	.long sys_exit_group
	.long sys_lookup_dcookie
	.long sys_epoll_create
+1 −1
Original line number Diff line number Diff line
@@ -1579,7 +1579,7 @@ sys_call_table:
	data8 sys_keyctl
	data8 sys_ni_syscall
	data8 sys_ni_syscall			// 1275
	data8 sys_ni_syscall
	data8 sys_set_zone_reclaim
	data8 sys_ni_syscall
	data8 sys_ni_syscall
	data8 sys_ni_syscall
+1 −1
Original line number Diff line number Diff line
@@ -256,7 +256,7 @@
#define __NR_io_submit		248
#define __NR_io_cancel		249
#define __NR_fadvise64		250

#define __NR_set_zone_reclaim	251
#define __NR_exit_group		252
#define __NR_lookup_dcookie	253
#define __NR_epoll_create	254
+1 −0
Original line number Diff line number Diff line
@@ -263,6 +263,7 @@
#define __NR_add_key			1271
#define __NR_request_key		1272
#define __NR_keyctl			1273
#define __NR_set_zone_reclaim		1276

#ifdef __KERNEL__

+6 −0
Original line number Diff line number Diff line
@@ -144,6 +144,12 @@ struct zone {
	unsigned long		pages_scanned;	   /* since last reclaim */
	int			all_unreclaimable; /* All pages pinned */

	/*
	 * Does the allocator try to reclaim pages from the zone as soon
	 * as it fails a watermark_ok() in __alloc_pages?
	 */
	int			reclaim_pages;

	/*
	 * prev_priority holds the scanning priority for this zone.  It is
	 * defined as the scanning priority at which we achieved our reclaim
Loading