Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 34e55232 authored by KAMEZAWA Hiroyuki's avatar KAMEZAWA Hiroyuki Committed by Linus Torvalds
Browse files

mm: avoid false sharing of mm_counter



Considering the nature of per mm stats, it's the shared object among
threads and can be a cache-miss point in the page fault path.

This patch adds per-thread cache for mm_counter.  RSS value will be
counted into a struct in task_struct and synchronized with mm's one at
events.

Now, in this patch, the event is the number of calls to handle_mm_fault.
Per-thread value is added to mm at each 64 calls.

 rough estimation with small benchmark on parallel thread (2threads) shows
 [before]
     4.5 cache-miss/faults
 [after]
     4.0 cache-miss/faults
 Anyway, the most contended object is mmap_sem if the number of threads grows.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent d559db08
Loading
Loading
Loading
Loading
+6 −0
Original line number Diff line number Diff line
@@ -188,6 +188,12 @@ memory usage. Its seven fields are explained in Table 1-3. The stat file
contains details information about the process itself.  Its fields are
explained in Table 1-4.

(for SMP CONFIG users)
For making accounting scalable, RSS related information are handled in
asynchronous manner and the vaule may not be very precise. To see a precise
snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
It's slow but very precise.

Table 1-2: Contents of the statm files (as of 2.6.30-rc7)
..............................................................................
 Field                       Content
+1 −0
Original line number Diff line number Diff line
@@ -718,6 +718,7 @@ static int exec_mmap(struct mm_struct *mm)
	/* Notify parent that we're no longer interested in the old VM */
	tsk = current;
	old_mm = current->mm;
	sync_mm_rss(tsk, old_mm);
	mm_release(tsk, old_mm);

	if (old_mm) {
+3 −5
Original line number Diff line number Diff line
@@ -873,7 +873,7 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
/*
 * per-process(per-mm_struct) statistics.
 */
#if USE_SPLIT_PTLOCKS
#if defined(SPLIT_RSS_COUNTING)
/*
 * The mm counters are not protected by its page_table_lock,
 * so must be incremented atomically.
@@ -883,10 +883,7 @@ static inline void set_mm_counter(struct mm_struct *mm, int member, long value)
	atomic_long_set(&mm->rss_stat.count[member], value);
}

static inline unsigned long get_mm_counter(struct mm_struct *mm, int member)
{
	return (unsigned long)atomic_long_read(&mm->rss_stat.count[member]);
}
unsigned long get_mm_counter(struct mm_struct *mm, int member);

static inline void add_mm_counter(struct mm_struct *mm, int member, long value)
{
@@ -974,6 +971,7 @@ static inline void setmax_mm_hiwater_rss(unsigned long *maxrss,
		*maxrss = hiwater_rss;
}

void sync_mm_rss(struct task_struct *task, struct mm_struct *mm);

/*
 * A callback you can register to apply pressure to ageable caches.
+6 −0
Original line number Diff line number Diff line
@@ -202,9 +202,15 @@ enum {
};

#if USE_SPLIT_PTLOCKS
#define SPLIT_RSS_COUNTING
struct mm_rss_stat {
	atomic_long_t count[NR_MM_COUNTERS];
};
/* per-thread cached information, */
struct task_rss_stat {
	int events;	/* for synchronization threshold */
	int count[NR_MM_COUNTERS];
};
#else  /* !USE_SPLIT_PTLOCKS */
struct mm_rss_stat {
	unsigned long count[NR_MM_COUNTERS];
+3 −1
Original line number Diff line number Diff line
@@ -1220,7 +1220,9 @@ struct task_struct {
	struct plist_node pushable_tasks;

	struct mm_struct *mm, *active_mm;

#if defined(SPLIT_RSS_COUNTING)
	struct task_rss_stat	rss_stat;
#endif
/* task state */
	int exit_state;
	int exit_code, exit_signal;
Loading