Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 0753ba01 authored by KOSAKI Motohiro's avatar KOSAKI Motohiro Committed by Linus Torvalds
Browse files

mm: revert "oom: move oom_adj value"



The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
the mm_struct.  It was a very good first step for sanitize OOM.

However Paul Menage reported the commit makes regression to his job
scheduler.  Current OOM logic can kill OOM_DISABLED process.

Why? His program has the code of similar to the following.

	...
	set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
	...
	if (vfork() == 0) {
		set_oom_adj(0); /* Invoked child can be killed */
		execve("foo-bar-cmd");
	}
	....

vfork() parent and child are shared the same mm_struct.  then above
set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
change oom_adj for vfork() parent.  Then, vfork() parent (job scheduler)
lost OOM immune and it was killed.

Actually, fork-setting-exec idiom is very frequently used in userland program.
We must not break this assumption.

Then, this patch revert commit 2ff05b2b and related commit.

Reverted commit list
---------------------
- commit 2ff05b2b (oom: move oom_adj value from task_struct to mm_struct)
- commit 4d8b9135 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
- commit 81236810 (oom: only oom kill exiting tasks with attached memory)
- commit 933b787b (mm: copy over oom_adj value at fork time)

Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 89a4eb4b
Loading
Loading
Loading
Loading
+5 −10
Original line number Original line Diff line number Diff line
@@ -1167,13 +1167,11 @@ CHAPTER 3: PER-PROCESS PARAMETERS
3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
3.1 /proc/<pid>/oom_adj - Adjust the oom-killer score
------------------------------------------------------
------------------------------------------------------


This file can be used to adjust the score used to select which processes should
This file can be used to adjust the score used to select which processes
be killed in an out-of-memory situation.  The oom_adj value is a characteristic
should be killed in an  out-of-memory  situation.  Giving it a high score will
of the task's mm, so all threads that share an mm with pid will have the same
increase the likelihood of this process being killed by the oom-killer.  Valid
oom_adj value.  A high value will increase the likelihood of this process being
values are in the range -16 to +15, plus the special value -17, which disables
killed by the oom-killer.  Valid values are in the range -16 to +15 as
oom-killing altogether for this process.
explained below and a special value of -17, which disables oom-killing
altogether for threads sharing pid's mm.


The process to be killed in an out-of-memory situation is selected among all others
The process to be killed in an out-of-memory situation is selected among all others
based on its badness score. This value equals the original memory size of the process
based on its badness score. This value equals the original memory size of the process
@@ -1187,9 +1185,6 @@ the parent's score if they do not share the same memory. Thus forking servers
are the prime candidates to be killed. Having only one 'hungry' child will make
are the prime candidates to be killed. Having only one 'hungry' child will make
parent less preferable than the child.
parent less preferable than the child.


/proc/<pid>/oom_adj cannot be changed for kthreads since they are immune from
oom-killing already.

/proc/<pid>/oom_score shows process' current badness score.
/proc/<pid>/oom_score shows process' current badness score.


The following heuristics are then applied:
The following heuristics are then applied:
+3 −16
Original line number Original line Diff line number Diff line
@@ -1003,12 +1003,7 @@ static ssize_t oom_adjust_read(struct file *file, char __user *buf,


	if (!task)
	if (!task)
		return -ESRCH;
		return -ESRCH;
	task_lock(task);
	oom_adjust = task->oomkilladj;
	if (task->mm)
		oom_adjust = task->mm->oom_adj;
	else
		oom_adjust = OOM_DISABLE;
	task_unlock(task);
	put_task_struct(task);
	put_task_struct(task);


	len = snprintf(buffer, sizeof(buffer), "%i\n", oom_adjust);
	len = snprintf(buffer, sizeof(buffer), "%i\n", oom_adjust);
@@ -1037,19 +1032,11 @@ static ssize_t oom_adjust_write(struct file *file, const char __user *buf,
	task = get_proc_task(file->f_path.dentry->d_inode);
	task = get_proc_task(file->f_path.dentry->d_inode);
	if (!task)
	if (!task)
		return -ESRCH;
		return -ESRCH;
	task_lock(task);
	if (oom_adjust < task->oomkilladj && !capable(CAP_SYS_RESOURCE)) {
	if (!task->mm) {
		task_unlock(task);
		put_task_struct(task);
		return -EINVAL;
	}
	if (oom_adjust < task->mm->oom_adj && !capable(CAP_SYS_RESOURCE)) {
		task_unlock(task);
		put_task_struct(task);
		put_task_struct(task);
		return -EACCES;
		return -EACCES;
	}
	}
	task->mm->oom_adj = oom_adjust;
	task->oomkilladj = oom_adjust;
	task_unlock(task);
	put_task_struct(task);
	put_task_struct(task);
	if (end - buffer == 0)
	if (end - buffer == 0)
		return -EIO;
		return -EIO;
+0 −2
Original line number Original line Diff line number Diff line
@@ -240,8 +240,6 @@ struct mm_struct {


	unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */
	unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */


	s8 oom_adj;	/* OOM kill score adjustment (bit shift) */

	cpumask_t cpu_vm_mask;
	cpumask_t cpu_vm_mask;


	/* Architecture-specific MM context */
	/* Architecture-specific MM context */
+1 −0
Original line number Original line Diff line number Diff line
@@ -1198,6 +1198,7 @@ struct task_struct {
	 * a short time
	 * a short time
	 */
	 */
	unsigned char fpu_counter;
	unsigned char fpu_counter;
	s8 oomkilladj; /* OOM kill score adjustment (bit shift). */
#ifdef CONFIG_BLK_DEV_IO_TRACE
#ifdef CONFIG_BLK_DEV_IO_TRACE
	unsigned int btrace_seq;
	unsigned int btrace_seq;
#endif
#endif
+0 −1
Original line number Original line Diff line number Diff line
@@ -426,7 +426,6 @@ static struct mm_struct * mm_init(struct mm_struct * mm, struct task_struct *p)
	init_rwsem(&mm->mmap_sem);
	init_rwsem(&mm->mmap_sem);
	INIT_LIST_HEAD(&mm->mmlist);
	INIT_LIST_HEAD(&mm->mmlist);
	mm->flags = (current->mm) ? current->mm->flags : default_dump_filter;
	mm->flags = (current->mm) ? current->mm->flags : default_dump_filter;
	mm->oom_adj = (current->mm) ? current->mm->oom_adj : 0;
	mm->core_state = NULL;
	mm->core_state = NULL;
	mm->nr_ptes = 0;
	mm->nr_ptes = 0;
	set_mm_counter(mm, file_rss, 0);
	set_mm_counter(mm, file_rss, 0);
Loading