Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 95913d97 authored by Peter Zijlstra's avatar Peter Zijlstra Committed by Ingo Molnar
Browse files

sched/core: Fix TASK_DEAD race in finish_task_switch()



So the problem this patch is trying to address is as follows:

        CPU0                            CPU1

        context_switch(A, B)
                                        ttwu(A)
                                          LOCK A->pi_lock
                                          A->on_cpu == 0
        finish_task_switch(A)
          prev_state = A->state  <-.
          WMB                      |
          A->on_cpu = 0;           |
          UNLOCK rq0->lock         |
                                   |    context_switch(C, A)
                                   `--  A->state = TASK_DEAD
          prev_state == TASK_DEAD
            put_task_struct(A)
                                        context_switch(A, C)
                                        finish_task_switch(A)
                                          A->state == TASK_DEAD
                                            put_task_struct(A)

The argument being that the WMB will allow the load of A->state on CPU0
to cross over and observe CPU1's store of A->state, which will then
result in a double-drop and use-after-free.

Now the comment states (and this was true once upon a long time ago)
that we need to observe A->state while holding rq->lock because that
will order us against the wakeup; however the wakeup will not in fact
acquire (that) rq->lock; it takes A->pi_lock these days.

We can obviously fix this by upgrading the WMB to an MB, but that is
expensive, so we'd rather avoid that.

The alternative this patch takes is: smp_store_release(&A->on_cpu, 0),
which avoids the MB on some archs, but not important ones like ARM.

Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org> # v3.1+
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: manfred@colorfullife.com
Cc: will.deacon@arm.com
Fixes: e4a52bcb ("sched: Remove rq->lock from the first half of ttwu()")
Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net


Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parent 049e6dde
Loading
Loading
Loading
Loading
+5 −5
Original line number Original line Diff line number Diff line
@@ -2517,11 +2517,11 @@ static struct rq *finish_task_switch(struct task_struct *prev)
	 * If a task dies, then it sets TASK_DEAD in tsk->state and calls
	 * If a task dies, then it sets TASK_DEAD in tsk->state and calls
	 * schedule one last time. The schedule call will never return, and
	 * schedule one last time. The schedule call will never return, and
	 * the scheduled task must drop that reference.
	 * the scheduled task must drop that reference.
	 * The test for TASK_DEAD must occur while the runqueue locks are
	 *
	 * still held, otherwise prev could be scheduled on another cpu, die
	 * We must observe prev->state before clearing prev->on_cpu (in
	 * there before we look at prev->state, and then the reference would
	 * finish_lock_switch), otherwise a concurrent wakeup can get prev
	 * be dropped twice.
	 * running on another CPU and we could rave with its RUNNING -> DEAD
	 *		Manfred Spraul <manfred@colorfullife.com>
	 * transition, resulting in a double drop.
	 */
	 */
	prev_state = prev->state;
	prev_state = prev->state;
	vtime_task_switch(prev);
	vtime_task_switch(prev);
+3 −2
Original line number Original line Diff line number Diff line
@@ -1078,9 +1078,10 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
	 * We must ensure this doesn't happen until the switch is completely
	 * We must ensure this doesn't happen until the switch is completely
	 * finished.
	 * finished.
	 *
	 * Pairs with the control dependency and rmb in try_to_wake_up().
	 */
	 */
	smp_wmb();
	smp_store_release(&prev->on_cpu, 0);
	prev->on_cpu = 0;
#endif
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
#ifdef CONFIG_DEBUG_SPINLOCK
	/* this is a valid case when another task releases the spinlock */
	/* this is a valid case when another task releases the spinlock */