Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit eb283428 authored by Lai Jiangshan's avatar Lai Jiangshan Committed by Tejun Heo
Browse files

workqueue: fix possible pool stall bug in wq_unbind_fn()



Since multiple pools per cpu have been introduced, wq_unbind_fn() has
a subtle bug which may theoretically stall work item processing.  The
problem is two-fold.

* wq_unbind_fn() depends on the worker executing wq_unbind_fn() itself
  to start unbound chain execution, which works fine when there was
  only single pool.  With multiple pools, only the pool which is
  running wq_unbind_fn() - the highpri one - is guaranteed to have
  such kick-off.  The other pool could stall when its busy workers
  block.

* The current code is setting WORKER_UNBIND / POOL_DISASSOCIATED of
  the two pools in succession without initiating work execution
  inbetween.  Because setting the flags requires grabbing assoc_mutex
  which is held while new workers are created, this could lead to
  stalls if a pool's manager is waiting for the previous pool's work
  items to release memory.  This is almost purely theoretical tho.

Update wq_unbind_fn() such that it sets WORKER_UNBIND /
POOL_DISASSOCIATED, goes over schedule() and explicitly kicks off
execution for a pool and then moves on to the next one.

tj: Updated comments and description.

Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: default avatarTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
parent 6dbe51c2
Loading
Loading
Loading
Loading
+25 −19
Original line number Original line Diff line number Diff line
@@ -3446,28 +3446,34 @@ static void wq_unbind_fn(struct work_struct *work)


		spin_unlock_irq(&pool->lock);
		spin_unlock_irq(&pool->lock);
		mutex_unlock(&pool->assoc_mutex);
		mutex_unlock(&pool->assoc_mutex);
	}


		/*
		/*
	 * Call schedule() so that we cross rq->lock and thus can guarantee
		 * Call schedule() so that we cross rq->lock and thus can
	 * sched callbacks see the %WORKER_UNBOUND flag.  This is necessary
		 * guarantee sched callbacks see the %WORKER_UNBOUND flag.
	 * as scheduler callbacks may be invoked from other cpus.
		 * This is necessary as scheduler callbacks may be invoked
		 * from other cpus.
		 */
		 */
		schedule();
		schedule();


		/*
		/*
	 * Sched callbacks are disabled now.  Zap nr_running.  After this,
		 * Sched callbacks are disabled now.  Zap nr_running.
	 * nr_running stays zero and need_more_worker() and keep_working()
		 * After this, nr_running stays zero and need_more_worker()
	 * are always true as long as the worklist is not empty.  Pools on
		 * and keep_working() are always true as long as the
	 * @cpu now behave as unbound (in terms of concurrency management)
		 * worklist is not empty.  This pool now behaves as an
	 * pools which are served by workers tied to the CPU.
		 * unbound (in terms of concurrency management) pool which
	 *
		 * are served by workers tied to the pool.
	 * On return from this function, the current worker would trigger
	 * unbound chain execution of pending work items if other workers
	 * didn't already.
		 */
		 */
	for_each_std_worker_pool(pool, cpu)
		atomic_set(&pool->nr_running, 0);
		atomic_set(&pool->nr_running, 0);

		/*
		 * With concurrency management just turned off, a busy
		 * worker blocking could lead to lengthy stalls.  Kick off
		 * unbound chain execution of currently pending work items.
		 */
		spin_lock_irq(&pool->lock);
		wake_up_worker(pool);
		spin_unlock_irq(&pool->lock);
	}
}
}


/*
/*