Commit b75a2bf8 authored Apr 27, 2016 by Linus Torvalds

Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue fix from Tejun Heo:
 "So, it turns out we had a silly bug in the most fundamental part of
  workqueue for a very long time.  AFAICS, this dates back to pre-git
  era and has quite likely been there from the time workqueue was first
  introduced.

  A work item uses its PENDING bit to synchronize multiple queuers.
  Anyone who wins the PENDING bit owns the pending state of the work
  item.  Whether a queuer wins or loses the race, one thing should be
  guaranteed - there will soon be at least one execution of the work
  item - where "after" means that the execution instance would be able
  to see all the changes that the queuer has made prior to the queueing
  attempt.

  Unfortunately, we were missing a smp_mb() after clearing PENDING for
  execution, so nothing guaranteed visibility of the changes that a
  queueing loser has made, which manifested as a reproducible blk-mq
  stall.

  Lots of kudos to Roman for debugging the problem.  The patch for
  -stable is the minimal one.  For v3.7, Peter is working on a patch to
  make the code path slightly more efficient and less fragile"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix ghost PENDING flag while doing MQ IO

parents 763cfc86 346c09f8

kernel/workqueue.c

+29 −0

Original line number	Diff line number	Diff line
		@@ -666,6 +666,35 @@ static void set_work_pool_and_clear_pending(struct work_struct *work,
		*/
		smp_wmb();
		set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0);
		/*
		* The following mb guarantees that previous clear of a PENDING bit
		* will not be reordered with any speculative LOADS or STORES from
		* work->current_func, which is executed afterwards. This possible
		* reordering can lead to a missed execution on attempt to qeueue
		* the same @work. E.g. consider this case:
		*
		* CPU#0 CPU#1
		* ---------------------------- --------------------------------
		*
		* 1 STORE event_indicated
		* 2 queue_work_on() {
		* 3 test_and_set_bit(PENDING)
		* 4 } set_..._and_clear_pending() {
		* 5 set_work_data() # clear bit
		* 6 smp_mb()
		* 7 work->current_func() {
		* 8 LOAD event_indicated
		* }
		*
		* Without an explicit full barrier speculative LOAD on line 8 can
		* be executed before CPU#0 does STORE on line 1. If that happens,
		* CPU#0 observes the PENDING bit is still set and new execution of
		* a @work is not queued in a hope, that CPU#1 will eventually
		* finish the queued @work. Meanwhile CPU#1 does not see
		* event_indicated is set, because speculative LOAD was executed
		* before actual STORE.
		*/
		smp_mb();
		}

		static void clear_work_data(struct work_struct *work)