Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 475c5ee1 authored by Ingo Molnar's avatar Ingo Molnar
Browse files

Merge branch 'for-mingo' of...

Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu

 into core/rcu

Pull RCU updates from Paul E. McKenney:

- Updates to use cond_resched() instead of cond_resched_rcu_qs()
  where feasible (currently everywhere except in kernel/rcu and
  in kernel/torture.c).  Also a couple of fixes to avoid sending
  IPIs to offline CPUs.

- Updates to simplify RCU's dyntick-idle handling.

- Updates to remove almost all uses of smp_read_barrier_depends()
  and read_barrier_depends().

- Miscellaneous fixes.

- Torture-test updates.

Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parents 30a7acd5 1dfa55e0
Loading
Loading
Loading
Loading
+34 −15
Original line number Diff line number Diff line
@@ -1097,7 +1097,8 @@ will cause the CPU to disregard the values of its counters on
its next exit from idle.
Finally, the <tt>rcu_qs_ctr_snap</tt> field is used to detect
cases where a given operation has resulted in a quiescent state
for all flavors of RCU, for example, <tt>cond_resched_rcu_qs()</tt>.
for all flavors of RCU, for example, <tt>cond_resched()</tt>
when RCU has indicated a need for quiescent states.

<h5>RCU Callback Handling</h5>

@@ -1182,8 +1183,8 @@ CPU (and from tracing) unless otherwise stated.
Its fields are as follows:

<pre>
  1   int dynticks_nesting;
  2   int dynticks_nmi_nesting;
  1   long dynticks_nesting;
  2   long dynticks_nmi_nesting;
  3   atomic_t dynticks;
  4   bool rcu_need_heavy_qs;
  5   unsigned long rcu_qs_ctr;
@@ -1191,15 +1192,31 @@ Its fields are as follows:
</pre>

<p>The <tt>-&gt;dynticks_nesting</tt> field counts the
nesting depth of normal interrupts.
In addition, this counter is incremented when exiting dyntick-idle
mode and decremented when entering it.
nesting depth of process execution, so that in normal circumstances
this counter has value zero or one.
NMIs, irqs, and tracers are counted by the <tt>-&gt;dynticks_nmi_nesting</tt>
field.
Because NMIs cannot be masked, changes to this variable have to be
undertaken carefully using an algorithm provided by Andy Lutomirski.
The initial transition from idle adds one, and nested transitions
add two, so that a nesting level of five is represented by a
<tt>-&gt;dynticks_nmi_nesting</tt> value of nine.
This counter can therefore be thought of as counting the number
of reasons why this CPU cannot be permitted to enter dyntick-idle
mode, aside from non-maskable interrupts (NMIs).
NMIs are counted by the <tt>-&gt;dynticks_nmi_nesting</tt>
field, except that NMIs that interrupt non-dyntick-idle execution
are not counted.
mode, aside from process-level transitions.

<p>However, it turns out that when running in non-idle kernel context,
the Linux kernel is fully capable of entering interrupt handlers that
never exit and perhaps also vice versa.
Therefore, whenever the <tt>-&gt;dynticks_nesting</tt> field is
incremented up from zero, the <tt>-&gt;dynticks_nmi_nesting</tt> field
is set to a large positive number, and whenever the
<tt>-&gt;dynticks_nesting</tt> field is decremented down to zero,
the the <tt>-&gt;dynticks_nmi_nesting</tt> field is set to zero.
Assuming that the number of misnested interrupts is not sufficient
to overflow the counter, this approach corrects the
<tt>-&gt;dynticks_nmi_nesting</tt> field every time the corresponding
CPU enters the idle loop from process context.

</p><p>The <tt>-&gt;dynticks</tt> field counts the corresponding
CPU's transitions to and from dyntick-idle mode, so that this counter
@@ -1231,14 +1248,16 @@ in response.
<tr><th>&nbsp;</th></tr>
<tr><th align="left">Quick Quiz:</th></tr>
<tr><td>
	Why not just count all NMIs?
	Wouldn't that be simpler and less error prone?
	Why not simply combine the <tt>-&gt;dynticks_nesting</tt>
	and <tt>-&gt;dynticks_nmi_nesting</tt> counters into a
	single counter that just counts the number of reasons that
	the corresponding CPU is non-idle?
</td></tr>
<tr><th align="left">Answer:</th></tr>
<tr><td bgcolor="#ffffff"><font color="ffffff">
	It seems simpler only until you think hard about how to go about
	updating the <tt>rcu_dynticks</tt> structure's
	<tt>-&gt;dynticks</tt> field.
	Because this would fail in the presence of interrupts whose
	handlers never return and of handlers that manage to return
	from a made-up interrupt.
</font></td></tr>
<tr><td>&nbsp;</td></tr>
</table>
+4 −3
Original line number Diff line number Diff line
@@ -581,7 +581,8 @@ This guarantee was only partially premeditated.
DYNIX/ptx used an explicit memory barrier for publication, but had nothing
resembling <tt>rcu_dereference()</tt> for subscription, nor did it
have anything resembling the <tt>smp_read_barrier_depends()</tt>
that was later subsumed into <tt>rcu_dereference()</tt>.
that was later subsumed into <tt>rcu_dereference()</tt> and later
still into <tt>READ_ONCE()</tt>.
The need for these operations made itself known quite suddenly at a
late-1990s meeting with the DEC Alpha architects, back in the days when
DEC was still a free-standing company.
@@ -2797,7 +2798,7 @@ RCU must avoid degrading real-time response for CPU-bound threads, whether
executing in usermode (which is one use case for
<tt>CONFIG_NO_HZ_FULL=y</tt>) or in the kernel.
That said, CPU-bound loops in the kernel must execute
<tt>cond_resched_rcu_qs()</tt> at least once per few tens of milliseconds
<tt>cond_resched()</tt> at least once per few tens of milliseconds
in order to avoid receiving an IPI from RCU.

<p>
@@ -3128,7 +3129,7 @@ The solution, in the form of
is to have implicit
read-side critical sections that are delimited by voluntary context
switches, that is, calls to <tt>schedule()</tt>,
<tt>cond_resched_rcu_qs()</tt>, and
<tt>cond_resched()</tt>, and
<tt>synchronize_rcu_tasks()</tt>.
In addition, transitions to and from userspace execution also delimit
tasks-RCU read-side critical sections.
+1 −5
Original line number Diff line number Diff line
@@ -122,11 +122,7 @@ o Be very careful about comparing pointers obtained from
		Note that if checks for being within an RCU read-side
		critical section are not required and the pointer is never
		dereferenced, rcu_access_pointer() should be used in place
		of rcu_dereference(). The rcu_access_pointer() primitive
		does not require an enclosing read-side critical section,
		and also omits the smp_read_barrier_depends() included in
		rcu_dereference(), which in turn should provide a small
		performance gain in some CPUs (e.g., the DEC Alpha).
		of rcu_dereference().

	o	The comparison is against a pointer that references memory
		that was initialized "a long time ago."  The reason
+4 −6
Original line number Diff line number Diff line
@@ -23,12 +23,10 @@ o A CPU looping with preemption disabled. This condition can
o	A CPU looping with bottom halves disabled.  This condition can
	result in RCU-sched and RCU-bh stalls.

o	For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the
	kernel without invoking schedule().  Note that cond_resched()
	does not necessarily prevent RCU CPU stall warnings.  Therefore,
	if the looping in the kernel is really expected and desirable
	behavior, you might need to replace some of the cond_resched()
	calls with calls to cond_resched_rcu_qs().
o	For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
	without invoking schedule().  If the looping in the kernel is
	really expected and desirable behavior, you might need to add
	some calls to cond_resched().

o	Booting Linux using a console connection that is too slow to
	keep up with the boot-time console-message rate.  For example,
+1 −2
Original line number Diff line number Diff line
@@ -600,8 +600,7 @@ don't forget about them when submitting patches making use of RCU!]

	#define rcu_dereference(p) \
	({ \
		typeof(p) _________p1 = p; \
		smp_read_barrier_depends(); \
		typeof(p) _________p1 = READ_ONCE(p); \
		(_________p1); \
	})

Loading