Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit be77f87c authored by Paul E. McKenney's avatar Paul E. McKenney
Browse files

Merge branches 'cbnum.2013.06.10a', 'doc.2013.06.10a', 'fixes.2013.06.10a',...

Merge branches 'cbnum.2013.06.10a', 'doc.2013.06.10a', 'fixes.2013.06.10a', 'srcu.2013.06.10a' and 'tiny.2013.06.10a' into HEAD

cbnum.2013.06.10a: Apply simplifications stemming from the new callback
	numbering.

doc.2013.06.10a: Documentation updates.

fixes.2013.06.10a: Miscellaneous fixes.

srcu.2013.06.10a: Updates to SRCU.

tiny.2013.06.10a: Eliminate TINY_PREEMPT_RCU.
Loading
Loading
Loading
Loading
+0 −6
Original line number Diff line number Diff line
@@ -354,12 +354,6 @@ over a rather long period of time, but improvements are always welcome!
	using RCU rather than SRCU, because RCU is almost always faster
	and easier to use than is SRCU.

	If you need to enter your read-side critical section in a
	hardirq or exception handler, and then exit that same read-side
	critical section in the task that was interrupted, then you need
	to srcu_read_lock_raw() and srcu_read_unlock_raw(), which avoid
	the lockdep checking that would otherwise this practice illegal.

	Also unlike other forms of RCU, explicit initialization
	and cleanup is required via init_srcu_struct() and
	cleanup_srcu_struct().	These are passed a "struct srcu_struct"
+0 −6
Original line number Diff line number Diff line
@@ -182,12 +182,6 @@ torture_type The type of RCU to test, with string values as follows:
		"srcu_expedited": srcu_read_lock(), srcu_read_unlock() and
			synchronize_srcu_expedited().

		"srcu_raw": srcu_read_lock_raw(), srcu_read_unlock_raw(),
			and call_srcu().

		"srcu_raw_sync": srcu_read_lock_raw(), srcu_read_unlock_raw(),
			and synchronize_srcu().

		"sched": preempt_disable(), preempt_enable(), and
			call_rcu_sched().

+4 −96
Original line number Diff line number Diff line
@@ -530,113 +530,21 @@ o "nos" counts the number of times we balked for other
	reasons, e.g., the grace period ended first.


CONFIG_TINY_RCU and CONFIG_TINY_PREEMPT_RCU debugfs Files and Formats
CONFIG_TINY_RCU debugfs Files and Formats

These implementations of RCU provides a single debugfs file under the
top-level directory RCU, namely rcu/rcudata, which displays fields in
rcu_bh_ctrlblk, rcu_sched_ctrlblk and, for CONFIG_TINY_PREEMPT_RCU,
rcu_preempt_ctrlblk.
rcu_bh_ctrlblk and rcu_sched_ctrlblk.

The output of "cat rcu/rcudata" is as follows:

rcu_preempt: qlen=24 gp=1097669 g197/p197/c197 tasks=...
             ttb=. btg=no ntb=184 neb=0 nnb=183 j=01f7 bt=0274
             normal balk: nt=1097669 gt=0 bt=371 b=0 ny=25073378 nos=0
             exp balk: bt=0 nos=0
rcu_sched: qlen: 0
rcu_bh: qlen: 0

This is split into rcu_preempt, rcu_sched, and rcu_bh sections, with the
rcu_preempt section appearing only in CONFIG_TINY_PREEMPT_RCU builds.
The last three lines of the rcu_preempt section appear only in
CONFIG_RCU_BOOST kernel builds.  The fields are as follows:
This is split into rcu_sched and rcu_bh sections.  The field is as
follows:

o	"qlen" is the number of RCU callbacks currently waiting either
	for an RCU grace period or waiting to be invoked.  This is the
	only field present for rcu_sched and rcu_bh, due to the
	short-circuiting of grace period in those two cases.

o	"gp" is the number of grace periods that have completed.

o	"g197/p197/c197" displays the grace-period state, with the
	"g" number being the number of grace periods that have started
	(mod 256), the "p" number being the number of grace periods
	that the CPU has responded to (also mod 256), and the "c"
	number being the number of grace periods that have completed
	(once again mode 256).

	Why have both "gp" and "g"?  Because the data flowing into
	"gp" is only present in a CONFIG_RCU_TRACE kernel.

o	"tasks" is a set of bits.  The first bit is "T" if there are
	currently tasks that have recently blocked within an RCU
	read-side critical section, the second bit is "N" if any of the
	aforementioned tasks are blocking the current RCU grace period,
	and the third bit is "E" if any of the aforementioned tasks are
	blocking the current expedited grace period.  Each bit is "."
	if the corresponding condition does not hold.

o	"ttb" is a single bit.  It is "B" if any of the blocked tasks
	need to be priority boosted and "." otherwise.

o	"btg" indicates whether boosting has been carried out during
	the current grace period, with "exp" indicating that boosting
	is in progress for an expedited grace period, "no" indicating
	that boosting has not yet started for a normal grace period,
	"begun" indicating that boosting has bebug for a normal grace
	period, and "done" indicating that boosting has completed for
	a normal grace period.

o	"ntb" is the total number of tasks subjected to RCU priority boosting
	periods since boot.

o	"neb" is the number of expedited grace periods that have had
	to resort to RCU priority boosting since boot.

o	"nnb" is the number of normal grace periods that have had
	to resort to RCU priority boosting since boot.

o	"j" is the low-order 16 bits of the jiffies counter in hexadecimal.

o	"bt" is the low-order 16 bits of the value that the jiffies counter
	will have at the next time that boosting is scheduled to begin.

o	In the line beginning with "normal balk", the fields are as follows:

	o	"nt" is the number of times that the system balked from
		boosting because there were no blocked tasks to boost.
		Note that the system will balk from boosting even if the
		grace period is overdue when the currently running task
		is looping within an RCU read-side critical section.
		There is no point in boosting in this case, because
		boosting a running task won't make it run any faster.

	o	"gt" is the number of times that the system balked
		from boosting because, although there were blocked tasks,
		none of them were preventing the current grace period
		from completing.

	o	"bt" is the number of times that the system balked
		from boosting because boosting was already in progress.

	o	"b" is the number of times that the system balked from
		boosting because boosting had already completed for
		the grace period in question.

	o	"ny" is the number of times that the system balked from
		boosting because it was not yet time to start boosting
		the grace period in question.

	o	"nos" is the number of times that the system balked from
		boosting for inexplicable ("not otherwise specified")
		reasons.  This can actually happen due to races involving
		increments of the jiffies counter.

o	In the line beginning with "exp balk", the fields are as follows:

	o	"bt" is the number of times that the system balked from
		boosting because there were no blocked tasks to boost.

	o	"nos" is the number of times that the system balked from
		 boosting for inexplicable ("not otherwise specified")
		 reasons.
+7 −15
Original line number Diff line number Diff line
@@ -842,9 +842,7 @@ SRCU: Critical sections Grace period Barrier

	srcu_read_lock		synchronize_srcu	srcu_barrier
	srcu_read_unlock	call_srcu
	srcu_read_lock_raw	synchronize_srcu_expedited
	srcu_read_unlock_raw
	srcu_dereference
	srcu_dereference	synchronize_srcu_expedited

SRCU:	Initialization/cleanup
	init_srcu_struct
@@ -865,38 +863,32 @@ list can be helpful:

a.	Will readers need to block?  If so, you need SRCU.

b.	Is it necessary to start a read-side critical section in a
	hardirq handler or exception handler, and then to complete
	this read-side critical section in the task that was
	interrupted?  If so, you need SRCU's srcu_read_lock_raw() and
	srcu_read_unlock_raw() primitives.

c.	What about the -rt patchset?  If readers would need to block
b.	What about the -rt patchset?  If readers would need to block
	in an non-rt kernel, you need SRCU.  If readers would block
	in a -rt kernel, but not in a non-rt kernel, SRCU is not
	necessary.

d.	Do you need to treat NMI handlers, hardirq handlers,
c.	Do you need to treat NMI handlers, hardirq handlers,
	and code segments with preemption disabled (whether
	via preempt_disable(), local_irq_save(), local_bh_disable(),
	or some other mechanism) as if they were explicit RCU readers?
	If so, RCU-sched is the only choice that will work for you.

e.	Do you need RCU grace periods to complete even in the face
d.	Do you need RCU grace periods to complete even in the face
	of softirq monopolization of one or more of the CPUs?  For
	example, is your code subject to network-based denial-of-service
	attacks?  If so, you need RCU-bh.

f.	Is your workload too update-intensive for normal use of
e.	Is your workload too update-intensive for normal use of
	RCU, but inappropriate for other synchronization mechanisms?
	If so, consider SLAB_DESTROY_BY_RCU.  But please be careful!

g.	Do you need read-side critical sections that are respected
f.	Do you need read-side critical sections that are respected
	even though they are in the middle of the idle loop, during
	user-mode execution, or on an offlined CPU?  If so, SRCU is the
	only choice that will work for you.

h.	Otherwise, use RCU.
g.	Otherwise, use RCU.

Of course, this all assumes that you have determined that RCU is in fact
the right tool for your job.
+47 −0
Original line number Diff line number Diff line
@@ -157,6 +157,53 @@ RCU_SOFTIRQ: Do at least one of the following:
		calls and by forcing both kernel threads and interrupts
		to execute elsewhere.

Name: kworker/%u:%d%s (cpu, id, priority)
Purpose: Execute workqueue requests
To reduce its OS jitter, do any of the following:
1.	Run your workload at a real-time priority, which will allow
	preempting the kworker daemons.
2.	Do any of the following needed to avoid jitter that your
	application cannot tolerate:
	a.	Build your kernel with CONFIG_SLUB=y rather than
		CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
		use of each CPU's workqueues to run its cache_reap()
		function.
	b.	Avoid using oprofile, thus avoiding OS jitter from
		wq_sync_buffer().
	c.	Limit your CPU frequency so that a CPU-frequency
		governor is not required, possibly enlisting the aid of
		special heatsinks or other cooling technologies.  If done
		correctly, and if you CPU architecture permits, you should
		be able to build your kernel with CONFIG_CPU_FREQ=n to
		avoid the CPU-frequency governor periodically running
		on each CPU, including cs_dbs_timer() and od_dbs_timer().
		WARNING:  Please check your CPU specifications to
		make sure that this is safe on your particular system.
	d.	It is not possible to entirely get rid of OS jitter
		from vmstat_update() on CONFIG_SMP=y systems, but you
		can decrease its frequency by writing a large value to
		/proc/sys/vm/stat_interval.  The default value is HZ,
		for an interval of one second.  Of course, larger values
		will make your virtual-memory statistics update more
		slowly.  Of course, you can also run your workload at
		a real-time priority, thus preempting vmstat_update().
	e.	If running on high-end powerpc servers, build with
		CONFIG_PPC_RTAS_DAEMON=n.  This prevents the RTAS
		daemon from running on each CPU every second or so.
		(This will require editing Kconfig files and will defeat
		this platform's RAS functionality.)  This avoids jitter
		due to the rtas_event_scan() function.
		WARNING:  Please check your CPU specifications to
		make sure that this is safe on your particular system.
	f.	If running on Cell Processor, build your kernel with
		CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
		spu_gov_work().
		WARNING:  Please check your CPU specifications to
		make sure that this is safe on your particular system.
	g.	If running on PowerMAC, build your kernel with
		CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
		avoiding OS jitter from rackmeter_do_timer().

Name: rcuc/%u
Purpose: Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels.
To reduce its OS jitter, do at least one of the following:
Loading