Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (d6dd50e0) · Commits · e / devices / android_kernel_xiaomi_markw

Documentation/RCU/stallwarn.txt

+24 −9

Original line number	Diff line number	Diff line
		@@ -56,8 +56,20 @@ RCU_STALL_RAT_DELAY
		two jiffies. (This is a cpp macro, not a kernel configuration
		parameter.)

		When a CPU detects that it is stalling, it will print a message similar
		to the following:
		rcupdate.rcu_task_stall_timeout

		This boot/sysfs parameter controls the RCU-tasks stall warning
		interval. A value of zero or less suppresses RCU-tasks stall
		warnings. A positive value sets the stall-warning interval
		in jiffies. An RCU-tasks stall warning starts wtih the line:

		INFO: rcu_tasks detected stalls on tasks:

		And continues with the output of sched_show_task() for each
		task stalling the current RCU-tasks grace period.

		For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
		it will print a message similar to the following:

		INFO: rcu_sched_state detected stall on CPU 5 (t=2500 jiffies)

		@@ -174,8 +186,12 @@ o A CPU looping with preemption disabled. This condition can
		o A CPU looping with bottom halves disabled. This condition can
		result in RCU-sched and RCU-bh stalls.

		o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
		without invoking schedule().
		o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the
		kernel without invoking schedule(). Note that cond_resched()
		does not necessarily prevent RCU CPU stall warnings. Therefore,
		if the looping in the kernel is really expected and desirable
		behavior, you might need to replace some of the cond_resched()
		calls with calls to cond_resched_rcu_qs().

		o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
		happen to preempt a low-priority task in the middle of an RCU
		@@ -208,11 +224,10 @@ o A hardware failure. This is quite unlikely, but has occurred
		This resulted in a series of RCU CPU stall warnings, eventually
		leading the realization that the CPU had failed.

		The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning.
		SRCU does not have its own CPU stall warnings, but its calls to
		synchronize_sched() will result in RCU-sched detecting RCU-sched-related
		CPU stalls. Please note that RCU only detects CPU stalls when there is
		a grace period in progress. No grace period, no CPU stall warnings.
		The RCU, RCU-sched, RCU-bh, and RCU-tasks implementations have CPU stall
		warning. Note that SRCU does -not- have CPU stall warnings. Please note
		that RCU only detects CPU stalls when there is a grace period in progress.
		No grace period, no CPU stall warnings.

		To diagnose the cause of the stall, inspect the stack traces.
		The offending function will usually be near the top of the stack.

Documentation/kernel-parameters.txt

+67 −1

Original line number	Diff line number	Diff line
		@@ -1723,6 +1723,49 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
		lockd.nlm_udpport=M [NFS] Assign UDP port.
		Format: <integer>

		locktorture.nreaders_stress= [KNL]
		Set the number of locking read-acquisition kthreads.
		Defaults to being automatically set based on the
		number of online CPUs.

		locktorture.nwriters_stress= [KNL]
		Set the number of locking write-acquisition kthreads.

		locktorture.onoff_holdoff= [KNL]
		Set time (s) after boot for CPU-hotplug testing.

		locktorture.onoff_interval= [KNL]
		Set time (s) between CPU-hotplug operations, or
		zero to disable CPU-hotplug testing.

		locktorture.shuffle_interval= [KNL]
		Set task-shuffle interval (jiffies). Shuffling
		tasks allows some CPUs to go into dyntick-idle
		mode during the locktorture test.

		locktorture.shutdown_secs= [KNL]
		Set time (s) after boot system shutdown. This
		is useful for hands-off automated testing.

		locktorture.stat_interval= [KNL]
		Time (s) between statistics printk()s.

		locktorture.stutter= [KNL]
		Time (s) to stutter testing, for example,
		specifying five seconds causes the test to run for
		five seconds, wait for five seconds, and so on.
		This tests the locking primitive's ability to
		transition abruptly to and from idle.

		locktorture.torture_runnable= [BOOT]
		Start locktorture running at boot time.

		locktorture.torture_type= [KNL]
		Specify the locking implementation to test.

		locktorture.verbose= [KNL]
		Enable additional printk() statements.

		logibm.irq= [HW,MOUSE] Logitech Bus Mouse Driver
		Format: <irq>

		@@ -2900,6 +2943,24 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
		Lazy RCU callbacks are those which RCU can
		prove do nothing more than free memory.

		rcutorture.cbflood_inter_holdoff= [KNL]
		Set holdoff time (jiffies) between successive
		callback-flood tests.

		rcutorture.cbflood_intra_holdoff= [KNL]
		Set holdoff time (jiffies) between successive
		bursts of callbacks within a given callback-flood
		test.

		rcutorture.cbflood_n_burst= [KNL]
		Set the number of bursts making up a given
		callback-flood test. Set this to zero to
		disable callback-flood testing.

		rcutorture.cbflood_n_per_burst= [KNL]
		Set the number of callbacks to be registered
		in a given burst of a callback-flood test.

		rcutorture.fqs_duration= [KNL]
		Set duration of force_quiescent_state bursts.

		@@ -2939,7 +3000,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
		Set time (s) between CPU-hotplug operations, or
		zero to disable CPU-hotplug testing.

		rcutorture.rcutorture_runnable= [BOOT]
		rcutorture.torture_runnable= [BOOT]
		Start rcutorture running at boot time.

		rcutorture.shuffle_interval= [KNL]
		@@ -3001,6 +3062,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
		rcupdate.rcu_cpu_stall_timeout= [KNL]
		Set timeout for RCU CPU stall warning messages.

		rcupdate.rcu_task_stall_timeout= [KNL]
		Set timeout in jiffies for RCU task stall warning
		messages. Disable with a value less than or equal
		to zero.

		rdinit= [KNL]
		Format: <full_path>
		Run specified binary instead of /init from the ramdisk,

Documentation/locking/locktorture.txt

0 → 100644

+147 −0

Original line number	Diff line number	Diff line
		Kernel Lock Torture Test Operation

		CONFIG_LOCK_TORTURE_TEST

		The CONFIG LOCK_TORTURE_TEST config option provides a kernel module
		that runs torture tests on core kernel locking primitives. The kernel
		module, 'locktorture', may be built after the fact on the running
		kernel to be tested, if desired. The tests periodically output status
		messages via printk(), which can be examined via the dmesg (perhaps
		grepping for "torture"). The test is started when the module is loaded,
		and stops when the module is unloaded. This program is based on how RCU
		is tortured, via rcutorture.

		This torture test consists of creating a number of kernel threads which
		acquire the lock and hold it for specific amount of time, thus simulating
		different critical region behaviors. The amount of contention on the lock
		can be simulated by either enlarging this critical region hold time and/or
		creating more kthreads.


		MODULE PARAMETERS

		This module has the following parameters:


		Locktorture-specific

		nwriters_stress Number of kernel threads that will stress exclusive lock
		ownership (writers). The default value is twice the number
		of online CPUs.

		nreaders_stress Number of kernel threads that will stress shared lock
		ownership (readers). The default is the same amount of writer
		locks. If the user did not specify nwriters_stress, then
		both readers and writers be the amount of online CPUs.

		torture_type Type of lock to torture. By default, only spinlocks will
		be tortured. This module can torture the following locks,
		with string values as follows:

		o "lock_busted": Simulates a buggy lock implementation.

		o "spin_lock": spin_lock() and spin_unlock() pairs.

		o "spin_lock_irq": spin_lock_irq() and spin_unlock_irq()
		pairs.

		o "rw_lock": read/write lock() and unlock() rwlock pairs.

		o "rw_lock_irq": read/write lock_irq() and unlock_irq()
		rwlock pairs.

		o "mutex_lock": mutex_lock() and mutex_unlock() pairs.

		o "rwsem_lock": read/write down() and up() semaphore pairs.

		torture_runnable Start locktorture at boot time in the case where the
		module is built into the kernel, otherwise wait for
		torture_runnable to be set via sysfs before starting.
		By default it will begin once the module is loaded.


		Torture-framework (RCU + locking)

		shutdown_secs The number of seconds to run the test before terminating
		the test and powering off the system. The default is
		zero, which disables test termination and system shutdown.
		This capability is useful for automated testing.

		onoff_interval The number of seconds between each attempt to execute a
		randomly selected CPU-hotplug operation. Defaults
		to zero, which disables CPU hotplugging. In
		CONFIG_HOTPLUG_CPU=n kernels, locktorture will silently
		refuse to do any CPU-hotplug operations regardless of
		what value is specified for onoff_interval.

		onoff_holdoff The number of seconds to wait until starting CPU-hotplug
		operations. This would normally only be used when
		locktorture was built into the kernel and started
		automatically at boot time, in which case it is useful
		in order to avoid confusing boot-time code with CPUs
		coming and going. This parameter is only useful if
		CONFIG_HOTPLUG_CPU is enabled.

		stat_interval Number of seconds between statistics-related printk()s.
		By default, locktorture will report stats every 60 seconds.
		Setting the interval to zero causes the statistics to
		be printed -only- when the module is unloaded, and this
		is the default.

		stutter The length of time to run the test before pausing for this
		same period of time. Defaults to "stutter=5", so as
		to run and pause for (roughly) five-second intervals.
		Specifying "stutter=0" causes the test to run continuously
		without pausing, which is the old default behavior.

		shuffle_interval The number of seconds to keep the test threads affinitied
		to a particular subset of the CPUs, defaults to 3 seconds.
		Used in conjunction with test_no_idle_hz.

		verbose Enable verbose debugging printing, via printk(). Enabled
		by default. This extra information is mostly related to
		high-level errors and reports from the main 'torture'
		framework.


		STATISTICS

		Statistics are printed in the following format:

		spin_lock-torture: Writes: Total: 93746064 Max/Min: 0/0 Fail: 0
		(A) (B) (C) (D) (E)

		(A): Lock type that is being tortured -- torture_type parameter.

		(B): Number of writer lock acquisitions. If dealing with a read/write primitive
		a second "Reads" statistics line is printed.

		(C): Number of times the lock was acquired.

		(D): Min and max number of times threads failed to acquire the lock.

		(E): true/false values if there were errors acquiring the lock. This should
		-only- be positive if there is a bug in the locking primitive's
		implementation. Otherwise a lock should never fail (i.e., spin_lock()).
		Of course, the same applies for (C), above. A dummy example of this is
		the "lock_busted" type.

		USAGE

		The following script may be used to torture locks:

		#!/bin/sh

		modprobe locktorture
		sleep 3600
		rmmod locktorture
		dmesg \| grep torture:

		The output can be manually inspected for the error flag of "!!!".
		One could of course create a more elaborate script that automatically
		checked for such errors. The "rmmod" command forces a "SUCCESS",
		"FAILURE", or "RCU_HOTPLUG" indication to be printk()ed. The first
		two are self-explanatory, while the last indicates that while there
		were no locking failures, CPU-hotplug problems were detected.

		Also see: Documentation/RCU/torture.txt

Documentation/memory-barriers.txt

+66 −62

Original line number	Diff line number	Diff line
		@@ -574,30 +574,14 @@ However, stores are not speculated. This means that ordering -is- provided
		in the following example:

		q = ACCESS_ONCE(a);
		if (ACCESS_ONCE(q)) {
		ACCESS_ONCE(b) = p;
		}

		Please note that ACCESS_ONCE() is not optional! Without the ACCESS_ONCE(),
		the compiler is within its rights to transform this example:

		q = a;
		if (q) {
		b = p; /* BUG: Compiler can reorder!!! */
		do_something();
		} else {
		b = p; /* BUG: Compiler can reorder!!! */
		do_something_else();
		ACCESS_ONCE(b) = p;
		}

		into this, which of course defeats the ordering:

		b = p;
		q = a;
		if (q)
		do_something();
		else
		do_something_else();
		Please note that ACCESS_ONCE() is not optional! Without the
		ACCESS_ONCE(), might combine the load from 'a' with other loads from
		'a', and the store to 'b' with other stores to 'b', with possible highly
		counterintuitive effects on ordering.

		Worse yet, if the compiler is able to prove (say) that the value of
		variable 'a' is always non-zero, it would be well within its rights
		@@ -605,11 +589,12 @@ to optimize the original example by eliminating the "if" statement
		as follows:

		q = a;
		b = p; /* BUG: Compiler can reorder!!! */
		do_something();
		b = p; /* BUG: Compiler and CPU can both reorder!!! */

		The solution is again ACCESS_ONCE() and barrier(), which preserves the
		ordering between the load from variable 'a' and the store to variable 'b':
		So don't leave out the ACCESS_ONCE().

		It is tempting to try to enforce ordering on identical stores on both
		branches of the "if" statement as follows:

		q = ACCESS_ONCE(a);
		if (q) {
		@@ -622,18 +607,11 @@ ordering between the load from variable 'a' and the store to variable 'b':
		do_something_else();
		}

		The initial ACCESS_ONCE() is required to prevent the compiler from
		proving the value of 'a', and the pair of barrier() invocations are
		required to prevent the compiler from pulling the two identical stores
		to 'b' out from the legs of the "if" statement.

		It is important to note that control dependencies absolutely require a
		a conditional. For example, the following "optimized" version of
		the above example breaks ordering, which is why the barrier() invocations
		are absolutely required if you have identical stores in both legs of
		the "if" statement:
		Unfortunately, current compilers will transform this as follows at high
		optimization levels:

		q = ACCESS_ONCE(a);
		barrier();
		ACCESS_ONCE(b) = p; /* BUG: No ordering vs. load from a!!! */
		if (q) {
		/* ACCESS_ONCE(b) = p; -- moved up, BUG!!! */
		@@ -643,21 +621,36 @@ the "if" statement:
		do_something_else();
		}

		It is of course legal for the prior load to be part of the conditional,
		for example, as follows:
		Now there is no conditional between the load from 'a' and the store to
		'b', which means that the CPU is within its rights to reorder them:
		The conditional is absolutely required, and must be present in the
		assembly code even after all compiler optimizations have been applied.
		Therefore, if you need ordering in this example, you need explicit
		memory barriers, for example, smp_store_release():

		if (ACCESS_ONCE(a) > 0) {
		barrier();
		ACCESS_ONCE(b) = q / 2;
		q = ACCESS_ONCE(a);
		if (q) {
		smp_store_release(&b, p);
		do_something();
		} else {
		barrier();
		ACCESS_ONCE(b) = q / 3;
		smp_store_release(&b, p);
		do_something_else();
		}

		This will again ensure that the load from variable 'a' is ordered before the
		stores to variable 'b'.
		In contrast, without explicit memory barriers, two-legged-if control
		ordering is guaranteed only when the stores differ, for example:

		q = ACCESS_ONCE(a);
		if (q) {
		ACCESS_ONCE(b) = p;
		do_something();
		} else {
		ACCESS_ONCE(b) = r;
		do_something_else();
		}

		The initial ACCESS_ONCE() is still required to prevent the compiler from
		proving the value of 'a'.

		In addition, you need to be careful what you do with the local variable 'q',
		otherwise the compiler might be able to guess the value and again remove
		@@ -665,12 +658,10 @@ the needed conditional. For example:

		q = ACCESS_ONCE(a);
		if (q % MAX) {
		barrier();
		ACCESS_ONCE(b) = p;
		do_something();
		} else {
		barrier();
		ACCESS_ONCE(b) = p;
		ACCESS_ONCE(b) = r;
		do_something_else();
		}

		@@ -682,9 +673,12 @@ transform the above code into the following:
		ACCESS_ONCE(b) = p;
		do_something_else();

		This transformation loses the ordering between the load from variable 'a'
		and the store to variable 'b'. If you are relying on this ordering, you
		should do something like the following:
		Given this transformation, the CPU is not required to respect the ordering
		between the load from variable 'a' and the store to variable 'b'. It is
		tempting to add a barrier(), but this does not help. The conditional
		is gone, and the barrier won't bring it back. Therefore, if you are
		relying on this ordering, you should make sure that MAX is greater than
		one, perhaps as follows:

		q = ACCESS_ONCE(a);
		BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
		@@ -692,35 +686,45 @@ should do something like the following:
		ACCESS_ONCE(b) = p;
		do_something();
		} else {
		ACCESS_ONCE(b) = p;
		ACCESS_ONCE(b) = r;
		do_something_else();
		}

		Please note once again that the stores to 'b' differ. If they were
		identical, as noted earlier, the compiler could pull this store outside
		of the 'if' statement.

		Finally, control dependencies do -not- provide transitivity. This is
		demonstrated by two related examples:
		demonstrated by two related examples, with the initial values of
		x and y both being zero:

		CPU 0 CPU 1
		===================== =====================
		r1 = ACCESS_ONCE(x); r2 = ACCESS_ONCE(y);
		if (r1 >= 0) if (r2 >= 0)
		if (r1 > 0) if (r2 > 0)
		ACCESS_ONCE(y) = 1; ACCESS_ONCE(x) = 1;

		assert(!(r1 == 1 && r2 == 1));

		The above two-CPU example will never trigger the assert(). However,
		if control dependencies guaranteed transitivity (which they do not),
		then adding the following two CPUs would guarantee a related assertion:
		then adding the following CPU would guarantee a related assertion:

		CPU 2 CPU 3
		===================== =====================
		ACCESS_ONCE(x) = 2; ACCESS_ONCE(y) = 2;
		CPU 2
		=====================
		ACCESS_ONCE(x) = 2;

		assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */

		assert(!(r1 == 2 && r2 == 2 && x == 1 && y == 1)); /* FAILS!!! */
		But because control dependencies do -not- provide transitivity, the above
		assertion can fail after the combined three-CPU example completes. If you
		need the three-CPU example to provide ordering, you will need smp_mb()
		between the loads and stores in the CPU 0 and CPU 1 code fragments,
		that is, just before or just after the "if" statements.

		But because control dependencies do -not- provide transitivity, the
		above assertion can fail after the combined four-CPU example completes.
		If you need the four-CPU example to provide ordering, you will need
		smp_mb() between the loads and stores in the CPU 0 and CPU 1 code fragments.
		These two examples are the LB and WWC litmus tests from this paper:
		http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this
		site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.

		In summary:

fs/file.c

+1 −1

Original line number	Diff line number	Diff line
		@@ -367,7 +367,7 @@ static struct fdtable close_files(struct files_struct files)
		struct file * file = xchg(&fdt->fd[i], NULL);
		if (file) {
		filp_close(file, files);
		cond_resched();
		cond_resched_rcu_qs();
		}
		}
		i++;