Documentation/memory-barriers.txt: Document ACCESS_ONCE() (692118da) · Commits · e / devices / android_kernel_oneplus_sm8150

Documentation/memory-barriers.txt

+271 −35

Original line number	Diff line number	Diff line
		@@ -231,37 +231,8 @@ And there are a number of things that _must_ or _must_not_ be assumed:
		(*) It _must_not_ be assumed that the compiler will do what you want with
		memory references that are not protected by ACCESS_ONCE(). Without
		ACCESS_ONCE(), the compiler is within its rights to do all sorts
		of "creative" transformations:

		(-) Repeat the load, possibly getting a different value on the second
		and subsequent loads. This is especially prone to happen when
		register pressure is high.

		(-) Merge adjacent loads and stores to the same location. The most
		familiar example is the transformation from:

		while (a)
		do_something();

		to something like:

		if (a)
		for (;;)
		do_something();

		Using ACCESS_ONCE() as follows prevents this sort of optimization:

		while (ACCESS_ONCE(a))
		do_something();

		(-) "Store tearing", where a single store in the source code is split
		into smaller stores in the object code. Note that gcc really
		will do this on some architectures when storing certain constants.
		It can be cheaper to do a series of immediate stores than to
		form the constant in a register and then to store that register.

		(-) "Load tearing", which splits loads in a manner analogous to
		store tearing.
		of "creative" transformations, which are covered in the Compiler
		Barrier section.

		(*) It _must_not_ be assumed that independent loads and stores will be issued
		in the order given. This means that for:
		@@ -749,7 +720,8 @@ In summary:

		(*) Control dependencies require that the compiler avoid reordering the
		dependency into nonexistence. Careful use of ACCESS_ONCE() or
		barrier() can help to preserve your control dependency.
		barrier() can help to preserve your control dependency. Please
		see the Compiler Barrier section for more information.

		(*) Control dependencies do -not- provide transitivity. If you
		need transitivity, use smp_mb().
		@@ -1248,12 +1220,276 @@ compiler from moving the memory accesses either side of it to the other side:
		barrier();

		This is a general barrier -- there are no read-read or write-write variants
		of barrier(). Howevever, ACCESS_ONCE() can be thought of as a weak form
		of barrier(). However, ACCESS_ONCE() can be thought of as a weak form
		for barrier() that affects only the specific accesses flagged by the
		ACCESS_ONCE().

		The compiler barrier has no direct effect on the CPU, which may then reorder
		things however it wishes.
		The barrier() function has the following effects:

		(*) Prevents the compiler from reordering accesses following the
		barrier() to precede any accesses preceding the barrier().
		One example use for this property is to ease communication between
		interrupt-handler code and the code that was interrupted.

		(*) Within a loop, forces the compiler to load the variables used
		in that loop's conditional on each pass through that loop.

		The ACCESS_ONCE() function can prevent any number of optimizations that,
		while perfectly safe in single-threaded code, can be fatal in concurrent
		code. Here are some examples of these sorts of optimizations:

		(*) The compiler is within its rights to merge successive loads from
		the same variable. Such merging can cause the compiler to "optimize"
		the following code:

		while (tmp = a)
		do_something_with(tmp);

		into the following code, which, although in some sense legitimate
		for single-threaded code, is almost certainly not what the developer
		intended:

		if (tmp = a)
		for (;;)
		do_something_with(tmp);

		Use ACCESS_ONCE() to prevent the compiler from doing this to you:

		while (tmp = ACCESS_ONCE(a))
		do_something_with(tmp);

		(*) The compiler is within its rights to reload a variable, for example,
		in cases where high register pressure prevents the compiler from
		keeping all data of interest in registers. The compiler might
		therefore optimize the variable 'tmp' out of our previous example:

		while (tmp = a)
		do_something_with(tmp);

		This could result in the following code, which is perfectly safe in
		single-threaded code, but can be fatal in concurrent code:

		while (a)
		do_something_with(a);

		For example, the optimized version of this code could result in
		passing a zero to do_something_with() in the case where the variable
		a was modified by some other CPU between the "while" statement and
		the call to do_something_with().

		Again, use ACCESS_ONCE() to prevent the compiler from doing this:

		while (tmp = ACCESS_ONCE(a))
		do_something_with(tmp);

		Note that if the compiler runs short of registers, it might save
		tmp onto the stack. The overhead of this saving and later restoring
		is why compilers reload variables. Doing so is perfectly safe for
		single-threaded code, so you need to tell the compiler about cases
		where it is not safe.

		(*) The compiler is within its rights to omit a load entirely if it knows
		what the value will be. For example, if the compiler can prove that
		the value of variable 'a' is always zero, it can optimize this code:

		while (tmp = a)
		do_something_with(tmp);

		Into this:

		do { } while (0);

		This transformation is a win for single-threaded code because it gets
		rid of a load and a branch. The problem is that the compiler will
		carry out its proof assuming that the current CPU is the only one
		updating variable 'a'. If variable 'a' is shared, then the compiler's
		proof will be erroneous. Use ACCESS_ONCE() to tell the compiler
		that it doesn't know as much as it thinks it does:

		while (tmp = ACCESS_ONCE(a))
		do_something_with(tmp);

		But please note that the compiler is also closely watching what you
		do with the value after the ACCESS_ONCE(). For example, suppose you
		do the following and MAX is a preprocessor macro with the value 1:

		while ((tmp = ACCESS_ONCE(a)) % MAX)
		do_something_with(tmp);

		Then the compiler knows that the result of the "%" operator applied
		to MAX will always be zero, again allowing the compiler to optimize
		the code into near-nonexistence. (It will still load from the
		variable 'a'.)

		(*) Similarly, the compiler is within its rights to omit a store entirely
		if it knows that the variable already has the value being stored.
		Again, the compiler assumes that the current CPU is the only one
		storing into the variable, which can cause the compiler to do the
		wrong thing for shared variables. For example, suppose you have
		the following:

		a = 0;
		/* Code that does not store to variable a. */
		a = 0;

		The compiler sees that the value of variable 'a' is already zero, so
		it might well omit the second store. This would come as a fatal
		surprise if some other CPU might have stored to variable 'a' in the
		meantime.

		Use ACCESS_ONCE() to prevent the compiler from making this sort of
		wrong guess:

		ACCESS_ONCE(a) = 0;
		/* Code that does not store to variable a. */
		ACCESS_ONCE(a) = 0;

		(*) The compiler is within its rights to reorder memory accesses unless
		you tell it not to. For example, consider the following interaction
		between process-level code and an interrupt handler:

		void process_level(void)
		{
		msg = get_message();
		flag = true;
		}

		void interrupt_handler(void)
		{
		if (flag)
		process_message(msg);
		}

		There is nothing to prevent the the compiler from transforming
		process_level() to the following, in fact, this might well be a
		win for single-threaded code:

		void process_level(void)
		{
		flag = true;
		msg = get_message();
		}

		If the interrupt occurs between these two statement, then
		interrupt_handler() might be passed a garbled msg. Use ACCESS_ONCE()
		to prevent this as follows:

		void process_level(void)
		{
		ACCESS_ONCE(msg) = get_message();
		ACCESS_ONCE(flag) = true;
		}

		void interrupt_handler(void)
		{
		if (ACCESS_ONCE(flag))
		process_message(ACCESS_ONCE(msg));
		}

		Note that the ACCESS_ONCE() wrappers in interrupt_handler()
		are needed if this interrupt handler can itself be interrupted
		by something that also accesses 'flag' and 'msg', for example,
		a nested interrupt or an NMI. Otherwise, ACCESS_ONCE() is not
		needed in interrupt_handler() other than for documentation purposes.
		(Note also that nested interrupts do not typically occur in modern
		Linux kernels, in fact, if an interrupt handler returns with
		interrupts enabled, you will get a WARN_ONCE() splat.)

		You should assume that the compiler can move ACCESS_ONCE() past
		code not containing ACCESS_ONCE(), barrier(), or similar primitives.

		This effect could also be achieved using barrier(), but ACCESS_ONCE()
		is more selective: With ACCESS_ONCE(), the compiler need only forget
		the contents of the indicated memory locations, while with barrier()
		the compiler must discard the value of all memory locations that
		it has currented cached in any machine registers. Of course,
		the compiler must also respect the order in which the ACCESS_ONCE()s
		occur, though the CPU of course need not do so.

		(*) The compiler is within its rights to invent stores to a variable,
		as in the following example:

		if (a)
		b = a;
		else
		b = 42;

		The compiler might save a branch by optimizing this as follows:

		b = 42;
		if (a)
		b = a;

		In single-threaded code, this is not only safe, but also saves
		a branch. Unfortunately, in concurrent code, this optimization
		could cause some other CPU to see a spurious value of 42 -- even
		if variable 'a' was never zero -- when loading variable 'b'.
		Use ACCESS_ONCE() to prevent this as follows:

		if (a)
		ACCESS_ONCE(b) = a;
		else
		ACCESS_ONCE(b) = 42;

		The compiler can also invent loads. These are usually less
		damaging, but they can result in cache-line bouncing and thus in
		poor performance and scalability. Use ACCESS_ONCE() to prevent
		invented loads.

		(*) For aligned memory locations whose size allows them to be accessed
		with a single memory-reference instruction, prevents "load tearing"
		and "store tearing," in which a single large access is replaced by
		multiple smaller accesses. For example, given an architecture having
		16-bit store instructions with 7-bit immediate fields, the compiler
		might be tempted to use two 16-bit store-immediate instructions to
		implement the following 32-bit store:

		p = 0x00010002;

		Please note that GCC really does use this sort of optimization,
		which is not surprising given that it would likely take more
		than two instructions to build the constant and then store it.
		This optimization can therefore be a win in single-threaded code.
		In fact, a recent bug (since fixed) caused GCC to incorrectly use
		this optimization in a volatile store. In the absence of such bugs,
		use of ACCESS_ONCE() prevents store tearing in the following example:

		ACCESS_ONCE(p) = 0x00010002;

		Use of packed structures can also result in load and store tearing,
		as in this example:

		struct __attribute__((__packed__)) foo {
		short a;
		int b;
		short c;
		};
		struct foo foo1, foo2;
		...

		foo2.a = foo1.a;
		foo2.b = foo1.b;
		foo2.c = foo1.c;

		Because there are no ACCESS_ONCE() wrappers and no volatile markings,
		the compiler would be well within its rights to implement these three
		assignment statements as a pair of 32-bit loads followed by a pair
		of 32-bit stores. This would result in load tearing on 'foo1.b'
		and store tearing on 'foo2.b'. ACCESS_ONCE() again prevents tearing
		in this example:

		foo2.a = foo1.a;
		ACCESS_ONCE(foo2.b) = ACCESS_ONCE(foo1.b);
		foo2.c = foo1.c;

		All that aside, it is never necessary to use ACCESS_ONCE() on a variable
		that has been marked volatile. For example, because 'jiffies' is marked
		volatile, it is never necessary to say ACCESS_ONCE(jiffies). The reason
		for this is that ACCESS_ONCE() is implemented as a volatile cast, which
		has no effect when its argument is already marked volatile.

		Please note that these compiler barriers have no direct effect on the CPU,
		which may then reorder things however it wishes.


		CPU MEMORY BARRIERS