Loading Documentation/RCU/Design/Requirements/Requirements.html +72 −1 Original line number Diff line number Diff line Loading @@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows: <li> <a href="#Hotplug CPU">Hotplug CPU</a>. <li> <a href="#Scheduler and RCU">Scheduler and RCU</a>. <li> <a href="#Tracing and RCU">Tracing and RCU</a>. <li> <a href="#Accesses to User Memory and RCU"> Accesses to User Memory and RCU</a>. <li> <a href="#Energy Efficiency">Energy Efficiency</a>. <li> <a href="#Scheduling-Clock Interrupts and RCU"> Scheduling-Clock Interrupts and RCU</a>. Loading Loading @@ -2512,7 +2514,7 @@ disabled across the entire RCU read-side critical section. <p> It is possible to use tracing on RCU code, but tracing itself uses RCU. For this reason, <tt>rcu_dereference_raw_notrace()</tt> For this reason, <tt>rcu_dereference_raw_check()</tt> is provided for use by tracing, which avoids the destructive recursion that could otherwise ensue. This API is also used by virtualization in some architectures, Loading @@ -2521,6 +2523,75 @@ cannot be used. The tracing folks both located the requirement and provided the needed fix, so this surprise requirement was relatively painless. <h3><a name="Accesses to User Memory and RCU"> Accesses to User Memory and RCU</a></h3> <p> The kernel needs to access user-space memory, for example, to access data referenced by system-call parameters. The <tt>get_user()</tt> macro does this job. <p> However, user-space memory might well be paged out, which means that <tt>get_user()</tt> might well page-fault and thus block while waiting for the resulting I/O to complete. It would be a very bad thing for the compiler to reorder a <tt>get_user()</tt> invocation into an RCU read-side critical section. For example, suppose that the source code looked like this: <blockquote> <pre> 1 rcu_read_lock(); 2 p = rcu_dereference(gp); 3 v = p->value; 4 rcu_read_unlock(); 5 get_user(user_v, user_p); 6 do_something_with(v, user_v); </pre> </blockquote> <p> The compiler must not be permitted to transform this source code into the following: <blockquote> <pre> 1 rcu_read_lock(); 2 p = rcu_dereference(gp); 3 get_user(user_v, user_p); // BUG: POSSIBLE PAGE FAULT!!! 4 v = p->value; 5 rcu_read_unlock(); 6 do_something_with(v, user_v); </pre> </blockquote> <p> If the compiler did make this transformation in a <tt>CONFIG_PREEMPT=n</tt> kernel build, and if <tt>get_user()</tt> did page fault, the result would be a quiescent state in the middle of an RCU read-side critical section. This misplaced quiescent state could result in line 4 being a use-after-free access, which could be bad for your kernel's actuarial statistics. Similar examples can be constructed with the call to <tt>get_user()</tt> preceding the <tt>rcu_read_lock()</tt>. <p> Unfortunately, <tt>get_user()</tt> doesn't have any particular ordering properties, and in some architectures the underlying <tt>asm</tt> isn't even marked <tt>volatile</tt>. And even if it was marked <tt>volatile</tt>, the above access to <tt>p->value</tt> is not volatile, so the compiler would not have any reason to keep those two accesses in order. <p> Therefore, the Linux-kernel definitions of <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt> must act as compiler barriers, at least for outermost instances of <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt> within a nested set of RCU read-side critical sections. <h3><a name="Energy Efficiency">Energy Efficiency</a></h3> <p> Loading Documentation/RCU/stallwarn.txt +6 −0 Original line number Diff line number Diff line Loading @@ -57,6 +57,12 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that CONFIG_PREEMPT_RCU case, you might see stall-warning messages. You can use the rcutree.kthread_prio kernel boot parameter to increase the scheduling priority of RCU's kthreads, which can help avoid this problem. However, please note that doing this can increase your system's context-switch rate and thus degrade performance. o A periodic interrupt whose handler takes longer than the time interval between successive pairs of interrupts. This can prevent RCU's kthreads and softirq handlers from running. Loading Documentation/admin-guide/kernel-parameters.txt +11 −6 Original line number Diff line number Diff line Loading @@ -3837,12 +3837,13 @@ RCU_BOOST is not set, valid values are 0-99 and the default is zero (non-realtime operation). rcutree.rcu_nocb_leader_stride= [KNL] Set the number of NOCB kthread groups, which defaults to the square root of the number of CPUs. Larger numbers reduces the wakeup overhead on the per-CPU grace-period kthreads, but increases that same overhead on each group's leader. rcutree.rcu_nocb_gp_stride= [KNL] Set the number of NOCB callback kthreads in each group, which defaults to the square root of the number of CPUs. Larger numbers reduce the wakeup overhead on the global grace-period kthread, but increases that same overhead on each group's NOCB grace-period kthread. rcutree.qhimark= [KNL] Set threshold of queued RCU callbacks beyond which Loading Loading @@ -4047,6 +4048,10 @@ rcutorture.verbose= [KNL] Enable additional printk() statements. rcupdate.rcu_cpu_stall_ftrace_dump= [KNL] Dump ftrace buffer after reporting RCU CPU stall warning. rcupdate.rcu_cpu_stall_suppress= [KNL] Suppress RCU CPU stall warning messages. Loading MAINTAINERS +1 −1 Original line number Diff line number Diff line Loading @@ -9326,7 +9326,7 @@ F: drivers/misc/lkdtm/* LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM) M: Alan Stern <stern@rowland.harvard.edu> M: Andrea Parri <andrea.parri@amarulasolutions.com> M: Andrea Parri <parri.andrea@gmail.com> M: Will Deacon <will@kernel.org> M: Peter Zijlstra <peterz@infradead.org> M: Boqun Feng <boqun.feng@gmail.com> Loading arch/arm/kernel/smp.c +2 −4 Original line number Diff line number Diff line Loading @@ -264,15 +264,13 @@ int __cpu_disable(void) return 0; } static DECLARE_COMPLETION(cpu_died); /* * called on the thread which is asking for a CPU to be shutdown - * waits until shutdown has completed, or it is timed out. */ void __cpu_die(unsigned int cpu) { if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) { if (!cpu_wait_death(cpu, 5)) { pr_err("CPU%u: cpu didn't die\n", cpu); return; } Loading Loading @@ -319,7 +317,7 @@ void arch_cpu_idle_dead(void) * this returns, power and/or clocks can be removed at any point * from this CPU and its cache by platform_cpu_kill(). */ complete(&cpu_died); (void)cpu_report_death(); /* * Ensure that the cache lines associated with that completion are Loading Loading
Documentation/RCU/Design/Requirements/Requirements.html +72 −1 Original line number Diff line number Diff line Loading @@ -2129,6 +2129,8 @@ Some of the relevant points of interest are as follows: <li> <a href="#Hotplug CPU">Hotplug CPU</a>. <li> <a href="#Scheduler and RCU">Scheduler and RCU</a>. <li> <a href="#Tracing and RCU">Tracing and RCU</a>. <li> <a href="#Accesses to User Memory and RCU"> Accesses to User Memory and RCU</a>. <li> <a href="#Energy Efficiency">Energy Efficiency</a>. <li> <a href="#Scheduling-Clock Interrupts and RCU"> Scheduling-Clock Interrupts and RCU</a>. Loading Loading @@ -2512,7 +2514,7 @@ disabled across the entire RCU read-side critical section. <p> It is possible to use tracing on RCU code, but tracing itself uses RCU. For this reason, <tt>rcu_dereference_raw_notrace()</tt> For this reason, <tt>rcu_dereference_raw_check()</tt> is provided for use by tracing, which avoids the destructive recursion that could otherwise ensue. This API is also used by virtualization in some architectures, Loading @@ -2521,6 +2523,75 @@ cannot be used. The tracing folks both located the requirement and provided the needed fix, so this surprise requirement was relatively painless. <h3><a name="Accesses to User Memory and RCU"> Accesses to User Memory and RCU</a></h3> <p> The kernel needs to access user-space memory, for example, to access data referenced by system-call parameters. The <tt>get_user()</tt> macro does this job. <p> However, user-space memory might well be paged out, which means that <tt>get_user()</tt> might well page-fault and thus block while waiting for the resulting I/O to complete. It would be a very bad thing for the compiler to reorder a <tt>get_user()</tt> invocation into an RCU read-side critical section. For example, suppose that the source code looked like this: <blockquote> <pre> 1 rcu_read_lock(); 2 p = rcu_dereference(gp); 3 v = p->value; 4 rcu_read_unlock(); 5 get_user(user_v, user_p); 6 do_something_with(v, user_v); </pre> </blockquote> <p> The compiler must not be permitted to transform this source code into the following: <blockquote> <pre> 1 rcu_read_lock(); 2 p = rcu_dereference(gp); 3 get_user(user_v, user_p); // BUG: POSSIBLE PAGE FAULT!!! 4 v = p->value; 5 rcu_read_unlock(); 6 do_something_with(v, user_v); </pre> </blockquote> <p> If the compiler did make this transformation in a <tt>CONFIG_PREEMPT=n</tt> kernel build, and if <tt>get_user()</tt> did page fault, the result would be a quiescent state in the middle of an RCU read-side critical section. This misplaced quiescent state could result in line 4 being a use-after-free access, which could be bad for your kernel's actuarial statistics. Similar examples can be constructed with the call to <tt>get_user()</tt> preceding the <tt>rcu_read_lock()</tt>. <p> Unfortunately, <tt>get_user()</tt> doesn't have any particular ordering properties, and in some architectures the underlying <tt>asm</tt> isn't even marked <tt>volatile</tt>. And even if it was marked <tt>volatile</tt>, the above access to <tt>p->value</tt> is not volatile, so the compiler would not have any reason to keep those two accesses in order. <p> Therefore, the Linux-kernel definitions of <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt> must act as compiler barriers, at least for outermost instances of <tt>rcu_read_lock()</tt> and <tt>rcu_read_unlock()</tt> within a nested set of RCU read-side critical sections. <h3><a name="Energy Efficiency">Energy Efficiency</a></h3> <p> Loading
Documentation/RCU/stallwarn.txt +6 −0 Original line number Diff line number Diff line Loading @@ -57,6 +57,12 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that CONFIG_PREEMPT_RCU case, you might see stall-warning messages. You can use the rcutree.kthread_prio kernel boot parameter to increase the scheduling priority of RCU's kthreads, which can help avoid this problem. However, please note that doing this can increase your system's context-switch rate and thus degrade performance. o A periodic interrupt whose handler takes longer than the time interval between successive pairs of interrupts. This can prevent RCU's kthreads and softirq handlers from running. Loading
Documentation/admin-guide/kernel-parameters.txt +11 −6 Original line number Diff line number Diff line Loading @@ -3837,12 +3837,13 @@ RCU_BOOST is not set, valid values are 0-99 and the default is zero (non-realtime operation). rcutree.rcu_nocb_leader_stride= [KNL] Set the number of NOCB kthread groups, which defaults to the square root of the number of CPUs. Larger numbers reduces the wakeup overhead on the per-CPU grace-period kthreads, but increases that same overhead on each group's leader. rcutree.rcu_nocb_gp_stride= [KNL] Set the number of NOCB callback kthreads in each group, which defaults to the square root of the number of CPUs. Larger numbers reduce the wakeup overhead on the global grace-period kthread, but increases that same overhead on each group's NOCB grace-period kthread. rcutree.qhimark= [KNL] Set threshold of queued RCU callbacks beyond which Loading Loading @@ -4047,6 +4048,10 @@ rcutorture.verbose= [KNL] Enable additional printk() statements. rcupdate.rcu_cpu_stall_ftrace_dump= [KNL] Dump ftrace buffer after reporting RCU CPU stall warning. rcupdate.rcu_cpu_stall_suppress= [KNL] Suppress RCU CPU stall warning messages. Loading
MAINTAINERS +1 −1 Original line number Diff line number Diff line Loading @@ -9326,7 +9326,7 @@ F: drivers/misc/lkdtm/* LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM) M: Alan Stern <stern@rowland.harvard.edu> M: Andrea Parri <andrea.parri@amarulasolutions.com> M: Andrea Parri <parri.andrea@gmail.com> M: Will Deacon <will@kernel.org> M: Peter Zijlstra <peterz@infradead.org> M: Boqun Feng <boqun.feng@gmail.com> Loading
arch/arm/kernel/smp.c +2 −4 Original line number Diff line number Diff line Loading @@ -264,15 +264,13 @@ int __cpu_disable(void) return 0; } static DECLARE_COMPLETION(cpu_died); /* * called on the thread which is asking for a CPU to be shutdown - * waits until shutdown has completed, or it is timed out. */ void __cpu_die(unsigned int cpu) { if (!wait_for_completion_timeout(&cpu_died, msecs_to_jiffies(5000))) { if (!cpu_wait_death(cpu, 5)) { pr_err("CPU%u: cpu didn't die\n", cpu); return; } Loading Loading @@ -319,7 +317,7 @@ void arch_cpu_idle_dead(void) * this returns, power and/or clocks can be removed at any point * from this CPU and its cache by platform_cpu_kill(). */ complete(&cpu_died); (void)cpu_report_death(); /* * Ensure that the cache lines associated with that completion are Loading