Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit fe4ba3c3 authored by Chris Metcalf's avatar Chris Metcalf Committed by Linus Torvalds
Browse files

watchdog: add watchdog_cpumask sysctl to assist nohz



Change the default behavior of watchdog so it only runs on the
housekeeping cores when nohz_full is enabled at build and boot time.
Allow modifying the set of cores the watchdog is currently running on
with a new kernel.watchdog_cpumask sysctl.

In the current system, the watchdog subsystem runs a periodic timer that
schedules the watchdog kthread to run.  However, nohz_full cores are
designed to allow userspace application code running on those cores to
have 100% access to the CPU.  So the watchdog system prevents the
nohz_full application code from being able to run the way it wants to,
thus the motivation to suppress the watchdog on nohz_full cores, which
this patchset provides by default.

However, if we disable the watchdog globally, then the housekeeping
cores can't benefit from the watchdog functionality.  So we allow
disabling it only on some cores.  See Documentation/lockup-watchdogs.txt
for more information.

[jhubbard@nvidia.com: fix a watchdog crash in some configurations]
Signed-off-by: default avatarChris Metcalf <cmetcalf@ezchip.com>
Acked-by: default avatarDon Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent b5242e98
Loading
Loading
Loading
Loading
+18 −0
Original line number Diff line number Diff line
@@ -61,3 +61,21 @@ As explained above, a kernel knob is provided that allows
administrators to configure the period of the hrtimer and the perf
event. The right value for a particular environment is a trade-off
between fast response to lockups and detection overhead.

By default, the watchdog runs on all online cores.  However, on a
kernel configured with NO_HZ_FULL, by default the watchdog runs only
on the housekeeping cores, not the cores specified in the "nohz_full"
boot argument.  If we allowed the watchdog to run by default on
the "nohz_full" cores, we would have to run timer ticks to activate
the scheduler, which would prevent the "nohz_full" functionality
from protecting the user code on those cores from the kernel.
Of course, disabling it by default on the nohz_full cores means that
when those cores do enter the kernel, by default we will not be
able to detect if they lock up.  However, allowing the watchdog
to continue to run on the housekeeping (non-tickless) cores means
that we will continue to detect lockups properly on those cores.

In either case, the set of cores excluded from running the watchdog
may be adjusted via the kernel.watchdog_cpumask sysctl.  For
nohz_full cores, this may be useful for debugging a case where the
kernel seems to be hanging on the nohz_full cores.
+21 −0
Original line number Diff line number Diff line
@@ -923,6 +923,27 @@ and nmi_watchdog.

==============================================================

watchdog_cpumask:

This value can be used to control on which cpus the watchdog may run.
The default cpumask is all possible cores, but if NO_HZ_FULL is
enabled in the kernel config, and cores are specified with the
nohz_full= boot argument, those cores are excluded by default.
Offline cores can be included in this mask, and if the core is later
brought online, the watchdog will be started based on the mask value.

Typically this value would only be touched in the nohz_full case
to re-enable cores that by default were not running the watchdog,
if a kernel lockup was suspected on those cores.

The argument value is the standard cpulist format for cpumasks,
so for example to enable the watchdog on cores 0, 2, 3, and 4 you
might say:

  echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask

==============================================================

watchdog_thresh:

This value can be used to control the frequency of hrtimer and NMI
+3 −0
Original line number Diff line number Diff line
@@ -67,6 +67,7 @@ extern int nmi_watchdog_enabled;
extern int soft_watchdog_enabled;
extern int watchdog_user_enabled;
extern int watchdog_thresh;
extern unsigned long *watchdog_cpumask_bits;
extern int sysctl_softlockup_all_cpu_backtrace;
struct ctl_table;
extern int proc_watchdog(struct ctl_table *, int ,
@@ -77,6 +78,8 @@ extern int proc_soft_watchdog(struct ctl_table *, int ,
			      void __user *, size_t *, loff_t *);
extern int proc_watchdog_thresh(struct ctl_table *, int ,
				void __user *, size_t *, loff_t *);
extern int proc_watchdog_cpumask(struct ctl_table *, int,
				 void __user *, size_t *, loff_t *);
#endif

#ifdef CONFIG_HAVE_ACPI_APEI_NMI
+1 −0
Original line number Diff line number Diff line
@@ -338,6 +338,7 @@ EXPORT_SYMBOL_GPL(smpboot_unregister_percpu_thread);
 *
 * The cpumask field in the smp_hotplug_thread must not be updated directly
 * by the client, but only by calling this function.
 * This function can only be called on a registered smp_hotplug_thread.
 */
int smpboot_update_cpumask_percpu_thread(struct smp_hotplug_thread *plug_thread,
					 const struct cpumask *new)
+7 −0
Original line number Diff line number Diff line
@@ -871,6 +871,13 @@ static struct ctl_table kern_table[] = {
		.extra1		= &zero,
		.extra2		= &one,
	},
	{
		.procname	= "watchdog_cpumask",
		.data		= &watchdog_cpumask_bits,
		.maxlen		= NR_CPUS,
		.mode		= 0644,
		.proc_handler	= proc_watchdog_cpumask,
	},
	{
		.procname	= "softlockup_panic",
		.data		= &softlockup_panic,
Loading