Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 67701ae9 authored by Jack F Vogel's avatar Jack F Vogel Committed by Linus Torvalds
Browse files

[PATCH] check nmi watchdog is broken



A bug against an xSeries system showed up recently noting that the
check_nmi_watchdog() test was failing.

I have been investigating it and discovered in both i386 and x86_64 the
recent change to the routine to use the cpu_callin_map has uncovered a
problem.  Prior to that change, on an SMP box, the test was trivally
passing because all cpu's were found to not yet be online, but now with the
callin_map they are discovered, it goes on to test the counter and they
have not yet begun to increment, so it announces a CPU is stuck and bails
out.

On all the systems I have access to test, the announcement of failure is
also bougs...  by the time you can login and check /proc/interrupts, the
NMI count is happily incrementing on all CPUs.  Its just that the test is
being done too early.

I have tried moving the call to the test around a bit, and it was always
too early.  I finally hit on this proposed solution, it delays the routine
via a late_initcall(), seems like the right solution to me.

Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent fd51f666
Loading
Loading
Loading
Loading
+0 −2
Original line number Diff line number Diff line
@@ -1265,8 +1265,6 @@ int __init APIC_init_uniprocessor (void)

	setup_local_APIC();

	if (nmi_watchdog == NMI_LOCAL_APIC)
		check_nmi_watchdog();
#ifdef CONFIG_X86_IO_APIC
	if (smp_found_config)
		if (!skip_ioapic_setup && nr_ioapics)
+0 −2
Original line number Diff line number Diff line
@@ -2175,7 +2175,6 @@ static inline void check_timer(void)
				disable_8259A_irq(0);
				setup_nmi();
				enable_8259A_irq(0);
				check_nmi_watchdog();
			}
			return;
		}
@@ -2198,7 +2197,6 @@ static inline void check_timer(void)
				add_pin_to_irq(0, 0, pin2);
			if (nmi_watchdog == NMI_IO_APIC) {
				setup_nmi();
				check_nmi_watchdog();
			}
			return;
		}
+7 −4
Original line number Diff line number Diff line
@@ -102,20 +102,21 @@ int nmi_active;
	(P4_CCCR_OVF_PMI0|P4_CCCR_THRESHOLD(15)|P4_CCCR_COMPLEMENT|	\
	 P4_CCCR_COMPARE|P4_CCCR_REQUIRED|P4_CCCR_ESCR_SELECT(4)|P4_CCCR_ENABLE)

int __init check_nmi_watchdog (void)
static int __init check_nmi_watchdog(void)
{
	unsigned int prev_nmi_count[NR_CPUS];
	int cpu;

	printk(KERN_INFO "testing NMI watchdog ... ");
	if (nmi_watchdog == NMI_NONE)
		return 0;

	printk(KERN_INFO "Testing NMI watchdog ... ");

	for (cpu = 0; cpu < NR_CPUS; cpu++)
		prev_nmi_count[cpu] = per_cpu(irq_stat, cpu).__nmi_count;
	local_irq_enable();
	mdelay((10*1000)/nmi_hz); // wait 10 ticks

	/* FIXME: Only boot CPU is online at this stage.  Check CPUs
           as they come up. */
	for (cpu = 0; cpu < NR_CPUS; cpu++) {
#ifdef CONFIG_SMP
		/* Check cpu_callin_map here because that is set
@@ -139,6 +140,8 @@ int __init check_nmi_watchdog (void)

	return 0;
}
/* This needs to happen later in boot so counters are working */
late_initcall(check_nmi_watchdog);

static int __init setup_nmi_watchdog(char *str)
{
+0 −3
Original line number Diff line number Diff line
@@ -1089,9 +1089,6 @@ static void __init smp_boot_cpus(unsigned int max_cpus)
		}
	}

	if (nmi_watchdog == NMI_LOCAL_APIC)
		check_nmi_watchdog();

	smpboot_setup_io_apic();

	setup_boot_APIC_clock();
+0 −2
Original line number Diff line number Diff line
@@ -1607,7 +1607,6 @@ static inline void check_timer(void)
				disable_8259A_irq(0);
				setup_nmi();
				enable_8259A_irq(0);
				check_nmi_watchdog();
			}
			return;
		}
@@ -1627,7 +1626,6 @@ static inline void check_timer(void)
			nmi_watchdog_default();
			if (nmi_watchdog == NMI_IO_APIC) {
				setup_nmi();
				check_nmi_watchdog();
			}
			return;
		}
Loading