Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit a2ee2981 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (80 commits)
  x86, mce: Add boot options for corrected errors
  x86, mce: Fix mce printing
  x86, mce: fix for mce counters
  x86, mce: support action-optional machine checks
  x86, mce: define MCE_VECTOR
  x86, mce: rename mce_notify_user to mce_notify_irq
  x86: fix panic with interrupts off (needed for MCE)
  x86, mce: export MCE severities coverage via debugfs
  x86, mce: implement new status bits
  x86, mce: print header/footer only once for multiple MCEs
  x86, mce: default to panic timeout for machine checks
  x86, mce: improve mce_get_rip
  x86, mce: make non Monarch panic message "Fatal machine check" too
  x86, mce: switch x86 machine check handler to Monarch election.
  x86, mce: implement panic synchronization
  x86, mce: implement bootstrapping for machine check wakeups
  x86, mce: check early in exception handler if panic is needed
  x86, mce: add table driven machine check grading
  x86, mce: remove TSC print heuristic
  x86, mce: log corrected errors when panicing
  ...
parents 7603ef03 0d595972
Loading
Loading
Loading
Loading
+15 −0
Original line number Diff line number Diff line
@@ -48,6 +48,7 @@ o procps 3.2.0 # ps --version
o  oprofile               0.9                     # oprofiled --version
o  udev                   081                     # udevinfo -V
o  grub                   0.93                    # grub --version
o  mcelog		  0.6

Kernel compilation
==================
@@ -276,6 +277,16 @@ before running exportfs or mountd. It is recommended that all NFS
services be protected from the internet-at-large by a firewall where
that is possible.

mcelog
------

In Linux 2.6.31+ the i386 kernel needs to run the mcelog utility
as a regular cronjob similar to the x86-64 kernel to process and log
machine check events when CONFIG_X86_NEW_MCE is enabled. Machine check
events are errors reported by the CPU. Processing them is strongly encouraged.
All x86-64 kernels since 2.6.4 require the mcelog utility to
process machine checks.

Getting updated software
========================

@@ -365,6 +376,10 @@ FUSE
----
o <http://sourceforge.net/projects/fuse>

mcelog
------
o <ftp://ftp.kernel.org/pub/linux/utils/cpu/mce/mcelog/>

Networking
**********

+10 −0
Original line number Diff line number Diff line
@@ -437,3 +437,13 @@ Why: Superseded by tdfxfb. I2C/DDC support used to live in a separate
	driver but this caused driver conflicts.
Who:	Jean Delvare <khali@linux-fr.org>
	Krzysztof Helt <krzysztof.h1@wp.pl>

----------------------------

What:	CONFIG_X86_OLD_MCE
When:	2.6.32
Why:	Remove the old legacy 32bit machine check code. This has been
	superseded by the newer machine check code from the 64bit port,
	but the old version has been kept around for easier testing. Note this
	doesn't impact the old P5 and WinChip machine check handlers.
Who:	Andi Kleen <andi@firstfloor.org>
+37 −7
Original line number Diff line number Diff line
@@ -5,21 +5,51 @@ only the AMD64 specific ones are listed here.

Machine check

   mce=off disable machine check
   mce=bootlog Enable logging of machine checks left over from booting.
   Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables.

   mce=off
		Disable machine check
   mce=no_cmci
		Disable CMCI(Corrected Machine Check Interrupt) that
		Intel processor supports.  Usually this disablement is
		not recommended, but it might be handy if your hardware
		is misbehaving.
		Note that you'll get more problems without CMCI than with
		due to the shared banks, i.e. you might get duplicated
		error logs.
   mce=dont_log_ce
		Don't make logs for corrected errors.  All events reported
		as corrected are silently cleared by OS.
		This option will be useful if you have no interest in any
		of corrected errors.
   mce=ignore_ce
		Disable features for corrected errors, e.g. polling timer
		and CMCI.  All events reported as corrected are not cleared
		by OS and remained in its error banks.
		Usually this disablement is not recommended, however if
		there is an agent checking/clearing corrected errors
		(e.g. BIOS or hardware monitoring applications), conflicting
		with OS's error handling, and you cannot deactivate the agent,
		then this option will be a help.
   mce=bootlog
		Enable logging of machine checks left over from booting.
		Disabled by default on AMD because some BIOS leave bogus ones.
		If your BIOS doesn't do that it's a good idea to enable though
		to make sure you log even machine check events that result
		in a reboot. On Intel systems it is enabled by default.
   mce=nobootlog
		Disable boot machine check logging.
   mce=tolerancelevel (number)
   mce=tolerancelevel[,monarchtimeout] (number,number)
		tolerance levels:
		0: always panic on uncorrected errors, log corrected errors
		1: panic or SIGBUS on uncorrected errors, log corrected errors
		2: SIGBUS or log uncorrected errors, log corrected errors
		3: never panic or SIGBUS, log all errors (for testing only)
		Default is 1
		Can be also set using sysfs which is preferable.
		monarchtimeout:
		Sets the time in us to wait for other CPUs on machine checks. 0
		to disable.

   nomce (for compatibility with i386): same as mce=off

+7 −1
Original line number Diff line number Diff line
@@ -41,7 +41,9 @@ check_interval
	the polling interval.  When the poller stops finding MCEs, it
	triggers an exponential backoff (poll less often) on the polling
	interval. The check_interval variable is both the initial and
	maximum polling interval.
	maximum polling interval. 0 means no polling for corrected machine
	check errors (but some corrected errors might be still reported
	in other ways)

tolerant
	Tolerance level. When a machine check exception occurs for a non
@@ -67,6 +69,10 @@ trigger
	Program to run when a machine check event is detected.
	This is an alternative to running mcelog regularly from cron
	and allows to detect events faster.
monarch_timeout
	How long to wait for the other CPUs to machine check too on a
	exception. 0 to disable waiting for other CPUs.
	Unit: us

TBD document entries for AMD threshold interrupt configuration

+41 −4
Original line number Diff line number Diff line
@@ -789,10 +789,26 @@ config X86_MCE
	  to disable it.  MCE support simply ignores non-MCE processors like
	  the 386 and 486, so nearly everyone can say Y here.

config X86_OLD_MCE
	depends on X86_32 && X86_MCE
	bool "Use legacy machine check code (will go away)"
	default n
	select X86_ANCIENT_MCE
	---help---
	  Use the old i386 machine check code. This is merely intended for
	  testing in a transition period. Try this if you run into any machine
	  check related software problems, but report the problem to
	  linux-kernel.  When in doubt say no.

config X86_NEW_MCE
	depends on X86_MCE
	bool
	default y if (!X86_OLD_MCE && X86_32) || X86_64

config X86_MCE_INTEL
	def_bool y
	prompt "Intel MCE features"
	depends on X86_64 && X86_MCE && X86_LOCAL_APIC
	depends on X86_NEW_MCE && X86_LOCAL_APIC
	---help---
	   Additional support for intel specific MCE features such as
	   the thermal monitor.
@@ -800,19 +816,36 @@ config X86_MCE_INTEL
config X86_MCE_AMD
	def_bool y
	prompt "AMD MCE features"
	depends on X86_64 && X86_MCE && X86_LOCAL_APIC
	depends on X86_NEW_MCE && X86_LOCAL_APIC
	---help---
	   Additional support for AMD specific MCE features such as
	   the DRAM Error Threshold.

config X86_ANCIENT_MCE
	def_bool n
	depends on X86_32
	prompt "Support for old Pentium 5 / WinChip machine checks"
	---help---
	  Include support for machine check handling on old Pentium 5 or WinChip
	  systems. These typically need to be enabled explicitely on the command
	  line.

config X86_MCE_THRESHOLD
	depends on X86_MCE_AMD || X86_MCE_INTEL
	bool
	default y

config X86_MCE_INJECT
	depends on X86_NEW_MCE
	tristate "Machine check injector support"
	---help---
	  Provide support for injecting machine checks for testing purposes.
	  If you don't know what a machine check is and you don't do kernel
	  QA it is safe to say n.

config X86_MCE_NONFATAL
	tristate "Check for non-fatal errors on AMD Athlon/Duron / Intel Pentium 4"
	depends on X86_32 && X86_MCE
	depends on X86_OLD_MCE
	---help---
	  Enabling this feature starts a timer that triggers every 5 seconds which
	  will look at the machine check registers to see if anything happened.
@@ -825,11 +858,15 @@ config X86_MCE_NONFATAL

config X86_MCE_P4THERMAL
	bool "check for P4 thermal throttling interrupt."
	depends on X86_32 && X86_MCE && (X86_UP_APIC || SMP)
	depends on X86_OLD_MCE && X86_MCE && (X86_UP_APIC || SMP)
	---help---
	  Enabling this feature will cause a message to be printed when the P4
	  enters thermal throttling.

config X86_THERMAL_VECTOR
	def_bool y
	depends on X86_MCE_P4THERMAL || X86_MCE_INTEL

config VM86
	bool "Enable VM86 support" if EMBEDDED
	default y
Loading