Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit fa45a45c authored by Ingo Molnar's avatar Ingo Molnar
Browse files

Merge tag 'ras_for_3.21' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras



Pull RAS updates from Borislav Petkov:

 "- Enable AMD thresholding IRQ by default if supported. (Aravind Gopalakrishnan)

  - Unify mce_panic() message pattern. (Derek Che)

  - A bit more involved simplification of the CMCI logic after yet another
    report about race condition with the adaptive logic. (Borislav Petkov)

  - ACPI APEI EINJ fleshing out of the user documentation. (Borislav Petkov)

  - Minor cleanup. (Jan Beulich.)"

Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
parents e07e0d4c d79f931f
Loading
Loading
Loading
Loading
+122 −74
Original line number Diff line number Diff line
			APEI Error INJection
			~~~~~~~~~~~~~~~~~~~~

EINJ provides a hardware error injection mechanism
It is very useful for debugging and testing of other APEI and RAS features.
EINJ provides a hardware error injection mechanism. It is very useful
for debugging and testing APEI and RAS features in general.

To use EINJ, make sure the following are enabled in your kernel
You need to check whether your BIOS supports EINJ first. For that, look
for early boot messages similar to this one:

ACPI: EINJ 0x000000007370A000 000150 (v01 INTEL           00000001 INTL 00000001)

which shows that the BIOS is exposing an EINJ table - it is the
mechanism through which the injection is done.

Alternatively, look in /sys/firmware/acpi/tables for an "EINJ" file,
which is a different representation of the same thing.

It doesn't necessarily mean that EINJ is not supported if those above
don't exist: before you give up, go into BIOS setup to see if the BIOS
has an option to enable error injection. Look for something called WHEA
or similar. Often, you need to enable an ACPI5 support option prior, in
order to see the APEI,EINJ,... functionality supported and exposed by
the BIOS menu.

To use EINJ, make sure the following are options enabled in your kernel
configuration:

CONFIG_DEBUG_FS
CONFIG_ACPI_APEI
CONFIG_ACPI_APEI_EINJ

The user interface of EINJ is debug file system, under the
directory apei/einj. The following files are provided.
The EINJ user interface is in <debugfs mount point>/apei/einj.

The following files belong to it:

- available_error_type
  Reading this file returns the error injection capability of the
  platform, that is, which error types are supported. The error type
  definition is as follow, the left field is the error type value, the
  right field is error description.

  This file shows which error types are supported:

  Error Type Value	Error Description
  ================	=================
  0x00000001		Processor Correctable
  0x00000002		Processor Uncorrectable non-fatal
  0x00000004		Processor Uncorrectable fatal
@@ -33,97 +52,126 @@ directory apei/einj. The following files are provided.
  0x00000400		Platform Uncorrectable non-fatal
  0x00000800		Platform Uncorrectable fatal

  The format of file contents are as above, except there are only the
  available error type lines.
  The format of the file contents are as above, except present are only
  the available error types.

- error_type
  This file is used to set the error type value. The error type value
  is defined in "available_error_type" description.

  Set the value of the error type being injected. Possible error types
  are defined in the file available_error_type above.

- error_inject
  Write any integer to this file to trigger the error
  injection. Before this, please specify all necessary error
  parameters.

  Write any integer to this file to trigger the error injection. Make
  sure you have specified all necessary error parameters, i.e. this
  write should be the last step when injecting errors.

- flags
  Present for kernel version 3.13 and above. Used to specify which
  of param{1..4} are valid and should be used by BIOS during injection.
  Value is a bitmask as specified in ACPI5.0 spec for the

  Present for kernel versions 3.13 and above. Used to specify which
  of param{1..4} are valid and should be used by the firmware during
  injection. Value is a bitmask as specified in ACPI5.0 spec for the
  SET_ERROR_TYPE_WITH_ADDRESS data structure:
	Bit 0 - Processor APIC field valid (see param3 below)
	Bit 1 - Memory address and mask valid (param1 and param2)
	Bit 2 - PCIe (seg,bus,dev,fn) valid (param4 below)
  If set to zero, legacy behaviour is used where the type of injection
  specifies just one bit set, and param1 is multiplexed.

	Bit 0 - Processor APIC field valid (see param3 below).
	Bit 1 - Memory address and mask valid (param1 and param2).
	Bit 2 - PCIe (seg,bus,dev,fn) valid (see param4 below).

  If set to zero, legacy behavior is mimicked where the type of
  injection specifies just one bit set, and param1 is multiplexed.

- param1
  This file is used to set the first error parameter value. Effect of
  parameter depends on error_type specified. For example, if error
  type is memory related type, the param1 should be a valid physical
  memory address. [Unless "flag" is set - see above]

  This file is used to set the first error parameter value. Its effect
  depends on the error type specified in error_type. For example, if
  error type is memory related type, the param1 should be a valid
  physical memory address. [Unless "flag" is set - see above]

- param2
  This file is used to set the second error parameter value. Effect of
  parameter depends on error_type specified. For example, if error
  type is memory related type, the param2 should be a physical memory
  address mask. Linux requires page or narrower granularity, say,
  0xfffffffffffff000.

  Same use as param1 above. For example, if error type is of memory
  related type, then param2 should be a physical memory address mask.
  Linux requires page or narrower granularity, say, 0xfffffffffffff000.

- param3
  Used when the 0x1 bit is set in "flag" to specify the APIC id

  Used when the 0x1 bit is set in "flags" to specify the APIC id

- param4
  Used when the 0x4 bit is set in "flag" to specify target PCIe device
  Used when the 0x4 bit is set in "flags" to specify target PCIe device

- notrigger
  The EINJ mechanism is a two step process. First inject the error, then
  perform some actions to trigger it. Setting "notrigger" to 1 skips the
  trigger phase, which *may* allow the user to cause the error in some other
  context by a simple access to the cpu, memory location, or device that is
  the target of the error injection. Whether this actually works depends
  on what operations the BIOS actually includes in the trigger phase.

BIOS versions based in the ACPI 4.0 specification have limited options
to control where the errors are injected.  Your BIOS may support an
extension (enabled with the param_extension=1 module parameter, or
boot command line einj.param_extension=1). This allows the address
and mask for memory injections to be specified by the param1 and
param2 files in apei/einj.

BIOS versions using the ACPI 5.0 specification have more control over
the target of the injection. For processor related errors (type 0x1,
0x2 and 0x4) the APICID of the target should be provided using the
param1 file in apei/einj. For memory errors (type 0x8, 0x10 and 0x20)
the address is set using param1 with a mask in param2 (0x0 is equivalent
to all ones). For PCI express errors (type 0x40, 0x80 and 0x100) the
segment, bus, device and function are specified using param1:

  The error injection mechanism is a two-step process. First inject the
  error, then perform some actions to trigger it. Setting "notrigger"
  to 1 skips the trigger phase, which *may* allow the user to cause the
  error in some other context by a simple access to the CPU, memory
  location, or device that is the target of the error injection. Whether
  this actually works depends on what operations the BIOS actually
  includes in the trigger phase.

BIOS versions based on the ACPI 4.0 specification have limited options
in controlling where the errors are injected. Your BIOS may support an
extension (enabled with the param_extension=1 module parameter, or boot
command line einj.param_extension=1). This allows the address and mask
for memory injections to be specified by the param1 and param2 files in
apei/einj.

BIOS versions based on the ACPI 5.0 specification have more control over
the target of the injection. For processor-related errors (type 0x1, 0x2
and 0x4), you can set flags to 0x3 (param3 for bit 0, and param1 and
param2 for bit 1) so that you have more information added to the error
signature being injected. The actual data passed is this:

	memory_address = param1;
	memory_address_range = param2;
	apicid = param3;
	pcie_sbdf = param4;

For memory errors (type 0x8, 0x10 and 0x20) the address is set using
param1 with a mask in param2 (0x0 is equivalent to all ones). For PCI
express errors (type 0x40, 0x80 and 0x100) the segment, bus, device and
function are specified using param1:

         31     24 23    16 15    11 10      8  7        0
	+-------------------------------------------------+
	| segment |   bus  | device | function | reserved |
	+-------------------------------------------------+

An ACPI 5.0 BIOS may also allow vendor specific errors to be injected.
Anyway, you get the idea, if there's doubt just take a look at the code
in drivers/acpi/apei/einj.c.

An ACPI 5.0 BIOS may also allow vendor-specific errors to be injected.
In this case a file named vendor will contain identifying information
from the BIOS that hopefully will allow an application wishing to use
the vendor specific extension to tell that they are running on a BIOS
the vendor-specific extension to tell that they are running on a BIOS
that supports it. All vendor extensions have the 0x80000000 bit set in
error_type. A file vendor_flags controls the interpretation of param1
and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor
documentation for details (and expect changes to this API if vendors
creativity in using this feature expands beyond our expectations).

Example:

An error injection example:

# cd /sys/kernel/debug/apei/einj
# cat available_error_type		# See which errors can be injected
0x00000002	Processor Uncorrectable non-fatal
0x00000008	Memory Correctable
0x00000010	Memory Uncorrectable non-fatal
# echo 0x12345000 > param1		# Set memory address for injection
# echo 0xfffffffffffff000 > param2	# Mask - anywhere in this page
# echo $((-1 << 12)) > param2		# Mask 0xfffffffffffff000 - anywhere in this page
# echo 0x8 > error_type			# Choose correctable memory error
# echo 1 > error_inject			# Inject now

You should see something like this in dmesg:

[22715.830801] EDAC sbridge MC3: HANDLING MCE MEMORY ERROR
[22715.834759] EDAC sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
[22715.834759] EDAC sbridge MC3: TSC 0
[22715.834759] EDAC sbridge MC3: ADDR 12345000 EDAC sbridge MC3: MISC 144780c86
[22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
[22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)

For more information about EINJ, please refer to ACPI specification
version 4.0, section 17.5 and ACPI 5.0, section 18.6.
+4 −4
Original line number Diff line number Diff line
@@ -183,11 +183,11 @@ typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);

enum mcp_flags {
	MCP_TIMESTAMP = (1 << 0),	/* log time stamp */
	MCP_UC = (1 << 1),		/* log uncorrected errors */
	MCP_DONTLOG = (1 << 2),		/* only clear, don't log */
	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
	MCP_UC		= BIT(1),	/* log uncorrected errors */
	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
};
void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);

int mce_notify_irq(void);

+5 −4
Original line number Diff line number Diff line
@@ -14,6 +14,7 @@ enum severity_level {
};

#define ATTR_LEN		16
#define INITIAL_CHECK_INTERVAL	5 * 60 /* 5 minutes */

/* One object for each MCE bank, shared by all CPUs */
struct mce_bank {
@@ -30,13 +31,13 @@ extern struct mce_bank *mce_banks;
extern mce_banks_t mce_banks_ce_disabled;

#ifdef CONFIG_X86_MCE_INTEL
unsigned long mce_intel_adjust_timer(unsigned long interval);
void mce_intel_cmci_poll(void);
unsigned long cmci_intel_adjust_timer(unsigned long interval);
bool mce_intel_cmci_poll(void);
void mce_intel_hcpu_update(unsigned long cpu);
void cmci_disable_bank(int bank);
#else
# define mce_intel_adjust_timer mce_adjust_timer_default
static inline void mce_intel_cmci_poll(void) { }
# define cmci_intel_adjust_timer mce_adjust_timer_default
static inline bool mce_intel_cmci_poll(void) { return false; }
static inline void mce_intel_hcpu_update(unsigned long cpu) { }
static inline void cmci_disable_bank(int bank) { }
#endif
+47 −43
Original line number Diff line number Diff line
@@ -88,9 +88,6 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait);
static DEFINE_PER_CPU(struct mce, mces_seen);
static int			cpu_missing;

/* CMCI storm detection filter */
static DEFINE_PER_CPU(unsigned long, mce_polled_error);

/*
 * MCA banks polled by the period polling timer for corrected events.
 * With Intel CMCI, this only has MCA banks which do not support CMCI (if any).
@@ -624,8 +621,9 @@ DEFINE_PER_CPU(unsigned, mce_poll_count);
 * is already totally * confused. In this case it's likely it will
 * not fully execute the machine check handler either.
 */
void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
	bool error_logged = false;
	struct mce m;
	int severity;
	int i;
@@ -648,7 +646,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
		if (!(m.status & MCI_STATUS_VAL))
			continue;

		this_cpu_write(mce_polled_error, 1);

		/*
		 * Uncorrected or signalled events are handled by the exception
		 * handler when it is enabled, so don't process those here.
@@ -681,8 +679,10 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
		 * Don't get the IP here because it's unlikely to
		 * have anything to do with the actual error location.
		 */
		if (!(flags & MCP_DONTLOG) && !mca_cfg.dont_log_ce)
		if (!(flags & MCP_DONTLOG) && !mca_cfg.dont_log_ce) {
			error_logged = true;
			mce_log(&m);
		}

		/*
		 * Clear state for this bank.
@@ -696,6 +696,8 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
	 */

	sync_core();

	return error_logged;
}
EXPORT_SYMBOL_GPL(machine_check_poll);

@@ -815,7 +817,7 @@ static void mce_reign(void)
	 * other CPUs.
	 */
	if (m && global_worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3)
		mce_panic("Fatal Machine check", m, msg);
		mce_panic("Fatal machine check", m, msg);

	/*
	 * For UC somewhere we let the CPU who detects it handle it.
@@ -828,7 +830,7 @@ static void mce_reign(void)
	 * source or one CPU is hung. Panic.
	 */
	if (global_worst <= MCE_KEEP_SEVERITY && mca_cfg.tolerant < 3)
		mce_panic("Machine check from unknown source", NULL, NULL);
		mce_panic("Fatal machine check from unknown source", NULL, NULL);

	/*
	 * Now clear all the mces_seen so that they don't reappear on
@@ -1260,7 +1262,7 @@ void mce_log_therm_throt_event(__u64 status)
 * poller finds an MCE, poll 2x faster.  When the poller finds no more
 * errors, poll 2x slower (up to check_interval seconds).
 */
static unsigned long check_interval = 5 * 60; /* 5 minutes */
static unsigned long check_interval = INITIAL_CHECK_INTERVAL;

static DEFINE_PER_CPU(unsigned long, mce_next_interval); /* in jiffies */
static DEFINE_PER_CPU(struct timer_list, mce_timer);
@@ -1270,49 +1272,57 @@ static unsigned long mce_adjust_timer_default(unsigned long interval)
	return interval;
}

static unsigned long (*mce_adjust_timer)(unsigned long interval) =
	mce_adjust_timer_default;
static unsigned long (*mce_adjust_timer)(unsigned long interval) = mce_adjust_timer_default;

static int cmc_error_seen(void)
static void __restart_timer(struct timer_list *t, unsigned long interval)
{
	unsigned long *v = this_cpu_ptr(&mce_polled_error);
	unsigned long when = jiffies + interval;
	unsigned long flags;

	return test_and_clear_bit(0, v);
	local_irq_save(flags);

	if (timer_pending(t)) {
		if (time_before(when, t->expires))
			mod_timer_pinned(t, when);
	} else {
		t->expires = round_jiffies(when);
		add_timer_on(t, smp_processor_id());
	}

	local_irq_restore(flags);
}

static void mce_timer_fn(unsigned long data)
{
	struct timer_list *t = this_cpu_ptr(&mce_timer);
	int cpu = smp_processor_id();
	unsigned long iv;
	int notify;

	WARN_ON(smp_processor_id() != data);
	WARN_ON(cpu != data);

	iv = __this_cpu_read(mce_next_interval);

	if (mce_available(this_cpu_ptr(&cpu_info))) {
		machine_check_poll(MCP_TIMESTAMP,
				this_cpu_ptr(&mce_poll_banks));
		mce_intel_cmci_poll();
		machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_poll_banks));

		if (mce_intel_cmci_poll()) {
			iv = mce_adjust_timer(iv);
			goto done;
		}
	}

	/*
	 * Alert userspace if needed.  If we logged an MCE, reduce the
	 * polling interval, otherwise increase the polling interval.
	 * Alert userspace if needed. If we logged an MCE, reduce the polling
	 * interval, otherwise increase the polling interval.
	 */
	iv = __this_cpu_read(mce_next_interval);
	notify = mce_notify_irq();
	notify |= cmc_error_seen();
	if (notify) {
	if (mce_notify_irq())
		iv = max(iv / 2, (unsigned long) HZ/100);
	} else {
	else
		iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));
		iv = mce_adjust_timer(iv);
	}

done:
	__this_cpu_write(mce_next_interval, iv);
	/* Might have become 0 after CMCI storm subsided */
	if (iv) {
		t->expires = jiffies + iv;
		add_timer_on(t, smp_processor_id());
	}
	__restart_timer(t, iv);
}

/*
@@ -1321,16 +1331,10 @@ static void mce_timer_fn(unsigned long data)
void mce_timer_kick(unsigned long interval)
{
	struct timer_list *t = this_cpu_ptr(&mce_timer);
	unsigned long when = jiffies + interval;
	unsigned long iv = __this_cpu_read(mce_next_interval);

	if (timer_pending(t)) {
		if (time_before(when, t->expires))
			mod_timer_pinned(t, when);
	} else {
		t->expires = round_jiffies(when);
		add_timer_on(t, smp_processor_id());
	}
	__restart_timer(t, interval);

	if (interval < iv)
		__this_cpu_write(mce_next_interval, interval);
}
@@ -1631,7 +1635,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
	switch (c->x86_vendor) {
	case X86_VENDOR_INTEL:
		mce_intel_feature_init(c);
		mce_adjust_timer = mce_intel_adjust_timer;
		mce_adjust_timer = cmci_intel_adjust_timer;
		break;
	case X86_VENDOR_AMD:
		mce_amd_feature_init(c);
+8 −3
Original line number Diff line number Diff line
@@ -79,7 +79,7 @@ static inline bool is_shared_bank(int bank)
	return (bank == 4);
}

static const char * const bank4_names(struct threshold_block *b)
static const char *bank4_names(const struct threshold_block *b)
{
	switch (b->address) {
	/* MSR4_MISC0 */
@@ -250,6 +250,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
			if (!b.interrupt_capable)
				goto init;

			b.interrupt_enable = 1;
			new	= (high & MASK_LVTOFF_HI) >> 20;
			offset  = setup_APIC_mce(offset, new);

@@ -322,6 +323,8 @@ static void amd_threshold_interrupt(void)
log:
	mce_setup(&m);
	rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
	if (!(m.status & MCI_STATUS_VAL))
		return;
	m.misc = ((u64)high << 32) | low;
	m.bank = bank;
	mce_log(&m);
@@ -497,10 +500,12 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank,
	b->interrupt_capable	= lvt_interrupt_supported(bank, high);
	b->threshold_limit	= THRESHOLD_MAX;

	if (b->interrupt_capable)
	if (b->interrupt_capable) {
		threshold_ktype.default_attrs[2] = &interrupt_enable.attr;
	else
		b->interrupt_enable = 1;
	} else {
		threshold_ktype.default_attrs[2] = NULL;
	}

	INIT_LIST_HEAD(&b->miscj);

Loading