Merge tag 'ras_for_3.21' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras (fa45a45c) · Commits · e / devices / android_kernel_fairphone_FP5

Documentation/acpi/apei/einj.txt

+122 −74

Original line number	Diff line number	Diff line
		APEI Error INJection
		~~~~~~~~~~~~~~~~~~~~

		EINJ provides a hardware error injection mechanism
		It is very useful for debugging and testing of other APEI and RAS features.
		EINJ provides a hardware error injection mechanism. It is very useful
		for debugging and testing APEI and RAS features in general.

		To use EINJ, make sure the following are enabled in your kernel
		You need to check whether your BIOS supports EINJ first. For that, look
		for early boot messages similar to this one:

		ACPI: EINJ 0x000000007370A000 000150 (v01 INTEL 00000001 INTL 00000001)

		which shows that the BIOS is exposing an EINJ table - it is the
		mechanism through which the injection is done.

		Alternatively, look in /sys/firmware/acpi/tables for an "EINJ" file,
		which is a different representation of the same thing.

		It doesn't necessarily mean that EINJ is not supported if those above
		don't exist: before you give up, go into BIOS setup to see if the BIOS
		has an option to enable error injection. Look for something called WHEA
		or similar. Often, you need to enable an ACPI5 support option prior, in
		order to see the APEI,EINJ,... functionality supported and exposed by
		the BIOS menu.

		To use EINJ, make sure the following are options enabled in your kernel
		configuration:

		CONFIG_DEBUG_FS
		CONFIG_ACPI_APEI
		CONFIG_ACPI_APEI_EINJ

		The user interface of EINJ is debug file system, under the
		directory apei/einj. The following files are provided.
		The EINJ user interface is in <debugfs mount point>/apei/einj.

		The following files belong to it:

		- available_error_type
		Reading this file returns the error injection capability of the
		platform, that is, which error types are supported. The error type
		definition is as follow, the left field is the error type value, the
		right field is error description.

		This file shows which error types are supported:

		Error Type Value Error Description
		================ =================
		0x00000001 Processor Correctable
		0x00000002 Processor Uncorrectable non-fatal
		0x00000004 Processor Uncorrectable fatal
		@@ -33,97 +52,126 @@ directory apei/einj. The following files are provided.
		0x00000400 Platform Uncorrectable non-fatal
		0x00000800 Platform Uncorrectable fatal

		The format of file contents are as above, except there are only the
		available error type lines.
		The format of the file contents are as above, except present are only
		the available error types.

		- error_type
		This file is used to set the error type value. The error type value
		is defined in "available_error_type" description.

		Set the value of the error type being injected. Possible error types
		are defined in the file available_error_type above.

		- error_inject
		Write any integer to this file to trigger the error
		injection. Before this, please specify all necessary error
		parameters.

		Write any integer to this file to trigger the error injection. Make
		sure you have specified all necessary error parameters, i.e. this
		write should be the last step when injecting errors.

		- flags
		Present for kernel version 3.13 and above. Used to specify which
		of param{1..4} are valid and should be used by BIOS during injection.
		Value is a bitmask as specified in ACPI5.0 spec for the

		Present for kernel versions 3.13 and above. Used to specify which
		of param{1..4} are valid and should be used by the firmware during
		injection. Value is a bitmask as specified in ACPI5.0 spec for the
		SET_ERROR_TYPE_WITH_ADDRESS data structure:
		Bit 0 - Processor APIC field valid (see param3 below)
		Bit 1 - Memory address and mask valid (param1 and param2)
		Bit 2 - PCIe (seg,bus,dev,fn) valid (param4 below)
		If set to zero, legacy behaviour is used where the type of injection
		specifies just one bit set, and param1 is multiplexed.

		Bit 0 - Processor APIC field valid (see param3 below).
		Bit 1 - Memory address and mask valid (param1 and param2).
		Bit 2 - PCIe (seg,bus,dev,fn) valid (see param4 below).

		If set to zero, legacy behavior is mimicked where the type of
		injection specifies just one bit set, and param1 is multiplexed.

		- param1
		This file is used to set the first error parameter value. Effect of
		parameter depends on error_type specified. For example, if error
		type is memory related type, the param1 should be a valid physical
		memory address. [Unless "flag" is set - see above]

		This file is used to set the first error parameter value. Its effect
		depends on the error type specified in error_type. For example, if
		error type is memory related type, the param1 should be a valid
		physical memory address. [Unless "flag" is set - see above]

		- param2
		This file is used to set the second error parameter value. Effect of
		parameter depends on error_type specified. For example, if error
		type is memory related type, the param2 should be a physical memory
		address mask. Linux requires page or narrower granularity, say,
		0xfffffffffffff000.

		Same use as param1 above. For example, if error type is of memory
		related type, then param2 should be a physical memory address mask.
		Linux requires page or narrower granularity, say, 0xfffffffffffff000.

		- param3
		Used when the 0x1 bit is set in "flag" to specify the APIC id

		Used when the 0x1 bit is set in "flags" to specify the APIC id

		- param4
		Used when the 0x4 bit is set in "flag" to specify target PCIe device
		Used when the 0x4 bit is set in "flags" to specify target PCIe device

		- notrigger
		The EINJ mechanism is a two step process. First inject the error, then
		perform some actions to trigger it. Setting "notrigger" to 1 skips the
		trigger phase, which may allow the user to cause the error in some other
		context by a simple access to the cpu, memory location, or device that is
		the target of the error injection. Whether this actually works depends
		on what operations the BIOS actually includes in the trigger phase.

		BIOS versions based in the ACPI 4.0 specification have limited options
		to control where the errors are injected. Your BIOS may support an
		extension (enabled with the param_extension=1 module parameter, or
		boot command line einj.param_extension=1). This allows the address
		and mask for memory injections to be specified by the param1 and
		param2 files in apei/einj.

		BIOS versions using the ACPI 5.0 specification have more control over
		the target of the injection. For processor related errors (type 0x1,
		0x2 and 0x4) the APICID of the target should be provided using the
		param1 file in apei/einj. For memory errors (type 0x8, 0x10 and 0x20)
		the address is set using param1 with a mask in param2 (0x0 is equivalent
		to all ones). For PCI express errors (type 0x40, 0x80 and 0x100) the
		segment, bus, device and function are specified using param1:

		The error injection mechanism is a two-step process. First inject the
		error, then perform some actions to trigger it. Setting "notrigger"
		to 1 skips the trigger phase, which may allow the user to cause the
		error in some other context by a simple access to the CPU, memory
		location, or device that is the target of the error injection. Whether
		this actually works depends on what operations the BIOS actually
		includes in the trigger phase.

		BIOS versions based on the ACPI 4.0 specification have limited options
		in controlling where the errors are injected. Your BIOS may support an
		extension (enabled with the param_extension=1 module parameter, or boot
		command line einj.param_extension=1). This allows the address and mask
		for memory injections to be specified by the param1 and param2 files in
		apei/einj.

		BIOS versions based on the ACPI 5.0 specification have more control over
		the target of the injection. For processor-related errors (type 0x1, 0x2
		and 0x4), you can set flags to 0x3 (param3 for bit 0, and param1 and
		param2 for bit 1) so that you have more information added to the error
		signature being injected. The actual data passed is this:

		memory_address = param1;
		memory_address_range = param2;
		apicid = param3;
		pcie_sbdf = param4;

		For memory errors (type 0x8, 0x10 and 0x20) the address is set using
		param1 with a mask in param2 (0x0 is equivalent to all ones). For PCI
		express errors (type 0x40, 0x80 and 0x100) the segment, bus, device and
		function are specified using param1:

		31 24 23 16 15 11 10 8 7 0
		+-------------------------------------------------+
		\| segment \| bus \| device \| function \| reserved \|
		+-------------------------------------------------+

		An ACPI 5.0 BIOS may also allow vendor specific errors to be injected.
		Anyway, you get the idea, if there's doubt just take a look at the code
		in drivers/acpi/apei/einj.c.

		An ACPI 5.0 BIOS may also allow vendor-specific errors to be injected.
		In this case a file named vendor will contain identifying information
		from the BIOS that hopefully will allow an application wishing to use
		the vendor specific extension to tell that they are running on a BIOS
		the vendor-specific extension to tell that they are running on a BIOS
		that supports it. All vendor extensions have the 0x80000000 bit set in
		error_type. A file vendor_flags controls the interpretation of param1
		and param2 (1 = PROCESSOR, 2 = MEMORY, 4 = PCI). See your BIOS vendor
		documentation for details (and expect changes to this API if vendors
		creativity in using this feature expands beyond our expectations).

		Example:

		An error injection example:

		# cd /sys/kernel/debug/apei/einj
		# cat available_error_type # See which errors can be injected
		0x00000002 Processor Uncorrectable non-fatal
		0x00000008 Memory Correctable
		0x00000010 Memory Uncorrectable non-fatal
		# echo 0x12345000 > param1 # Set memory address for injection
		# echo 0xfffffffffffff000 > param2 # Mask - anywhere in this page
		# echo $((-1 << 12)) > param2 # Mask 0xfffffffffffff000 - anywhere in this page
		# echo 0x8 > error_type # Choose correctable memory error
		# echo 1 > error_inject # Inject now

		You should see something like this in dmesg:

		[22715.830801] EDAC sbridge MC3: HANDLING MCE MEMORY ERROR
		[22715.834759] EDAC sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
		[22715.834759] EDAC sbridge MC3: TSC 0
		[22715.834759] EDAC sbridge MC3: ADDR 12345000 EDAC sbridge MC3: MISC 144780c86
		[22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
		[22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)

		For more information about EINJ, please refer to ACPI specification
		version 4.0, section 17.5 and ACPI 5.0, section 18.6.

arch/x86/include/asm/mce.h

+4 −4

Original line number	Diff line number	Diff line
		@@ -183,11 +183,11 @@ typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
		DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);

		enum mcp_flags {
		MCP_TIMESTAMP = (1 << 0), /* log time stamp */
		MCP_UC = (1 << 1), /* log uncorrected errors */
		MCP_DONTLOG = (1 << 2), /* only clear, don't log */
		MCP_TIMESTAMP = BIT(0), /* log time stamp */
		MCP_UC = BIT(1), /* log uncorrected errors */
		MCP_DONTLOG = BIT(2), /* only clear, don't log */
		};
		void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
		bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b);

		int mce_notify_irq(void);

arch/x86/kernel/cpu/mcheck/mce-internal.h

+5 −4

Original line number	Diff line number	Diff line
		@@ -14,6 +14,7 @@ enum severity_level {
		};

		#define ATTR_LEN 16
		#define INITIAL_CHECK_INTERVAL 5 * 60 /* 5 minutes */

		/* One object for each MCE bank, shared by all CPUs */
		struct mce_bank {
		@@ -30,13 +31,13 @@ extern struct mce_bank *mce_banks;
		extern mce_banks_t mce_banks_ce_disabled;

		#ifdef CONFIG_X86_MCE_INTEL
		unsigned long mce_intel_adjust_timer(unsigned long interval);
		void mce_intel_cmci_poll(void);
		unsigned long cmci_intel_adjust_timer(unsigned long interval);
		bool mce_intel_cmci_poll(void);
		void mce_intel_hcpu_update(unsigned long cpu);
		void cmci_disable_bank(int bank);
		#else
		# define mce_intel_adjust_timer mce_adjust_timer_default
		static inline void mce_intel_cmci_poll(void) { }
		# define cmci_intel_adjust_timer mce_adjust_timer_default
		static inline bool mce_intel_cmci_poll(void) { return false; }
		static inline void mce_intel_hcpu_update(unsigned long cpu) { }
		static inline void cmci_disable_bank(int bank) { }
		#endif

arch/x86/kernel/cpu/mcheck/mce.c

+47 −43

Original line number	Diff line number	Diff line
		@@ -88,9 +88,6 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait);
		static DEFINE_PER_CPU(struct mce, mces_seen);
		static int cpu_missing;

		/* CMCI storm detection filter */
		static DEFINE_PER_CPU(unsigned long, mce_polled_error);

		/*
		* MCA banks polled by the period polling timer for corrected events.
		* With Intel CMCI, this only has MCA banks which do not support CMCI (if any).
		@@ -624,8 +621,9 @@ DEFINE_PER_CPU(unsigned, mce_poll_count);
		* is already totally * confused. In this case it's likely it will
		* not fully execute the machine check handler either.
		*/
		void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
		bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
		{
		bool error_logged = false;
		struct mce m;
		int severity;
		int i;
		@@ -648,7 +646,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
		if (!(m.status & MCI_STATUS_VAL))
		continue;

		this_cpu_write(mce_polled_error, 1);

		/*
		* Uncorrected or signalled events are handled by the exception
		* handler when it is enabled, so don't process those here.
		@@ -681,8 +679,10 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
		* Don't get the IP here because it's unlikely to
		* have anything to do with the actual error location.
		*/
		if (!(flags & MCP_DONTLOG) && !mca_cfg.dont_log_ce)
		if (!(flags & MCP_DONTLOG) && !mca_cfg.dont_log_ce) {
		error_logged = true;
		mce_log(&m);
		}

		/*
		* Clear state for this bank.
		@@ -696,6 +696,8 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
		*/

		sync_core();

		return error_logged;
		}
		EXPORT_SYMBOL_GPL(machine_check_poll);

		@@ -815,7 +817,7 @@ static void mce_reign(void)
		* other CPUs.
		*/
		if (m && global_worst >= MCE_PANIC_SEVERITY && mca_cfg.tolerant < 3)
		mce_panic("Fatal Machine check", m, msg);
		mce_panic("Fatal machine check", m, msg);

		/*
		* For UC somewhere we let the CPU who detects it handle it.
		@@ -828,7 +830,7 @@ static void mce_reign(void)
		* source or one CPU is hung. Panic.
		*/
		if (global_worst <= MCE_KEEP_SEVERITY && mca_cfg.tolerant < 3)
		mce_panic("Machine check from unknown source", NULL, NULL);
		mce_panic("Fatal machine check from unknown source", NULL, NULL);

		/*
		* Now clear all the mces_seen so that they don't reappear on
		@@ -1260,7 +1262,7 @@ void mce_log_therm_throt_event(__u64 status)
		* poller finds an MCE, poll 2x faster. When the poller finds no more
		* errors, poll 2x slower (up to check_interval seconds).
		*/
		static unsigned long check_interval = 5 * 60; /* 5 minutes */
		static unsigned long check_interval = INITIAL_CHECK_INTERVAL;

		static DEFINE_PER_CPU(unsigned long, mce_next_interval); /* in jiffies */
		static DEFINE_PER_CPU(struct timer_list, mce_timer);
		@@ -1270,49 +1272,57 @@ static unsigned long mce_adjust_timer_default(unsigned long interval)
		return interval;
		}

		static unsigned long (*mce_adjust_timer)(unsigned long interval) =
		mce_adjust_timer_default;
		static unsigned long (*mce_adjust_timer)(unsigned long interval) = mce_adjust_timer_default;

		static int cmc_error_seen(void)
		static void __restart_timer(struct timer_list *t, unsigned long interval)
		{
		unsigned long *v = this_cpu_ptr(&mce_polled_error);
		unsigned long when = jiffies + interval;
		unsigned long flags;

		return test_and_clear_bit(0, v);
		local_irq_save(flags);

		if (timer_pending(t)) {
		if (time_before(when, t->expires))
		mod_timer_pinned(t, when);
		} else {
		t->expires = round_jiffies(when);
		add_timer_on(t, smp_processor_id());
		}

		local_irq_restore(flags);
		}

		static void mce_timer_fn(unsigned long data)
		{
		struct timer_list *t = this_cpu_ptr(&mce_timer);
		int cpu = smp_processor_id();
		unsigned long iv;
		int notify;

		WARN_ON(smp_processor_id() != data);
		WARN_ON(cpu != data);

		iv = __this_cpu_read(mce_next_interval);

		if (mce_available(this_cpu_ptr(&cpu_info))) {
		machine_check_poll(MCP_TIMESTAMP,
		this_cpu_ptr(&mce_poll_banks));
		mce_intel_cmci_poll();
		machine_check_poll(MCP_TIMESTAMP, this_cpu_ptr(&mce_poll_banks));

		if (mce_intel_cmci_poll()) {
		iv = mce_adjust_timer(iv);
		goto done;
		}
		}

		/*
		* Alert userspace if needed. If we logged an MCE, reduce the
		* polling interval, otherwise increase the polling interval.
		* Alert userspace if needed. If we logged an MCE, reduce the polling
		* interval, otherwise increase the polling interval.
		*/
		iv = __this_cpu_read(mce_next_interval);
		notify = mce_notify_irq();
		notify \|= cmc_error_seen();
		if (notify) {
		if (mce_notify_irq())
		iv = max(iv / 2, (unsigned long) HZ/100);
		} else {
		else
		iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));
		iv = mce_adjust_timer(iv);
		}

		done:
		__this_cpu_write(mce_next_interval, iv);
		/* Might have become 0 after CMCI storm subsided */
		if (iv) {
		t->expires = jiffies + iv;
		add_timer_on(t, smp_processor_id());
		}
		__restart_timer(t, iv);
		}

		/*
		@@ -1321,16 +1331,10 @@ static void mce_timer_fn(unsigned long data)
		void mce_timer_kick(unsigned long interval)
		{
		struct timer_list *t = this_cpu_ptr(&mce_timer);
		unsigned long when = jiffies + interval;
		unsigned long iv = __this_cpu_read(mce_next_interval);

		if (timer_pending(t)) {
		if (time_before(when, t->expires))
		mod_timer_pinned(t, when);
		} else {
		t->expires = round_jiffies(when);
		add_timer_on(t, smp_processor_id());
		}
		__restart_timer(t, interval);

		if (interval < iv)
		__this_cpu_write(mce_next_interval, interval);
		}
		@@ -1631,7 +1635,7 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
		switch (c->x86_vendor) {
		case X86_VENDOR_INTEL:
		mce_intel_feature_init(c);
		mce_adjust_timer = mce_intel_adjust_timer;
		mce_adjust_timer = cmci_intel_adjust_timer;
		break;
		case X86_VENDOR_AMD:
		mce_amd_feature_init(c);

arch/x86/kernel/cpu/mcheck/mce_amd.c

+8 −3

Original line number	Diff line number	Diff line
		@@ -79,7 +79,7 @@ static inline bool is_shared_bank(int bank)
		return (bank == 4);
		}

		static const char * const bank4_names(struct threshold_block *b)
		static const char bank4_names(const struct threshold_block b)
		{
		switch (b->address) {
		/* MSR4_MISC0 */
		@@ -250,6 +250,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
		if (!b.interrupt_capable)
		goto init;

		b.interrupt_enable = 1;
		new = (high & MASK_LVTOFF_HI) >> 20;
		offset = setup_APIC_mce(offset, new);

		@@ -322,6 +323,8 @@ static void amd_threshold_interrupt(void)
		log:
		mce_setup(&m);
		rdmsrl(MSR_IA32_MCx_STATUS(bank), m.status);
		if (!(m.status & MCI_STATUS_VAL))
		return;
		m.misc = ((u64)high << 32) \| low;
		m.bank = bank;
		mce_log(&m);
		@@ -497,10 +500,12 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank,
		b->interrupt_capable = lvt_interrupt_supported(bank, high);
		b->threshold_limit = THRESHOLD_MAX;

		if (b->interrupt_capable)
		if (b->interrupt_capable) {
		threshold_ktype.default_attrs[2] = &interrupt_enable.attr;
		else
		b->interrupt_enable = 1;
		} else {
		threshold_ktype.default_attrs[2] = NULL;
		}

		INIT_LIST_HEAD(&b->miscj);