Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 8da8533d authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull EDAC patches from Mauro Carvalho Chehab:

 - the second part of the EDAC rework:
    - Add the sysfs nodes that exports the real memory layout, instead
      of the fake one (needed to properly represent Intel memory
      controllers since 2002)
    - convert EDAC MC to use "struct device" instead of creating the
      sysfs nodes via the kobj API
    - adds a tracepoint to represent memory errors

 - some cleanup patches

 - some fixes at i5000, i5400 and EDAC core

 - a new EDAC driver for Caldera.

* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (33 commits)
  edac i5000, i5400: fix pointer math in i5000_get_mc_regs()
  edac: allow specifying the error count with fake_inject
  edac: add support for Calxeda highbank L2 cache ecc
  edac: add support for Calxeda highbank memory controller
  edac: create top-level debugfs directory
  sb_edac: properly handle error count
  i7core_edac: properly handle error count
  edac: edac_mc_handle_error(): add an error_count parameter
  edac: remove arch-specific parameter for the error handler
  amd64_edac: Don't pass driver name as an error parameter
  edac_mc: check for allocation failure in edac_mc_alloc()
  edac: Increase version to 3.0.0
  edac_mc: Cleanup per-dimm_info debug messages
  edac: Convert debugfX to edac_dbg(X,
  edac: Use more normal debugging macro style
  edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs
  Edac: Add ABI Documentation for the new device nodes
  edac: move documentation ABI to ABI/testing/sysfs-devices-edac
  i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy
  edac: change the mem allocation scheme to make Documentation/kobject.txt happy
  ...
parents f50f118c c2078e4c
Loading
Loading
Loading
Loading
+140 −0
Original line number Diff line number Diff line
What:		/sys/devices/system/edac/mc/mc*/reset_counters
Date:		January 2006
Contact:	linux-edac@vger.kernel.org
Description:	This write-only control file will zero all the statistical
		counters for UE and CE errors on the given memory controller.
		Zeroing the counters will also reset the timer indicating how
		long since the last counter were reset. This is useful for
		computing errors/time.  Since the counters are always reset
		at driver initialization time, no module/kernel parameter
		is available.

What:		/sys/devices/system/edac/mc/mc*/seconds_since_reset
Date:		January 2006
Contact:	linux-edac@vger.kernel.org
Description:	This attribute file displays how many seconds have elapsed
		since the last counter reset. This can be used with the error
		counters to measure error rates.

What:		/sys/devices/system/edac/mc/mc*/mc_name
Date:		January 2006
Contact:	linux-edac@vger.kernel.org
Description:	This attribute file displays the type of memory controller
		that is being utilized.

What:		/sys/devices/system/edac/mc/mc*/size_mb
Date:		January 2006
Contact:	linux-edac@vger.kernel.org
Description:	This attribute file displays, in count of megabytes, of memory
		that this memory controller manages.

What:		/sys/devices/system/edac/mc/mc*/ue_count
Date:		January 2006
Contact:	linux-edac@vger.kernel.org
Description:	This attribute file displays the total count of uncorrectable
		errors that have occurred on this memory controller. If
		panic_on_ue is set, this counter will not have a chance to
		increment, since EDAC will panic the system

What:		/sys/devices/system/edac/mc/mc*/ue_noinfo_count
Date:		January 2006
Contact:	linux-edac@vger.kernel.org
Description:	This attribute file displays the number of UEs that have
		occurred on this memory controller with no information as to
		which DIMM slot is having errors.

What:		/sys/devices/system/edac/mc/mc*/ce_count
Date:		January 2006
Contact:	linux-edac@vger.kernel.org
Description:	This attribute file displays the total count of correctable
		errors that have occurred on this memory controller. This
		count is very important to examine. CEs provide early
		indications that a DIMM is beginning to fail. This count
		field should be monitored for non-zero values and report
		such information to the system administrator.

What:		/sys/devices/system/edac/mc/mc*/ce_noinfo_count
Date:		January 2006
Contact:	linux-edac@vger.kernel.org
Description:	This attribute file displays the number of CEs that
		have occurred on this memory controller wherewith no
		information as to which DIMM slot is having errors. Memory is
		handicapped, but operational, yet no information is available
		to indicate which slot the failing memory is in. This count
		field should be also be monitored for non-zero values.

What:		/sys/devices/system/edac/mc/mc*/sdram_scrub_rate
Date:		February 2007
Contact:	linux-edac@vger.kernel.org
Description:	Read/Write attribute file that controls memory scrubbing.
		The scrubbing rate used by the memory controller is set by
		writing a minimum bandwidth in bytes/sec to the attribute file.
		The rate will be translated to an internal value that gives at
		least the specified rate.
		Reading the file will return the actual scrubbing rate employed.
		If configuration fails or memory scrubbing is not implemented,
		the value of the attribute file will be -1.

What:		/sys/devices/system/edac/mc/mc*/max_location
Date:		April 2012
Contact:	Mauro Carvalho Chehab <mchehab@redhat.com>
		linux-edac@vger.kernel.org
Description:	This attribute file displays the information about the last
		available memory slot in this memory controller. It is used by
		userspace tools in order to display the memory filling layout.

What:		/sys/devices/system/edac/mc/mc*/(dimm|rank)*/size
Date:		April 2012
Contact:	Mauro Carvalho Chehab <mchehab@redhat.com>
		linux-edac@vger.kernel.org
Description:	This attribute file will display the size of dimm or rank.
		For dimm*/size, this is the size, in MB of the DIMM memory
		stick. For rank*/size, this is the size, in MB for one rank
		of the DIMM memory stick. On single rank memories (1R), this
		is also the total size of the dimm. On dual rank (2R) memories,
		this is half the size of the total DIMM memories.

What:		/sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_dev_type
Date:		April 2012
Contact:	Mauro Carvalho Chehab <mchehab@redhat.com>
		linux-edac@vger.kernel.org
Description:	This attribute file will display what type of DRAM device is
		being utilized on this DIMM (x1, x2, x4, x8, ...).

What:		/sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_edac_mode
Date:		April 2012
Contact:	Mauro Carvalho Chehab <mchehab@redhat.com>
		linux-edac@vger.kernel.org
Description:	This attribute file will display what type of Error detection
		and correction is being utilized. For example: S4ECD4ED would
		mean a Chipkill with x4 DRAM.

What:		/sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_label
Date:		April 2012
Contact:	Mauro Carvalho Chehab <mchehab@redhat.com>
		linux-edac@vger.kernel.org
Description:	This control file allows this DIMM to have a label assigned
		to it. With this label in the module, when errors occur
		the output can provide the DIMM label in the system log.
		This becomes vital for panic events to isolate the
		cause of the UE event.
		DIMM Labels must be assigned after booting, with information
		that correctly identifies the physical slot with its
		silk screen label. This information is currently very
		motherboard specific and determination of this information
		must occur in userland at this time.

What:		/sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_location
Date:		April 2012
Contact:	Mauro Carvalho Chehab <mchehab@redhat.com>
		linux-edac@vger.kernel.org
Description:	This attribute file will display the location (csrow/channel,
		branch/channel/slot or channel/slot) of the dimm or rank.

What:		/sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_mem_type
Date:		April 2012
Contact:	Mauro Carvalho Chehab <mchehab@redhat.com>
		linux-edac@vger.kernel.org
Description:	This attribute file will display what type of memory is
		currently on this csrow. Normally, either buffered or
		unbuffered memory (for example, Unbuffered-DDR3).
+15 −0
Original line number Diff line number Diff line
Calxeda Highbank L2 cache ECC

Properties:
- compatible : Should be "calxeda,hb-sregs-l2-ecc"
- reg : Address and size for ECC error interrupt clear registers.
- interrupts : Should be single bit error interrupt, then double bit error
	interrupt.

Example:

	sregs@fff3c200 {
		compatible = "calxeda,hb-sregs-l2-ecc";
		reg = <0xfff3c200 0x100>;
		interrupts = <0 71 4  0 72 4>;
	};
+14 −0
Original line number Diff line number Diff line
Calxeda DDR memory controller

Properties:
- compatible : Should be "calxeda,hb-ddr-ctrl"
- reg : Address and size for DDR controller registers.
- interrupts : Interrupt for DDR controller.

Example:

	memory-controller@fff00000 {
		compatible = "calxeda,hb-ddr-ctrl";
		reg = <0xfff00000 0x1000>;
		interrupts = <0 91 4>;
	};
+8 −104
Original line number Diff line number Diff line
@@ -232,116 +232,20 @@ EDAC control and attribute files.


In 'mcX' directories are EDAC control and attribute files for
this 'X' instance of the memory controllers:


Counter reset control file:

	'reset_counters'

	This write-only control file will zero all the statistical counters
	for UE and CE errors.  Zeroing the counters will also reset the timer
	indicating how long since the last counter zero.  This is useful
	for computing errors/time.  Since the counters are always reset at
	driver initialization time, no module/kernel parameter is available.

	RUN TIME: echo "anything" >/sys/devices/system/edac/mc/mc0/counter_reset

		This resets the counters on memory controller 0


Seconds since last counter reset control file:

	'seconds_since_reset'

	This attribute file displays how many seconds have elapsed since the
	last counter reset. This can be used with the error counters to
	measure error rates.



Memory Controller name attribute file:

	'mc_name'

	This attribute file displays the type of memory controller
	that is being utilized.


Total memory managed by this memory controller attribute file:

	'size_mb'

	This attribute file displays, in count of megabytes, of memory
	that this instance of memory controller manages.


Total Uncorrectable Errors count attribute file:

	'ue_count'

	This attribute file displays the total count of uncorrectable
	errors that have occurred on this memory controller. If panic_on_ue
	is set this counter will not have a chance to increment,
	since EDAC will panic the system.


Total UE count that had no information attribute fileY:

	'ue_noinfo_count'

	This attribute file displays the number of UEs that have occurred
	with no information as to which DIMM slot is having errors.


Total Correctable Errors count attribute file:

	'ce_count'

	This attribute file displays the total count of correctable
	errors that have occurred on this memory controller. This
	count is very important to examine. CEs provide early
	indications that a DIMM is beginning to fail. This count
	field should be monitored for non-zero values and report
	such information to the system administrator.


Total Correctable Errors count attribute file:

	'ce_noinfo_count'

	This attribute file displays the number of CEs that
	have occurred wherewith no information as to which DIMM slot
	is having errors. Memory is handicapped, but operational,
	yet no information is available to indicate which slot
	the failing memory is in. This count field should be also
	be monitored for non-zero values.

Device Symlink:

	'device'

	Symlink to the memory controller device.

Sdram memory scrubbing rate:

	'sdram_scrub_rate'

	Read/Write attribute file that controls memory scrubbing. The scrubbing
	rate is set by writing a minimum bandwidth in bytes/sec to the attribute
	file. The rate will be translated to an internal value that gives at
	least the specified rate.

	Reading the file will return the actual scrubbing rate employed.

	If configuration fails or memory scrubbing is not implemented, accessing
	that attribute will fail.
this 'X' instance of the memory controllers.

For a description of the sysfs API, please see:
	Documentation/ABI/testing/sysfs/devices-edac


============================================================================
'csrowX' DIRECTORIES

When CONFIG_EDAC_LEGACY_SYSFS is enabled, the sysfs will contain the
csrowX directories. As this API doesn't work properly for Rambus, FB-DIMMs
and modern Intel Memory Controllers, this is being deprecated in favor
of dimmX directories.

In the 'csrowX' directories are EDAC control and attribute files for
this 'X' instance of csrow:

+12 −0
Original line number Diff line number Diff line
@@ -130,6 +130,12 @@
			clocks = <&eclk>;
		};

		memory-controller@fff00000 {
			compatible = "calxeda,hb-ddr-ctrl";
			reg = <0xfff00000 0x1000>;
			interrupts = <0 91 4>;
		};

		ipc@fff20000 {
			compatible = "arm,pl320", "arm,primecell";
			reg = <0xfff20000 0x1000>;
@@ -275,6 +281,12 @@
			};
		};

		sregs@fff3c200 {
			compatible = "calxeda,hb-sregs-l2-ecc";
			reg = <0xfff3c200 0x100>;
			interrupts = <0 71 4  0 72 4>;
		};

		dma@fff3d000 {
			compatible = "arm,pl330", "arm,primecell";
			reg = <0xfff3d000 0x1000>;
Loading