Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 414f827c authored by Linus Torvalds's avatar Linus Torvalds
Browse files
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (94 commits)
  [PATCH] x86-64: Remove mk_pte_phys()
  [PATCH] i386: Fix broken CONFIG_COMPAT_VDSO on i386
  [PATCH] i386: fix 32-bit ioctls on x64_32
  [PATCH] x86: Unify pcspeaker platform device code between i386/x86-64
  [PATCH] i386: Remove extern declaration from mm/discontig.c, put in header.
  [PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c
  [PATCH] i386: Move mce_disabled to asm/mce.h
  [PATCH] i386: paravirt unhandled fallthrough
  [PATCH] x86_64: Wire up compat epoll_pwait
  [PATCH] x86: Don't require the vDSO for handling a.out signals
  [PATCH] i386: Fix Cyrix MediaGX detection
  [PATCH] i386: Fix warning in cpu initialization
  [PATCH] i386: Fix warning in microcode.c
  [PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs
  [PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo
  [PATCH] i386: Remove fastcall in paravirt.[ch]
  [PATCH] x86-64: Fix wrong gcc check in bitops.h
  [PATCH] x86-64: survive having no irq mapping for a vector
  [PATCH] i386: geode configuration fixes
  [PATCH] i386: add option to show more code in oops reports
  ...
parents 86a71dbd 126b1922
Loading
Loading
Loading
Loading
+8 −0
Original line number Diff line number Diff line
@@ -104,6 +104,9 @@ loader, and have no meaning to the kernel directly.
Do not modify the syntax of boot loader parameters without extreme
need or coordination with <Documentation/i386/boot.txt>.

There are also arch-specific kernel-parameters not documented here.
See for example <Documentation/x86_64/boot-options.txt>.

Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
a trailing = on the name of any parameter states that that parameter will
be entered as an environment variable, whereas its absence indicates that
@@ -361,6 +364,11 @@ and is between 256 and 4096 characters. It is defined in the file
			clocksource is not available, it defaults to PIT.
			Format: { pit | tsc | cyclone | pmtmr }

	code_bytes	[IA32] How many bytes of object code to print in an
			oops report.
			Range: 0 - 8192
			Default: 64

	disable_8254_timer
	enable_8254_timer
			[IA32/X86_64] Disable/Enable interrupt 0 timer routing
+83 −49
Original line number Diff line number Diff line
@@ -180,40 +180,81 @@ PCI
  pci=lastbus=NUMBER	       Scan upto NUMBER busses, no matter what the mptable says.
  pci=noacpi		Don't use ACPI to set up PCI interrupt routing.

IOMMU

 iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge]
         [,forcesac][,fullflush][,nomerge][,noaperture][,calgary]
   size  set size of iommu (in bytes)
   noagp don't initialize the AGP driver and use full aperture.
   off   don't use the IOMMU
   leak  turn on simple iommu leak tracing (only when CONFIG_IOMMU_LEAK is on)
   memaper[=order] allocate an own aperture over RAM with size 32MB^order.
   noforce don't force IOMMU usage. Default.
   force  Force IOMMU.
   merge  Do SG merging. Implies force (experimental)
   nomerge Don't do SG merging.
   forcesac For SAC mode for masks <40bits  (experimental)
   fullflush Flush IOMMU on each allocation (default)
   nofullflush Don't use IOMMU fullflush
   allowed  overwrite iommu off workarounds for specific chipsets.
   soft	 Use software bounce buffering (default for Intel machines)
   noaperture Don't touch the aperture for AGP.
   allowdac Allow DMA >4GB
	    When off all DMA over >4GB is forced through an IOMMU or bounce
	    buffering.
   nodac    Forbid DMA >4GB
   panic    Always panic when IOMMU overflows
IOMMU (input/output memory management unit)

 Currently four x86-64 PCI-DMA mapping implementations exist:

   1. <arch/x86_64/kernel/pci-nommu.c>: use no hardware/software IOMMU at all
      (e.g. because you have < 3 GB memory).
      Kernel boot message: "PCI-DMA: Disabling IOMMU"

   2. <arch/x86_64/kernel/pci-gart.c>: AMD GART based hardware IOMMU.
      Kernel boot message: "PCI-DMA: using GART IOMMU"

   3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used
      e.g. if there is no hardware IOMMU in the system and it is need because
      you have >3GB memory or told the kernel to us it (iommu=soft))
      Kernel boot message: "PCI-DMA: Using software bounce buffering
      for IO (SWIOTLB)"

   4. <arch/x86_64/pci-calgary.c> : IBM Calgary hardware IOMMU. Used in IBM
      pSeries and xSeries servers. This hardware IOMMU supports DMA address
      mapping with memory protection, etc.
      Kernel boot message: "PCI-DMA: Using Calgary IOMMU"

 iommu=[<size>][,noagp][,off][,force][,noforce][,leak[=<nr_of_leak_pages>]
	[,memaper[=<order>]][,merge][,forcesac][,fullflush][,nomerge]
	[,noaperture][,calgary]

  General iommu options:
    off                Don't initialize and use any kind of IOMMU.
    noforce            Don't force hardware IOMMU usage when it is not needed.
                       (default).
    force              Force the use of the hardware IOMMU even when it is
                       not actually needed (e.g. because < 3 GB memory).
    soft               Use software bounce buffering (SWIOTLB) (default for
                       Intel machines). This can be used to prevent the usage
                       of an available hardware IOMMU.

  iommu options only relevant to the AMD GART hardware IOMMU:
    <size>             Set the size of the remapping area in bytes.
    allowed            Overwrite iommu off workarounds for specific chipsets.
    fullflush          Flush IOMMU on each allocation (default).
    nofullflush        Don't use IOMMU fullflush.
    leak               Turn on simple iommu leak tracing (only when
                       CONFIG_IOMMU_LEAK is on). Default number of leak pages
                       is 20.
    memaper[=<order>]  Allocate an own aperture over RAM with size 32MB<<order.
                       (default: order=1, i.e. 64MB)
    merge              Do scatter-gather (SG) merging. Implies "force"
                       (experimental).
    nomerge            Don't do scatter-gather (SG) merging.
    noaperture         Ask the IOMMU not to touch the aperture for AGP.
    forcesac           Force single-address cycle (SAC) mode for masks <40bits
                       (experimental).
    noagp              Don't initialize the AGP driver and use full aperture.
    allowdac           Allow double-address cycle (DAC) mode, i.e. DMA >4GB.
                       DAC is used with 32-bit PCI to push a 64-bit address in
                       two cycles. When off all DMA over >4GB is forced through
                       an IOMMU or software bounce buffering.
    nodac              Forbid DAC mode, i.e. DMA >4GB.
    panic              Always panic when IOMMU overflows.
    calgary            Use the Calgary IOMMU if it is available

  swiotlb=pages[,force]

  pages  Prereserve that many 128K pages for the software IO bounce buffering.
  iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU
  implementation:
    swiotlb=<pages>[,force]
    <pages>            Prereserve that many 128K pages for the software IO
                       bounce buffering.
    force              Force all IO through the software TLB.

  Settings for the IBM Calgary hardware IOMMU currently found in IBM
  pSeries and xSeries machines:

    calgary=[64k,128k,256k,512k,1M,2M,4M,8M]
    calgary=[translate_empty_slots]
    calgary=[disable=<PCI bus number>]
    panic              Always panic when IOMMU overflows

    64k,...,8M - Set the size of each PCI slot's translation table
    when using the Calgary IOMMU. This is the size of the translation
@@ -239,7 +280,7 @@ Debugging
		This will also cause panics on machine check exceptions.
		Useful together with panic=30 to trigger a reboot.

  kstack=N   Print that many words from the kernel stack in oops dumps.
  kstack=N	Print N words from the kernel stack in oops dumps.

  pagefaulttrace  Dump all page faults. Only useful for extreme debugging
		and will create a lot of output.
@@ -251,15 +292,8 @@ Debugging
		newfallback: use new unwinder but fall back to old if it gets
			stuck (default)

  call_trace=[old|both|newfallback|new]
		old: use old inexact backtracer
		new: use new exact dwarf2 unwinder
 		both: print entries from both
		newfallback: use new unwinder but fall back to old if it gets
			stuck (default)

Misc
Miscellaneous

  noreplacement  Don't replace instructions with more appropriate ones
		 for the CPU. This may be useful on asymmetric MP systems
		 where some CPU have less capabilities than the others.
		 where some CPUs have less capabilities than others.
+1 −1
Original line number Diff line number Diff line
@@ -2,7 +2,7 @@ Firmware support for CPU hotplug under Linux/x86-64
---------------------------------------------------

Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to
know in advance boot time the maximum number of CPUs that could be plugged
know in advance of boot time the maximum number of CPUs that could be plugged
into the system. ACPI 3.0 currently has no official way to supply
this information from the firmware to the operating system.

+13 −13
Original line number Diff line number Diff line
@@ -9,9 +9,9 @@ zombie. While the thread is in user space the kernel stack is empty
except for the thread_info structure at the bottom.

In addition to the per thread stacks, there are specialized stacks
associated with each cpu.  These stacks are only used while the kernel
is in control on that cpu, when a cpu returns to user space the
specialized stacks contain no useful data.  The main cpu stacks is
associated with each CPU.  These stacks are only used while the kernel
is in control on that CPU; when a CPU returns to user space the
specialized stacks contain no useful data.  The main CPU stacks are:

* Interrupt stack.  IRQSTACKSIZE

@@ -32,17 +32,17 @@ x86_64 also has a feature which is not available on i386, the ability
to automatically switch to a new stack for designated events such as
double fault or NMI, which makes it easier to handle these unusual
events on x86_64.  This feature is called the Interrupt Stack Table
(IST).  There can be up to 7 IST entries per cpu. The IST code is an
index into the Task State Segment (TSS), the IST entries in the TSS
point to dedicated stacks, each stack can be a different size.
(IST).  There can be up to 7 IST entries per CPU. The IST code is an
index into the Task State Segment (TSS). The IST entries in the TSS
point to dedicated stacks; each stack can be a different size.

An IST is selected by an non-zero value in the IST field of an
An IST is selected by a non-zero value in the IST field of an
interrupt-gate descriptor.  When an interrupt occurs and the hardware
loads such a descriptor, the hardware automatically sets the new stack
pointer based on the IST value, then invokes the interrupt handler.  If
software wants to allow nested IST interrupts then the handler must
adjust the IST values on entry to and exit from the interrupt handler.
(this is occasionally done, e.g. for debug exceptions)
(This is occasionally done, e.g. for debug exceptions.)

Events with different IST codes (i.e. with different stacks) can be
nested.  For example, a debug interrupt can safely be interrupted by an
@@ -58,17 +58,17 @@ The currently assigned IST stacks are :-

  Used for interrupt 12 - Stack Fault Exception (#SS).

  This allows to recover from invalid stack segments. Rarely
  This allows the CPU to recover from invalid stack segments. Rarely
  happens.

* DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).

  Used for interrupt 8 - Double Fault Exception (#DF).

  Invoked when handling a exception causes another exception. Happens
  when the kernel is very confused (e.g. kernel stack pointer corrupt)
  Using a separate stack allows to recover from it well enough in many
  cases to still output an oops.
  Invoked when handling one exception causes another exception. Happens
  when the kernel is very confused (e.g. kernel stack pointer corrupt).
  Using a separate stack allows the kernel to recover from it well enough
  in many cases to still output an oops.

* NMI_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).

+70 −0
Original line number Diff line number Diff line

Configurable sysfs parameters for the x86-64 machine check code.

Machine checks report internal hardware error conditions detected
by the CPU. Uncorrected errors typically cause a machine check
(often with panic), corrected ones cause a machine check log entry.

Machine checks are organized in banks (normally associated with
a hardware subsystem) and subevents in a bank. The exact meaning
of the banks and subevent is CPU specific.

mcelog knows how to decode them.

When you see the "Machine check errors logged" message in the system
log then mcelog should run to collect and decode machine check entries
from /dev/mcelog. Normally mcelog should be run regularly from a cronjob.

Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN
(N = CPU number)

The directory contains some configurable entries:

Entries:

bankNctl
(N bank number)
	64bit Hex bitmask enabling/disabling specific subevents for bank N
	When a bit in the bitmask is zero then the respective
	subevent will not be reported.
	By default all events are enabled.
	Note that BIOS maintain another mask to disable specific events
	per bank.  This is not visible here

The following entries appear for each CPU, but they are truly shared
between all CPUs.

check_interval
	How often to poll for corrected machine check errors, in seconds
	(Note output is hexademical). Default 5 minutes.

tolerant
	Tolerance level. When a machine check exception occurs for a non
	corrected machine check the kernel can take different actions.
	Since machine check exceptions can happen any time it is sometimes
	risky for the kernel to kill a process because it defies
	normal kernel locking rules. The tolerance level configures
	how hard the kernel tries to recover even at some risk of deadlock.

	0: always panic,
	1: panic if deadlock possible,
	2: try to avoid panic,
   	3: never panic or exit (for testing only)

	Default: 1

	Note this only makes a difference if the CPU allows recovery
	from a machine check exception. Current x86 CPUs generally do not.

trigger
	Program to run when a machine check event is detected.
	This is an alternative to running mcelog regularly from cron
	and allows to detect events faster.

TBD document entries for AMD threshold interrupt configuration

For more details about the x86 machine check architecture
see the Intel and AMD architecture manuals from their developer websites.

For more details about the architecture see
see http://one.firstfloor.org/~andi/mce.pdf
Loading