Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 414f827c authored by Linus Torvalds's avatar Linus Torvalds
Browse files
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (94 commits)
  [PATCH] x86-64: Remove mk_pte_phys()
  [PATCH] i386: Fix broken CONFIG_COMPAT_VDSO on i386
  [PATCH] i386: fix 32-bit ioctls on x64_32
  [PATCH] x86: Unify pcspeaker platform device code between i386/x86-64
  [PATCH] i386: Remove extern declaration from mm/discontig.c, put in header.
  [PATCH] i386: Rename cpu_gdt_descr and remove extern declaration from smpboot.c
  [PATCH] i386: Move mce_disabled to asm/mce.h
  [PATCH] i386: paravirt unhandled fallthrough
  [PATCH] x86_64: Wire up compat epoll_pwait
  [PATCH] x86: Don't require the vDSO for handling a.out signals
  [PATCH] i386: Fix Cyrix MediaGX detection
  [PATCH] i386: Fix warning in cpu initialization
  [PATCH] i386: Fix warning in microcode.c
  [PATCH] x86: Enable NMI watchdog for AMD Family 0x10 CPUs
  [PATCH] x86: Add new CPUID bits for AMD Family 10 CPUs in /proc/cpuinfo
  [PATCH] i386: Remove fastcall in paravirt.[ch]
  [PATCH] x86-64: Fix wrong gcc check in bitops.h
  [PATCH] x86-64: survive having no irq mapping for a vector
  [PATCH] i386: geode configuration fixes
  [PATCH] i386: add option to show more code in oops reports
  ...
parents 86a71dbd 126b1922
Loading
Loading
Loading
Loading
+8 −0
Original line number Original line Diff line number Diff line
@@ -104,6 +104,9 @@ loader, and have no meaning to the kernel directly.
Do not modify the syntax of boot loader parameters without extreme
Do not modify the syntax of boot loader parameters without extreme
need or coordination with <Documentation/i386/boot.txt>.
need or coordination with <Documentation/i386/boot.txt>.


There are also arch-specific kernel-parameters not documented here.
See for example <Documentation/x86_64/boot-options.txt>.

Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
a trailing = on the name of any parameter states that that parameter will
a trailing = on the name of any parameter states that that parameter will
be entered as an environment variable, whereas its absence indicates that
be entered as an environment variable, whereas its absence indicates that
@@ -361,6 +364,11 @@ and is between 256 and 4096 characters. It is defined in the file
			clocksource is not available, it defaults to PIT.
			clocksource is not available, it defaults to PIT.
			Format: { pit | tsc | cyclone | pmtmr }
			Format: { pit | tsc | cyclone | pmtmr }


	code_bytes	[IA32] How many bytes of object code to print in an
			oops report.
			Range: 0 - 8192
			Default: 64

	disable_8254_timer
	disable_8254_timer
	enable_8254_timer
	enable_8254_timer
			[IA32/X86_64] Disable/Enable interrupt 0 timer routing
			[IA32/X86_64] Disable/Enable interrupt 0 timer routing
+83 −49
Original line number Original line Diff line number Diff line
@@ -180,40 +180,81 @@ PCI
  pci=lastbus=NUMBER	       Scan upto NUMBER busses, no matter what the mptable says.
  pci=lastbus=NUMBER	       Scan upto NUMBER busses, no matter what the mptable says.
  pci=noacpi		Don't use ACPI to set up PCI interrupt routing.
  pci=noacpi		Don't use ACPI to set up PCI interrupt routing.


IOMMU
IOMMU (input/output memory management unit)


 iommu=[size][,noagp][,off][,force][,noforce][,leak][,memaper[=order]][,merge]
 Currently four x86-64 PCI-DMA mapping implementations exist:
         [,forcesac][,fullflush][,nomerge][,noaperture][,calgary]

   size  set size of iommu (in bytes)
   1. <arch/x86_64/kernel/pci-nommu.c>: use no hardware/software IOMMU at all
   noagp don't initialize the AGP driver and use full aperture.
      (e.g. because you have < 3 GB memory).
   off   don't use the IOMMU
      Kernel boot message: "PCI-DMA: Disabling IOMMU"
   leak  turn on simple iommu leak tracing (only when CONFIG_IOMMU_LEAK is on)

   memaper[=order] allocate an own aperture over RAM with size 32MB^order.
   2. <arch/x86_64/kernel/pci-gart.c>: AMD GART based hardware IOMMU.
   noforce don't force IOMMU usage. Default.
      Kernel boot message: "PCI-DMA: using GART IOMMU"
   force  Force IOMMU.

   merge  Do SG merging. Implies force (experimental)
   3. <arch/x86_64/kernel/pci-swiotlb.c> : Software IOMMU implementation. Used
   nomerge Don't do SG merging.
      e.g. if there is no hardware IOMMU in the system and it is need because
   forcesac For SAC mode for masks <40bits  (experimental)
      you have >3GB memory or told the kernel to us it (iommu=soft))
   fullflush Flush IOMMU on each allocation (default)
      Kernel boot message: "PCI-DMA: Using software bounce buffering
   nofullflush Don't use IOMMU fullflush
      for IO (SWIOTLB)"
   allowed  overwrite iommu off workarounds for specific chipsets.

   soft	 Use software bounce buffering (default for Intel machines)
   4. <arch/x86_64/pci-calgary.c> : IBM Calgary hardware IOMMU. Used in IBM
   noaperture Don't touch the aperture for AGP.
      pSeries and xSeries servers. This hardware IOMMU supports DMA address
   allowdac Allow DMA >4GB
      mapping with memory protection, etc.
	    When off all DMA over >4GB is forced through an IOMMU or bounce
      Kernel boot message: "PCI-DMA: Using Calgary IOMMU"
	    buffering.

   nodac    Forbid DMA >4GB
 iommu=[<size>][,noagp][,off][,force][,noforce][,leak[=<nr_of_leak_pages>]
   panic    Always panic when IOMMU overflows
	[,memaper[=<order>]][,merge][,forcesac][,fullflush][,nomerge]
	[,noaperture][,calgary]

  General iommu options:
    off                Don't initialize and use any kind of IOMMU.
    noforce            Don't force hardware IOMMU usage when it is not needed.
                       (default).
    force              Force the use of the hardware IOMMU even when it is
                       not actually needed (e.g. because < 3 GB memory).
    soft               Use software bounce buffering (SWIOTLB) (default for
                       Intel machines). This can be used to prevent the usage
                       of an available hardware IOMMU.

  iommu options only relevant to the AMD GART hardware IOMMU:
    <size>             Set the size of the remapping area in bytes.
    allowed            Overwrite iommu off workarounds for specific chipsets.
    fullflush          Flush IOMMU on each allocation (default).
    nofullflush        Don't use IOMMU fullflush.
    leak               Turn on simple iommu leak tracing (only when
                       CONFIG_IOMMU_LEAK is on). Default number of leak pages
                       is 20.
    memaper[=<order>]  Allocate an own aperture over RAM with size 32MB<<order.
                       (default: order=1, i.e. 64MB)
    merge              Do scatter-gather (SG) merging. Implies "force"
                       (experimental).
    nomerge            Don't do scatter-gather (SG) merging.
    noaperture         Ask the IOMMU not to touch the aperture for AGP.
    forcesac           Force single-address cycle (SAC) mode for masks <40bits
                       (experimental).
    noagp              Don't initialize the AGP driver and use full aperture.
    allowdac           Allow double-address cycle (DAC) mode, i.e. DMA >4GB.
                       DAC is used with 32-bit PCI to push a 64-bit address in
                       two cycles. When off all DMA over >4GB is forced through
                       an IOMMU or software bounce buffering.
    nodac              Forbid DAC mode, i.e. DMA >4GB.
    panic              Always panic when IOMMU overflows.
    calgary            Use the Calgary IOMMU if it is available
    calgary            Use the Calgary IOMMU if it is available


  swiotlb=pages[,force]
  iommu options only relevant to the software bounce buffering (SWIOTLB) IOMMU

  implementation:
  pages  Prereserve that many 128K pages for the software IO bounce buffering.
    swiotlb=<pages>[,force]
    <pages>            Prereserve that many 128K pages for the software IO
                       bounce buffering.
    force              Force all IO through the software TLB.
    force              Force all IO through the software TLB.


  Settings for the IBM Calgary hardware IOMMU currently found in IBM
  pSeries and xSeries machines:

    calgary=[64k,128k,256k,512k,1M,2M,4M,8M]
    calgary=[64k,128k,256k,512k,1M,2M,4M,8M]
    calgary=[translate_empty_slots]
    calgary=[translate_empty_slots]
    calgary=[disable=<PCI bus number>]
    calgary=[disable=<PCI bus number>]
    panic              Always panic when IOMMU overflows


    64k,...,8M - Set the size of each PCI slot's translation table
    64k,...,8M - Set the size of each PCI slot's translation table
    when using the Calgary IOMMU. This is the size of the translation
    when using the Calgary IOMMU. This is the size of the translation
@@ -239,7 +280,7 @@ Debugging
		This will also cause panics on machine check exceptions.
		This will also cause panics on machine check exceptions.
		Useful together with panic=30 to trigger a reboot.
		Useful together with panic=30 to trigger a reboot.


  kstack=N   Print that many words from the kernel stack in oops dumps.
  kstack=N	Print N words from the kernel stack in oops dumps.


  pagefaulttrace  Dump all page faults. Only useful for extreme debugging
  pagefaulttrace  Dump all page faults. Only useful for extreme debugging
		and will create a lot of output.
		and will create a lot of output.
@@ -251,15 +292,8 @@ Debugging
		newfallback: use new unwinder but fall back to old if it gets
		newfallback: use new unwinder but fall back to old if it gets
			stuck (default)
			stuck (default)


  call_trace=[old|both|newfallback|new]
Miscellaneous
		old: use old inexact backtracer
		new: use new exact dwarf2 unwinder
 		both: print entries from both
		newfallback: use new unwinder but fall back to old if it gets
			stuck (default)

Misc


  noreplacement  Don't replace instructions with more appropriate ones
  noreplacement  Don't replace instructions with more appropriate ones
		 for the CPU. This may be useful on asymmetric MP systems
		 for the CPU. This may be useful on asymmetric MP systems
		 where some CPU have less capabilities than the others.
		 where some CPUs have less capabilities than others.
+1 −1
Original line number Original line Diff line number Diff line
@@ -2,7 +2,7 @@ Firmware support for CPU hotplug under Linux/x86-64
---------------------------------------------------
---------------------------------------------------


Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to
Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to
know in advance boot time the maximum number of CPUs that could be plugged
know in advance of boot time the maximum number of CPUs that could be plugged
into the system. ACPI 3.0 currently has no official way to supply
into the system. ACPI 3.0 currently has no official way to supply
this information from the firmware to the operating system.
this information from the firmware to the operating system.


+13 −13
Original line number Original line Diff line number Diff line
@@ -9,9 +9,9 @@ zombie. While the thread is in user space the kernel stack is empty
except for the thread_info structure at the bottom.
except for the thread_info structure at the bottom.


In addition to the per thread stacks, there are specialized stacks
In addition to the per thread stacks, there are specialized stacks
associated with each cpu.  These stacks are only used while the kernel
associated with each CPU.  These stacks are only used while the kernel
is in control on that cpu, when a cpu returns to user space the
is in control on that CPU; when a CPU returns to user space the
specialized stacks contain no useful data.  The main cpu stacks is
specialized stacks contain no useful data.  The main CPU stacks are:


* Interrupt stack.  IRQSTACKSIZE
* Interrupt stack.  IRQSTACKSIZE


@@ -32,17 +32,17 @@ x86_64 also has a feature which is not available on i386, the ability
to automatically switch to a new stack for designated events such as
to automatically switch to a new stack for designated events such as
double fault or NMI, which makes it easier to handle these unusual
double fault or NMI, which makes it easier to handle these unusual
events on x86_64.  This feature is called the Interrupt Stack Table
events on x86_64.  This feature is called the Interrupt Stack Table
(IST).  There can be up to 7 IST entries per cpu. The IST code is an
(IST).  There can be up to 7 IST entries per CPU. The IST code is an
index into the Task State Segment (TSS), the IST entries in the TSS
index into the Task State Segment (TSS). The IST entries in the TSS
point to dedicated stacks, each stack can be a different size.
point to dedicated stacks; each stack can be a different size.


An IST is selected by an non-zero value in the IST field of an
An IST is selected by a non-zero value in the IST field of an
interrupt-gate descriptor.  When an interrupt occurs and the hardware
interrupt-gate descriptor.  When an interrupt occurs and the hardware
loads such a descriptor, the hardware automatically sets the new stack
loads such a descriptor, the hardware automatically sets the new stack
pointer based on the IST value, then invokes the interrupt handler.  If
pointer based on the IST value, then invokes the interrupt handler.  If
software wants to allow nested IST interrupts then the handler must
software wants to allow nested IST interrupts then the handler must
adjust the IST values on entry to and exit from the interrupt handler.
adjust the IST values on entry to and exit from the interrupt handler.
(this is occasionally done, e.g. for debug exceptions)
(This is occasionally done, e.g. for debug exceptions.)


Events with different IST codes (i.e. with different stacks) can be
Events with different IST codes (i.e. with different stacks) can be
nested.  For example, a debug interrupt can safely be interrupted by an
nested.  For example, a debug interrupt can safely be interrupted by an
@@ -58,17 +58,17 @@ The currently assigned IST stacks are :-


  Used for interrupt 12 - Stack Fault Exception (#SS).
  Used for interrupt 12 - Stack Fault Exception (#SS).


  This allows to recover from invalid stack segments. Rarely
  This allows the CPU to recover from invalid stack segments. Rarely
  happens.
  happens.


* DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
* DOUBLEFAULT_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).


  Used for interrupt 8 - Double Fault Exception (#DF).
  Used for interrupt 8 - Double Fault Exception (#DF).


  Invoked when handling a exception causes another exception. Happens
  Invoked when handling one exception causes another exception. Happens
  when the kernel is very confused (e.g. kernel stack pointer corrupt)
  when the kernel is very confused (e.g. kernel stack pointer corrupt).
  Using a separate stack allows to recover from it well enough in many
  Using a separate stack allows the kernel to recover from it well enough
  cases to still output an oops.
  in many cases to still output an oops.


* NMI_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).
* NMI_STACK.  EXCEPTION_STKSZ (PAGE_SIZE).


+70 −0
Original line number Original line Diff line number Diff line

Configurable sysfs parameters for the x86-64 machine check code.

Machine checks report internal hardware error conditions detected
by the CPU. Uncorrected errors typically cause a machine check
(often with panic), corrected ones cause a machine check log entry.

Machine checks are organized in banks (normally associated with
a hardware subsystem) and subevents in a bank. The exact meaning
of the banks and subevent is CPU specific.

mcelog knows how to decode them.

When you see the "Machine check errors logged" message in the system
log then mcelog should run to collect and decode machine check entries
from /dev/mcelog. Normally mcelog should be run regularly from a cronjob.

Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN
(N = CPU number)

The directory contains some configurable entries:

Entries:

bankNctl
(N bank number)
	64bit Hex bitmask enabling/disabling specific subevents for bank N
	When a bit in the bitmask is zero then the respective
	subevent will not be reported.
	By default all events are enabled.
	Note that BIOS maintain another mask to disable specific events
	per bank.  This is not visible here

The following entries appear for each CPU, but they are truly shared
between all CPUs.

check_interval
	How often to poll for corrected machine check errors, in seconds
	(Note output is hexademical). Default 5 minutes.

tolerant
	Tolerance level. When a machine check exception occurs for a non
	corrected machine check the kernel can take different actions.
	Since machine check exceptions can happen any time it is sometimes
	risky for the kernel to kill a process because it defies
	normal kernel locking rules. The tolerance level configures
	how hard the kernel tries to recover even at some risk of deadlock.

	0: always panic,
	1: panic if deadlock possible,
	2: try to avoid panic,
   	3: never panic or exit (for testing only)

	Default: 1

	Note this only makes a difference if the CPU allows recovery
	from a machine check exception. Current x86 CPUs generally do not.

trigger
	Program to run when a machine check event is detected.
	This is an alternative to running mcelog regularly from cron
	and allows to detect events faster.

TBD document entries for AMD threshold interrupt configuration

For more details about the x86 machine check architecture
see the Intel and AMD architecture manuals from their developer websites.

For more details about the architecture see
see http://one.firstfloor.org/~andi/mce.pdf
Loading