Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 4368c4bc authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'x86/grand-schemozzle' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull pti updates from Thomas Gleixner:
 "The performance deterioration departement is not proud at all to
  present yet another set of speculation fences to mitigate the next
  chapter in the 'what could possibly go wrong' story.

  The new vulnerability belongs to the Spectre class and affects GS
  based data accesses and has therefore been dubbed 'Grand Schemozzle'
  for secret communication purposes. It's officially listed as
  CVE-2019-1125.

  Conditional branches in the entry paths which contain a SWAPGS
  instruction (interrupts and exceptions) can be mis-speculated which
  results in speculative accesses with a wrong GS base.

  This can happen on entry from user mode through a mis-speculated
  branch which takes the entry from kernel mode path and therefore does
  not execute the SWAPGS instruction. The following speculative accesses
  are done with user GS base.

  On entry from kernel mode the mis-speculated branch executes the
  SWAPGS instruction in the entry from user mode path which has the same
  effect that the following GS based accesses are done with user GS
  base.

  If there is a disclosure gadget available in these code paths the
  mis-speculated data access can be leaked through the usual side
  channels.

  The entry from user mode issue affects all CPUs which have speculative
  execution. The entry from kernel mode issue affects only Intel CPUs
  which can speculate through SWAPGS. On CPUs from other vendors SWAPGS
  has semantics which prevent that.

  SMAP migitates both problems but only when the CPU is not affected by
  the Meltdown vulnerability.

  The mitigation is to issue LFENCE instructions in the entry from
  kernel mode path for all affected CPUs and on the affected Intel CPUs
  also in the entry from user mode path unless PTI is enabled because
  the CR3 write is serializing.

  The fences are as usual enabled conditionally and can be completely
  disabled on the kernel command line. The Spectre V1 documentation is
  updated accordingly.

  A big "Thank You!" goes to Josh for doing the heavy lifting for this
  round of hardware misfeature 'repair'. Of course also "Thank You!" to
  everybody else who contributed in one way or the other"

* 'x86/grand-schemozzle' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation: Add swapgs description to the Spectre v1 documentation
  x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS
  x86/entry/64: Use JMP instead of JMPQ
  x86/speculation: Enable Spectre v1 swapgs mitigations
  x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
parents 0eb0ce0a 4c920576
Loading
Loading
Loading
Loading
+80 −8
Original line number Diff line number Diff line
@@ -41,10 +41,11 @@ Related CVEs

The following CVE entries describe Spectre variants:

   =============   =======================  =================
   =============   =======================  ==========================
   CVE-2017-5753   Bounds check bypass      Spectre variant 1
   CVE-2017-5715   Branch target injection  Spectre variant 2
   =============   =======================  =================
   CVE-2019-1125   Spectre v1 swapgs        Spectre variant 1 (swapgs)
   =============   =======================  ==========================

Problem
-------
@@ -78,6 +79,13 @@ There are some extensions of Spectre variant 1 attacks for reading data
over the network, see :ref:`[12] <spec_ref12>`. However such attacks
are difficult, low bandwidth, fragile, and are considered low risk.

Note that, despite "Bounds Check Bypass" name, Spectre variant 1 is not
only about user-controlled array bounds checks.  It can affect any
conditional checks.  The kernel entry code interrupt, exception, and NMI
handlers all have conditional swapgs checks.  Those may be problematic
in the context of Spectre v1, as kernel code can speculatively run with
a user GS.

Spectre variant 2 (Branch Target Injection)
-------------------------------------------

@@ -132,6 +140,9 @@ not cover all possible attack vectors.
1. A user process attacking the kernel
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Spectre variant 1
~~~~~~~~~~~~~~~~~

   The attacker passes a parameter to the kernel via a register or
   via a known address in memory during a syscall. Such parameter may
   be used later by the kernel as an index to an array or to derive
@@ -144,7 +155,40 @@ not cover all possible attack vectors.
   potentially be influenced for Spectre attacks, new "nospec" accessor
   macros are used to prevent speculative loading of data.

   Spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
Spectre variant 1 (swapgs)
~~~~~~~~~~~~~~~~~~~~~~~~~~

   An attacker can train the branch predictor to speculatively skip the
   swapgs path for an interrupt or exception.  If they initialize
   the GS register to a user-space value, if the swapgs is speculatively
   skipped, subsequent GS-related percpu accesses in the speculation
   window will be done with the attacker-controlled GS value.  This
   could cause privileged memory to be accessed and leaked.

   For example:

   ::

     if (coming from user space)
         swapgs
     mov %gs:<percpu_offset>, %reg
     mov (%reg), %reg1

   When coming from user space, the CPU can speculatively skip the
   swapgs, and then do a speculative percpu load using the user GS
   value.  So the user can speculatively force a read of any kernel
   value.  If a gadget exists which uses the percpu value as an address
   in another load/store, then the contents of the kernel value may
   become visible via an L1 side channel attack.

   A similar attack exists when coming from kernel space.  The CPU can
   speculatively do the swapgs, causing the user GS to get used for the
   rest of the speculative window.

Spectre variant 2
~~~~~~~~~~~~~~~~~

   A spectre variant 2 attacker can :ref:`poison <poison_btb>` the branch
   target buffer (BTB) before issuing syscall to launch an attack.
   After entering the kernel, the kernel could use the poisoned branch
   target buffer on indirect jump and jump to gadget code in speculative
@@ -280,11 +324,18 @@ The sysfs file showing Spectre variant 1 mitigation status is:

The possible values in this file are:

  =======================================  =================================
  'Mitigation: __user pointer sanitation'  Protection in kernel on a case by
                                           case base with explicit pointer
                                           sanitation.
  =======================================  =================================
  .. list-table::

     * - 'Not affected'
       - The processor is not vulnerable.
     * - 'Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers'
       - The swapgs protections are disabled; otherwise it has
         protection in the kernel on a case by case base with explicit
         pointer sanitation and usercopy LFENCE barriers.
     * - 'Mitigation: usercopy/swapgs barriers and __user pointer sanitization'
       - Protection in the kernel on a case by case base with explicit
         pointer sanitation, usercopy LFENCE barriers, and swapgs LFENCE
         barriers.

However, the protections are put in place on a case by case basis,
and there is no guarantee that all possible attack vectors for Spectre
@@ -366,12 +417,27 @@ Turning on mitigation for Spectre variant 1 and Spectre variant 2
1. Kernel mitigation
^^^^^^^^^^^^^^^^^^^^

Spectre variant 1
~~~~~~~~~~~~~~~~~

   For the Spectre variant 1, vulnerable kernel code (as determined
   by code audit or scanning tools) is annotated on a case by case
   basis to use nospec accessor macros for bounds clipping :ref:`[2]
   <spec_ref2>` to avoid any usable disclosure gadgets. However, it may
   not cover all attack vectors for Spectre variant 1.

   Copy-from-user code has an LFENCE barrier to prevent the access_ok()
   check from being mis-speculated.  The barrier is done by the
   barrier_nospec() macro.

   For the swapgs variant of Spectre variant 1, LFENCE barriers are
   added to interrupt, exception and NMI entry where needed.  These
   barriers are done by the FENCE_SWAPGS_KERNEL_ENTRY and
   FENCE_SWAPGS_USER_ENTRY macros.

Spectre variant 2
~~~~~~~~~~~~~~~~~

   For Spectre variant 2 mitigation, the compiler turns indirect calls or
   jumps in the kernel into equivalent return trampolines (retpolines)
   :ref:`[3] <spec_ref3>` :ref:`[9] <spec_ref9>` to go to the target
@@ -473,6 +539,12 @@ Mitigation control on the kernel command line
Spectre variant 2 mitigation can be disabled or force enabled at the
kernel command line.

	nospectre_v1

		[X86,PPC] Disable mitigations for Spectre Variant 1
		(bounds check bypass). With this option data leaks are
		possible in the system.

	nospectre_v2

		[X86] Disable all mitigations for the Spectre variant 2
+4 −4
Original line number Diff line number Diff line
@@ -2604,7 +2604,7 @@
				expose users to several CPU vulnerabilities.
				Equivalent to: nopti [X86,PPC]
					       kpti=0 [ARM64]
					       nospectre_v1 [PPC]
					       nospectre_v1 [X86,PPC]
					       nobp=0 [S390]
					       nospectre_v2 [X86,PPC,S390,ARM64]
					       spectre_v2_user=off [X86]
@@ -2965,9 +2965,9 @@
			nosmt=force: Force disable SMT, cannot be undone
				     via the sysfs control file.

	nospectre_v1	[PPC] Disable mitigations for Spectre Variant 1 (bounds
			check bypass). With this option data leaks are possible
			in the system.
	nospectre_v1	[X86,PPC] Disable mitigations for Spectre Variant 1
			(bounds check bypass). With this option data leaks are
			possible in the system.

	nospectre_v2	[X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
			the Spectre variant 2 (indirect branch prediction)
+17 −0
Original line number Diff line number Diff line
@@ -314,6 +314,23 @@ For 32-bit we have the following conventions - kernel is built with

#endif

/*
 * Mitigate Spectre v1 for conditional swapgs code paths.
 *
 * FENCE_SWAPGS_USER_ENTRY is used in the user entry swapgs code path, to
 * prevent a speculative swapgs when coming from kernel space.
 *
 * FENCE_SWAPGS_KERNEL_ENTRY is used in the kernel entry non-swapgs code path,
 * to prevent the swapgs from getting speculatively skipped when coming from
 * user space.
 */
.macro FENCE_SWAPGS_USER_ENTRY
	ALTERNATIVE "", "lfence", X86_FEATURE_FENCE_SWAPGS_USER
.endm
.macro FENCE_SWAPGS_KERNEL_ENTRY
	ALTERNATIVE "", "lfence", X86_FEATURE_FENCE_SWAPGS_KERNEL
.endm

.macro STACKLEAK_ERASE_NOCLOBBER
#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
	PUSH_AND_CLEAR_REGS
+18 −3
Original line number Diff line number Diff line
@@ -519,7 +519,7 @@ ENTRY(interrupt_entry)
	testb	$3, CS-ORIG_RAX+8(%rsp)
	jz	1f
	SWAPGS

	FENCE_SWAPGS_USER_ENTRY
	/*
	 * Switch to the thread stack. The IRET frame and orig_ax are
	 * on the stack, as well as the return address. RDI..R12 are
@@ -549,8 +549,10 @@ ENTRY(interrupt_entry)
	UNWIND_HINT_FUNC

	movq	(%rdi), %rdi
	jmp	2f
1:

	FENCE_SWAPGS_KERNEL_ENTRY
2:
	PUSH_AND_CLEAR_REGS save_ret=1
	ENCODE_FRAME_POINTER 8

@@ -1238,6 +1240,13 @@ ENTRY(paranoid_entry)
	 */
	SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14

	/*
	 * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
	 * unconditional CR3 write, even in the PTI case.  So do an lfence
	 * to prevent GS speculation, regardless of whether PTI is enabled.
	 */
	FENCE_SWAPGS_KERNEL_ENTRY

	ret
END(paranoid_entry)

@@ -1288,6 +1297,7 @@ ENTRY(error_entry)
	 * from user mode due to an IRET fault.
	 */
	SWAPGS
	FENCE_SWAPGS_USER_ENTRY
	/* We have user CR3.  Change to kernel CR3. */
	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax

@@ -1301,6 +1311,8 @@ ENTRY(error_entry)
	pushq	%r12
	ret

.Lerror_entry_done_lfence:
	FENCE_SWAPGS_KERNEL_ENTRY
.Lerror_entry_done:
	ret

@@ -1318,7 +1330,7 @@ ENTRY(error_entry)
	cmpq	%rax, RIP+8(%rsp)
	je	.Lbstep_iret
	cmpq	$.Lgs_change, RIP+8(%rsp)
	jne	.Lerror_entry_done
	jne	.Lerror_entry_done_lfence

	/*
	 * hack: .Lgs_change can fail with user gsbase.  If this happens, fix up
@@ -1326,6 +1338,7 @@ ENTRY(error_entry)
	 * .Lgs_change's error handler with kernel gsbase.
	 */
	SWAPGS
	FENCE_SWAPGS_USER_ENTRY
	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax
	jmp .Lerror_entry_done

@@ -1340,6 +1353,7 @@ ENTRY(error_entry)
	 * gsbase and CR3.  Switch to kernel gsbase and CR3:
	 */
	SWAPGS
	FENCE_SWAPGS_USER_ENTRY
	SWITCH_TO_KERNEL_CR3 scratch_reg=%rax

	/*
@@ -1431,6 +1445,7 @@ ENTRY(nmi)

	swapgs
	cld
	FENCE_SWAPGS_USER_ENTRY
	SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx
	movq	%rsp, %rdx
	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+3 −0
Original line number Diff line number Diff line
@@ -281,6 +281,8 @@
#define X86_FEATURE_CQM_OCCUP_LLC	(11*32+ 1) /* LLC occupancy monitoring */
#define X86_FEATURE_CQM_MBM_TOTAL	(11*32+ 2) /* LLC Total MBM monitoring */
#define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
#define X86_FEATURE_FENCE_SWAPGS_USER	(11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
#define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */

/* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
#define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -394,5 +396,6 @@
#define X86_BUG_L1TF			X86_BUG(18) /* CPU is affected by L1 Terminal Fault */
#define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
#define X86_BUG_MSBDS_ONLY		X86_BUG(20) /* CPU is only affected by the  MSDBS variant of BUG_MDS */
#define X86_BUG_SWAPGS			X86_BUG(21) /* CPU is affected by speculation through SWAPGS */

#endif /* _ASM_X86_CPUFEATURES_H */
Loading