Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 35277995 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull spectre/meltdown updates from Thomas Gleixner:
 "The next round of updates related to melted spectrum:

   - The initial set of spectre V1 mitigations:

       - Array index speculation blocker and its usage for syscall,
         fdtable and the n180211 driver.

       - Speculation barrier and its usage in user access functions

   - Make indirect calls in KVM speculation safe

   - Blacklisting of known to be broken microcodes so IPBP/IBSR are not
     touched.

   - The initial IBPB support and its usage in context switch

   - The exposure of the new speculation MSRs to KVM guests.

   - A fix for a regression in x86/32 related to the cpu entry area

   - Proper whitelisting for known to be safe CPUs from the mitigations.

   - objtool fixes to deal proper with retpolines and alternatives

   - Exclude __init functions from retpolines which speeds up the boot
     process.

   - Removal of the syscall64 fast path and related cleanups and
     simplifications

   - Removal of the unpatched paravirt mode which is yet another source
     of indirect unproteced calls.

   - A new and undisputed version of the module mismatch warning

   - A couple of cleanup and correctness fixes all over the place

  Yet another step towards full mitigation. There are a few things still
  missing like the RBS underflow mitigation for Skylake and other small
  details, but that's being worked on.

  That said, I'm taking a belated christmas vacation for a week and hope
  that everything is magically solved when I'm back on Feb 12th"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
  KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
  KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
  KVM/x86: Add IBPB support
  KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX
  x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL
  x86/pti: Mark constant arrays as __initconst
  x86/spectre: Simplify spectre_v2 command line parsing
  x86/retpoline: Avoid retpolines for built-in __init functions
  x86/kvm: Update spectre-v1 mitigation
  KVM: VMX: make MSR bitmaps per-VCPU
  x86/paravirt: Remove 'noreplace-paravirt' cmdline option
  x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
  x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable"
  x86/spectre: Report get_user mitigation for spectre_v1
  nl80211: Sanitize array index in parse_txq_params
  vfs, fdtable: Prevent bounds-check bypass via speculative execution
  x86/syscall: Sanitize syscall table de-references under speculation
  x86/get_user: Use pointer masking to limit speculation
  ...
parents 0a646e9c b2ac58f9
Loading
Loading
Loading
Loading
+0 −2
Original line number Diff line number Diff line
@@ -2758,8 +2758,6 @@
	norandmaps	Don't use address space randomization.  Equivalent to
			echo 0 > /proc/sys/kernel/randomize_va_space

	noreplace-paravirt	[X86,IA-64,PV_OPS] Don't patch paravirt_ops

	noreplace-smp	[X86-32,SMP] Don't replace SMP instructions
			with UP alternatives

+90 −0
Original line number Diff line number Diff line
This document explains potential effects of speculation, and how undesirable
effects can be mitigated portably using common APIs.

===========
Speculation
===========

To improve performance and minimize average latencies, many contemporary CPUs
employ speculative execution techniques such as branch prediction, performing
work which may be discarded at a later stage.

Typically speculative execution cannot be observed from architectural state,
such as the contents of registers. However, in some cases it is possible to
observe its impact on microarchitectural state, such as the presence or
absence of data in caches. Such state may form side-channels which can be
observed to extract secret information.

For example, in the presence of branch prediction, it is possible for bounds
checks to be ignored by code which is speculatively executed. Consider the
following code:

	int load_array(int *array, unsigned int index)
	{
		if (index >= MAX_ARRAY_ELEMS)
			return 0;
		else
			return array[index];
	}

Which, on arm64, may be compiled to an assembly sequence such as:

	CMP	<index>, #MAX_ARRAY_ELEMS
	B.LT	less
	MOV	<returnval>, #0
	RET
  less:
	LDR	<returnval>, [<array>, <index>]
	RET

It is possible that a CPU mis-predicts the conditional branch, and
speculatively loads array[index], even if index >= MAX_ARRAY_ELEMS. This
value will subsequently be discarded, but the speculated load may affect
microarchitectural state which can be subsequently measured.

More complex sequences involving multiple dependent memory accesses may
result in sensitive information being leaked. Consider the following
code, building on the prior example:

	int load_dependent_arrays(int *arr1, int *arr2, int index)
	{
		int val1, val2,

		val1 = load_array(arr1, index);
		val2 = load_array(arr2, val1);

		return val2;
	}

Under speculation, the first call to load_array() may return the value
of an out-of-bounds address, while the second call will influence
microarchitectural state dependent on this value. This may provide an
arbitrary read primitive.

====================================
Mitigating speculation side-channels
====================================

The kernel provides a generic API to ensure that bounds checks are
respected even under speculation. Architectures which are affected by
speculation-based side-channels are expected to implement these
primitives.

The array_index_nospec() helper in <linux/nospec.h> can be used to
prevent information from being leaked via side-channels.

A call to array_index_nospec(index, size) returns a sanitized index
value that is bounded to [0, size) even under cpu speculation
conditions.

This can be used to protect the earlier load_array() example:

	int load_array(int *array, unsigned int index)
	{
		if (index >= MAX_ARRAY_ELEMS)
			return 0;
		else {
			index = array_index_nospec(index, MAX_ARRAY_ELEMS);
			return array[index];
		}
	}
+6 −3
Original line number Diff line number Diff line
@@ -21,6 +21,7 @@
#include <linux/export.h>
#include <linux/context_tracking.h>
#include <linux/user-return-notifier.h>
#include <linux/nospec.h>
#include <linux/uprobes.h>
#include <linux/livepatch.h>
#include <linux/syscalls.h>
@@ -206,7 +207,7 @@ __visible inline void prepare_exit_to_usermode(struct pt_regs *regs)
	 * special case only applies after poking regs and before the
	 * very next return to user mode.
	 */
	current->thread.status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
	ti->status &= ~(TS_COMPAT|TS_I386_REGS_POKED);
#endif

	user_enter_irqoff();
@@ -282,7 +283,8 @@ __visible void do_syscall_64(struct pt_regs *regs)
	 * regs->orig_ax, which changes the behavior of some syscalls.
	 */
	if (likely((nr & __SYSCALL_MASK) < NR_syscalls)) {
		regs->ax = sys_call_table[nr & __SYSCALL_MASK](
		nr = array_index_nospec(nr & __SYSCALL_MASK, NR_syscalls);
		regs->ax = sys_call_table[nr](
			regs->di, regs->si, regs->dx,
			regs->r10, regs->r8, regs->r9);
	}
@@ -304,7 +306,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
	unsigned int nr = (unsigned int)regs->orig_ax;

#ifdef CONFIG_IA32_EMULATION
	current->thread.status |= TS_COMPAT;
	ti->status |= TS_COMPAT;
#endif

	if (READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY) {
@@ -318,6 +320,7 @@ static __always_inline void do_syscall_32_irqs_on(struct pt_regs *regs)
	}

	if (likely(nr < IA32_NR_syscalls)) {
		nr = array_index_nospec(nr, IA32_NR_syscalls);
		/*
		 * It's possible that a 32-bit syscall implementation
		 * takes a 64-bit parameter but nonetheless assumes that
+7 −120
Original line number Diff line number Diff line
@@ -236,91 +236,20 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
	pushq	%r9				/* pt_regs->r9 */
	pushq	%r10				/* pt_regs->r10 */
	pushq	%r11				/* pt_regs->r11 */
	sub	$(6*8), %rsp			/* pt_regs->bp, bx, r12-15 not saved */
	UNWIND_HINT_REGS extra=0

	TRACE_IRQS_OFF

	/*
	 * If we need to do entry work or if we guess we'll need to do
	 * exit work, go straight to the slow path.
	 */
	movq	PER_CPU_VAR(current_task), %r11
	testl	$_TIF_WORK_SYSCALL_ENTRY|_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
	jnz	entry_SYSCALL64_slow_path

entry_SYSCALL_64_fastpath:
	/*
	 * Easy case: enable interrupts and issue the syscall.  If the syscall
	 * needs pt_regs, we'll call a stub that disables interrupts again
	 * and jumps to the slow path.
	 */
	TRACE_IRQS_ON
	ENABLE_INTERRUPTS(CLBR_NONE)
#if __SYSCALL_MASK == ~0
	cmpq	$__NR_syscall_max, %rax
#else
	andl	$__SYSCALL_MASK, %eax
	cmpl	$__NR_syscall_max, %eax
#endif
	ja	1f				/* return -ENOSYS (already in pt_regs->ax) */
	movq	%r10, %rcx

	/*
	 * This call instruction is handled specially in stub_ptregs_64.
	 * It might end up jumping to the slow path.  If it jumps, RAX
	 * and all argument registers are clobbered.
	 */
#ifdef CONFIG_RETPOLINE
	movq	sys_call_table(, %rax, 8), %rax
	call	__x86_indirect_thunk_rax
#else
	call	*sys_call_table(, %rax, 8)
#endif
.Lentry_SYSCALL_64_after_fastpath_call:

	movq	%rax, RAX(%rsp)
1:
	pushq	%rbx				/* pt_regs->rbx */
	pushq	%rbp				/* pt_regs->rbp */
	pushq	%r12				/* pt_regs->r12 */
	pushq	%r13				/* pt_regs->r13 */
	pushq	%r14				/* pt_regs->r14 */
	pushq	%r15				/* pt_regs->r15 */
	UNWIND_HINT_REGS

	/*
	 * If we get here, then we know that pt_regs is clean for SYSRET64.
	 * If we see that no exit work is required (which we are required
	 * to check with IRQs off), then we can go straight to SYSRET64.
	 */
	DISABLE_INTERRUPTS(CLBR_ANY)
	TRACE_IRQS_OFF
	movq	PER_CPU_VAR(current_task), %r11
	testl	$_TIF_ALLWORK_MASK, TASK_TI_flags(%r11)
	jnz	1f

	LOCKDEP_SYS_EXIT
	TRACE_IRQS_ON		/* user mode is traced as IRQs on */
	movq	RIP(%rsp), %rcx
	movq	EFLAGS(%rsp), %r11
	addq	$6*8, %rsp	/* skip extra regs -- they were preserved */
	UNWIND_HINT_EMPTY
	jmp	.Lpop_c_regs_except_rcx_r11_and_sysret

1:
	/*
	 * The fast path looked good when we started, but something changed
	 * along the way and we need to switch to the slow path.  Calling
	 * raise(3) will trigger this, for example.  IRQs are off.
	 */
	TRACE_IRQS_ON
	ENABLE_INTERRUPTS(CLBR_ANY)
	SAVE_EXTRA_REGS
	movq	%rsp, %rdi
	call	syscall_return_slowpath	/* returns with IRQs disabled */
	jmp	return_from_SYSCALL_64

entry_SYSCALL64_slow_path:
	/* IRQs are off. */
	SAVE_EXTRA_REGS
	movq	%rsp, %rdi
	call	do_syscall_64		/* returns with IRQs disabled */

return_from_SYSCALL_64:
	TRACE_IRQS_IRETQ		/* we're about to change IF */

	/*
@@ -393,7 +322,6 @@ syscall_return_via_sysret:
	/* rcx and r11 are already restored (see code above) */
	UNWIND_HINT_EMPTY
	POP_EXTRA_REGS
.Lpop_c_regs_except_rcx_r11_and_sysret:
	popq	%rsi	/* skip r11 */
	popq	%r10
	popq	%r9
@@ -424,47 +352,6 @@ syscall_return_via_sysret:
	USERGS_SYSRET64
END(entry_SYSCALL_64)

ENTRY(stub_ptregs_64)
	/*
	 * Syscalls marked as needing ptregs land here.
	 * If we are on the fast path, we need to save the extra regs,
	 * which we achieve by trying again on the slow path.  If we are on
	 * the slow path, the extra regs are already saved.
	 *
	 * RAX stores a pointer to the C function implementing the syscall.
	 * IRQs are on.
	 */
	cmpq	$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
	jne	1f

	/*
	 * Called from fast path -- disable IRQs again, pop return address
	 * and jump to slow path
	 */
	DISABLE_INTERRUPTS(CLBR_ANY)
	TRACE_IRQS_OFF
	popq	%rax
	UNWIND_HINT_REGS extra=0
	jmp	entry_SYSCALL64_slow_path

1:
	JMP_NOSPEC %rax				/* Called from C */
END(stub_ptregs_64)

.macro ptregs_stub func
ENTRY(ptregs_\func)
	UNWIND_HINT_FUNC
	leaq	\func(%rip), %rax
	jmp	stub_ptregs_64
END(ptregs_\func)
.endm

/* Instantiate ptregs_stub for each ptregs-using syscall */
#define __SYSCALL_64_QUAL_(sym)
#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
#include <asm/syscalls_64.h>

/*
 * %rdi: prev task
 * %rsi: next task
+2 −5
Original line number Diff line number Diff line
@@ -7,14 +7,11 @@
#include <asm/asm-offsets.h>
#include <asm/syscall.h>

#define __SYSCALL_64_QUAL_(sym) sym
#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym

#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);
#include <asm/syscalls_64.h>
#undef __SYSCALL_64

#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
#define __SYSCALL_64(nr, sym, qual) [nr] = sym,

extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long);

Loading