Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 7453311d authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 asm changes from Ingo Molnar:
 "The main changes in this cycle were the x86/entry and sysret
  enhancements from Andy Lutomirski, see merge commits 772a9aca and
  b57c0b51 for details"

[ Exectutive summary: IST exceptions that interrupt user space will run
  on the regular kernel stack instead of the IST stack.  Which
  simplifies things particularly on return to user space.

  The sysret cleanup ends up simplifying the logic on when we can use
  sysret vs when we have to use iret.                - Linus ]

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64, entry: Remove the syscall exit audit and schedule optimizations
  x86_64, entry: Use sysret to return to userspace when possible
  x86, traps: Fix ist_enter from userspace
  x86, vdso: teach 'make clean' remove vdso64 binaries
  x86_64 entry: Fix RCX for ptraced syscalls
  x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user
  x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET
  x86: entry_64.S: delete unused code
  x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks
  x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic
  x86: Clean up current_stack_pointer
  x86, traps: Track entry into and exit from IST context
  x86, entry: Switch stacks on a paranoid entry from userspace
parents 9d43bade b57c0b51
Loading
Loading
Loading
Loading
+12 −6
Original line number Diff line number Diff line
@@ -78,9 +78,6 @@ The expensive (paranoid) way is to read back the MSR_GS_BASE value
	xorl %ebx,%ebx
1:	ret

and the whole paranoid non-paranoid macro complexity is about whether
to suffer that RDMSR cost.

If we are at an interrupt or user-trap/gate-alike boundary then we can
use the faster check: the stack will be a reliable indicator of
whether SWAPGS was already done: if we see that we are a secondary
@@ -93,6 +90,15 @@ which might have triggered right after a normal entry wrote CS to the
stack but before we executed SWAPGS, then the only safe way to check
for GS is the slower method: the RDMSR.

So we try only to mark those entry methods 'paranoid' that absolutely
need the more expensive check for the GS base - and we generate all
'normal' entry points with the regular (faster) entry macros.
Therefore, super-atomic entries (except NMI, which is handled separately)
must use idtentry with paranoid=1 to handle gsbase correctly.  This
triggers three main behavior changes:

 - Interrupt entry will use the slower gsbase check.
 - Interrupt entry from user mode will switch off the IST stack.
 - Interrupt exit to kernel mode will not attempt to reschedule.

We try to only use IST entries and the paranoid entry code for vectors
that absolutely need the more expensive check for the GS base - and we
generate all 'normal' entry points with the regular (faster) paranoid=0
variant.
+5 −3
Original line number Diff line number Diff line
@@ -40,9 +40,11 @@ An IST is selected by a non-zero value in the IST field of an
interrupt-gate descriptor.  When an interrupt occurs and the hardware
loads such a descriptor, the hardware automatically sets the new stack
pointer based on the IST value, then invokes the interrupt handler.  If
software wants to allow nested IST interrupts then the handler must
adjust the IST values on entry to and exit from the interrupt handler.
(This is occasionally done, e.g. for debug exceptions.)
the interrupt came from user mode, then the interrupt handler prologue
will switch back to the per-thread stack.  If software wants to allow
nested IST interrupts then the handler must adjust the IST values on
entry to and exit from the interrupt handler.  (This is occasionally
done, e.g. for debug exceptions.)

Events with different IST codes (i.e. with different stacks) can be
nested.  For example, a debug interrupt can safely be interrupted by an
+2 −2
Original line number Diff line number Diff line
@@ -179,8 +179,8 @@ sysenter_dispatch:
sysexit_from_sys_call:
	andl    $~TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
	/* clear IF, that popfq doesn't enable interrupts early */
	andl  $~0x200,EFLAGS-R11(%rsp) 
	movl	RIP-R11(%rsp),%edx		/* User %eip */
	andl	$~0x200,EFLAGS-ARGOFFSET(%rsp)
	movl	RIP-ARGOFFSET(%rsp),%edx		/* User %eip */
	CFI_REGISTER rip,rdx
	RESTORE_ARGS 0,24,0,0,0,0
	xorq	%r8,%r8
+0 −1
Original line number Diff line number Diff line
@@ -83,7 +83,6 @@ For 32-bit we have the following conventions - kernel is built with
#define SS		160

#define ARGOFFSET	R11
#define SWFRAME		ORIG_RAX

	.macro SAVE_ARGS addskip=0, save_rcx=1, save_r891011=1, rax_enosys=0
	subq  $9*8+\addskip, %rsp
+0 −1
Original line number Diff line number Diff line
@@ -190,7 +190,6 @@ enum mcp_flags {
void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);

int mce_notify_irq(void);
void mce_notify_process(void);

DECLARE_PER_CPU(struct mce, injectm);

Loading