Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 2a74dbb9 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull security subsystem updates from James Morris:
 "A quiet cycle for the security subsystem with just a few maintenance
  updates."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  Smack: create a sysfs mount point for smackfs
  Smack: use select not depends in Kconfig
  Yama: remove locking from delete path
  Yama: add RCU to drop read locking
  drivers/char/tpm: remove tasklet and cleanup
  KEYS: Use keyring_alloc() to create special keyrings
  KEYS: Reduce initial permissions on keys
  KEYS: Make the session and process keyrings per-thread
  seccomp: Make syscall skipping and nr changes more consistent
  key: Fix resource leak
  keys: Fix unreachable code
  KEYS: Add payload preparsing opportunity prior to key instantiate or update
parents 770b6cb4 e9307237
Loading
Loading
Loading
Loading
+68 −6
Original line number Diff line number Diff line
@@ -95,12 +95,15 @@ SECCOMP_RET_KILL:

SECCOMP_RET_TRAP:
	Results in the kernel sending a SIGSYS signal to the triggering
	task without executing the system call.  The kernel will
	rollback the register state to just before the system call
	entry such that a signal handler in the task will be able to
	inspect the ucontext_t->uc_mcontext registers and emulate
	system call success or failure upon return from the signal
	handler.
	task without executing the system call.  siginfo->si_call_addr
	will show the address of the system call instruction, and
	siginfo->si_syscall and siginfo->si_arch will indicate which
	syscall was attempted.  The program counter will be as though
	the syscall happened (i.e. it will not point to the syscall
	instruction).  The return value register will contain an arch-
	dependent value -- if resuming execution, set it to something
	sensible.  (The architecture dependency is because replacing
	it with -ENOSYS could overwrite some useful information.)

	The SECCOMP_RET_DATA portion of the return value will be passed
	as si_errno.
@@ -123,6 +126,18 @@ SECCOMP_RET_TRACE:
	the BPF program return value will be available to the tracer
	via PTRACE_GETEVENTMSG.

	The tracer can skip the system call by changing the syscall number
	to -1.  Alternatively, the tracer can change the system call
	requested by changing the system call to a valid syscall number.  If
	the tracer asks to skip the system call, then the system call will
	appear to return the value that the tracer puts in the return value
	register.

	The seccomp check will not be run again after the tracer is
	notified.  (This means that seccomp-based sandboxes MUST NOT
	allow use of ptrace, even of other sandboxed processes, without
	extreme care; ptracers can use this mechanism to escape.)

SECCOMP_RET_ALLOW:
	Results in the system call being executed.

@@ -161,3 +176,50 @@ architecture supports both ptrace_event and seccomp, it will be able to
support seccomp filter with minor fixup: SIGSYS support and seccomp return
value checking.  Then it must just add CONFIG_HAVE_ARCH_SECCOMP_FILTER
to its arch-specific Kconfig.



Caveats
-------

The vDSO can cause some system calls to run entirely in userspace,
leading to surprises when you run programs on different machines that
fall back to real syscalls.  To minimize these surprises on x86, make
sure you test with
/sys/devices/system/clocksource/clocksource0/current_clocksource set to
something like acpi_pm.

On x86-64, vsyscall emulation is enabled by default.  (vsyscalls are
legacy variants on vDSO calls.)  Currently, emulated vsyscalls will honor seccomp, with a few oddities:

- A return value of SECCOMP_RET_TRAP will set a si_call_addr pointing to
  the vsyscall entry for the given call and not the address after the
  'syscall' instruction.  Any code which wants to restart the call
  should be aware that (a) a ret instruction has been emulated and (b)
  trying to resume the syscall will again trigger the standard vsyscall
  emulation security checks, making resuming the syscall mostly
  pointless.

- A return value of SECCOMP_RET_TRACE will signal the tracer as usual,
  but the syscall may not be changed to another system call using the
  orig_rax register. It may only be changed to -1 order to skip the
  currently emulated call. Any other change MAY terminate the process.
  The rip value seen by the tracer will be the syscall entry address;
  this is different from normal behavior.  The tracer MUST NOT modify
  rip or rsp.  (Do not rely on other changes terminating the process.
  They might work.  For example, on some kernels, choosing a syscall
  that only exists in future kernels will be correctly emulated (by
  returning -ENOSYS).

To detect this quirky behavior, check for addr & ~0x0C00 ==
0xFFFFFFFFFF600000.  (For SECCOMP_RET_TRACE, use rip.  For
SECCOMP_RET_TRAP, use siginfo->si_call_addr.)  Do not check any other
condition: future kernels may improve vsyscall emulation and current
kernels in vsyscall=native mode will behave differently, but the
instructions at 0xF...F600{0,4,8,C}00 will not be system calls in these
cases.

Note that modern systems are unlikely to use vsyscalls at all -- they
are a legacy feature and they are considerably slower than standard
syscalls.  New code will use the vDSO, and vDSO-issued system calls
are indistinguishable from normal system calls.
+17 −0
Original line number Diff line number Diff line
@@ -994,6 +994,23 @@ payload contents" for more information.
    reference pointer if successful.


(*) A keyring can be created by:

	struct key *keyring_alloc(const char *description, uid_t uid, gid_t gid,
				  const struct cred *cred,
				  key_perm_t perm,
				  unsigned long flags,
				  struct key *dest);

    This creates a keyring with the given attributes and returns it.  If dest
    is not NULL, the new keyring will be linked into the keyring to which it
    points.  No permission checks are made upon the destination keyring.

    Error EDQUOT can be returned if the keyring would overload the quota (pass
    KEY_ALLOC_NOT_IN_QUOTA in flags if the keyring shouldn't be accounted
    towards the user's quota).  Error ENOMEM can also be returned.


(*) To check the validity of a key, this function can be called:

	int validate_key(struct key *key);
+59 −51
Original line number Diff line number Diff line
@@ -145,19 +145,6 @@ static int addr_to_vsyscall_nr(unsigned long addr)
	return nr;
}

#ifdef CONFIG_SECCOMP
static int vsyscall_seccomp(struct task_struct *tsk, int syscall_nr)
{
	if (!seccomp_mode(&tsk->seccomp))
		return 0;
	task_pt_regs(tsk)->orig_ax = syscall_nr;
	task_pt_regs(tsk)->ax = syscall_nr;
	return __secure_computing(syscall_nr);
}
#else
#define vsyscall_seccomp(_tsk, _nr) 0
#endif

static bool write_ok_or_segv(unsigned long ptr, size_t size)
{
	/*
@@ -190,10 +177,9 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
{
	struct task_struct *tsk;
	unsigned long caller;
	int vsyscall_nr;
	int vsyscall_nr, syscall_nr, tmp;
	int prev_sig_on_uaccess_error;
	long ret;
	int skip;

	/*
	 * No point in checking CS -- the only way to get here is a user mode
@@ -225,56 +211,84 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
	}

	tsk = current;
	/*
	 * With a real vsyscall, page faults cause SIGSEGV.  We want to
	 * preserve that behavior to make writing exploits harder.
	 */
	prev_sig_on_uaccess_error = current_thread_info()->sig_on_uaccess_error;
	current_thread_info()->sig_on_uaccess_error = 1;

	/*
	 * Check for access_ok violations and find the syscall nr.
	 *
	 * NULL is a valid user pointer (in the access_ok sense) on 32-bit and
	 * 64-bit, so we don't need to special-case it here.  For all the
	 * vsyscalls, NULL means "don't write anything" not "write it at
	 * address 0".
	 */
	ret = -EFAULT;
	skip = 0;
	switch (vsyscall_nr) {
	case 0:
		skip = vsyscall_seccomp(tsk, __NR_gettimeofday);
		if (skip)
			break;

		if (!write_ok_or_segv(regs->di, sizeof(struct timeval)) ||
		    !write_ok_or_segv(regs->si, sizeof(struct timezone)))
			break;
		    !write_ok_or_segv(regs->si, sizeof(struct timezone))) {
			ret = -EFAULT;
			goto check_fault;
		}

		ret = sys_gettimeofday(
			(struct timeval __user *)regs->di,
			(struct timezone __user *)regs->si);
		syscall_nr = __NR_gettimeofday;
		break;

	case 1:
		skip = vsyscall_seccomp(tsk, __NR_time);
		if (skip)
			break;
		if (!write_ok_or_segv(regs->di, sizeof(time_t))) {
			ret = -EFAULT;
			goto check_fault;
		}

		if (!write_ok_or_segv(regs->di, sizeof(time_t)))
		syscall_nr = __NR_time;
		break;

		ret = sys_time((time_t __user *)regs->di);
	case 2:
		if (!write_ok_or_segv(regs->di, sizeof(unsigned)) ||
		    !write_ok_or_segv(regs->si, sizeof(unsigned))) {
			ret = -EFAULT;
			goto check_fault;
		}

		syscall_nr = __NR_getcpu;
		break;
	}

	case 2:
		skip = vsyscall_seccomp(tsk, __NR_getcpu);
		if (skip)
	/*
	 * Handle seccomp.  regs->ip must be the original value.
	 * See seccomp_send_sigsys and Documentation/prctl/seccomp_filter.txt.
	 *
	 * We could optimize the seccomp disabled case, but performance
	 * here doesn't matter.
	 */
	regs->orig_ax = syscall_nr;
	regs->ax = -ENOSYS;
	tmp = secure_computing(syscall_nr);
	if ((!tmp && regs->orig_ax != syscall_nr) || regs->ip != address) {
		warn_bad_vsyscall(KERN_DEBUG, regs,
				  "seccomp tried to change syscall nr or ip");
		do_exit(SIGSYS);
	}
	if (tmp)
		goto do_ret;  /* skip requested */

	/*
	 * With a real vsyscall, page faults cause SIGSEGV.  We want to
	 * preserve that behavior to make writing exploits harder.
	 */
	prev_sig_on_uaccess_error = current_thread_info()->sig_on_uaccess_error;
	current_thread_info()->sig_on_uaccess_error = 1;

	ret = -EFAULT;
	switch (vsyscall_nr) {
	case 0:
		ret = sys_gettimeofday(
			(struct timeval __user *)regs->di,
			(struct timezone __user *)regs->si);
		break;

		if (!write_ok_or_segv(regs->di, sizeof(unsigned)) ||
		    !write_ok_or_segv(regs->si, sizeof(unsigned)))
	case 1:
		ret = sys_time((time_t __user *)regs->di);
		break;

	case 2:
		ret = sys_getcpu((unsigned __user *)regs->di,
				 (unsigned __user *)regs->si,
				 NULL);
@@ -283,12 +297,7 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)

	current_thread_info()->sig_on_uaccess_error = prev_sig_on_uaccess_error;

	if (skip) {
		if ((long)regs->ax <= 0L) /* seccomp errno emulation */
			goto do_ret;
		goto done; /* seccomp trace/trap */
	}

check_fault:
	if (ret == -EFAULT) {
		/* Bad news -- userspace fed a bad pointer to a vsyscall. */
		warn_bad_vsyscall(KERN_INFO, regs,
@@ -311,7 +320,6 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address)
	/* Emulate a ret instruction. */
	regs->ip = caller;
	regs->sp += 8;
done:
	return true;

sigsegv:
+28 −53
Original line number Diff line number Diff line
@@ -38,8 +38,6 @@ static struct vio_device_id tpm_ibmvtpm_device_table[] = {
};
MODULE_DEVICE_TABLE(vio, tpm_ibmvtpm_device_table);

DECLARE_WAIT_QUEUE_HEAD(wq);

/**
 * ibmvtpm_send_crq - Send a CRQ request
 * @vdev:	vio device struct
@@ -83,6 +81,7 @@ static int tpm_ibmvtpm_recv(struct tpm_chip *chip, u8 *buf, size_t count)
{
	struct ibmvtpm_dev *ibmvtpm;
	u16 len;
	int sig;

	ibmvtpm = (struct ibmvtpm_dev *)chip->vendor.data;

@@ -91,22 +90,23 @@ static int tpm_ibmvtpm_recv(struct tpm_chip *chip, u8 *buf, size_t count)
		return 0;
	}

	wait_event_interruptible(wq, ibmvtpm->crq_res.len != 0);
	sig = wait_event_interruptible(ibmvtpm->wq, ibmvtpm->res_len != 0);
	if (sig)
		return -EINTR;

	len = ibmvtpm->res_len;

	if (count < ibmvtpm->crq_res.len) {
	if (count < len) {
		dev_err(ibmvtpm->dev,
			"Invalid size in recv: count=%ld, crq_size=%d\n",
			count, ibmvtpm->crq_res.len);
			count, len);
		return -EIO;
	}

	spin_lock(&ibmvtpm->rtce_lock);
	memcpy((void *)buf, (void *)ibmvtpm->rtce_buf, ibmvtpm->crq_res.len);
	memset(ibmvtpm->rtce_buf, 0, ibmvtpm->crq_res.len);
	ibmvtpm->crq_res.valid = 0;
	ibmvtpm->crq_res.msg = 0;
	len = ibmvtpm->crq_res.len;
	ibmvtpm->crq_res.len = 0;
	memcpy((void *)buf, (void *)ibmvtpm->rtce_buf, len);
	memset(ibmvtpm->rtce_buf, 0, len);
	ibmvtpm->res_len = 0;
	spin_unlock(&ibmvtpm->rtce_lock);
	return len;
}
@@ -273,7 +273,6 @@ static int tpm_ibmvtpm_remove(struct vio_dev *vdev)
	int rc = 0;

	free_irq(vdev->irq, ibmvtpm);
	tasklet_kill(&ibmvtpm->tasklet);

	do {
		if (rc)
@@ -372,7 +371,6 @@ static int ibmvtpm_reset_crq(struct ibmvtpm_dev *ibmvtpm)
static int tpm_ibmvtpm_resume(struct device *dev)
{
	struct ibmvtpm_dev *ibmvtpm = ibmvtpm_get_data(dev);
	unsigned long flags;
	int rc = 0;

	do {
@@ -387,10 +385,11 @@ static int tpm_ibmvtpm_resume(struct device *dev)
		return rc;
	}

	spin_lock_irqsave(&ibmvtpm->lock, flags);
	vio_disable_interrupts(ibmvtpm->vdev);
	tasklet_schedule(&ibmvtpm->tasklet);
	spin_unlock_irqrestore(&ibmvtpm->lock, flags);
	rc = vio_enable_interrupts(ibmvtpm->vdev);
	if (rc) {
		dev_err(dev, "Error vio_enable_interrupts rc=%d\n", rc);
		return rc;
	}

	rc = ibmvtpm_crq_send_init(ibmvtpm);
	if (rc)
@@ -467,7 +466,7 @@ static struct ibmvtpm_crq *ibmvtpm_crq_get_next(struct ibmvtpm_dev *ibmvtpm)
	if (crq->valid & VTPM_MSG_RES) {
		if (++crq_q->index == crq_q->num_entry)
			crq_q->index = 0;
		rmb();
		smp_rmb();
	} else
		crq = NULL;
	return crq;
@@ -535,11 +534,9 @@ static void ibmvtpm_crq_process(struct ibmvtpm_crq *crq,
			ibmvtpm->vtpm_version = crq->data;
			return;
		case VTPM_TPM_COMMAND_RES:
			ibmvtpm->crq_res.valid = crq->valid;
			ibmvtpm->crq_res.msg = crq->msg;
			ibmvtpm->crq_res.len = crq->len;
			ibmvtpm->crq_res.data = crq->data;
			wake_up_interruptible(&wq);
			/* len of the data in rtce buffer */
			ibmvtpm->res_len = crq->len;
			wake_up_interruptible(&ibmvtpm->wq);
			return;
		default:
			return;
@@ -559,38 +556,19 @@ static void ibmvtpm_crq_process(struct ibmvtpm_crq *crq,
static irqreturn_t ibmvtpm_interrupt(int irq, void *vtpm_instance)
{
	struct ibmvtpm_dev *ibmvtpm = (struct ibmvtpm_dev *) vtpm_instance;
	unsigned long flags;

	spin_lock_irqsave(&ibmvtpm->lock, flags);
	vio_disable_interrupts(ibmvtpm->vdev);
	tasklet_schedule(&ibmvtpm->tasklet);
	spin_unlock_irqrestore(&ibmvtpm->lock, flags);

	return IRQ_HANDLED;
}

/**
 * ibmvtpm_tasklet - Interrupt handler tasklet
 * @data:	ibm vtpm device struct
 *
 * Returns:
 *	Nothing
 **/
static void ibmvtpm_tasklet(void *data)
{
	struct ibmvtpm_dev *ibmvtpm = data;
	struct ibmvtpm_crq *crq;
	unsigned long flags;

	spin_lock_irqsave(&ibmvtpm->lock, flags);
	/* while loop is needed for initial setup (get version and
	 * get rtce_size). There should be only one tpm request at any
	 * given time.
	 */
	while ((crq = ibmvtpm_crq_get_next(ibmvtpm)) != NULL) {
		ibmvtpm_crq_process(crq, ibmvtpm);
		crq->valid = 0;
		wmb();
		smp_wmb();
	}

	vio_enable_interrupts(ibmvtpm->vdev);
	spin_unlock_irqrestore(&ibmvtpm->lock, flags);
	return IRQ_HANDLED;
}

/**
@@ -650,9 +628,6 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
		goto reg_crq_cleanup;
	}

	tasklet_init(&ibmvtpm->tasklet, (void *)ibmvtpm_tasklet,
		     (unsigned long)ibmvtpm);

	rc = request_irq(vio_dev->irq, ibmvtpm_interrupt, 0,
			 tpm_ibmvtpm_driver_name, ibmvtpm);
	if (rc) {
@@ -666,13 +641,14 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,
		goto init_irq_cleanup;
	}

	init_waitqueue_head(&ibmvtpm->wq);

	crq_q->index = 0;

	ibmvtpm->dev = dev;
	ibmvtpm->vdev = vio_dev;
	chip->vendor.data = (void *)ibmvtpm;

	spin_lock_init(&ibmvtpm->lock);
	spin_lock_init(&ibmvtpm->rtce_lock);

	rc = ibmvtpm_crq_send_init(ibmvtpm);
@@ -689,7 +665,6 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev,

	return rc;
init_irq_cleanup:
	tasklet_kill(&ibmvtpm->tasklet);
	do {
		rc1 = plpar_hcall_norets(H_FREE_CRQ, vio_dev->unit_address);
	} while (rc1 == H_BUSY || H_IS_LONG_BUSY(rc1));
+2 −3
Original line number Diff line number Diff line
@@ -38,13 +38,12 @@ struct ibmvtpm_dev {
	struct vio_dev *vdev;
	struct ibmvtpm_crq_queue crq_queue;
	dma_addr_t crq_dma_handle;
	spinlock_t lock;
	struct tasklet_struct tasklet;
	u32 rtce_size;
	void __iomem *rtce_buf;
	dma_addr_t rtce_dma_handle;
	spinlock_t rtce_lock;
	struct ibmvtpm_crq crq_res;
	wait_queue_head_t wq;
	u16 res_len;
	u32 vtpm_version;
};

Loading