Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security (cb60e3e6) · Commits · e / devices / android_kernel_oneplus_sm8150

Documentation/prctl/seccomp_filter.txt

0 → 100644

+163 −0

Original line number	Diff line number	Diff line
		SECure COMPuting with filters
		=============================

		Introduction
		------------

		A large number of system calls are exposed to every userland process
		with many of them going unused for the entire lifetime of the process.
		As system calls change and mature, bugs are found and eradicated. A
		certain subset of userland applications benefit by having a reduced set
		of available system calls. The resulting set reduces the total kernel
		surface exposed to the application. System call filtering is meant for
		use with those applications.

		Seccomp filtering provides a means for a process to specify a filter for
		incoming system calls. The filter is expressed as a Berkeley Packet
		Filter (BPF) program, as with socket filters, except that the data
		operated on is related to the system call being made: system call
		number and the system call arguments. This allows for expressive
		filtering of system calls using a filter program language with a long
		history of being exposed to userland and a straightforward data set.

		Additionally, BPF makes it impossible for users of seccomp to fall prey
		to time-of-check-time-of-use (TOCTOU) attacks that are common in system
		call interposition frameworks. BPF programs may not dereference
		pointers which constrains all filters to solely evaluating the system
		call arguments directly.

		What it isn't
		-------------

		System call filtering isn't a sandbox. It provides a clearly defined
		mechanism for minimizing the exposed kernel surface. It is meant to be
		a tool for sandbox developers to use. Beyond that, policy for logical
		behavior and information flow should be managed with a combination of
		other system hardening techniques and, potentially, an LSM of your
		choosing. Expressive, dynamic filters provide further options down this
		path (avoiding pathological sizes or selecting which of the multiplexed
		system calls in socketcall() is allowed, for instance) which could be
		construed, incorrectly, as a more complete sandboxing solution.

		Usage
		-----

		An additional seccomp mode is added and is enabled using the same
		prctl(2) call as the strict seccomp. If the architecture has
		CONFIG_HAVE_ARCH_SECCOMP_FILTER, then filters may be added as below:

		PR_SET_SECCOMP:
		Now takes an additional argument which specifies a new filter
		using a BPF program.
		The BPF program will be executed over struct seccomp_data
		reflecting the system call number, arguments, and other
		metadata. The BPF program must then return one of the
		acceptable values to inform the kernel which action should be
		taken.

		Usage:
		prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog);

		The 'prog' argument is a pointer to a struct sock_fprog which
		will contain the filter program. If the program is invalid, the
		call will return -1 and set errno to EINVAL.

		If fork/clone and execve are allowed by @prog, any child
		processes will be constrained to the same filters and system
		call ABI as the parent.

		Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
		run with CAP_SYS_ADMIN privileges in its namespace. If these are not
		true, -EACCES will be returned. This requirement ensures that filter
		programs cannot be applied to child processes with greater privileges
		than the task that installed them.

		Additionally, if prctl(2) is allowed by the attached filter,
		additional filters may be layered on which will increase evaluation
		time, but allow for further decreasing the attack surface during
		execution of a process.

		The above call returns 0 on success and non-zero on error.

		Return values
		-------------
		A seccomp filter may return any of the following values. If multiple
		filters exist, the return value for the evaluation of a given system
		call will always use the highest precedent value. (For example,
		SECCOMP_RET_KILL will always take precedence.)

		In precedence order, they are:

		SECCOMP_RET_KILL:
		Results in the task exiting immediately without executing the
		system call. The exit status of the task (status & 0x7f) will
		be SIGSYS, not SIGKILL.

		SECCOMP_RET_TRAP:
		Results in the kernel sending a SIGSYS signal to the triggering
		task without executing the system call. The kernel will
		rollback the register state to just before the system call
		entry such that a signal handler in the task will be able to
		inspect the ucontext_t->uc_mcontext registers and emulate
		system call success or failure upon return from the signal
		handler.

		The SECCOMP_RET_DATA portion of the return value will be passed
		as si_errno.

		SIGSYS triggered by seccomp will have a si_code of SYS_SECCOMP.

		SECCOMP_RET_ERRNO:
		Results in the lower 16-bits of the return value being passed
		to userland as the errno without executing the system call.

		SECCOMP_RET_TRACE:
		When returned, this value will cause the kernel to attempt to
		notify a ptrace()-based tracer prior to executing the system
		call. If there is no tracer present, -ENOSYS is returned to
		userland and the system call is not executed.

		A tracer will be notified if it requests PTRACE_O_TRACESECCOMP
		using ptrace(PTRACE_SETOPTIONS). The tracer will be notified
		of a PTRACE_EVENT_SECCOMP and the SECCOMP_RET_DATA portion of
		the BPF program return value will be available to the tracer
		via PTRACE_GETEVENTMSG.

		SECCOMP_RET_ALLOW:
		Results in the system call being executed.

		If multiple filters exist, the return value for the evaluation of a
		given system call will always use the highest precedent value.

		Precedence is only determined using the SECCOMP_RET_ACTION mask. When
		multiple filters return values of the same precedence, only the
		SECCOMP_RET_DATA from the most recently installed filter will be
		returned.

		Pitfalls
		--------

		The biggest pitfall to avoid during use is filtering on system call
		number without checking the architecture value. Why? On any
		architecture that supports multiple system call invocation conventions,
		the system call numbers may vary based on the specific invocation. If
		the numbers in the different calling conventions overlap, then checks in
		the filters may be abused. Always check the arch value!

		Example
		-------

		The samples/seccomp/ directory contains both an x86-specific example
		and a more generic example of a higher level macro interface for BPF
		program generation.



		Adding architecture support
		-----------------------

		See arch/Kconfig for the authoritative requirements. In general, if an
		architecture supports both ptrace_event and seccomp, it will be able to
		support seccomp filter with minor fixup: SIGSYS support and seccomp return
		value checking. Then it must just add CONFIG_HAVE_ARCH_SECCOMP_FILTER
		to its arch-specific Kconfig.

Documentation/security/Smack.txt

+164 −40

Original line number	Diff line number	Diff line
		@@ -15,7 +15,7 @@ at hand.

		Smack consists of three major components:
		- The kernel
		- A start-up script and a few modified applications
		- Basic utilities, which are helpful but not required
		- Configuration data

		The kernel component of Smack is implemented as a Linux
		@@ -23,37 +23,28 @@ Security Modules (LSM) module. It requires netlabel and
		works best with file systems that support extended attributes,
		although xattr support is not strictly required.
		It is safe to run a Smack kernel under a "vanilla" distribution.

		Smack kernels use the CIPSO IP option. Some network
		configurations are intolerant of IP options and can impede
		access to systems that use them as Smack does.

		The startup script etc-init.d-smack should be installed
		in /etc/init.d/smack and should be invoked early in the
		start-up process. On Fedora rc5.d/S02smack is recommended.
		This script ensures that certain devices have the correct
		Smack attributes and loads the Smack configuration if
		any is defined. This script invokes two programs that
		ensure configuration data is properly formatted. These
		programs are /usr/sbin/smackload and /usr/sin/smackcipso.
		The system will run just fine without these programs,
		but it will be difficult to set access rules properly.

		A version of "ls" that provides a "-M" option to display
		Smack labels on long listing is available.
		The current git repositories for Smack user space are:

		A hacked version of sshd that allows network logins by users
		with specific Smack labels is available. This version does
		not work for scp. You must set the /etc/ssh/sshd_config
		line:
		UsePrivilegeSeparation no
		git@gitorious.org:meego-platform-security/smackutil.git
		git@gitorious.org:meego-platform-security/libsmack.git

		The format of /etc/smack/usr is:
		These should make and install on most modern distributions.
		There are three commands included in smackutil:

		username smack
		smackload - properly formats data for writing to /smack/load
		smackcipso - properly formats data for writing to /smack/cipso
		chsmack - display or set Smack extended attribute values

		In keeping with the intent of Smack, configuration data is
		minimal and not strictly required. The most important
		configuration step is mounting the smackfs pseudo filesystem.
		If smackutil is installed the startup script will take care
		of this, but it can be manually as well.

		Add this line to /etc/fstab:

		@@ -61,19 +52,148 @@ Add this line to /etc/fstab:

		and create the /smack directory for mounting.

		Smack uses extended attributes (xattrs) to store file labels.
		The command to set a Smack label on a file is:
		Smack uses extended attributes (xattrs) to store labels on filesystem
		objects. The attributes are stored in the extended attribute security
		name space. A process must have CAP_MAC_ADMIN to change any of these
		attributes.

		The extended attributes that Smack uses are:

		SMACK64
		Used to make access control decisions. In almost all cases
		the label given to a new filesystem object will be the label
		of the process that created it.
		SMACK64EXEC
		The Smack label of a process that execs a program file with
		this attribute set will run with this attribute's value.
		SMACK64MMAP
		Don't allow the file to be mmapped by a process whose Smack
		label does not allow all of the access permitted to a process
		with the label contained in this attribute. This is a very
		specific use case for shared libraries.
		SMACK64TRANSMUTE
		Can only have the value "TRUE". If this attribute is present
		on a directory when an object is created in the directory and
		the Smack rule (more below) that permitted the write access
		to the directory includes the transmute ("t") mode the object
		gets the label of the directory instead of the label of the
		creating process. If the object being created is a directory
		the SMACK64TRANSMUTE attribute is set as well.
		SMACK64IPIN
		This attribute is only available on file descriptors for sockets.
		Use the Smack label in this attribute for access control
		decisions on packets being delivered to this socket.
		SMACK64IPOUT
		This attribute is only available on file descriptors for sockets.
		Use the Smack label in this attribute for access control
		decisions on packets coming from this socket.

		There are multiple ways to set a Smack label on a file:

		# attr -S -s SMACK64 -V "value" path
		# chsmack -a value path

		NOTE: Smack labels are limited to 23 characters. The attr command
		does not enforce this restriction and can be used to set
		invalid Smack labels on files.

		If you don't do anything special all users will get the floor ("_")
		label when they log in. If you do want to log in via the hacked ssh
		at other labels use the attr command to set the smack value on the
		home directory and its contents.
		A process can see the smack label it is running with by
		reading /proc/self/attr/current. A process with CAP_MAC_ADMIN
		can set the process smack by writing there.

		Most Smack configuration is accomplished by writing to files
		in the smackfs filesystem. This pseudo-filesystem is usually
		mounted on /smack.

		access
		This interface reports whether a subject with the specified
		Smack label has a particular access to an object with a
		specified Smack label. Write a fixed format access rule to
		this file. The next read will indicate whether the access
		would be permitted. The text will be either "1" indicating
		access, or "0" indicating denial.
		access2
		This interface reports whether a subject with the specified
		Smack label has a particular access to an object with a
		specified Smack label. Write a long format access rule to
		this file. The next read will indicate whether the access
		would be permitted. The text will be either "1" indicating
		access, or "0" indicating denial.
		ambient
		This contains the Smack label applied to unlabeled network
		packets.
		cipso
		This interface allows a specific CIPSO header to be assigned
		to a Smack label. The format accepted on write is:
		"%24s%4d%4d"["%4d"]...
		The first string is a fixed Smack label. The first number is
		the level to use. The second number is the number of categories.
		The following numbers are the categories.
		"level-3-cats-5-19 3 2 5 19"
		cipso2
		This interface allows a specific CIPSO header to be assigned
		to a Smack label. The format accepted on write is:
		"%s%4d%4d"["%4d"]...
		The first string is a long Smack label. The first number is
		the level to use. The second number is the number of categories.
		The following numbers are the categories.
		"level-3-cats-5-19 3 2 5 19"
		direct
		This contains the CIPSO level used for Smack direct label
		representation in network packets.
		doi
		This contains the CIPSO domain of interpretation used in
		network packets.
		load
		This interface allows access control rules in addition to
		the system defined rules to be specified. The format accepted
		on write is:
		"%24s%24s%5s"
		where the first string is the subject label, the second the
		object label, and the third the requested access. The access
		string may contain only the characters "rwxat-", and specifies
		which sort of access is allowed. The "-" is a placeholder for
		permissions that are not allowed. The string "r-x--" would
		specify read and execute access. Labels are limited to 23
		characters in length.
		load2
		This interface allows access control rules in addition to
		the system defined rules to be specified. The format accepted
		on write is:
		"%s %s %s"
		where the first string is the subject label, the second the
		object label, and the third the requested access. The access
		string may contain only the characters "rwxat-", and specifies
		which sort of access is allowed. The "-" is a placeholder for
		permissions that are not allowed. The string "r-x--" would
		specify read and execute access.
		load-self
		This interface allows process specific access rules to be
		defined. These rules are only consulted if access would
		otherwise be permitted, and are intended to provide additional
		restrictions on the process. The format is the same as for
		the load interface.
		load-self2
		This interface allows process specific access rules to be
		defined. These rules are only consulted if access would
		otherwise be permitted, and are intended to provide additional
		restrictions on the process. The format is the same as for
		the load2 interface.
		logging
		This contains the Smack logging state.
		mapped
		This contains the CIPSO level used for Smack mapped label
		representation in network packets.
		netlabel
		This interface allows specific internet addresses to be
		treated as single label hosts. Packets are sent to single
		label hosts without CIPSO headers, but only from processes
		that have Smack write access to the host label. All packets
		received from single label hosts are given the specified
		label. The format accepted on write is:
		"%d.%d.%d.%d label" or "%d.%d.%d.%d/%d label".
		onlycap
		This contains the label processes must have for CAP_MAC_ADMIN
		and CAP_MAC_OVERRIDE to be effective. If this file is empty
		these capabilities are effective at for processes with any
		label. The value is set by writing the desired label to the
		file or cleared by writing "-" to the file.

		You can add access rules in /etc/smack/accesses. They take the form:

		@@ -83,10 +203,6 @@ access is a combination of the letters rwxa which specify the
		kind of access permitted a subject with subjectlabel on an
		object with objectlabel. If there is no rule no access is allowed.

		A process can see the smack label it is running with by
		reading /proc/self/attr/current. A privileged process can
		set the process smack by writing there.

		Look for additional programs on http://schaufler-ca.com

		From the Smack Whitepaper:
		@@ -186,7 +302,7 @@ team. Smack labels are unstructured, case sensitive, and the only operation
		ever performed on them is comparison for equality. Smack labels cannot
		contain unprintable characters, the "/" (slash), the "\" (backslash), the "'"
		(quote) and '"' (double-quote) characters.
		Smack labels cannot begin with a '-', which is reserved for special options.
		Smack labels cannot begin with a '-'. This is reserved for special options.

		There are some predefined labels:

		@@ -194,7 +310,7 @@ There are some predefined labels:
		^ Pronounced "hat", a single circumflex character.
		* Pronounced "star", a single asterisk character.
		? Pronounced "huh", a single question mark character.
		@ Pronounced "Internet", a single at sign character.
		@ Pronounced "web", a single at sign character.

		Every task on a Smack system is assigned a label. System tasks, such as
		init(8) and systems daemons, are run with the floor ("_") label. User tasks
		@@ -246,13 +362,14 @@ The format of an access rule is:

		Where subject-label is the Smack label of the task, object-label is the Smack
		label of the thing being accessed, and access is a string specifying the sort
		of access allowed. The Smack labels are limited to 23 characters. The access
		specification is searched for letters that describe access modes:
		of access allowed. The access specification is searched for letters that
		describe access modes:

		a: indicates that append access should be granted.
		r: indicates that read access should be granted.
		w: indicates that write access should be granted.
		x: indicates that execute access should be granted.
		t: indicates that the rule requests transmutation.

		Uppercase values for the specification letters are allowed as well.
		Access mode specifications can be in any order. Examples of acceptable rules
		@@ -273,7 +390,7 @@ Examples of unacceptable rules are:

		Spaces are not allowed in labels. Since a subject always has access to files
		with the same label specifying a rule for that case is pointless. Only
		valid letters (rwxaRWXA) and the dash ('-') character are allowed in
		valid letters (rwxatRWXAT) and the dash ('-') character are allowed in
		access specifications. The dash is a placeholder, so "a-r" is the same
		as "ar". A lone dash is used to specify that no access should be allowed.

		@@ -297,6 +414,13 @@ but not any of its attributes by the circumstance of having read access to the
		containing directory but not to the differently labeled file. This is an
		artifact of the file name being data in the directory, not a part of the file.

		If a directory is marked as transmuting (SMACK64TRANSMUTE=TRUE) and the
		access rule that allows a process to create an object in that directory
		includes 't' access the label assigned to the new object will be that
		of the directory, not the creating process. This makes it much easier
		for two processes with different labels to share data without granting
		access to all of their files.

		IPC objects, message queues, semaphore sets, and memory segments exist in flat
		namespaces and access requests are only required to match the object in
		question.

Documentation/security/Yama.txt

+9 −1

Original line number	Diff line number	Diff line
		@@ -34,7 +34,7 @@ parent to a child process (i.e. direct "gdb EXE" and "strace EXE" still
		work), or with CAP_SYS_PTRACE (i.e. "gdb --pid=PID", and "strace -p PID"
		still work as root).

		For software that has defined application-specific relationships
		In mode 1, software that has defined application-specific relationships
		between a debugging process and its inferior (crash handlers, etc),
		prctl(PR_SET_PTRACER, pid, ...) can be used. An inferior can declare which
		other process (and its descendents) are allowed to call PTRACE_ATTACH
		@@ -46,6 +46,8 @@ restrictions, it can call prctl(PR_SET_PTRACER, PR_SET_PTRACER_ANY, ...)
		so that any otherwise allowed process (even those in external pid namespaces)
		may attach.

		These restrictions do not change how ptrace via PTRACE_TRACEME operates.

		The sysctl settings are:

		0 - classic ptrace permissions: a process can PTRACE_ATTACH to any other
		@@ -60,6 +62,12 @@ The sysctl settings are:
		inferior can call prctl(PR_SET_PTRACER, debugger, ...) to declare
		an allowed debugger PID to call PTRACE_ATTACH on the inferior.

		2 - admin-only attach: only processes with CAP_SYS_PTRACE may use ptrace
		with PTRACE_ATTACH.

		3 - no attach: no processes may use ptrace with PTRACE_ATTACH. Once set,
		this sysctl cannot be changed to a lower value.

		The original children-only logic was based on the restrictions in grsecurity.

		==============================================================

Documentation/security/keys.txt

+17 −0

Original line number	Diff line number	Diff line
		@@ -805,6 +805,23 @@ The keyctl syscall functions are:
		kernel and resumes executing userspace.


		(*) Invalidate a key.

		long keyctl(KEYCTL_INVALIDATE, key_serial_t key);

		This function marks a key as being invalidated and then wakes up the
		garbage collector. The garbage collector immediately removes invalidated
		keys from all keyrings and deletes the key when its reference count
		reaches zero.

		Keys that are marked invalidated become invisible to normal key operations
		immediately, though they are still visible in /proc/keys until deleted
		(they're marked with an 'i' flag).

		A process must have search permission on the key for this function to be
		successful.


		===============
		KERNEL SERVICES
		===============

MAINTAINERS

+2 −1

Original line number	Diff line number	Diff line
		@@ -1733,6 +1733,7 @@ S: Supported
		F: include/linux/capability.h
		F: security/capability.c
		F: security/commoncap.c
		F: kernel/capability.c

		CELL BROADBAND ENGINE ARCHITECTURE
		M: Arnd Bergmann <arnd@arndb.de>
		@@ -5950,7 +5951,7 @@ SECURITY SUBSYSTEM
		M: James Morris <james.l.morris@oracle.com>
		L: linux-security-module@vger.kernel.org (suggested Cc:)
		T: git git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git
		W: http://security.wiki.kernel.org/
		W: http://kernsec.org/
		S: Supported
		F: security/