Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 93c26d7d authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull protection keys syscall interface from Thomas Gleixner:
 "This is the final step of Protection Keys support which adds the
  syscalls so user space can actually allocate keys and protect memory
  areas with them. Details and usage examples can be found in the
  documentation.

  The mm side of this has been acked by Mel"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pkeys: Update documentation
  x86/mm/pkeys: Do not skip PKRU register if debug registers are not used
  x86/pkeys: Fix pkeys build breakage for some non-x86 arches
  x86/pkeys: Add self-tests
  x86/pkeys: Allow configuration of init_pkru
  x86/pkeys: Default to a restrictive init PKRU
  pkeys: Add details of system call use to Documentation/
  generic syscalls: Wire up memory protection keys syscalls
  x86: Wire up protection keys system calls
  x86/pkeys: Allocation/free syscalls
  x86/pkeys: Make mprotect_key() mask off additional vm_flags
  mm: Implement new pkey_mprotect() system call
  x86/pkeys: Add fault handling for PF_PK page fault bit
parents 5fa0eb0b 6679dac5
Loading
Loading
Loading
Loading
+5 −0
Original line number Diff line number Diff line
@@ -1666,6 +1666,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.

	initrd=		[BOOT] Specify the location of the initial ramdisk

	init_pkru=	[x86] Specify the default memory protection keys rights
			register contents for all processes.  0x55555554 by
			default (disallow access to all but pkey 0).  Can
			override in debugfs after boot.

	inport.irq=	[HW] Inport (ATI XL and Microsoft) busmouse driver
			Format: <irq>

+64 −6
Original line number Diff line number Diff line
@@ -18,10 +18,68 @@ even though there is theoretically space in the PAE PTEs. These
permissions are enforced on data access only and have no effect on
instruction fetches.

=========================== Config Option ===========================
=========================== Syscalls ===========================

This config option adds approximately 1.5kb of text. and 50 bytes of
data to the executable.  A workload which does large O_DIRECT reads
of holes in XFS files was run to exercise get_user_pages_fast().  No
performance delta was observed with the config option
enabled or disabled.
There are 3 system calls which directly interact with pkeys:

	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
	int pkey_free(int pkey);
	int pkey_mprotect(unsigned long start, size_t len,
			  unsigned long prot, int pkey);

Before a pkey can be used, it must first be allocated with
pkey_alloc().  An application calls the WRPKRU instruction
directly in order to change access permissions to memory covered
with a key.  In this example WRPKRU is wrapped by a C function
called pkey_set().

	int real_prot = PROT_READ|PROT_WRITE;
	pkey = pkey_alloc(0, PKEY_DENY_WRITE);
	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
	ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
	... application runs here

Now, if the application needs to update the data at 'ptr', it can
gain access, do the update, then remove its write access:

	pkey_set(pkey, 0); // clear PKEY_DENY_WRITE
	*ptr = foo; // assign something
	pkey_set(pkey, PKEY_DENY_WRITE); // set PKEY_DENY_WRITE again

Now when it frees the memory, it will also free the pkey since it
is no longer in use:

	munmap(ptr, PAGE_SIZE);
	pkey_free(pkey);

(Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
 An example implementation can be found in
 tools/testing/selftests/x86/protection_keys.c)

=========================== Behavior ===========================

The kernel attempts to make protection keys consistent with the
behavior of a plain mprotect().  For instance if you do this:

	mprotect(ptr, size, PROT_NONE);
	something(ptr);

you can expect the same effects with protection keys when doing this:

	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
	pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
	something(ptr);

That should be true whether something() is a direct access to 'ptr'
like:

	*ptr = foo;

or when the kernel does the access on the application's behalf like
with a read():

	read(fd, ptr, 1);

The kernel will send a SIGSEGV in both cases, but si_code will be set
to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
the plain mprotect() permissions are violated.
+5 −0
Original line number Diff line number Diff line
@@ -78,4 +78,9 @@
#define MAP_HUGE_SHIFT	26
#define MAP_HUGE_MASK	0x3f

#define PKEY_DISABLE_ACCESS	0x1
#define PKEY_DISABLE_WRITE	0x2
#define PKEY_ACCESS_MASK	(PKEY_DISABLE_ACCESS |\
				 PKEY_DISABLE_WRITE)

#endif /* __ALPHA_MMAN_H__ */
+5 −0
Original line number Diff line number Diff line
@@ -105,4 +105,9 @@
#define MAP_HUGE_SHIFT	26
#define MAP_HUGE_MASK	0x3f

#define PKEY_DISABLE_ACCESS	0x1
#define PKEY_DISABLE_WRITE	0x2
#define PKEY_ACCESS_MASK	(PKEY_DISABLE_ACCESS |\
				 PKEY_DISABLE_WRITE)

#endif /* _ASM_MMAN_H */
+5 −0
Original line number Diff line number Diff line
@@ -75,4 +75,9 @@
#define MAP_HUGE_SHIFT	26
#define MAP_HUGE_MASK	0x3f

#define PKEY_DISABLE_ACCESS	0x1
#define PKEY_DISABLE_WRITE	0x2
#define PKEY_ACCESS_MASK	(PKEY_DISABLE_ACCESS |\
				 PKEY_DISABLE_WRITE)

#endif /* __PARISC_MMAN_H__ */
Loading