[PATCH 00/10] [v6] System Calls for Memory Protection Keys

From: Dave Hansen
Date: Fri Jul 29 2016 - 12:30:28 EST

This set takes all of the feedback on the last version into
account and simplifies the ABI. It adds one feature: restrictive
'init_pkru' support. I realize it's during the merge window, but
I'm posting so folks who aren't busy with merge window activities
can take a look.

Barring any new issues, I think this is ready to be applied once
4.9 material is being queued.

Folks wishing to run this code can do so on any processor with
the new PKU support in qemu >=2.6. Just boot with -cpu
qemu64,+pku,+xsave, and make sure to apply this patch[1] to qemu.

Changes from v5:
* Removed pkey_set/get() system calls to simplify ABI
* Added 'init_pkru' support to ensure we have a restrictive
PKRU by default.
* Requisite changes to selftests, plus some bugfixes around
stdio in signal handlers


Memory Protection Keys for User pages (pkeys) is a CPU feature
which will first appear on Skylake Servers, but will also be
supported on future non-server parts. It provides a mechanism
for enforcing page-based protections, but without requiring
modification of the page tables when an application changes
wishes to change permissions.

Among other things, this feature was designed to help fix a class
of bugs in long-running applications where data corruption is
detected long after it occurs. Applications today either live
with the corruption or eat a huge performance penalty from
calling mprotect() frequently. The developers of these
applications are already running this code and are very eager to
see this feature merged and picked up in future distributions
where their customers can use it.

Patches to implement execute-only mapping support using pkeys
were merged in to 4.6. But, to do anything more useful with
pkeys, an application needs to be able to set the pkey field in
the PTE (obviously has to be done in-kernel) and make changes to
the "rights" register (using unprivileged instructions).

An application also needs to have an an allocator for the keys
themselves. If two different parts of an application both want
to protect their data with pkeys, they first need to know which
key to use for their individual purposes.

This set introduces 3 system calls:

sys_pkey_mprotect(): apply PTE to memory (patches #1-3)
sys_pkey_alloc(): ask the kernel for a free pkey (patch #4)
sys_pkey_free(): the reverse of alloc (patch #4)

I have manpages written for these syscalls, and have had multiple
rounds of reviews on the manpages list. I have not revised them
to remove pkey_get/set(), but will once this is merged in -tip.

This set is also available here:

git://git.kernel.org/pub/scm/linux/kernel/git/daveh/x86-pkeys.git pkeys-v040

I've written a set of unit tests for these interfaces, which is
available as the last patch in the series and integrated in to

Folks wishing to run this code can do so with the new PKU support
in qemu >=2.6. Just boot with -cpu qemu64,+pku,+xsave, and make
sure to apply this patch[1] to qemu.

=== diffstat ===

Dave Hansen (10):
x86, pkeys: add fault handling for PF_PK page fault bit
mm: implement new pkey_mprotect() system call
x86, pkeys: make mprotect_key() mask off additional vm_flags
x86, pkeys: allocation/free syscalls
x86: wire up protection keys system calls
generic syscalls: wire up memory protection keys syscalls
pkeys: add details of system call use to Documentation/
x86, pkeys: default to a restrictive init PKRU
x86, pkeys: allow configuration of init_pkru
x86, pkeys: add self-tests

Documentation/kernel-parameters.txt | 5 +
Documentation/x86/protection-keys.txt | 63 +
arch/alpha/include/uapi/asm/mman.h | 5 +
arch/mips/include/uapi/asm/mman.h | 5 +
arch/parisc/include/uapi/asm/mman.h | 5 +
arch/x86/entry/syscalls/syscall_32.tbl | 5 +
arch/x86/entry/syscalls/syscall_64.tbl | 5 +
arch/x86/include/asm/mmu.h | 8 +
arch/x86/include/asm/mmu_context.h | 25 +-
arch/x86/include/asm/pkeys.h | 73 +-
arch/x86/kernel/fpu/core.c | 4 +
arch/x86/kernel/fpu/xstate.c | 5 +-
arch/x86/mm/fault.c | 9 +
arch/x86/mm/pkeys.c | 143 +-
arch/xtensa/include/uapi/asm/mman.h | 5 +
include/linux/pkeys.h | 41 +-
include/linux/syscalls.h | 8 +
include/uapi/asm-generic/mman-common.h | 5 +
include/uapi/asm-generic/unistd.h | 12 +-
mm/mprotect.c | 90 +-
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/pkey-helpers.h | 219 +++
tools/testing/selftests/x86/protection_keys.c | 1411 +++++++++++++++++
23 files changed, 2116 insertions(+), 38 deletions(-)

=== changelog ===

Changes from v5:
* remove sys_pkey_get/set() to simplify the ABI. There was
concern they could not be easily vsyscall-accelerated.
* Added 'init_pkru' support to ensure we have a restrictive
PKRU by default.
* Requisite changes to selftests, plus some bugfixes around
stdio in signal handlers

Changes from v4:
* removed validate_pkey(). It was redundant with the work we do
in mm_pkey_alloc() and all of the mm_pkey_is_allocated() checks.
* reorder patches to wait to wire up any syscalls until the end.
* make allocation map functions explicity use unsigned masks
* some tweaks to changelog (and associated manpages)

Changes from v3:
* added generic syscalls declarations to include/linux/syscalls.h
to fix arm64 compile issue.

Changes from v2:
* selftest updates:
* formatting changes like what Ingo asked for with MPX
* actually call WRPKRU in __wrpkru()
* once __wrpkru() was fixed, revealed a bug in the ptrace
test where we were testing against the wrong pointer during
the "baseline" test
* Man-pages that match this set are here:

Changes from v1:
* updates to alloc/free patch description calling out that
"in-use" pkeys may still be pkey_free()'d successfully.
* Fixed a bug in the selftest where the 'flags' argument was
not passed to pkey_get().
* Added all syscalls to generic syscalls header
* Added extra checking to selftests so it doesn't fall over
when 1G pages are made the hugetlbfs default.

1. http://lists.nongnu.org/archive/html/qemu-devel/2016-07/msg04774.html

Cc: linux-api@xxxxxxxxxxxxxxx
Cc: linux-arch@xxxxxxxxxxxxxxx
Cc: linux-mm@xxxxxxxxx
Cc: x86@xxxxxxxxxx
Cc: torvalds@xxxxxxxxxxxxxxxxxxxx
Cc: akpm@xxxxxxxxxxxxxxxxxxxx
Cc: Arnd Bergmann <arnd@xxxxxxxx>
Cc: mgorman@xxxxxxxxxxxxxxxxxxx
Cc: Dave Hansen (Intel) <dave.hansen@xxxxxxxxx>