[PATCH v5 0/5] userfaultfd: add /dev/userfaultfd for fine grained access control

From: Axel Rasmussen
Date: Mon Aug 08 2022 - 13:56:30 EST


This series is based on torvalds/master.

The series is split up like so:
- Patch 1 is a simple fixup which we should take in any case (even by itself).
- Patches 2-5 add the feature, configurable selftest support, and docs.

Why not ...?
============

- Why not /proc/[pid]/userfaultfd? Two main points (additional discussion [1]):

- /proc/[pid]/* files are all owned by the user/group of the process, and
they don't really support chmod/chown. So, without extending procfs it
doesn't solve the problem this series is trying to solve.

- The main argument *for* this was to support creating UFFDs for remote
processes. But, that use case clearly calls for CAP_SYS_PTRACE, so to
support this we could just use the UFFD syscall as-is.

- Why not use a syscall? Access to syscalls is generally controlled by
capabilities. We don't have a capability which is used for userfaultfd access
without also granting more / other permissions as well, and adding a new
capability was rejected [2].

- It's possible a LSM could be used to control access instead, but I have
some concerns. I don't think this approach would be as easy to use,
particularly if we were to try to solve this with something heavyweight
like SELinux. Maybe we could pursue adding a new LSM specifically for
this user case, but it may be too narrow of a case to justify that.

Changelog
=========

v4->v5:
- Call userfaultfd_syscall_allowed() directly in the syscall, so we don't
have to plumb a flag into new_userfaultfd(). [Nadav]
- Refactored run_vmtests.sh to loop over UFFD test mods. [Nadav]
- Reworded cover letter.
- Picked up some Acked-by's.

v3->v4:
- Picked up an Acked-by on 5/5.
- Updated cover letter to cover "why not ...".
- Refactored userfaultfd_allowed() into userfaultfd_syscall_allowed(). [Peter]
- Removed obsolete comment from a previous version. [Peter]
- Refactored userfaultfd_open() in selftest. [Peter]
- Reworded admin-guide documentation. [Mike, Peter]
- Squashed 2 commits adding /dev/userfaultfd to selftest and making selftest
configurable. [Peter]
- Added "syscall" test modifier (the default behavior) to selftest. [Peter]

v2->v3:
- Rebased onto linux-next/akpm-base, in order to be based on top of the
run_vmtests.sh refactor which was merged previously.
- Picked up some Reviewed-by's.
- Fixed ioctl definition (_IO instead of _IOWR), and stopped using
compat_ptr_ioctl since it is unneeded for ioctls which don't take a pointer.
- Removed the "handle_kernel_faults" bool, simplifying the code. The result is
logically equivalent, but simpler.
- Fixed userfaultfd selftest so it returns KSFT_SKIP appropriately.
- Reworded documentation per Shuah's feedback on v2.
- Improved example usage for userfaultfd selftest.

v1->v2:
- Add documentation update.
- Test *both* userfaultfd(2) and /dev/userfaultfd via the selftest.

[1]: https://patchwork.kernel.org/project/linux-mm/cover/20220719195628.3415852-1-axelrasmussen@xxxxxxxxxx/
[2]: https://lore.kernel.org/lkml/686276b9-4530-2045-6bd8-170e5943abe4@xxxxxxxxxxxxxxxx/T/

Axel Rasmussen (5):
selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh
userfaultfd: add /dev/userfaultfd for fine grained access control
userfaultfd: selftests: modify selftest to use /dev/userfaultfd
userfaultfd: update documentation to describe /dev/userfaultfd
selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh

Documentation/admin-guide/mm/userfaultfd.rst | 41 ++++++++++-
Documentation/admin-guide/sysctl/vm.rst | 3 +
fs/userfaultfd.c | 73 +++++++++++++++-----
include/uapi/linux/userfaultfd.h | 4 ++
tools/testing/selftests/vm/run_vmtests.sh | 15 ++--
tools/testing/selftests/vm/userfaultfd.c | 69 +++++++++++++++---
6 files changed, 172 insertions(+), 33 deletions(-)

--
2.37.1.559.g78731f0fdb-goog