[RFC PATCH v4 0/2] Providing mount in memfd_restricted() syscall

From: Ackerley Tng
Date: Mon Apr 10 2023 - 21:29:43 EST


Hello,

This patchset builds upon the memfd_restricted() system call that was
discussed in the 'KVM: mm: fd-based approach for supporting KVM' patch
series, at
https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@xxxxxxxxxxxxxxx/T/

The tree can be found at:
https://github.com/googleprodkernel/linux-cc/tree/restrictedmem-provide-mount-fd-rfc-v4

In this patchset, a modification to the memfd_restricted() syscall is
proposed, which allows userspace to provide a mount, on which the
restrictedmem file will be created and returned from the
memfd_restricted().

Allowing userspace to provide a mount allows userspace to control
various memory binding policies via tmpfs mount options, such as
Transparent HugePage memory allocation policy through
'huge=always/never' and NUMA memory allocation policy through
'mpol=local/bind:*'.

Changes since RFCv3:
+ Added check to ensure that bind mounts must be bind mounts of the
whole filesystem
+ Removed inappropriate check on fd’s permissions as Christian
suggested
+ Renamed RMFD_USERMNT to MEMFD_RSTD_USERMNT as David suggested
+ Added selftest to check that bind mounts must be bind mounts of the
whole filesystem

Changes since RFCv2:
+ Tightened semantics to accept only fds of the root of a tmpfs mount,
as Christian suggested
+ Added permissions check on the inode represented by the fd to guard
against creation of restrictedmem files on read-only tmpfs
filesystems or mounts
+ Renamed RMFD_TMPFILE to RMFD_USERMNT to better represent providing a
userspace mount to create a restrictedmem file on
+ Updated selftests for tighter semantics and added selftests to check
for permissions

Changes since RFCv1:
+ Use fd to represent mount instead of path string, as Kirill
suggested. I believe using fds makes this syscall interface more
aligned with the other syscalls like fsopen(), fsconfig(), and
fsmount() in terms of using and passing around fds
+ Remove unused variable char *orig_shmem_enabled from selftests

Dependencies:
+ Chao’s work on UPM, at
https://github.com/chao-p/linux/commits/privmem-v11.5

Links to earlier patch series:
+ RFC v3: https://lore.kernel.org/lkml/cover.1680306489.git.ackerleytng@xxxxxxxxxx/T/
+ RFC v2: https://lore.kernel.org/lkml/cover.1679428901.git.ackerleytng@xxxxxxxxxx/T/
+ RFC v1: https://lore.kernel.org/lkml/cover.1676507663.git.ackerleytng@xxxxxxxxxx/T/

Ackerley Tng (2):
mm: restrictedmem: Allow userspace to specify mount for
memfd_restricted
selftests: restrictedmem: Check memfd_restricted()'s handling of
provided userspace mount

include/linux/syscalls.h | 2 +-
include/uapi/linux/restrictedmem.h | 8 +
mm/restrictedmem.c | 73 ++-
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
.../selftests/mm/memfd_restricted_usermnt.c | 529 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 3 +
7 files changed, 611 insertions(+), 6 deletions(-)
create mode 100644 include/uapi/linux/restrictedmem.h
create mode 100644 tools/testing/selftests/mm/memfd_restricted_usermnt.c

--
2.40.0.577.gac1e443424-goog