[PATCH] prctl: require checkpoint_restore_ns_capable for PR_SET_MM_MAP
From: Qi Tang
Date: Thu Apr 02 2026 - 07:16:04 EST
prctl_set_mm_map() allows modifying all mm_struct boundaries and
the saved auxv vector. The individual field path (PR_SET_MM_START_CODE
etc.) correctly requires CAP_SYS_RESOURCE, but the PR_SET_MM_MAP path
dispatches before this check and has no capability requirement of its
own when exe_fd is -1.
This means any unprivileged user on a CONFIG_CHECKPOINT_RESTORE kernel
(nearly all distros) can rewrite mm boundaries including start_brk, brk,
arg_start/end, env_start/end and saved_auxv. Consequences include:
- SELinux PROCESS__EXECHEAP bypass via start_brk manipulation
- procfs info disclosure by pointing arg/env ranges at other memory
- auxv poisoning (AT_SYSINFO_EHDR, AT_BASE, AT_ENTRY)
The original commit f606b77f1a9e ("prctl: PR_SET_MM -- introduce
PR_SET_MM_MAP operation") states "we require the caller to be at least
user-namespace root user", but this was never enforced in the code.
Add a checkpoint_restore_ns_capable() check at the top of
prctl_set_mm_map(), after the PR_SET_MM_MAP_SIZE early return. This
requires CAP_CHECKPOINT_RESTORE or CAP_SYS_ADMIN in the caller's
user namespace, matching the stated design intent and the existing
check for exe_fd changes.
Fixes: f606b77f1a9e ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
Cc: stable@xxxxxxxxxxxxxxx
Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
Signed-off-by: Qi Tang <tpluszz77@xxxxxxxxx>
---
kernel/sys.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sys.c b/kernel/sys.c
index c86eba9aa7e9..2b8c57f23a35 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2071,6 +2071,9 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
return put_user((unsigned int)sizeof(prctl_map),
(unsigned int __user *)addr);
+ if (!checkpoint_restore_ns_capable(current_user_ns()))
+ return -EPERM;
+
if (data_size != sizeof(prctl_map))
return -EINVAL;
--
2.43.0