Re: [PATCH] prctl: require checkpoint_restore_ns_capable for PR_SET_MM_MAP
From: Lorenzo Stoakes (Oracle)
Date: Thu Apr 02 2026 - 09:06:52 EST
On Thu, Apr 02, 2026 at 07:13:32PM +0800, Qi Tang wrote:
> prctl_set_mm_map() allows modifying all mm_struct boundaries and
> the saved auxv vector. The individual field path (PR_SET_MM_START_CODE
> etc.) correctly requires CAP_SYS_RESOURCE, but the PR_SET_MM_MAP path
> dispatches before this check and has no capability requirement of its
> own when exe_fd is -1.
>
> This means any unprivileged user on a CONFIG_CHECKPOINT_RESTORE kernel
> (nearly all distros) can rewrite mm boundaries including start_brk, brk,
> arg_start/end, env_start/end and saved_auxv. Consequences include:
>
> - SELinux PROCESS__EXECHEAP bypass via start_brk manipulation
> - procfs info disclosure by pointing arg/env ranges at other memory
> - auxv poisoning (AT_SYSINFO_EHDR, AT_BASE, AT_ENTRY)
>
> The original commit f606b77f1a9e ("prctl: PR_SET_MM -- introduce
> PR_SET_MM_MAP operation") states "we require the caller to be at least
> user-namespace root user", but this was never enforced in the code.
>
> Add a checkpoint_restore_ns_capable() check at the top of
> prctl_set_mm_map(), after the PR_SET_MM_MAP_SIZE early return. This
> requires CAP_CHECKPOINT_RESTORE or CAP_SYS_ADMIN in the caller's
> user namespace, matching the stated design intent and the existing
> check for exe_fd changes.
>
> Fixes: f606b77f1a9e ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
We've had a gaping security hole since 2014 and nobody noticed? I find it
hard to believe.
> Cc: stable@xxxxxxxxxxxxxxx
> Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
> Signed-off-by: Qi Tang <tpluszz77@xxxxxxxxx>
> ---
> kernel/sys.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index c86eba9aa7e9..2b8c57f23a35 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2071,6 +2071,9 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
> return put_user((unsigned int)sizeof(prctl_map),
> (unsigned int __user *)addr);
>
> + if (!checkpoint_restore_ns_capable(current_user_ns()))
> + return -EPERM;
Hmm there is already:
if (prctl_map.exe_fd != (u32)-1) {
/*
* Check if the current user is checkpoint/restore capable.
* At the time of this writing, it checks for CAP_SYS_ADMIN
* or CAP_CHECKPOINT_RESTORE.
* Note that a user with access to ptrace can masquerade an
* arbitrary program as any executable, even setuid ones.
* This may have implications in the tomoyo subsystem.
*/
if (!checkpoint_restore_ns_capable(current_user_ns()))
return -EPERM;
And you're proposing _adding_ this check on top of that? Seems super
redundant.
but also, this seems super-specific buuut... Then again #ifdef
CONFIG_CHECKPOINT_RESTORE around this. Ugh.
I _hate_ this inteface. HATE HATE HATE it.
Anyway, does updating _your own_ auxv really require elevated permissions
like this?
I don't think so? Couldn't you go and manipulate that anyway without
elevated anything?
> +
> if (data_size != sizeof(prctl_map))
> return -EINVAL;
>
> --
> 2.43.0
>
This all seems unnecessary and in fact, surely would break userspace? Am I
missing something here?
Thanks, Lorenzo