Re: [PATCH] prctl: require checkpoint_restore_ns_capable for PR_SET_MM_MAP

From: Andrew Morton

Date: Thu Apr 02 2026 - 13:50:05 EST


On Thu, 2 Apr 2026 19:13:32 +0800 Qi Tang <tpluszz77@xxxxxxxxx> wrote:

> prctl_set_mm_map() allows modifying all mm_struct boundaries and
> the saved auxv vector. The individual field path (PR_SET_MM_START_CODE
> etc.) correctly requires CAP_SYS_RESOURCE, but the PR_SET_MM_MAP path
> dispatches before this check and has no capability requirement of its
> own when exe_fd is -1.
>
> This means any unprivileged user on a CONFIG_CHECKPOINT_RESTORE kernel
> (nearly all distros) can rewrite mm boundaries including start_brk, brk,
> arg_start/end, env_start/end and saved_auxv. Consequences include:
>
> - SELinux PROCESS__EXECHEAP bypass via start_brk manipulation
> - procfs info disclosure by pointing arg/env ranges at other memory
> - auxv poisoning (AT_SYSINFO_EHDR, AT_BASE, AT_ENTRY)
>
> The original commit f606b77f1a9e ("prctl: PR_SET_MM -- introduce
> PR_SET_MM_MAP operation") states "we require the caller to be at least
> user-namespace root user", but this was never enforced in the code.
>
> Add a checkpoint_restore_ns_capable() check at the top of
> prctl_set_mm_map(), after the PR_SET_MM_MAP_SIZE early return. This
> requires CAP_CHECKPOINT_RESTORE or CAP_SYS_ADMIN in the caller's
> user namespace, matching the stated design intent and the existing
> check for exe_fd changes.

Thanks.

AI review claims to have found a couple of things:
https://sashiko.dev/#/patchset/20260402111332.55957-1-tpluszz77@xxxxxxxxx