Re: [PATCH] prctl: require checkpoint_restore_ns_capable for PR_SET_MM_MAP

From: David Hildenbrand (Arm)

Date: Thu Apr 02 2026 - 09:55:42 EST


On 4/2/26 15:06, Lorenzo Stoakes (Oracle) wrote:
> On Thu, Apr 02, 2026 at 07:13:32PM +0800, Qi Tang wrote:
>> prctl_set_mm_map() allows modifying all mm_struct boundaries and
>> the saved auxv vector. The individual field path (PR_SET_MM_START_CODE
>> etc.) correctly requires CAP_SYS_RESOURCE, but the PR_SET_MM_MAP path
>> dispatches before this check and has no capability requirement of its
>> own when exe_fd is -1.
>>
>> This means any unprivileged user on a CONFIG_CHECKPOINT_RESTORE kernel
>> (nearly all distros) can rewrite mm boundaries including start_brk, brk,
>> arg_start/end, env_start/end and saved_auxv. Consequences include:
>>
>> - SELinux PROCESS__EXECHEAP bypass via start_brk manipulation
>> - procfs info disclosure by pointing arg/env ranges at other memory
>> - auxv poisoning (AT_SYSINFO_EHDR, AT_BASE, AT_ENTRY)
>>
>> The original commit f606b77f1a9e ("prctl: PR_SET_MM -- introduce
>> PR_SET_MM_MAP operation") states "we require the caller to be at least
>> user-namespace root user", but this was never enforced in the code.
>>
>> Add a checkpoint_restore_ns_capable() check at the top of
>> prctl_set_mm_map(), after the PR_SET_MM_MAP_SIZE early return. This
>> requires CAP_CHECKPOINT_RESTORE or CAP_SYS_ADMIN in the caller's
>> user namespace, matching the stated design intent and the existing
>> check for exe_fd changes.
>>
>> Fixes: f606b77f1a9e ("prctl: PR_SET_MM -- introduce PR_SET_MM_MAP operation")
>
> We've had a gaping security hole since 2014 and nobody noticed? I find it
> hard to believe.
>
>> Cc: stable@xxxxxxxxxxxxxxx
>> Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
>> Signed-off-by: Qi Tang <tpluszz77@xxxxxxxxx>
>> ---
>> kernel/sys.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/kernel/sys.c b/kernel/sys.c
>> index c86eba9aa7e9..2b8c57f23a35 100644
>> --- a/kernel/sys.c
>> +++ b/kernel/sys.c
>> @@ -2071,6 +2071,9 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>> return put_user((unsigned int)sizeof(prctl_map),
>> (unsigned int __user *)addr);
>>
>> + if (!checkpoint_restore_ns_capable(current_user_ns()))
>> + return -EPERM;
>
> Hmm there is already:
>
> if (prctl_map.exe_fd != (u32)-1) {
> /*
> * Check if the current user is checkpoint/restore capable.
> * At the time of this writing, it checks for CAP_SYS_ADMIN
> * or CAP_CHECKPOINT_RESTORE.
> * Note that a user with access to ptrace can masquerade an
> * arbitrary program as any executable, even setuid ones.
> * This may have implications in the tomoyo subsystem.
> */
> if (!checkpoint_restore_ns_capable(current_user_ns()))
> return -EPERM;
>
> And you're proposing _adding_ this check on top of that? Seems super
> redundant.

Yes, should be moved.

>
> but also, this seems super-specific buuut... Then again #ifdef
> CONFIG_CHECKPOINT_RESTORE around this. Ugh.
>
> I _hate_ this inteface. HATE HATE HATE it.
>
> Anyway, does updating _your own_ auxv really require elevated permissions
> like this?
>
> I don't think so? Couldn't you go and manipulate that anyway without
> elevated anything?

Hard to believe ...

I was wondering whether this could break some users. At least CRIU doc
states:

This option tells *criu* to accept the limitations when running
as non-root. Running as non-root requires *criu* at least to have
*CAP_SYS_ADMIN* or *CAP_CHECKPOINT_RESTORE*. For details about
running *criu* as non-root please consult the *NON-ROOT* section.

I mean, the check makes sense given that prctl_set_mm() rejects all
these operations without CAP_SYS_RESOURCE.


CAP_CHECKPOINT_RESTORE was not introduced before

commit 124ea650d3072b005457faed69909221c2905a1f
Author: Adrian Reber <areber@xxxxxxxxxx>
Date: Sun Jul 19 12:04:11 2020 +0200

capabilities: Introduce CAP_CHECKPOINT_RESTORE

So at the time PR_SET_MM_MAP was added there simply was no such capability.

Likely, now that we have it, we should indeed use it.

--
Cheers,

David