Re: [PATCH] prctl: require checkpoint_restore_ns_capable for PR_SET_MM_MAP

From: David Hildenbrand (Arm)

Date: Thu Apr 02 2026 - 10:29:04 EST


On 4/2/26 16:21, Lorenzo Stoakes (Oracle) wrote:
> On Thu, Apr 02, 2026 at 03:55:27PM +0200, David Hildenbrand (Arm) wrote:
>> On 4/2/26 15:06, Lorenzo Stoakes (Oracle) wrote:
>>>
>>> We've had a gaping security hole since 2014 and nobody noticed? I find it
>>> hard to believe.
>>>
>>>
>>> Hmm there is already:
>>>
>>> if (prctl_map.exe_fd != (u32)-1) {
>>> /*
>>> * Check if the current user is checkpoint/restore capable.
>>> * At the time of this writing, it checks for CAP_SYS_ADMIN
>>> * or CAP_CHECKPOINT_RESTORE.
>>> * Note that a user with access to ptrace can masquerade an
>>> * arbitrary program as any executable, even setuid ones.
>>> * This may have implications in the tomoyo subsystem.
>>> */
>>> if (!checkpoint_restore_ns_capable(current_user_ns()))
>>> return -EPERM;
>>>
>>> And you're proposing _adding_ this check on top of that? Seems super
>>> redundant.
>>
>> Yes, should be moved.
>
> Well, I don't think this patch should be applied at all...
>

I mean a v2 would have to do that. Whether we would merge that is
another discussion :)

>>
>>>
>>> but also, this seems super-specific buuut... Then again #ifdef
>>> CONFIG_CHECKPOINT_RESTORE around this. Ugh.
>>>
>>> I _hate_ this inteface. HATE HATE HATE it.
>>>
>>> Anyway, does updating _your own_ auxv really require elevated permissions
>>> like this?
>>>
>>> I don't think so? Couldn't you go and manipulate that anyway without
>>> elevated anything?
>>
>> Hard to believe ...
>>
>> I was wondering whether this could break some users. At least CRIU doc
>> states:
>>
>> This option tells *criu* to accept the limitations when running
>> as non-root. Running as non-root requires *criu* at least to have
>> *CAP_SYS_ADMIN* or *CAP_CHECKPOINT_RESTORE*. For details about
>> running *criu* as non-root please consult the *NON-ROOT* section.
>
> Hmm. I wonder if we don't have more users than that though? Hard to rule out
> some weird program somewhere using it for some strange reason.

See my LXC example. My gut feeling is that there are more users.

Which then raises the question why this is still protected by that
kconfig option.

Something is off here, maybe :)

>
> Commit ebd6de681238 ("prctl: Allow local CAP_CHECKPOINT_RESTORE to change
> /proc/self/exe") explicitly _only_ restricted the exe link.
>
> So maybe these comment is in reference to _other_ operations other than non-exe
> changing PR_SET_MM_MAP, PR_SET_MM_MAP_SIZE?
>
>>
>> I mean, the check makes sense given that prctl_set_mm() rejects all
>> these operations without CAP_SYS_RESOURCE.
>
> Hmm but the CAP_SYS_RESOURCE check is only applicable to commands other than
> PR_SET_MM_MAP or PR_SET_MM_MAP_SIZE?
>
> #ifdef CONFIG_CHECKPOINT_RESTORE
> if (opt == PR_SET_MM_MAP || opt == PR_SET_MM_MAP_SIZE)
> return prctl_set_mm_map(opt, (const void __user *)addr, arg4);
> #endif
>
> if (!capable(CAP_SYS_RESOURCE))
> return -EPERM;
>
> ... rest ...

My point is that you can perform all these modifications without
CAP_SYS_RESOURCE through prctl_set_mm_map().

Like PR_SET_MM_AUXV.

It's all very inconsistent, that's what I am saying.

>
>>
>>
>> CAP_CHECKPOINT_RESTORE was not introduced before
>>
>> commit 124ea650d3072b005457faed69909221c2905a1f
>> Author: Adrian Reber <areber@xxxxxxxxxx>
>> Date: Sun Jul 19 12:04:11 2020 +0200
>>
>> capabilities: Introduce CAP_CHECKPOINT_RESTORE
>>
>> So at the time PR_SET_MM_MAP was added there simply was no such capability.
>>
>> Likely, now that we have it, we should indeed use it.
>
> But we did start using it in the exec_fd != -1 case?

The existing ns check was replaced at some point, yes.

>
> Hmm actually sorry it does more than just manipulating auxv, you can change a
> bunch of mm->... stuff.
>
> But if it's your process does it really matter? You can manipulate memory all
> over the place in your process...

Well, I am wondering why e.g., PR_SET_MM_AUXV etc requires CAP_SYS_RESOURCE.

PR_SET_MM_EXE_FILE I understand. The other not.

Extremely inconsistent.

--
Cheers,

David