Re: [PATCH] prctl: require checkpoint_restore_ns_capable for PR_SET_MM_MAP
From: Qi Tang
Date: Thu Apr 02 2026 - 23:55:06 EST
On Thu, Apr 2, 2026, Andrei Vagin wrote:
> A approach is to eliminate CAP_SYS_RESOURCE check but pass all
> new values in one bundle, which would allow the kernel to make
> more intensive test for sanity of values and same time allow us
> to support checkpoint/restore of user namespaces.
>
> The initial implementation of PR_SET_MM_MAP didn't have the
> capability check.
This clears up the history. The two paths have different
permission models by design, not by accident.
On Thu, Apr 2, 2026, Lorenzo Stoakes wrote:
> But if it's your process does it really matter? You can
> manipulate memory all over the place in your process...
I went back and checked each impact I claimed. The SELinux
execheap bypass does not work because file_map_prot_check()
still enforces PROCESS__EXECMEM on anonymous mappings
regardless of start_brk. The procfs paths use
access_remote_vm() which safely returns zero for unmapped
addresses. auxv only affects the process itself. So yes,
it doesn't really matter.
I should have verified these claims more carefully before
sending the patch. Lesson learned.
Please drop this patch.
That said, the man page still documents PR_SET_MM as requiring
CAP_SYS_RESOURCE, and the individual field path enforces it
while the MAP path does not. Might be worth a man-pages fix
or a code comment to make the intent explicit, but that's a
separate cleanup.
Thanks everyone for the thorough discussion.
Qi Tang