In order to protect against ptrace(2) and similar attacks on container
runtimes when they join namespaces, many runtimes set mm->dumpable to
SUID_DUMP_DISABLE. However, doing this means that attempting to set up
an unprivileged user namespace will fail because an unprivileged process
can no longer access /proc/self/{setgroups,{uid,gid}_map} for the
container process (which is the same uid as the runtime process).
Fix this by changing pid_getattr to *also* change the owner of regular
files that have a mode of 0644 (when the process is not dumpable). This
ensures that the important /proc/[pid]/... files mentioned above are
properly accessible by a container runtime in a rootless container
context.
The most blantant issue is that a non-dumpable process in a rootless
container context is unable to open /proc/self/setgroups, because it
doesn't own the file.
int main(void)
{
prctl(PR_SET_DUMPABLE, 0, 0, 0, 0);
unshare(CLONE_NEWUSER);
/* This will fail. */
int fd = open("/proc/self/setgroups", O_WRONLY);
if (fd < 0)
abort();
return 0;
}
I do agree that failing to open anything in /proc/self/ is more than
unexepcted! I cannot judge the patch but my gut feeling tells me that
the fix should be somewhere in the open handler.