In order to protect against ptrace(2) and similar attacks on container
runtimes when they join namespaces, many runtimes set mm->dumpable to
SUID_DUMP_DISABLE. However, doing this means that attempting to set up
an unprivileged user namespace will fail because an unprivileged process
can no longer access /proc/self/{setgroups,{uid,gid}_map} for the
container process (which is the same uid as the runtime process).
Fix this by changing pid_getattr to *also* change the owner of regular
files that have a mode of 0644 (when the process is not dumpable). This
ensures that the important /proc/[pid]/... files mentioned above are
properly accessible by a container runtime in a rootless container
context.
The most blantant issue is that a non-dumpable process in a rootless
container context is unable to open /proc/self/setgroups, because it
doesn't own the file.
This changes a lot more than just setgroups, doesn't it? This bypasses
the task_dumpable check for all kinds of things.
Though, I expect the
has_pid_permissions() check to be the harder one to pass. Why does
has_pid_permissions() succeed in the case you've given?