From: Michael Kerrisk (man-pages)
Date: Sun Aug 08 2021 - 05:10:08 EST

Hello Serge,

Your commit:

commit db2e718a47984b9d71ed890eb2ea36ecf150de18
Author: Serge E. Hallyn <serge@xxxxxxxxxx>
Date: Tue Apr 20 08:43:34 2021 -0500

capabilities: require CAP_SETFCAP to map uid 0

added a new requirement when updating a UID map a user namespace
with a value of '0 0 *'.

Kir sent a patch to briefly document this change, but I think much more
should be written. I've attempted to do so. Could you tell me whether the
following text (to be added in user_namespaces(7)) is accurate please:

In order for a process to write to the /proc/[pid]/uid_map
(/proc/[pid]/gid_map) file, all of the following requirements must
be met:


4. If updating /proc/[pid]/uid_map to create a mapping that maps
UID 0 in the parent namespace, then one of the following must
be true:

* if writing process is in the parent user namespace, then it
must have the CAP_SETFCAP capability in that user namespace;

* if the writing process is in the child user namespace, then
the process that created the user namespace must have had
the CAP_SETFCAP capability when the namespace was created.

This rule has been in place since Linux 5.12. It eliminates an
earlier security bug whereby a UID 0 process that lacks the
CAP_SETFCAP capability, which is needed to create a binary with
namespaced file capabilities (as described in capabilities(7)),
could nevertheless create such a binary, by the following

* Create a new user namespace with the identity mapping (i.e.,
UID 0 in the new user namespace maps to UID 0 in the parent
namespace), so that UID 0 in both namespaces is equivalent
to the same root user ID.

* Since the child process has the CAP_SETFCAP capability, it
could create a binary with namespaced file capabilities that
would then be effective in the parent user namespace (be‐
cause the root user IDs are the same in the two namespaces).




