Re: [PATCH] namei: permit linking with CAP_FOWNER in userns

From: Dirk Steinmetz
Date: Wed Oct 28 2015 - 11:07:42 EST

On Tue, 27 Oct 2015 20:28:02 +0000, Serge Hallyn wrote:
> Quoting Dirk Steinmetz (public@xxxxxxxxxxxxxxxxxx):
> > On Tue, 27 Oct 2015 09:33:44 -0500, Seth Forshee wrote:
> > > I did want to point what seems to be an inconsistency in how
> > > capabilities in user namespaces are handled with respect to inodes. When
> > > I started looking at this my initial thought was to replace
> > > capable(CAP_FOWNER) with capable_wrt_inode_uidgid(inode, CAP_FOWNER). On
> > > the face of it this should be equivalent to what's done here, but it
> > > turns out that capable_wrt_inode_uidgid requires that the inode's uid
> > > and gid are both mapped into the namespace whereas
> > > inode_owner_or_capable only requires the uid be mapped. I'm not sure how
> > > significant that is, but it seems a bit odd.
> >
> > I agree that this seems odd. I've chosen inode_owner_or_capable over
> > capable_wrt_inode_uidgid(inode, CAP_FOWNER) as it seemed consistent:
> > a privileged user (with CAP_SETUID) can impersonate the owner UID and thus
> > bypass the check completely; this also matches the documented behavior of
> > CAP_FOWNER: "Bypass permission checks on operations that normally require
> > the filesystem UID of the process to match the UID of the file".
> >
> > However, thinking about it I get the feeling that checking the gid seems
> > reasonable as well. This is, however, independently of user namespaces.
> > Consider the following scenario in any namespace, including the init one:
> > - A file has the setgid and user/group executable bits set, and is owned
> > by user:group.
> > - The user 'user' is not in the group 'group', and does not have any
> > capabilities.
> > - The user 'user' hardlinks the file. The permission check will succeed,
> > as the user is the owner of the file.
> > - The file is replaced with a newer version (for example fixing a security
> > issue)
> > - Now user can still use the hardlink-pinned version to execute the file
> > as 'user:group' (and for example exploit the security issue).
> > I would have expected the user to not be able to hardlink, as he lacks
> > CAP_FSETID, and thus is not allowed to chmod, change or move the file
> > without loosing the setgid bit. So it is impossible for him to make a non-
> > hardlink copy with the setgid bit set -- why should he be able to make a
> > hardlinked one?
> Yeah, this sounds sensible. It allows a user without access to 'disk',
> for instance, to become that group.
> > It seems to me as if may_linkat would additionally require a check
> > verifying that either
> > - not both setgid and group executable bit set
> > - fsgid == owner gid
> > - capable_wrt_inode_uidgid(CAP_FSETID) -- or CAP_FOWNER, depending on
> > whether to adapt chmod's behavior or keeping everything hardlink-
> > related in CAP_FOWNER; I don't feel qualified enough to pick ;)
> In particular just changing it is not ok since people who are using file
> capabilities to grant what they currently need would be stuck with a
> mysterious new failure.

Is there any use case (besides exploiting hardlinks with malicious intent)
that would be broken when changing this? There are some (imho) rather
unlikely conditions to be met in order to observe changed behavior:
- a user owns an executable setgid-file belonging to a group he is not in
- the user does not have CAP_FSETID (or CAP_FOWNER, depending on which one
is chosen to be required)
- the user is for some legitimate reason supposed to hardlink the file
If these conditions are not met in practice, the change would not break
anything. In that case, it would be imho better to not provide
backward-compatibility to reduce complexity in these checks. Else, I'd
propose adding a new possible value '2' for
/proc/sys/fs/protected_hardlinks, while keeping '1' for the current

I can provide an initial draft for either of the options, but would like
recommendations to which of the two ways to take (or is there a third
one?), as well as comments on the new condition itself: may_linkat would
block hardlinks when all of the following conditions are met:
- sysctrl_protected_hardlinks is enabled or 2 (depending on way)
- inode uid != fsuid and no CAP_FOWNER (for userns: with mapping on uid),
while the hardlink source is not a regular file, is a setuid-executable
or is not accessible for reading and writing
- inode gid not fsgid or in supplemental gids and no CAP_FSETID (for
userns: with mapping on gid -- not sure whether the uid is relevant?),
while the hardlink source is a setgid-executable (with group executable
bit set)

If anyone else wants to fix the issue, thats fine with me as well.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at