Re: [PATCH v2] userns: Disallow setgroups unless the gid_map writer is privileged

From: Andy Lutomirski
Date: Tue Dec 02 2014 - 15:13:59 EST

On Tue, Dec 2, 2014 at 11:45 AM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:
>> On Tue, Dec 2, 2014 at 4:09 AM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>>> Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:
>>>> Classic unix permission checks have an interesting feature. The
>>>> group permissions for a file can be set to less than the other
>>>> permissions on a file. Occasionally this is used deliberately to
>>>> give a certain group of users fewer permissions than the default.
>>>> User namespaces break this usage. Groups set in rgid or egid are
>>>> unaffected because an unprivileged user namespace creator can only
>>>> map a single group, so setresgid inside and outside the namespace
>>>> have the same effect. However, an unprivileged user namespace
>>>> creator can currently use setgroups(2) to drop all supplementary
>>>> groups, so, if a supplementary group denies access to some resource,
>>>> user namespaces can be used to bypass that restriction.
>>>> To fix this issue, this introduces a new user namespace flag
>>>> USERNS_SETGROUPS_ALLOWED. If that flag is not set, then
>>>> setgroups(2) will fail regardless of the caller's capabilities.
>>>> USERNS_SETGROUPS_ALLOWED is cleared in a new user namespace. By
>>>> default, if the writer of gid_map has CAP_SETGID in the parent
>>>> userns and the parent userns has USERNS_SETGROUPS_ALLOWED, then the
>>>> USERNS_SETGROUPS_ALLOWED will be set in the child. If the writer is
>>>> not so privileged, then writing to gid_map will fail unless the
>>>> writer adds "setgroups deny" to gid_map, in which case the check is
>>>> skipped but USERNS_SETGROUPS_ALLOWED will remain cleared.
>>>> The full semantics are:
>>>> If "setgroups allow" is present or no explicit "setgroups" setting
>>>> is written to gid_map, then writing to gid_map will fail with -EPERM
>>>> unless the opener and writer have CAP_SETGID in the parent namespace
>>>> and the parent namespace has USERNS_SETGROUPS_ALLOWED.
>>>> If "setgroups deny" is present, then writing gid_map will work as
>>>> before, but USERNS_SETGROUPS_ALLOWED will remain cleared. This will
>>>> result in processes in the userns that have CAP_SETGID to be
>>>> nontheless unable to use setgroups(2). If this breaks something
>>>> inside the userns, then this is okay -- the userns creator
>>>> specifically requested this behavior.
>>> I think we need to do this but I also think setgroups allow/deny
>>> should be a separate knob than the uid/gid mapping.
>> Yeah. It should be readable, too.
>>> If for no other reason than you missed at least two implementations of
>>> setgroups, in your implementation.
>> I clearly didn't grep hard enough. Grr.
>>>> While it could be safe to set USERNS_SETGROUPS_ALLOWED if the user
>>>> namespace creator has no supplementary groups, doing so could be
>>>> surprising and could have unpleasant interactions with setns(2).
>>>> Any application that uses newgidmap(1) should be unaffected by this
>>>> fix, but unprivileged users that create user namespaces to
>>>> manipulate mounts or sandbox themselves will break until they start
>>>> using "setgroups deny".
>>>> This should fix CVE-2014-8989.
>>>> Cc: stable@xxxxxxxxxxxxxxx
>>>> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
>>>> ---
>>>> Unlike v1, this *will* break things like Sandstorm. Fixing them will be
>>>> easy. I agree that this will result in better long-term semantics, but
>>>> I'm not so happy about breaking working software.
>>> I know what you mean. One of the pieces of software broken by all of
>>> this is my test to verify the remount semantics. Which makes all of
>>> this very unfortunate.
>>>> If this is unpalatable, here's a different option: get rid of all these
>>>> permission checks and just change setgroups. Specifically, make it so
>>>> that setgroups(2) in a userns will succeed but will silently refuse to
>>>> remove unmapped groups.
>>> Nope silently refusing to remove unmapped groups is not enough. I can
>>> make any gid in my supplemental groups my egid, it takes a sgid helper
>>> application but I don't need any special privileges to create that.
>>> Once that group is my egid I can map it. Which means I could drop
>>> any one group of my choosing without privielges. Which out and out
>>> breaks negative groups :(
>> Whoops, right. And you can, indeed, have egid match one of your
>> supplementary groups.
>>> I got to looking and I have a significant piece of code that all of this
>>> breaks.
>>> tools/testing/selftests/mount/unprivileged-remount-test.c
>>> So I am extra motivated to figure out at find a way to preserve most of
>>> the existing functionality. My regression tests won't pass until I can
>>> find something pallateable.
>>> It is very annoying that every option I have considered so far breaks
>>> something useful.
>>> Having a write once setgroups disable, and the allowing unprivileged
>>> mappings after that seems the most palatable option I have seen,
>>> semantically. Which means existing software that doesn't care about
>>> setgroups can just add the disable code and then work otherwise
>>> unmodified.
>>> The other option that I have played with is forcing a set of groups
>>> in setgroups if your user namespace was created without privilege,
>>> that winds up requiring that verify you don't have any other
>>> supplementary groups, and is generally messy whichever way I look at it.
>> How bad would the automatic selection of setgroups behavior really be?
>> Suppose we have /proc/self/userns_setgroups_mode that can be "allow",
>> "deny", or "auto". It starts out as "auto" (or "deny" if it's set to
>> "deny" in the parent). Once any of the maps have been set,
>> userns_options becomes readonly. If you try to write to gid_map when
>> setgroups == auto, then it switches to "allow" or "deny" depending on
>> whether the writer has privilege.
>> This is nasty magical behavior, but it should DTRT for existing users,
>> and everyone can be updated to set the value explicitly.
> Rarely is everything updated unless there is a requirement for an
> update.
> For my code that cares an update is necessary anyway as it contains
> a gratuitous setgroups(0, NULL).
> Since we have to break applications breaking them loud and clear and
> letting them set the flat to recover (if possible) seems the best we can
> do. That at least allows someone to ask if they depend on setgroups or
> init_groups.

Fair enough.

Any thoughts on what the API should be for v3?

>> FWIW, it might also make sense to move all of this stuff into
>> /proc/PID/userns. There may be races in which a setuid or otherwise
>> privileged helper pokes at more than one userns file but actually
>> modifies different namespaces each time. I don't know whether these
>> races matter. uid_map, gid_map, and projid_map could be symlinks.
> I don't see how moving these files as removing any races.

It helps if you use openat to open the userns directory and of the
/proc infrastructure is smart enough to make that work.

Admittedly, I don't actually see a dangerous race right now.


> Eric

Andy Lutomirski
AMA Capital Management, LLC
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at