Re: [PATCH 0/2] sysctl: allow CLONE_NEWUSER to be disabled

From: Austin S. Hemmelgarn
Date: Tue Jan 26 2016 - 09:47:36 EST


On 2016-01-26 09:38, Josh Boyer wrote:
On Mon, Jan 25, 2016 at 11:57 PM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
Kees Cook <keescook@xxxxxxxxxxxx> writes:

On Mon, Jan 25, 2016 at 11:33 AM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
Kees Cook <keescook@xxxxxxxxxxxx> writes:

Well, I don't know about less weird, but it would leave a unneeded
hole in the permission checks.

To be clear the current patch has my:

Nacked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>

The code is buggy, and poorly thought through. Your lack of interest in
fixing the bugs in your patch is distressing.

I'm not sure where you see me having a "lack of interest". The
existing cap-checking sysctls have a corner-case bug, which is
orthogonal to this change.

That certainly doesn't sound like you have any plans to change anything
there.

So broken code, not willing to fix. No. We are not merging this sysctl.

I think you're jumping to conclusions. :)

I think I am the maintainer.

What you are proposing is very much something that is only of interst to
people who are not using user namespaces. It is fatally flawed as
a way to avoid new attack surfaces for people who don't care as the
sysctl leaves user namespaces enabled by default. It is fatally flawed
as remediation to recommend to people to change if a new user namespace
related but is discovered. Any running process that happens to be
created while user namespace creation was enabled will continue to
exist. Effectively a reboot will be required as part of a mitigation.
Many sysadmins will get that wrong.

I can't possibly see your sysctl as proposed achieving it's goals. A
person has to be entirely too aware of subtlety and nuance to use it
effectively.

What you're saying is true for the "oh crap" case of a new userns
related CVE being found. However, there is the case where sysadmins
know for a fact that a set of machines should not allow user
namespaces to be enabled. Currently they have 2 choices, 1) use their
distro kernel as-is, which may not meet their goal of having userns
disabled, or 2) rebuild their kernel to disable it, which may
invalidate any support contracts they have.

I tend to agree with you on the lack of value around runtime
mitigation, but allowing an admin to toggle this as a blatant on/off
switch on reboot does have value.

This feature is already implemented by two distros, and likely wanted
by others. We cannot ignore that. The sysctl default doesn't change
the existing behavior, so this doesn't get in your way at all. Can you
please respond to my earlier email where I rebutted each of your
arguments against it? Just saying "no" and putting words in my mouth
isn't very productive.

Calling people who make mistakes insane is not a rebuttal. In security
usability matters, and your sysctl has low usability.

Further you seem to have missed something crucial in your understanding.
As was explained earlier the sysctl was added to ubuntu to allow early
adopters to experiment not as a long term way of managing user
namespaces.


What sounds like a generally useful feature that would cover your use
case and many others is a per user limit on the number of user
namespaces users may create.

Where that number may be zero? I don't see how that is really any
better than a sysctl. Could you elaborate?
It's a better option because it would allow better configurability. Take for example a single user desktop system with some network daemons. On such a system, the actual login used for the graphical environment by the user should be allowed at least a few user namespaces, because some software depends on them for security (Chrome for example, as well as some distro's build systems), but system users should be limited to at most one if they need it, and ideally zero, so that remote exploits couldn't give access to a user namespace.

Conversely, on a server system, it's not unreasonable to completely disable user namespaces for almost everything, except for giving one to services that use them properly for sand-boxing.

I will state though that I only feel this is a better solution given that two criteria are met:
1. You can set 0 as the limit.
2. You can configure this without needing some special software (this in particular means that seccomp is not an option).