Re: [kernel-hardening] Re: [PATCH 0/2] sysctl: allow CLONE_NEWUSER to be disabled

From: Austin S. Hemmelgarn
Date: Tue Jan 26 2016 - 15:11:28 EST

On 2016-01-26 14:56, Josh Boyer wrote:
On Tue, Jan 26, 2016 at 12:20 PM, Serge Hallyn <serge.hallyn@xxxxxxxxxx> wrote:
Quoting Josh Boyer (jwboyer@xxxxxxxxxxxxxxxxx):
On Tue, Jan 26, 2016 at 9:46 AM, Austin S. Hemmelgarn
<ahferroin7@xxxxxxxxx> wrote:
On 2016-01-26 09:38, Josh Boyer wrote:

On Mon, Jan 25, 2016 at 11:57 PM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:

Kees Cook <keescook@xxxxxxxxxxxx> writes:

On Mon, Jan 25, 2016 at 11:33 AM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:

Kees Cook <keescook@xxxxxxxxxxxx> writes:

Well, I don't know about less weird, but it would leave a unneeded
hole in the permission checks.

To be clear the current patch has my:

Nacked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>

The code is buggy, and poorly thought through. Your lack of interest
fixing the bugs in your patch is distressing.

I'm not sure where you see me having a "lack of interest". The
existing cap-checking sysctls have a corner-case bug, which is
orthogonal to this change.

That certainly doesn't sound like you have any plans to change anything

So broken code, not willing to fix. No. We are not merging this

I think you're jumping to conclusions. :)

I think I am the maintainer.

What you are proposing is very much something that is only of interst to
people who are not using user namespaces. It is fatally flawed as
a way to avoid new attack surfaces for people who don't care as the
sysctl leaves user namespaces enabled by default. It is fatally flawed
as remediation to recommend to people to change if a new user namespace
related but is discovered. Any running process that happens to be
created while user namespace creation was enabled will continue to
exist. Effectively a reboot will be required as part of a mitigation.
Many sysadmins will get that wrong.

I can't possibly see your sysctl as proposed achieving it's goals. A
person has to be entirely too aware of subtlety and nuance to use it

What you're saying is true for the "oh crap" case of a new userns
related CVE being found. However, there is the case where sysadmins
know for a fact that a set of machines should not allow user
namespaces to be enabled. Currently they have 2 choices, 1) use their
distro kernel as-is, which may not meet their goal of having userns
disabled, or 2) rebuild their kernel to disable it, which may
invalidate any support contracts they have.

I tend to agree with you on the lack of value around runtime
mitigation, but allowing an admin to toggle this as a blatant on/off
switch on reboot does have value.

This feature is already implemented by two distros, and likely wanted
by others. We cannot ignore that. The sysctl default doesn't change
the existing behavior, so this doesn't get in your way at all. Can you
please respond to my earlier email where I rebutted each of your
arguments against it? Just saying "no" and putting words in my mouth
isn't very productive.

Calling people who make mistakes insane is not a rebuttal. In security
usability matters, and your sysctl has low usability.

Further you seem to have missed something crucial in your understanding.
As was explained earlier the sysctl was added to ubuntu to allow early
adopters to experiment not as a long term way of managing user

What sounds like a generally useful feature that would cover your use
case and many others is a per user limit on the number of user
namespaces users may create.

Where that number may be zero? I don't see how that is really any
better than a sysctl. Could you elaborate?

It's a better option because it would allow better configurability. Take for
example a single user desktop system with some network daemons. On such a
system, the actual login used for the graphical environment by the user
should be allowed at least a few user namespaces, because some software
depends on them for security (Chrome for example, as well as some distro's
build systems), but system users should be limited to at most one if they
need it, and ideally zero, so that remote exploits couldn't give access to a
user namespace.

Conversely, on a server system, it's not unreasonable to completely disable
user namespaces for almost everything, except for giving one to services
that use them properly for sand-boxing.

OK, so better granularity. Fine.

I will state though that I only feel this is a better solution given that
two criteria are met:
1. You can set 0 as the limit.
2. You can configure this without needing some special software (this in
particular means that seccomp is not an option).

I'd have to add 3. You can set a global default for all users that can
be overridden on a per user basis.

Otherwise you play whack-a-mole with every new user or daemon that
adds its own uid.

Given that you want per-user, does a per-uid rlimit, which could be -1
(unlimited) by default, inherited for all uids mapped into a namespace
owned by the uid, and which can be set (only reduced) by pam on login,
make sense?

To clarify, I don't actively want per-user. Eric suggested it and I'm
thinking through it from a theoretical perspective. I'd likely be
fine with a big ban-hammer that sysadmins can set, but that doesn't
mean I'm opposed to something more flexible if it makes sense.

I'm not sure if rlimit makes sense in the way you describe it. I
don't care about uids within an existing user namespace really. That
is icing on the cake. I was looking for something that would disallow
uids to create user namespaces to begin with (so inheritance wouldn't
matter). If rlimit is that mechanism, then I guess. Seems like an
odd fit though, particularly if you tie it to pam.
The PAM connection would be a side effect of it being an rlimit, not something by itself. It's not used on a lot of smaller systems because Linux is not used as much as a time sharing system, but part of the purpose of PAM was to be able to set rlimits and similar things on new sessions before the user could do anything. This is still used today, and /etc/security/limits.conf can be found on any modern Linux system which uses PAM.