Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

From: Serge E. Hallyn
Date: Mon Nov 06 2017 - 10:03:11 EST


Quoting Mahesh Bandewar (àààà ààààààà) (maheshb@xxxxxxxxxx):
> On Sat, Nov 4, 2017 at 4:53 PM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote:
> >
> > Quoting Mahesh Bandewar (mahesh@xxxxxxxxxxxx):
> > > Init-user-ns is always uncontrolled and a process that has SYS_ADMIN
> > > that belongs to uncontrolled user-ns can create another (child) user-
> > > namespace that is uncontrolled. Any other process (that either does
> > > not have SYS_ADMIN or belongs to a controlled user-ns) can only
> > > create a user-ns that is controlled.
> >
> > That's a huge change though. It means that any system that previously
> > used unprivileged containers will need new privileged code (which always
> > risks more privilege leaks through the new code) to re-enable what was
> > possible without privilege before. That's a regression.
> >
> I wouldn't call it a regression since the existing behavior is
> preserved as it is if the default-mask is not altered. i.e.
> uncontrolled process can create user-ns and have full control inside
> that user-ns. The only difference is - as an example if 'something'
> comes up which makes a specific capability expose ring-0, so admin can
> quickly remove the capability in question from the mask, so that no
> untrusted code can exploit that capability until either the kernel is

Oh, sorry, I misread then, and missed that step. I thought the default
with this patchset was that there were no capabilities exposed to user
namespaces.

> patched or workloads are sanitized keeping in mind what was
> discovered. (I have given some real life example vulnerabilities
> published recently about CAP_NET_RAW in the cover letter)
>
> > I'm very much interested in what you want to do, But it seems like
> > it would be worth starting with some automated code analysis that shows
> > exactly what code becomes accessible to unprivileged users with user
> > namespaces which was accessible to unprivileged users before. Then we
> > can reason about classifying that code and perhaps limiting access to
> > some of it.
> I would like to look at this as 'a tool' that is available to admins
> who can quickly take possible-compromise-situation under-control
> probably at the cost of some functionality-loss until kernel is
> patched and the mask is restored to default value.

The thing that makes me hesitate with this set is that it is a
permanent new feature to address what (I hope) is a temporary
problem. What would you think about doing this as a stackable
(yama-style) LSM?

> I'm not sure if automated tools could discover anything since these
> changes should not alter behavior in any way.

Seems like there are two naive ways to do it, the first being to just
look at all code under ns_capable() plus code called from there. It
seems like looking at the result of that could be fruitful.

-serge