Re: [PATCH v2] xattr: Enable security.capability in user namespaces

From: Serge E. Hallyn
Date: Thu Jul 13 2017 - 17:14:08 EST

Quoting Theodore Ts'o (tytso@xxxxxxx):
> On Thu, Jul 13, 2017 at 07:11:36AM -0500, Eric W. Biederman wrote:
> > The concise summary:
> >
> > Today we have the xattr security.capable that holds a set of
> > capabilities that an application gains when executed. AKA setuid root exec
> > without actually being setuid root.
> >
> > User namespaces have the concept of capabilities that are not global but
> > are limited to their user namespace. We do not currently have
> > filesystem support for this concept.
> So correct me if I am wrong; in general, there will only be one
> variant of the form:
> It's not like there will be:
> Except.... if you have an Distribution root directory which is shared
> by many containers, you would need to put the xattrs in the overlay
> inodes.

Is that a problem? Essentially people who would try to do the
above also want to use 'shiftfs' stackable filesystem, which would
presumably eventually do this for you.

> Worse, each time you launch a new container, with a new
> subuid allocation, you will have to iterate over all files with
> capabilities and do a copy-up operations on the xattrs in overlayfs.
> So that's actually a bit of a disaster.

Only if you create the container rootfs as a copy.

Note that generally they would want to walk the fs in that case anyway, to chown
the files into the container. And said chown would clear out any existing file
capabilities (and suid/sgid bits).

On the other hand, unprivileged lxc containers are created by
untarring the distro image straight into the mapped user namespace.
So no chowning is needed, and - once we we have this properly supported -
the filecaps should be automatically written correctly for the container.

> So for distribution overlays, you will need to do things a different
> way, which is to map the distro subdirectory so you know that the
> capability with the global uid 0 should be used for the container
> "root" uid, right?
> So this hack of using is *only* useful when the
> subcontainer root wants to create the privileged executable. You
> still have to do things the other way.
> So can we make perhaps the assertion that *either*:
> exists, *or*
> exists, but never both? And there BAR is exclusive to only one
> instances?

I think that's fine.