Re: [PATCH 1/1] simplified security.nscapability xattr

From: Serge E. Hallyn
Date: Mon May 16 2016 - 17:48:41 EST


On Mon, May 16, 2016 at 04:15:23PM -0500, Serge E. Hallyn wrote:
> Quoting Serge E. Hallyn (serge@xxxxxxxxxx):
> ...
> > There's a problem though. The above suffices to prevent an unprivileged user
> > in a user_ns from unsharing a user_ns to write a file capability and exploit
> > that capability in the ns where he is unprivileged. With one exception, which
> > is the case where the unprivileged user is mapped to the same kuid which
> > created the namespace. So if uid 1000 on the host creates a namespace
> > where uid 1000 maps to 1000 in the namespace, then 1000 in the namespace
> > can create a new user_ns, write the xattr, and exploit it from the
> > parent namespace. This is not an uncommon case. I'm not sure what to do about
> > it.
>
> Ok I think I've convinced myself that requiring a kuid 0 in the container
> and storing that in the security.nscapability is best solution. The DAC
> objection is imo not really valid - we don't have to give uid 0 in the
> container any special privilege, we just require that the ns have a uid 0
> mapping. I have not been able to think of any other reliable way to verify
> that the writer of the capability is authorized to grant privilege to the
> file when executed by current.
>
> I'm going to proceed with another POC based on the following design:
>
> 1. no new syscalls at the moment. You can choose to set/query
> security.nscapability, but can also just set security.capability from
> a user_ns and have the kernel transparently set a security.nscapability
> entry for you.
>
> 2. For now just a single security.nscapability entry, but in a format
> that turning it into an array will be a trivial change
>
> 3. When running file foo which has a security.nscapability for kuid 100000,
> then any namespace where kuid 100000 is root - or which has an ancestor ns where
> that is the case - will run the file with the listed capabilities.
>
> 4. When doing getxattr of security.capability from a user_ns, if there is a
> security.capability entry, that will be returned; else if there is a valid
> security.nscapability for your ns, that will be returned.
>
> 5. when doing a setxattr of security.capability from a user_ns, if there is
> a security.nscapability entry, you get EBUSY; else a security.nscapability
> with your root kuid will be written provided that (a) you are privileged
> over your namespace, (b) you are privileged over your root uid, (c) the
> file owner maps into your namespace.

Stéphane pointed out this isn't quite right. The EBUSY will happen if
a security.nscapability is defined with a kuid over which the writer is
not privileged - else it will overwrite. It will also happen if
security.capbility is set.

> 6. when doing a getxattr of security.nscapability, the entry will be shown
> with kuid mapped into your namespace or -1 if the uid does not map into
> your ns.
>
> 7. when doing a setxattr of security.nscapability, if an entry exists, you
> get -EBUSY; if you are not privileged over your ns, your root uid, and
> the file owner, then you get -EPERM; the xattr includes a uid field, which
> must be either 0 or a value valid in your ns. The value will be converted
> to a kuid and stored on disk. (Seth, I'm not sure offhand how that should
> mesh with your patches, we can talk about it after I send the next patch,
> which I'm quite certain will handle it wrongly)
>
> 8. If a security.capability exists, it will override any security.nscapability
> at execve() (so, inverse of my previous two patches).
>
> -serge