On 06/20/2017 01:42 AM, Amir Goldstein wrote:
On Tue, Jun 20, 2017 at 12:34 AM, Eric W. Biederman
"Serge E. Hallyn" <serge@xxxxxxxxxx> writes:Apropos stackable filesystems [cc some overlayfs folks], is there any
Quoting Stefan Berger (stefanb@xxxxxxxxxxxxxxxxxx):Agreed. I will take a look. I also want to see how all of this works
On 06/14/2017 11:05 PM, Serge E. Hallyn wrote:Thanks!
On Wed, Jun 14, 2017 at 08:27:40AM -0400, Stefan Berger wrote:I think I have something now that accomodates userns access to
On 06/13/2017 07:55 PM, Serge E. Hallyn wrote:That may be ok though. Assume the host created a container with
Quoting Stefan Berger (stefanb@xxxxxxxxxxxxxxxxxx):No, I don't have a patch. It may not be possible to implement it.
If all extendedRight, I missed that in your original email when I saw it this morning.
attributes were to support this model, maybe the 'uid' could be
associated with the 'name' of the xattr rather than its 'value' (not
sure whether that's possible).
It's not what my patch does, but it's an interesting idea. Do you have
a patch to that effect? We might even be able to generalize that to
The xattr_handler's take the name of the xattr as input to get().
100000 as the uid for root, which created a container with 130000 as
uid for root. If root in the nested container tries to read the
xattr, the kernel can check for security.foo first, then
security.foo, then security.foo. Or, it can do a listxattr
and look for those. Am I overlooking one?
So one could try to encode the mapped uid in the name. However, thatI thought that's exactly what you were suggesting in your original
could lead to problems with stale xattrs in a shared filesystem overHm. Yeah. But really how many setups are there like that? I.e. if
time unless one could limit the number of xattrs with the same
prefix, e.g., security.capability*. So I doubt that it would work.
you launch a regular docker or lxd container, the image doesn't do a
bind mount of a shared image, it layers something above it or does a
copy. What setups do you know of where multiple containers in different
user namespaces mount the same filesystem shared and writeable?
Encoding of uid is in the attribute name now as follows:This looks very close to what we want. One exception - we do want
1) The 'plain' security.capability is only r/w accessible from the
2) When userns reads/writes 'security.capability' it will read/write
security.capability@uid=<uid> instead, with uid being the uid of
root , e.g. 1000.
3) When listing xattrs for userns the host's security.capability is
filtered out to avoid read failures iof 'security.capability' if
security.capability@uid=<uid> is read but not there. (see 1) and 2))
4) security.capability* may all be read from anywhere
5) security.capability@uid=<uid> may be read or written directly
from a userns if <uid> matches the uid of root (current_uid())
to support root in a user namespace being able to write
security.capability@uid=<x> where <x> is a valid uid mapped in its
namespace. In that case the name should be rewritten to be
security.capability@uid=<y> where y is the unmapped kuid.val.
so far my patch hasn't yet hit Linus' tree. Given that, would you
mind taking a look and seeing what you think of this approach? If
we may decide to go this route, we probably should stop my patch
from hitting Linus' tree before we have to continue supporting it.
in the context of stackable filesystems. As that is the one case that
looked like it could be a problem case in your current patchset.
way that parts of this work could be generalized towards ns aware
I am at least removing all string comparison with xattr names from the core code and move the enabled xattr names into a list. For the security.* extended attribute names we would enumerated the enabled ones in that list, only security.capability for now. I am not sure how the trusted.* space works.