Re: [PATCH v2] xattr: Enable security.capability in user namespaces

From: Stefan Berger
Date: Wed Jul 12 2017 - 20:44:57 EST

On 07/12/2017 07:13 PM, Eric W. Biederman wrote:
"Serge E. Hallyn" <serge@xxxxxxxxxx> writes:

Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx):
Stefan Berger <"Stefan Bergerstefanb"> writes:
Signed-off-by: Stefan Berger <stefanb@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Serge Hallyn <serge@xxxxxxxxxx>
Reviewed-by: Serge Hallyn <serge@xxxxxxxxxx>
It doesn't look like this is coming through Serge so I don't see how
the Signed-off-by tag is legtimate.
This is mostly explained by the fact that there have been a *lot* of
changes, many of them discussed in private emails.

>From the replies to this it doesn't look like Serge has reviewed this
version either.

I am disappointed that all of my concerns about technical feasibility
remain unaddressed.
Can you re-state those, or give a link to them?
Well I only posted about one substantive comment on the last round
so it should be easy to find that said.

The big question is how does this intereact with filesystems
xattr implementations?

There is the potential that we create many more security xattrs this
way. How does that scale? With more names etc.

It doesn't scale. Shared filesystems are a problem if many containers use them.

'man listxattr' also mentions this here as a BUG:

" As noted in xattr(7), the VFS imposes a limit of 64 kB on the size of
the extended attribute name list returned by listxattr(7). If the
total size of attribute names attached to a file exceeds this limit,
it is no longer possible to retrieve the list of attribute names."

A simple test on ext4:

#> touch foo
#> for ((i = 0; i < 200; i++)); do setfattr -n${i} -v hello foo; done

user.foo126 was the last one created...

Depending on the size of the data the xattrs are writing, the limit is reached sooner. Writing 'hellohello' only goes up to 'user.foo112'. Maybe one could try to encode the data more efficiently or as Serge did write the uid on the xattr value side, but either way, it won't scale due to that VFS limit.


What happens if we have one xattr per uid for 1000+ uids?

How does this interact with filesystems optimization of xattr names?
For some filesystems they optmize the xattr names, and don't store the
entire thing.

I'd really like to get to a point where unprivileged containers can start
using filecaps - at this point if that means having an extra temporary
file format based on my earlier patchset while we hash this out, that
actually seems worthwhile. But it would of course be ideal if we could
do the name based caps right in the first place.
This whole new version has set my review back to square one