Re: [PATCH] Introduce v3 namespaced file capabilities
From: Eric W. Biederman
Date: Thu Apr 27 2017 - 12:27:18 EST
ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:
> "Serge E. Hallyn" <serge@xxxxxxxxxx> writes:
>
>> Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx):
>>>
>>> "Serge E. Hallyn" <serge@xxxxxxxxxx> writes:
>>>
>>> > diff --git a/fs/xattr.c b/fs/xattr.c
>>> > index 7e3317c..75cc65a 100644
>>> > --- a/fs/xattr.c
>>> > +++ b/fs/xattr.c
>>> > @@ -170,12 +170,29 @@ int __vfs_setxattr_noperm(struct dentry *dentry, const char *name,
>>> > const void *value, size_t size, int flags)
>>> > {
>>> > struct inode *inode = dentry->d_inode;
>>> > - int error = -EAGAIN;
>>> > + int error;
>>> > + void *wvalue = NULL;
>>> > + size_t wsize = 0;
>>> > int issec = !strncmp(name, XATTR_SECURITY_PREFIX,
>>> > XATTR_SECURITY_PREFIX_LEN);
>>> >
>>> > - if (issec)
>>> > + if (issec) {
>>> > inode->i_flags &= ~S_NOSEC;
>>> > +
>>> > + if (!strcmp(name, "security.capability")) {
>>> > + error = cap_setxattr_convert_nscap(dentry, value, size,
>>> > + &wvalue, &wsize);
>>> > + if (error < 0)
>>> > + return error;
>>> > + if (wvalue) {
>>> > + value = wvalue;
>>> > + size = wsize;
>>> > + }
>>> > + }
>>> > + }
>>> > +
>>> > + error = -EAGAIN;
>>> > +
>>>
>>> Why is the conversion in __vfs_setxattr_noperm and not in setattr as
>>> was done for posix_acl_fix_xattr_from_user?
>>
>> I think I was thinking I wanted to catch all the vfs_setxattr operations,
>> but I don't think that's right. Moving to setxattr seems right. I'll
>> look around a bit more.
>
> Thanks. This is one of these little details that we want a good answer
> to why there. If you can document that in your patch description when
> you resend I would appreciate it.
Ok. Grrr.
Looking at this a little more getting it correct where we call the
conversion operation is critical.
I believe the current placement of cap_setxattr_convert_nscap is
actively wrong. In particular unless I am misleading something this
will trigger multiple conversions when setting one of these attributes
on overlayfs.
The stragey I adopted for for posix acls is:
On a write from userspace convert from current_user_ns() to &init_user_ns.
On a write to the filesystem convert from &init_user_ns to fs_user_ns.
On a read from the filesystem convert from fs_user_ns to &init_user_ns
On a read from the kernel to userspace convert from &init_user_ns
to current_user_ns().
Overall a good strategy but no one we can trivially adopt for the
capability xattr as the second write to filesystem method does not
appear to actually exist for anything except for posix acls.
I need to think a little more about how we want to accomplish this for
the capability xattr. My apoligies for leading you down a path that has
all of these bumps and then being sufficiently distracted not to help
you through this maze.
The only easy solution I can see is to just always keep things in
&init_user_ns inside the kernel. That works until we bring fuse or
other unprivileged mounts onboard that have storage outside of the
kernel.
Seth and I will have to rework that for fuse support but that sounds
better than not letting such an issue prevent us from merging the code.
Eric