Re: [PATCH v2 0/3] fuse: Add support for mounts from pid/user namespaces

From: Seth Forshee
Date: Sat Sep 27 2014 - 00:25:18 EST


On Fri, Sep 26, 2014 at 06:41:33PM -0700, Eric W. Biederman wrote:
> >> I am on the fence about what to do when a uid from the filesystem server
> >> or for other filesystems the on-disk data structures does not map, but
> >> make_bad_inode is simpler in conception. So make_bad_inode seems like
> >> a good place to start. For fuse especially this isn't hard because
> >> the make_bad_inode calls are already there to handle a change in i_mode.
> >
> > I agree that if we're unsure then make_bad_inode is a more logical place
> > to start, since it's easier to loosen the restrictions later than to
> > tighten them. I've got an initiail implementation that I'm in the
> > process of testing. If you want to take a look I've pushed it to:
> >
> > git://kernel.ubuntu.com/sforshee/linux.git fuse-userns
>
> Thanks. If I can scrape together enough focus I will look at it.
>
> As a second best thing here are my prototype from when I was looking at
> performing this fuse conversion last. Given that you had missed
> checking the from_kuid permission checks, it might be worth comparing
> and seeing if there is something else in these patches that would be
> worth including.

I think all of the from_kuid checks have been there for at least the
last two rounds of patches, but let me know if you see one I missed.

Looking at your patches, I see a lot that looks familiar. Seems like a
good sign :-)

I do see a few things I missed though. I've pointed those out below, and
in those cases I'm updating my patches. There are also some other
differences which I don't believe require any changes on my part, and
I've provided explanations for those. I also asked a few questions.

Miklos, are you satisfied with Eric's explanations about using a single
namespace and the permissions checks on uids? If so I should be able to
send updated patches early next week, otherwise let's continue trying to
resolve the points of disagreement.

> -static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
> +static int fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
> struct kstat *stat)
> {
> unsigned int blkbits;
> @@ -905,8 +905,12 @@ static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
> stat->ino = attr->ino;
> stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
> stat->nlink = attr->nlink;
> - stat->uid = make_kuid(&init_user_ns, attr->uid);
> - stat->gid = make_kgid(&init_user_ns, attr->gid);
> + stat->uid = make_kuid(fc->user_ns, attr->uid);
> + if (!uid_valid(stat->uid))
> + return -EOVERFLOW;
> + stat->gid = make_kgid(fc->user_ns, attr->gid);
> + if (!gid_valid(stat->gid))
> + return -EOVERFLOW;

If we were to go with the invalid id approach these checks shouldn't be
necessary, and they'll get converted to the overflow ids for userspace.
In my make_bad_inode patches I'm doing this check when we update the
inode and returning -EINVAL if it doesn't map. Then I changed
fuse_fillattr to just copy them from the inode, since fuse always
updates the inode right before calling fuse_fillattr anyway and that
makes it clear why we don't need to check the results of the conversion.

> static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
> @@ -973,7 +978,8 @@ static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
> attr_timeout(&outarg),
> attr_version);
> if (stat)
> - fuse_fillattr(inode, &outarg.attr, stat);
> + err = fuse_fillattr(inode, &outarg.attr,
> + stat);
> }
> }
> return err;

So right before calling fuse_fillattr here fuse_change_attributes is
called to update the inode, so it appears your patches would have
allowed setting inode->i_iuid to INVALID_UID?

> @@ -624,6 +626,8 @@ void fuse_conn_put(struct fuse_conn *fc)
> if (atomic_dec_and_test(&fc->count)) {
> if (fc->destroy_req)
> fuse_request_free(fc->destroy_req);
> + put_user_ns(fc->user_ns);
> + fc->user_ns = NULL;
> fc->release(fc);
> }
> }

I did this in fuse_free_con, but now looking again this is a bug for
cuse since it uses a different release method. I've updated my tree to
do it here instead.

> @@ -1033,6 +1037,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
> sb->s_maxbytes = MAX_LFS_FILESIZE;
> sb->s_time_gran = 1;
> sb->s_export_op = &fuse_export_operations;
> + sb->s_user_ns = get_user_ns(current_user_ns());
>
> file = fget(d.fd);
> err = -EINVAL;
> @@ -1149,6 +1154,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
> }
>
> kill_anon_super(sb);
> + put_user_ns(sb->s_user_ns);
> }

What's the point of s_user_ns? It doesn't seem to be used anywhere.

> From 9bdaa744858d296f361a92c8940c33f878aec169 Mon Sep 17 00:00:00 2001
> From: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> Date: Thu, 4 Oct 2012 13:42:34 -0700
> Subject: [PATCH 2/4] fuse: Teach fuse how to handle the pid namespace.

<snip>

> -static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
> +static int convert_fuse_file_lock(struct fuse_conn *fc,
> + const struct fuse_file_lock *ffl,
> struct file_lock *fl)
> {
> switch (ffl->type) {
> @@ -2145,7 +2146,9 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
>
> fl->fl_start = ffl->start;
> fl->fl_end = ffl->end;
> - fl->fl_pid = ffl->pid;
> + rcu_read_lock();
> + fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns));
> + rcu_read_unlock();
> break;
>
> default:
> @@ -2156,7 +2159,7 @@ static int convert_fuse_file_lock(const struct fuse_file_lock *ffl,
> }
>
> static void fuse_lk_fill(struct fuse_req *req, struct file *file,
> - const struct file_lock *fl, int opcode, pid_t pid,
> + const struct file_lock *fl, int opcode, struct pid *pid,
> int flock)
> {
> struct inode *inode = file_inode(file);
> @@ -2169,7 +2172,7 @@ static void fuse_lk_fill(struct fuse_req *req, struct file *file,
> arg->lk.start = fl->fl_start;
> arg->lk.end = fl->fl_end;
> arg->lk.type = fl->fl_type;
> - arg->lk.pid = pid;
> + arg->lk.pid = pid_nr_ns(pid, fc->pid_ns);
> if (flock)
> arg->lk_flags |= FUSE_LK_FLOCK;
> req->in.h.opcode = opcode;
> @@ -2191,7 +2194,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - fuse_lk_fill(req, file, fl, FUSE_GETLK, 0, 0);
> + fuse_lk_fill(req, file, fl, FUSE_GETLK, NULL, 0);
> req->out.numargs = 1;
> req->out.args[0].size = sizeof(outarg);
> req->out.args[0].value = &outarg;
> @@ -2199,7 +2202,7 @@ static int fuse_getlk(struct file *file, struct file_lock *fl)
> err = req->out.h.error;
> fuse_put_request(fc, req);
> if (!err)
> - err = convert_fuse_file_lock(&outarg.lk, fl);
> + err = convert_fuse_file_lock(fc, &outarg.lk, fl);
>
> return err;
> }
> @@ -2210,7 +2213,7 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
> struct fuse_conn *fc = get_fuse_conn(inode);
> struct fuse_req *req;
> int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
> - pid_t pid = fl->fl_type != F_UNLCK ? current->tgid : 0;
> + struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL;
> int err;
>
> if (fl->fl_lmops && fl->fl_lmops->lm_grant) {

D'oh, I did totally miss this file locking stuff. Your changes look
good, so I'll pretty much take them verbatim.

> @@ -628,6 +630,8 @@ void fuse_conn_put(struct fuse_conn *fc)
> fuse_request_free(fc->destroy_req);
> put_user_ns(fc->user_ns);
> fc->user_ns = NULL;
> + put_pid_ns(fc->pid_ns);
> + fc->pid_ns = NULL;
> fc->release(fc);
> }
> }

And I had a leak here for cuse as with the userns.

> From 55226d169826abd110d9bc60a6b079f6be3f6a46 Mon Sep 17 00:00:00 2001
> From: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> Date: Fri, 5 Oct 2012 10:18:28 -0700
> Subject: [PATCH 3/4] userns: fuse unprivileged mount suport
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> ---
> fs/fuse/inode.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 5284d7fda269..75f5326868e0 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1164,7 +1164,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
> static struct file_system_type fuse_fs_type = {
> .owner = THIS_MODULE,
> .name = "fuse",
> - .fs_flags = FS_HAS_SUBTYPE,
> + .fs_flags = FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
> .mount = fuse_mount,
> .kill_sb = fuse_kill_sb_anon,
> };

I have the order of my patches switched, then this one squashed in with
the one which adds userns support to fuse.

> From 6ae88ecfe4e8c8998478932ca225d1d9753b6c4b Mon Sep 17 00:00:00 2001
> From: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> Date: Fri, 5 Oct 2012 14:33:36 -0700
> Subject: [PATCH 4/4] fuse: Only allow read/writing user xattrs
>
> In the context of unprivileged mounts supporting anything except
> xattrs with the "user." prefix seems foolish. Return -EOPNOSUPP
> for all other types of xattrs.
>
> Cc: Miklos Szeredi <miklos@xxxxxxxxxx>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> ---
> fs/fuse/dir.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index d74c75a057cd..d84f5b819fab 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -13,6 +13,7 @@
> #include <linux/sched.h>
> #include <linux/namei.h>
> #include <linux/slab.h>
> +#include <linux/xattr.h>
>
> static bool fuse_use_readdirplus(struct inode *dir, struct dir_context *ctx)
> {
> @@ -1868,6 +1869,9 @@ static int fuse_setxattr(struct dentry *entry, const char *name,
> if (fc->no_setxattr)
> return -EOPNOTSUPP;
>
> + if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> + return -EOPNOTSUPP;
> +
> req = fuse_get_req_nopages(fc);
> if (IS_ERR(req))
> return PTR_ERR(req);
> @@ -1911,6 +1915,9 @@ static ssize_t fuse_getxattr(struct dentry *entry, const char *name,
> if (fc->no_getxattr)
> return -EOPNOTSUPP;
>
> + if (strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN) != 0)
> + return -EOPNOTSUPP;
> +
> req = fuse_get_req_nopages(fc);
> if (IS_ERR(req))
> return PTR_ERR(req);

This I hadn't considered, but it seems reasonable. Two questions though.

Why shouldn't we let root-owned mounts support namespaces other than
"user."?

Thinking ahead to (hopefully) allowing other filesystems to be mounted
from user namespaces, should this be enforced by the vfs instead?

Thanks,
Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/