Re: [PATCH 08/11] fuse: Support fuse filesystems outside of init_user_ns
From: Miklos Szeredi
Date: Tue Feb 13 2018 - 05:20:16 EST
On Mon, Feb 12, 2018 at 5:35 PM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> Miklos Szeredi <mszeredi@xxxxxxxxxx> writes:
>
>> On Fri, Dec 22, 2017 at 3:32 PM, Dongsu Park <dongsu@xxxxxxxxxx> wrote:
>>> From: Seth Forshee <seth.forshee@xxxxxxxxxxxxx>
>>>
>>> In order to support mounts from namespaces other than
>>> init_user_ns, fuse must translate uids and gids to/from the
>>> userns of the process servicing requests on /dev/fuse. This
>>> patch does that, with a couple of restrictions on the namespace:
>>>
>>> - The userns for the fuse connection is fixed to the namespace
>>> from which /dev/fuse is opened.
>>>
>>> - The namespace must be the same as s_user_ns.
>>>
>>> These restrictions simplify the implementation by avoiding the
>>> need to pass around userns references and by allowing fuse to
>>> rely on the checks in inode_change_ok for ownership changes.
>>> Either restriction could be relaxed in the future if needed.
>>
>> Can we not introduce potential userspace interface regressions?
>>
>> The issue with pid namespaces fixed in commit 5d6d3a301c4e ("fuse:
>> allow server to run in different pid_ns") will probably bite us here
>> as well.
>
> Maybe, but unlike the pid namespace no one has been able to mount
> fuse outside of init_user_ns so we are much less exposed. I agree we
> should be careful.
Have to wrap my head around all the rules here.
There's the may_mount() one:
ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN)
Um, first of all, why isn't it checking current->cred->user_ns?
Ah, there it is in sget():
ns_capable(user_ns, CAP_SYS_ADMIN)
I get the plain capable(CAP_SYS_ADMIN) check in sget_userns() if fs
doesn't have FS_USERNS_MOUNT. This is the one that prevents fuse
mounts from being created when (current->cred->user_ns !=
&init_user_ns).
Maybe there's a logic to this web of namespaces, but I don't yet see
it. Is it documented somewhere?
>> We basically need two modes of operation:
>>
>> a) old, backward compatible (not introducing any new failure mores),
>> created with privileged mount
>> b) new, non-backward compatible, created with unprivileged mount
>>
>> Technically there would still be a risk from breaking userspace, since
>> we are using the same entry point for both, but let's hope that no
>> practical problems come from that.
>
> Answering from a 10,000 foot perspective:
>
> There are two cases. Requests to read/write the filesystem from outside
> of s_user_ns. These run no risk of breaking userspace as this mode has
> not been implemented before.
This comes from the fact that (s_user_ns == &init_user_ns) and all
user namespaces are "inside" init_user_ns, right?
One question: why does current code use the from_[ug]id_munged()
variant, when the conversion can never fail. Or can it?
> Restrictions at mount time to ensure we are not dealing with a crazy mix
> of namespaces. This has a small chance of breaking someone's crazy
> setup.
>
>
> Dropping requests to read/write the filesystem when the requester does
> not map into s_user_ns should not be a problem to enable universally. If
> s_user_ns is init_user_ns everything maps so there is no restriction.
>
>
>
> What we can do if we want to ensure maximum backwards compatibility
> is if the fuse filesystem is mounted in init_user_ns but if device for
> the communication channel is opened in some other user namespace we
> can just force the communication channel to operate in init_user_ns.
>
> That will be 100% backwards compatible in all cases and as far as I can
> see remove the need for having different ``modes'' of operation.
Okay.
Thanks,
Miklos