Re: [PATCH v4 03/21] fs: Allow sysfs and cgroupfs to share super blocks between user namespaces

From: Seth Forshee
Date: Tue May 17 2016 - 19:58:51 EST


On Tue, May 17, 2016 at 05:39:33PM -0500, Eric W. Biederman wrote:
> Seth Forshee <seth.forshee@xxxxxxxxxxxxx> writes:
>
> > Both of these filesystems already have use cases for mounting the
> > same super block from multiple user namespaces. For sysfs this
> > happens when using criu for snapshotting a container, where sysfs
> > is mnounted in the containers network ns but the hosts user ns.
> > The cgroup filesystem shares the same super block for all mounts
> > of the same hierarchy regardless of the namespace.
> >
> > As a result, the restriction on mounting a super block from a
> > single user namespace creates regressions for existing uses of
> > these filesystems. For these specific filesystems this
> > restriction isn't really necessary since the backing store is
> > objects in kernel memory and thus the ids assigned from inodes
> > is not subject to translation relative to s_user_ns.
> >
> > Add a new filesystem flag, FS_USERNS_SHARE_SB, which when set
> > causes sget_userns() to skip the check of s_user_ns. Set this
> > flag for the sysfs and cgroup filesystems to fix the
> > regressions.
>
> So this one needs to be sget_userns(..., &init_user_ns, ...).
> And not a new special case.

This is actually what I wanted to do, but based on a previous discussion
where I had suggested doing this (for a different reason) I came away
thinking you did not want it that way. So I'm happy with that change.

But if we do that it violates some of the assumptions of the patch to
rework MNT_NODEV on your testing branch (and also those behind patch 2
in this series). Something will need to be changed there to prevent a
regression in mount behavior when a user ns tries to mount without
MNT_NODEV when the mount inherited from its parent has it set.

> Apologies for not catching this earlier.

Actually this is a more recent patch, so you possibly hadn't seen it
before.

> I am looking at folding all of this into the patch that introduces
> sget_userns so that even bisects won't have regresssions.

That's fine with me.

Thanks,
Seth