Re: [PATCH] fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT

From: Christian Brauner

Date: Thu Jan 29 2026 - 10:49:52 EST

On Thu, Jan 29, 2026 at 09:36:54AM -0500, Jeff Layton wrote:
> On Wed, 2024-07-24 at 09:53 -0500, Seth Forshee (DigitalOcean) wrote:
> > Christian noticed that it is possible for a privileged user to mount
> > most filesystems with a non-initial user namespace in sb->s_user_ns.
> > When fsopen() is called in a non-init namespace the caller's namespace
> > is recorded in fs_context->user_ns. If the returned file descriptor is
> > then passed to a process priviliged in init_user_ns, that process can
> > call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock
> > with sb->s_user_ns set to the namespace of the process which called
> > fsopen().
> >
> > This is problematic. We cannot assume that any filesystem which does not
> > set FS_USERNS_MOUNT has been written with a non-initial s_user_ns in
> > mind, increasing the risk for bugs and security issues.
> >
> > Prevent this by returning EPERM from sget_fc() when FS_USERNS_MOUNT is
> > not set for the filesystem and a non-initial user namespace will be
> > used. sget() does not need to be updated as it always uses the user
> > namespace of the current context, or the initial user namespace if
> > SB_SUBMOUNT is set.
> >
> > Fixes: cb50b348c71f ("convenience helpers: vfs_get_super() and sget_fc()")
> > Reported-by: Christian Brauner <brauner@xxxxxxxxxx>
> > Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@xxxxxxxxxx>
> > ---
> > fs/super.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/fs/super.c b/fs/super.c
> > index 095ba793e10c..d681fb7698d8 100644
> > --- a/fs/super.c
> > +++ b/fs/super.c
> > @@ -736,6 +736,17 @@ struct super_block *sget_fc(struct fs_context *fc,
> > struct user_namespace *user_ns = fc->global ? &init_user_ns : fc->user_ns;
> > int err;
> >
> > + /*
> > + * Never allow s_user_ns != &init_user_ns when FS_USERNS_MOUNT is
> > + * not set, as the filesystem is likely unprepared to handle it.
> > + * This can happen when fsconfig() is called from init_user_ns with
> > + * an fs_fd opened in another user namespace.
> > + */
> > + if (user_ns != &init_user_ns && !(fc->fs_type->fs_flags & FS_USERNS_MOUNT)) {
> > + errorfc(fc, "mounting from non-initial user namespace is not allowed");
> > + return ERR_PTR(-EPERM);
> > + }
> > +
> > retry:
> > spin_lock(&sb_lock);
> > if (test) {
> >
> > ---
> > base-commit: 256abd8e550ce977b728be79a74e1729438b4948
> > change-id: 20240723-s_user_ns-fix-b00c31de1cb8
> >
> > Best regards,
>
> I sent an incorrect RFC patch for this yesterday, but this patch breaks

Oh? I did not see it.

> NFS mounting in containers for us, as the prohibited activity is
> exactly the process we use to do them.
>
> We basically have a task in the container do an fsopen() and then pass
> the fd to a daemon in the init namespace via unix socket. The daemon
> vets the NFS mount parameters (ensuring that the mount options are
> sane, and that we trust the server), and then does the mount inside the
> container.

The mountfsd model - kinda.

>
> We don't want to set FS_USERNS_MOUNT on NFS, because that would give
> the container carte blanche to mount anything it likes, even a
> malicious server. Do we need to split that flag into two? Maybe
> FS_USERNS_SAFE and FS_USERNS_MOUNT?

I think you can simply add FS_USERNS_DELEGATABLE and raise it for nfs.