Re: fsconfig parsing bugs
From: Christian Brauner
Date: Wed Apr 05 2023 - 10:12:12 EST
On Thu, Sep 01, 2022 at 04:50:28PM +0200, Christian Brauner wrote:
> On Wed, Aug 31, 2022 at 04:12:21PM -0700, Seth Jenkins wrote:
> > The codebase-wide refactor efforts to using the latest fs mounting API
> > (with support for fsopen/fsconfig/fsmount etc.) have introduced some
> > bugs into mount configuration parsing in several parse_param handlers,
> > most notably shmem_parse_one() which can be accessed from a userns.
> > There are several cases where the following code pattern is used:
> >
> > ctx->value = <expression>
> > if(ctx->value is invalid)
> > goto fail;
> > ctx->seen |= SHMEM_SEEN_X;
> > break;
> >
> > However, this coding pattern does not work in the case where multiple
> > fsconfig calls are made. For example, if I were to call fsconfig with
> > the key "nr_blocks" twice, the first time with a valid value, and the
> > second time with an invalid value, the invalid value will be persisted
> > and used upon creation of the mount for the value of ctx->blocks, and
> > consequently for sbinfo->max_blocks.
> >
> > This code pattern is used for Opt_nr_blocks, Opt_nr_inodes, Opt_uid,
> > Opt_gid and Opt_huge. Probably the proper thing to do is to check for
> > validity before assigning the value to the shmem_options struct in the
> > fs_context.
> >
> > We also see this code pattern replicated throughout other filesystems
> > for uid/gid resolution, including hugetlbfs, FUSE, ntfs3 and ffs.
> >
> > The other outstanding issue I noticed comes from the fact that
> > fsconfig syscalls may occur in a different userns than that which
> > called fsopen. That means that resolving the uid/gid via
> > current_user_ns() can save a kuid that isn't mapped in the associated
> > namespace when the filesystem is finally mounted. This means that it
> > is possible for an unprivileged user to create files owned by any
> > group in a tmpfs mount (since we can set the SUID bit on the tmpfs
> > directory), or a tmpfs that is owned by any user, including the root
> > group/user. This is probably outside the original intention of this
> > code.
> >
> > The fix for this bug is not quite so simple as the others. The options
> > that I've assessed are:
> >
> > - Resolve the kuid/kgid via the fs_context namespace - this does
> > however mean that any task outside the fsopen'ing userns that tries to
> > set the uid/gid of a tmpfs will have to know that the uid/gid will be
> > resolved by a different namespace than that which the current task is
> > in. It also subtly changes the behavior of this specific subsystem in
> > a userland visible way.
> > - Globally disallow fsconfig calls originating from outside the
> > fs_context userns - This is a more robust solution that would prevent
> > any similar bugs, but it may impinge on valid mount use-cases. It's
> > the best from a security standpoint and if it's determined that it was
> > not in the original intention to be juggling user/mount namespaces
> > this way, it's probably the ideal solution.
> > - Throw EINVAL if the kuid specified cannot be mapped in the mounting
> > userns (and/or potentially in the fs_context userns) - This is
> > probably the solution that remains most faithful to all potential
> > use-cases, but it doesn't reduce the potential for variants in the
> > future in other parts of the codebase and it also introduces some
> > slight derivative logic bug risk.
> > - Don't resolve the uid/gid specified in fsconfig at all, and resolve
> > it during mount-time when calling an associated fill_super. This is
> > precedented and used in other parts of the codebase, but specificity
> > is lost in the final error case since an end-user cannot easily
> > attribute a mount failure to an unmappable uid.
> >
> > I've also attached a PoC for this bug that demonstrates that an
> > unprivileged user can create files/directories with root uid/gid's.
> > There is no deadline for this issue as we can't see any obvious way to
> > cross a privilege boundary with this.
> >
> > Thanks in advance!
>
> I'm involved in 2 large projects that make use of the new mount api LXC
> and CRIU. None of them call fsconfig() outside of the target user
> namespace. util-linux mount(2) does not yet use the new mount api and so
> can't be affected either but will in maybe even the next release.
> Additionally, glibc 2.36 is the first glibc with support for the new
> mount api which just released. So all users before that users would have
> to write their own system call wrappers so I think we have some liberty
> here.
>
> I think this is too much of a restriction to require that fsopen() and
> fsconfig() userns must match in order to set options. It is pretty handy
> to be able to set mount options outside of fc->user_ns. And we'd
> definitely want to make use of this in the future.
>
> So ideally, we just switch all filesystems that are mountable in userns
> over to use fc->user_ns. There's not really a big regression risk here
> because it's not used in userns workloads widely today. Taking a close
> look, the affected filesystems are devpts and tmpfs. Having them rely on
> fc->user_ns aligns them with how fuse does it today.
Seth, did you plan on sending a patch for this? I'd like to get this
sorted now that more projects are starting to make use of the new mount
api.