Re: [RFC] Null Namespaces

From: David Laight

Date: Fri Jun 26 2026 - 04:36:36 EST

On Thu, 25 Jun 2026 16:09:58 -0700
Andy Lutomirski <luto@xxxxxxxxxx> wrote:

> On Thu, Jun 25, 2026 at 2:53 PM John Ericson <mail@xxxxxxxxxxxxxx> wrote:
> >
> > On Thu, Jun 25, 2026, at 5:00 PM, H. Peter Anvin wrote:
> > > On 2026-06-24 16:12, Al Viro wrote:
> > > > On Wed, Jun 24, 2026 at 06:51:47PM -0400, John Ericson wrote:
> > > >
> > > >> #### Null mount namespace
> > > >>
> > > >> - requires:
> > > >>
> > > >> - null root file system: absolute paths don't work.
> > > >>
> > > >> - null current working directory: relative paths with traditional,
> > > >> non-`*at` system calls (and `*at` ones using `AT_FDCWD`) don't work.
> > > >>
> > > >> - All operations relating to the "ambient" mount tree don't work.
> > > >>
> > > >> - `*at` operations with a file descriptor do work.
> > > >
> > > > Huh? The last bit looks contradicts the previous one - if you have
> > > > an opened directory in a mount from some namespace, those `*at` operations
> > > > with that descriptor *will* be seeing the mount tree of that namespace,
> > > > whatever the hell is "ambient" supposed to mean. Either that, or you
> > > > will be exposing whatever's overmounted in that mount, which is a huge
> > > > can of worms.
> > >
> > > It seems to me that this is really no different *in practice* to having an
> > > empty mount namespace, no? You might still be able to stat("/") and get a
> > > d--------- result, but how does that actually affect anything?
> >
> > The argument against just having an empty, immutable root directory and
> > calling it a day is the tie-in with a new process-spawning API discussed
> > near the bottom of my original email. I want to have nice secure
> > defaults, rather than forcing the programmer to remember to unshare, but
> > I also don't want to degrade performance by speculatively creating new
> > empty mount namespaces that might just be thrown away. Null fields alone
> > get us both --- security and good performance.
>
> This seems like a false dichotomy. There's such thing as a singleton.
>
> In fact, we have this spiffy nullfs_fs_get_tree. It seems relatively
> straightforward to have an API to get an fd to the singleton nullfs,
> and the default for a newly spawned process could even be to have cwd
> pointing at nullfs.
>
> root is still harder, because of the shadowing issue. I think I
> proposed, ages ago, relaxing the chroot rules so that, at least under
> certain circumstances (e.g. the task is not already chrooted) an
> unprivileged task could chroot. chrooting to nullfs seems like a
> somewhat useful operation.
>
> I can imagine more complex schemes to allow even a chrooted process to
> safely start acting as though their root is nullfs, but that would be
> potentially fairly nasty. *Maybe* everything would work if there was
> a root-for-dotdot and a separate root-for-absolute-paths, and
> nameidata->root could point to the former, but I'm certainly not
> willing to say that I think this would work with any confidence at
> all.

You'd also need to sort out the 'pwd' mess.
The kernel inode always has its real parent, inside a chroot the scan stops
when the inode is the same as that of the base of the chroot.
But faf about with namespaces (IIRC I was doing an unshare to get out of
a network namespace) and that comparison can fail (if the chroot base isn't
a mount point) - so "../.." can go all the way back to the real root rather
than stopping at the base of the chroot (as you would expect).

David

>
> --Andy
>