Re: [RFC] Null Namespaces

From: Andy Lutomirski

Date: Thu Jun 25 2026 - 11:52:46 EST


On Wed, Jun 24, 2026 at 8:41 PM John Ericson <mail@xxxxxxxxxxxxxx> wrote:
>
> Ah, I started replying to your first email, but this is better, this
> gets to the heart of the matter. Please don't mind me responding to your
> two questions in reverse.
>
> On Wed, Jun 24, 2026, at 9:10 PM, Al Viro wrote:
> > What's the fundamental difference between CWD and any open descriptor
> > for a directory? Why does it make sense to ban the former, but allow
> > the equivalents done via the latter?
>
> Yes! These two notions are very close --- but that's the *problem*, not
> a reason to not care about the existence of the CWD and root FS. I want
> to get rid of CWD in my processes not because it is fundamentally
> different (it isn't), but because it is superfluous.
>
> If one is capability-minded like me, it's a bad mistake that we ever had
> this "working directory" notion to begin with, and yet another example
> of the folks at Bell Labs sticking something in the kernel that was
> really only needed by the shell, and that could have just been done in
> userland.
>
> The current working directory, roughly, is *just* some global state
> holding a directory file descriptor. But I don't want that global state.
> If I am writing my userland program (that is not a shell), I would not
> create the global variable. I do not appreciate the fact that the kernel
> foists that state upon me whether I like it or not.
>
> Now obviously we cannot have a giant breaking change removing the notion
> of a current working directory altogether. But we can allow individual
> processes which don't want it to opt out, and that is what nulling out
> these fields (and updating the path resolution code to cope with that)
> allows.
>
> There is no loss of expressive power doing this, because one can (and
> should!) just use the `*at` and file descriptors. But there is, however,
> the imposition of discipline. The programmer (or coding agent) is
> encouraged to do everything with file descriptors rather than path
> concatenations etc., because they need to use `*at` anyways, and then
> voilà, without browbeating anyone in security seminars or code review, a
> bunch of TOCTOU issues disappear simply because doing the right thing is
> now the path of least resistance.
>
> > Please, start with explaining what, in your opinion, a mount namespace
> > _is_, and where does "mount X is attached at path P relative to mount
> > Y" belong.
>
> Let's take a pathological example:
>
> - Process A has `/foo` bind-mounted at `/bar/foo`
>
> - Process B has `/bar` without that bind mount, and `/foo` mounted at
> `/baz/foo`, as is possible because it is in a different mount
> namespace.
>
> If A opens `/bar/foo`, and sends it over (via socket) to B, and then B
> does `openat(recv_fd, "..")`, B will get `/bar`, not `/baz`. This is
> because `..` is resolved according to the mount referenced in the open
> file. (This is, by the way, very good! Directory file descriptors would
> be perilous to use if this were not the case!)
>
> The moral of the story is that "mount X is attached at path P relative
> to mount Y" is information accessed in the mounts themselves (maybe via
> their containing mount namespace, per the `mnt_ns` field, or maybe not,
> I am not sure, but it is immaterial). In contrast, the mount namespace
> of the *opening* task (`current->nsproxy->mnt_ns`, and current is B)
> doesn't matter at all for this purpose.

It's sort of a combination -- read the data structures :) Other than
the propagation part, they're really not that bad.

In any event, I think this discussion is sort of immaterial to the
proposed API change. No one is about to remove the concept of a mount
namespace. But maybe it makes sense to have a way to have a task that
doesn't actually belong to a mount namespace. A mount namespace is
certainly going to exist.

There will definitely be subtleties. For example, what happens if a
task with "no mount namespace" tries to do OPEN_TREE_CLONE? In some
logical sense it ought to work but it ought to be impossible to
actually mount the resulting tree anywhere, but this risks running
afoul of all kinds of checks. Maybe you get a whole new mount
namespace (that does not become your current mnt_ns) if you
OPEN_TREE_CLONE?

This stuff is complex and it probably makes more sense to keep changes simple.