Re: [PATCH 0/2] mount: add OPEN_TREE_NAMESPACE
From: Rob Landley
Date: Wed Jan 21 2026 - 16:25:25 EST
You want rootfs to be a NULLFS instead of ramfs. You don't seem to want it to
actually _be_ a filesystem. Even with your "fix", containers could communicate
with each _other_ through it if it becomes accessible. If a container can get
access to an empty initramfs and write into it, it can ask/answer the question
"Are there any other containers on this machine running stux24" and then coordinate.
Or you could just make the ROOT= codepath remount the empty initramfs -o ro like some switch_root implementations do. If the PID 1 you launch isn't in initramfs, don't leave initramfs writeable. That seems unlikely to break userspace.
(Having permissions to remount initramfs but _not_ having already "cracked root" seems... a bit funky? You have waaaaay more faith in security modules than I do...)
I think this new OPEN_TREE_NAMESPACE is nifty, but I don't think the
path that gives it sensible behavior should be conditional like this.
Either make it *always* mount on top of nullfs (regardless of boot
options) or find some way to have it actually be the root. I assume
the latter is challenging for some reason.
I think that's the plan. I suggested the same to Christian last week,
and he was amenable to removing the option and just always doing a
nullfs_rootfs mount.
Since 2013, initramfs might be ramfs or tmpfs depending on circumstances. Adding a third option for it be nullfs when there's no cpio.gz extracted into it seems reasonable. (You can always mount a tmpfs _over_ it if you need that later, it's writeable so a PID 1 launched in it has workspace.)
That said, if you are changing the semantics, right now we switch_root from initramfs instead of pivot_root because initramfs couldn't be unmounted. With this change would pivot_root become the mechanism for initramfs too? (If the cpio.gz recipient wasn't actually rootfs but was an overmount the way ROOT= does it.)
Aside: it would be nice if inaccessible mount points could automatically be garbage collected. There's already some "lazy umount" plumbing that does that when explicitly requested to, but last I checked there were cases that didn't get caught. It's been a while though, might already have been fixed. Presumably initramfs would always get pinned because it's PID 0's / reference...
Also, could you guys make CONFIG_DEVTMPFS_MOUNT work with initramfs? I've posted patches for that on and off since 2017, most recent one's probably https://landley.net/bin/mkroot/0.8.13/linux-patches/0003-Wire-up-CONFIG_DEVTMPFS_MOUNT-to-initramfs.patch (tested on a 6.17 kernel).
We think that older runtimes should still "just work" with this scheme.
Out of an abundance of caution, we _might_ want a command-line option
to make it go back to old way, in case we find some userland stuff that
doesn't like this for some reason, but hopefully we won't even need
that.
I assume it will break stuff, but I also assume the systems it breaks will never upgrade to a 7.x kernel because the kernel itself would consume all available memory before launching PID 1.
Rob