Re: [RFC PATCH 1/5] misc: introduce FDBox

From: Christian Brauner
Date: Sun Mar 09 2025 - 08:03:50 EST


On Sat, Mar 08, 2025 at 12:10:12AM +0000, Pratyush Yadav wrote:
> Hi Christian,
>
> Thanks for the review!

No worries, I'm not trying to be polemic. It's just that this whole
proposed concept is pretty lightweight in terms of thinking about
possible implications.

> > This use-case is covered with systemd's fdstore and it's available to
> > unprivileged userspace. Stashing arbitrary file descriptors in the
> > kernel in this way isn't a good idea.
>
> For one, it can't be arbitrary FDs, but only explicitly enabled ones.
> Beyond that, while not intended, there is no way to stop userspace from
> using it as a stash. Stashing FDs is a needed operation for this to
> work, and there is no way to guarantee in advance that userspace will
> actually use it for KHO, and not just stash it to grab back later.

As written it can't ever function as a generic file descriptor store.

It only allows fully privileged processes to stash file descriptors.
Which makes it useless for generic userspace. A generic fdstore should
have a model that makes it usable unprivileged it probably should also
be multi-instance and work easily with namespaces. This doesn't and
hitching it on devtmpfs and character devices is guaranteed to not work
well with such use-cases.

It also has big time security issues and implications. Any file you
stash in there will have the credentials of the opener attached to it.
So if someone stashes anything in there you need permission mechanisms
that ensures that Joe Random can't via FDBOX_GET_FD pull out a file for
e.g., someone else's cgroup and happily migrate processses under the
openers credentials or mess around some random executing binary.

So you need a model of who is allowed to pull out what file descriptors
from a file descriptor stash. What are the semantics for that? What's
the security model for that? What are possible corner cases?

For systemd's userspace fstore that's covered by policy it can implement
quite easily what fds it accepts. For the kernel it's a lot more
complicated.

If someone puts in file descriptors for a bunch of files in there opened
in different mount namespaces then this will pin said mount namespaces.
If the last process in the mount namespace exists the mount namespace
would be cleaned up but not anymore. The mount namespace would stay
pinned. Not wrong, but needs to be spelled out what the implications of
this are.

What if someone puts a file descriptor from devtmpfs or for /dev/fdbox
into an fdbox? Even if that's blocked, what happens if someone creates a
detached bind-mount of a /dev/fdbox mount and mounts it into a different
mount namespace and then puts a file descriptor for that mount namespace
into the fdbox? Tons of other scenarios come to mind. Ignoring when
networking is brought into the mix as well.

It's not done by just letting the kernel stash some files and getting
them out later somehow and then see whether it's somehow useful in the
future for other stuff. A generic globally usable fdstore is not
happening without a clear and detailed analysis what the semantics are
going to be.

So either that work is done right from the start or that stashing files
goes out the window and instead that KHO part is implemented in a way
where during a KHO dump relevant userspace is notified that they must
now serialize their state into the serialization stash. And no files are
actually kept in there at all.