Re: [RFC PATCH 1/5] misc: introduce FDBox
From: Christian Brauner
Date: Tue Mar 18 2025 - 10:26:22 EST
On Mon, Mar 17, 2025 at 01:59:05PM -0300, Jason Gunthorpe wrote:
> On Sun, Mar 09, 2025 at 01:03:31PM +0100, Christian Brauner wrote:
>
> > So either that work is done right from the start or that stashing files
> > goes out the window and instead that KHO part is implemented in a way
> > where during a KHO dump relevant userspace is notified that they must
> > now serialize their state into the serialization stash. And no files are
> > actually kept in there at all.
>
> Let's ignore memfd/shmem for a moment..
>
> It is not userspace state that is being serialized, it is *kernel*
> state inside device drivers like VFIO/iommufd/kvm/etc that is being
> serialized to the KHO.
>
> The file descriptor is simply the handle to the kernel state. It is
> not a "file" in any normal filesystem sense, it is just an uAPI handle
> for a char dev that is used with IOCTL.
>
> When KHO is triggered triggered whatever is contained inside the FD is
> serialized into the KHO.
>
> So we need:
> 1) A way to register FDs to be serialized. For instance, not every
> VFIO FD should be retained.
> 2) A way for the kexecing kernel to make callbacks to the char dev
> owner (probably via struct file operations) to perform the
> serialization
> 3) A way for the new kernel to ask the char dev owner to create a new
> struct file out of the serialized data. Probably allowed to happen
> only once, ie you can't clone these things. This is not the same
> as just opening an empty char device, it would also fill the char
> device with whatever data was serialized.
> 4) A way to get the struct file into a process fd number so userspace
> can route it to the right place.
>
> It is not really a stash, it is not keeping files, it is hardwired to
Right now as written it is keeping references to files in these fdboxes
and thus functioning both as a crippled high-privileged fdstore and a
serialization mechanism. Please get rid of the fdstore bits and
implement it in a way that it serializes files without stashing
references to live files that can at arbitrary points in time before the
fdbox is "sealed" be pulled out and installed into the caller's fdtable
again.
> KHO to drive it's serialize/deserialize mechanism around char devs in
> a very limited way.
>
> If you have that then feeding an anonymous memfd/guestmemfd through
> the same machinery is a fairly small and logical step.
>
> Jason