Re: [RFC PATCH 1/5] misc: introduce FDBox
From: Jason Gunthorpe
Date: Mon Mar 31 2025 - 11:56:21 EST
On Wed, Mar 26, 2025 at 10:40:29PM +0000, Pratyush Yadav wrote:
> Ideally, kho_preserve_folio() should be similar to freeing the folio,
> except that it doesn't go to buddy for re-allocation. In that case,
> re-using those pages should not be a problem as long as the driver made
> sure the page was properly "freed", and there are no stale references to
> it. They should be doing that anyway since they should make sure the
> file doesn't change after it has been serialized.
I don't know if this is a good idea, it seems to make error recovery
much more complex.
> > Then you have the issue that I don't actually imagine shutting down
> > something like iommufd, I was intending to leave it frozen in place
> > with all its allocations and so on. If you try to de-serialize you
> > can't de-serialize into the thing that is frozen, you'd create a new
> > one from empty. Now you have two things pointing at the same stuff,
> > what a mess.
>
> What do you mean by "frozen in place"? Isn't that the same as being
> serialized?
I mean all the memory and internal state is still there, it is just
not changing. It is not the same as being serialized, as the
de-serialized versions of everything would still exist in parallel.
> Considering that we want to make sure a file is not opened by any
> process before we serialize it, what do we get by keeping the struct
> file around (assuming we can safely deserialize it without going
> through kexec)?
We do alot less work.
Having serialize reliably but the entire system into a fully
post-live-update state, including dependent things like the
iommufd/vfio attachment and iommu driver, is very hard. This stuff is
quite complex.
I imagine instead we have three data states
- Fully operating
- Frozen and all preserved memory logged in KHO
- post-live-update where there are hints scattered around the drivers
about what is in the KHO
>From an error prespective going from frozen back to fully operating
should just be throwing away the KHO record and allowing use of the FD
again. That is super simply and makes error recovery during
micro-steps of the KHO simple and safe.
If you imagine that KHO is destructive then every failure point needs
to unwind the partial destruction which is a total nightmare to code :\
> Main idea is for logical grouping and dependency management. If some FDs
> have a dependency between them, grouping them in different boxes makes
> it easy to let userspace choose the order of operations, but still have
> a way to make sure all dependencies are met when the FDs are serialized.
> Similarly, on the deserialize side, this ensures that all dependent FDs
> are deserialized together.
That seems over complicated to me. Userspace should write the FDs in
the required order and that should be a topological sort of the
required dependencies. kernel should just validate this was done.
Jason