Re: [RFC PATCH 13/15] iommufd: Persist iommu domains for live update
From: Samiullah Khawaja
Date: Thu Oct 02 2025 - 14:08:38 EST
On Thu, Oct 2, 2025 at 10:37 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Thu, Oct 02, 2025 at 10:03:05AM -0700, Samiullah Khawaja wrote:
> > > I think the simplest thing is the domain exists forever until
> > > userspace attaches an iommufd, takes ownership of it and frees it.
> > > Nothing to do with finish.
> >
> > Hmm.. I think this is tricky. There needs to be a way to clean up and
> > discard the old state if the userspace doesn't need it.
>
> Why?
>
> Isn't "userspace doesn't need it" some extermely weird unused corner
> case?
>
> This should not be automatic or divorced from userspace, if the
> operator would like to switch something out of LUO then they should
> have userspace that co-ordinates this. Receive the iommufd, close it,
> install a normal kernel driver.
>
> Why make special code in the kernel to sequence this automatically?
>
> > session manager (VMM or LUOD) decides that the finish needs to happen
> > and the iommufd (or the underlying HWPTs) are not restored, it means
> > that LUOD has decided that the VM is not going to come up and the
> > preserved state and resources (domain, device, memory) need to be
> > freed/released.
>
> I've been assuming if luo fails so catastrophically the whole node
> would reboot to recover.
>
> Is there really a case where you might say a kexec happens and a
> single VM out of many doesn't survive? Seems weird..
>
> So to repeat above, if this is something people want then the
> userspace should complete luo restoring the failed vm and then turn
> around and free up all the resources. Why should the kernel
> automatically do the same operations?
>
> Maybe userspace needs some contingency flow where there is a dedicated
> reaper program for a luo session. The VMM crashes during restore, OK,
> we pass the luo FD to a reaper and it cleans up the objects in the
> session and closes it.
These are all great points. I agree, it makes sense. It keeps the
FINISH lightweight and makes the domain ownership model very clean. I
will further discuss the memfd dependency scenario in the other
thread.
>
> > > Maybe the HWPT has to be auto-created inside the iommufd as soon as it
> > > is attached. The "restore" ioctl would just return back the ID of this
> > > already created HWPT.
> >
> > Once we return the ID, do we make this HWPT mutable? Or is this
> > re-created HWPT just a handle to keep the domain ownership?
>
> That's a bigger question..
>
> For starting I was imagining that the restored iommu_domain was
> immutable, eg it does not have map and unmap operations. It never
> becomes mutable.
>
> As I outlined this special luo immutable domain is then attached
> during early boot, which sould be a NOP, and gets turned into a HWPT
> during iommufd restoration. The only thing userspace should be able to
> do with that HWPT handle is destroy it after replacing it.
Okay, this is great. An immutable HWPT associated with the restored
iommu_domain confirms my intuition that this is just a handle to the
underlying domain. The user can destroy it when it is replaced, or
when iommufd is closed without HWPT replacement.
>
> Jason