Re: [PATCH 00/21] vfio/pci: Base support to preserve a VFIO device file across Live Update

From: David Matlack
Date: Tue Dec 02 2025 - 16:30:19 EST


On Tue, Dec 2, 2025 at 6:10 AM Pratyush Yadav <pratyush@xxxxxxxxxx> wrote:
>
> On Mon, Dec 01 2025, Pasha Tatashin wrote:
>
> > On Wed, Nov 26, 2025 at 2:36 PM David Matlack <dmatlack@xxxxxxxxxx> wrote:
> [...]
> >> FLB Locking
> >>
> >> I don't see a way to properly synchronize pci_flb_finish() with
> >> pci_liveupdate_incoming_is_preserved() since the incoming FLB mutex is
> >> dropped by liveupdate_flb_get_incoming() when it returns the pointer
> >> to the object, and taking pci_flb_incoming_lock in pci_flb_finish()
> >> could result in a deadlock due to reversing the lock ordering.
>
> My mental model for FLB is that it is a dependency for files, so it
> should always be created (aka prepare) before _any_ of the files, and
> always destroyed (aka finish) after _all_ of the files.
>
> By the time the FLB is being finished, all the files for that FLB should
> also be finished, so there should no longer be a user of the FLB.
>
> Once all of the files are finished, it should be LUO's responsibility to
> make sure liveupdate_flb_get_incoming() returns an error _before_ it
> starts doing the FLB finish. And in FLB finish you should not be needing
> to take any locks.
>
> Why do you want to use the FLB when it is being finished?

The next patch looks at the PCI FLB anytime a device is probed, which
could could race with the last device file getting finished causing
the FLB to be freed.

However, it looks like I am going to drop that patch. But the PCI FLB
is still used in PATCH 08 [1] whenever userspace opens a VFIO cdev or
issues the VFIO_GROUP_GET_DEVICE_FD ioctl to check of the underlying
PCI device was preserved. Offline Jason suggested decoupling those
checks from the FLB, so I'll look into doing that in the next version.

[1]https://lore.kernel.org/kvm/20251126193608.2678510-9-dmatlack@xxxxxxxxxx/

>
> >
> > I will re-introduce _lock/_unlock API to solve this issue.
> >
> >>
> >> FLB Retrieving
> >>
> >> The first patch of this series includes a fix to prevent an FLB from
> >> being retrieved again it is finished. I am wondering if this is the
> >> right approach or if subsystems are expected to stop calling
> >> liveupdate_flb_get_incoming() after an FLB is finished.
>
> IMO once the FLB is finished, LUO should make sure it cannot be
> retrieved, mainly so subsystem code is simpler and less bug-prone.

+1, and I think Pasha is going to do that in the next version of FLB.