Re: [PATCH v6 03/12] PCI: liveupdate: Track incoming preserved PCI devices
From: David Matlack
Date: Mon Jun 29 2026 - 14:49:35 EST
On Thu, Jun 25, 2026 at 7:35 AM Pratyush Yadav <pratyush@xxxxxxxxxx> wrote:
>
> Hi David,
>
> On Mon, Jun 15 2026, David Matlack wrote:
>
> > On 2026-06-14 01:38 PM, Pasha Tatashin wrote:
> >> On Fri, 22 May 2026 20:24:01 +0000, David Matlack <dmatlack@xxxxxxxxxx> wrote:
> [...]
> >> > + }
> >> > +
> >> > + pci_info(dev, "Device was preserved by previous kernel across Live Update\n");
> >> > + dev->liveupdate.incoming = dev_ser;
> >> > +
> >> > + /*
> >> > + * Hold the ref on the incoming FLB until pci_liveupdate_finish() so
> >> > + * that dev->liveupdate.incoming does not get freed while it is in use.
> >> > + */
> >>
> >> How would that work? If finish is not called FLB stays around until the
> >> next reboot.
> >
> > True... I think if the PCI core trusts drivers to call
> > pci_liveupdate_finish() then we don't need to hold onto the incoming
> > reference here.
>
> That was my point when I was arguing against refcounts on outgoing FLBs.
> This is very easy to abuse, especially when we are talking about device
> drivers. And this refcounting mechanism makes the FLB no longer
> file-lifecycle-bound, since now it is entirely up to drivers to decide
> the lifecycle of this data.
The PCI core holds a reference to the incoming FLB for as long as it
maintains a pointer to that FLB in struct pci_dev
(dev->liveupdate.incoming). The lifetime of that pointer is aligned
with the lifetime of the file as long as the driver calls
pci_liveupdate_finish() in its file finish() callback.
If there is a bug in the driver that causes it to not call
pci_liveupdate_finish() then the FLB will leak past the file yes. But
the alternative would be to leak a pointer to freed memory in
dev->liveupdate.incoming, which could lead to UAF.
Leaking the FLB seems safer than UAF, which is why I went for the
refcounting approach.
Another approach entirely would be to drop the
dev->liveupdate.incoming and do the xarray lookup everytime instead.
>
> I have been thinking about this a bit more in the last couple days, and
> I wonder if we are doing this right. Here's an idea I have been thinking
> of.
>
> We should make live update a first class citizen in PCI. Instead of
> patching in liveupdate via the liveupdate.incoming field, and letting
> drivers figure out when to use it, we should separate out probe and
> retrieve paths entirely.
>
> Probe and retrieve are fundamentally different operations. While they
> may share some common initialization logic for the _software_ state, how
> they interface with the hardware is completely different. I think mixing
> the two will result in driver code being more spaghetti by having
> liveupdate checks sprayed out all over.
We are only planning on supporting Live Update for VFIO drivers for
the forseeable future. The VFIO work during probe is almost entirely
software state setup. The only hardware logic we need to "if" out in
the vfio-pci driver's probe() is putting the device into a low-power
mode via the runtime power manager. So I don't think we will get any
benefit from this approach, and it would be a lot more intrusive to
both the PCI core driver framework, and VFIO itself, to support this.
> This series doesn't add support for any drivers, but looking at some of
> the code we have downstream, I see this problem. The liveupdate code is
> all over the place in the driver and it is very hard to wrap one's head
> around how the device is actually retrieved.
You can find the vfio-pci driver changes here:
https://lore.kernel.org/kvm/20260511234802.2280368-1-vipinsh@xxxxxxxxxx/
Let's keep the discussion focused on upstream VFIO drivers since that
is all we are planning to support right now due to LUO's requirement
of file-based preservation. The downstream driver changes we are
carrying is not reflective of what we want to support upstream.
> So I think PCI core should track preserved devices, and if the device is
> preserved, it should skip the probe and wait for retrieve. Retrieve does
> the full initialization of the device. This fits in with the LUO model
> as well. You can make retrieve a callback of struct pci_driver and do
> some wrappers to talk with LUO, so device drivers don't directly
> interface with LUO at all.
>
> We should do similar things on the shutdown path. Shutdown is a
> fundamentally different operation from freeze, and so we should separate
> them out as well.
This is speculative. In practice, we haven't needed to change VFIO's
shutdown() or probe() functions so far. The only change I anticipate
needing is skipping runtime power management "put" during probe() I
mentioned above.
If we actually made retrieve() a first-class callback and used that
instead of probe(), VFIO would internally just call its probe()
function because that would be the cleanest way to set up all the
software state it needs to manage the device.
> This solves the lifetime problem as well. When PCI core is initializing,
> it knows for sure that no retrievals are going to happen. That's because
> none of the drivers have registered yet. So it can safely access the FLB
> and initialize its state. After that, drivers can register themselves
> and start accepting retrieve() calls. Once the last driver goes away,
> the FLB is freed automatically.
It's not so simple. The PCI core does not really initialize itself.
Scanning devices gets triggered externally, e.g. by ACPI, device
trees, runtime hotplug events, etc., and that is when the PCI core
gets notified about a device. None of this is synchronized with "when
drivers have registered", which I assume you are referring to
registering with LUO.
>
> I am sorry for suggesting a big refactor at v6, but the early versions
> looked good to me at the time, and I only thought more deeply about this
> when trying to figure out how we can make the lifetimes cleaner.
>
> What do you think? Does this make sense?
>
> --
> Regards,
> Pratyush Yadav