Re: [PATCH v6 03/12] PCI: liveupdate: Track incoming preserved PCI devices

From: Samiullah Khawaja

Date: Tue Jun 16 2026 - 18:38:17 EST


On Tue, Jun 16, 2026 at 03:20:33PM -0700, David Matlack wrote:
On Tue, Jun 16, 2026 at 1:09 PM Samiullah Khawaja <skhawaja@xxxxxxxxxx> wrote:


[snip]


Hmm.. This is interesting, so the KHO state is freed and it cannot be
reused. I see you already pointed out that we are putting an LUO policy
to say that the retry is not allowed.

But what should be the behaviour of liveupdate in this regard? Let the
system boot in a normal way? This might break other subsystems as they
might depend on PCIe restoring state properly. Also I think some of the
PCIe state, like device-id, BAR addresses, ACLs etc, might be used as
source of truth by other components.

For example, lets say FLB retrieve() of PCIe fails, but succeeds for
VFIO/IOMMU, now VFIO/IOMMU are restoring state of a device that is not
restored/preserved?

Should this be considered fatal?

If PCI FLB retrieve fails then there are certain things that cannot be
guaranteed, such as BDF (B specifically) remaining constant. This
could lead to memory corruption as the IOMMU may have live
translations in place for those specific RequesterIDs. And, in the
future, preserved devices may be doing P2P which depends on BARs not
moving. If the PCI core cannot retrieve the FLB saved by the previous
kernel, it cannot make these guarantees.

Yes, this is what I was worried about.

So yeah I think you're right that PCI core should treat FLB retrieve
as fatal and just panic.

This sounds great.

> }
>
> static void pci_flb_finish(struct liveupdate_flb_op_args *args)
> {
>- kho_restore_free(args->obj);
>+ struct pci_flb_incoming *incoming = args->obj;
>+
>+ xa_destroy(&incoming->xa);
>+ kho_restore_free(incoming->ser);
>+ kfree(incoming);
> }
>
> static struct liveupdate_flb_ops pci_liveupdate_flb_ops = {
>@@ -270,6 +335,91 @@ void pci_liveupdate_unpreserve(struct pci_dev *dev)
> }
> EXPORT_SYMBOL_GPL(pci_liveupdate_unpreserve);
>
>+static struct pci_flb_incoming *pci_liveupdate_flb_get_incoming(void)
>+{
>+ struct pci_flb_incoming *incoming = NULL;
>+ int ret;
>+
>+ ret = liveupdate_flb_get_incoming(&pci_liveupdate_flb, (void **)&incoming);
>+
>+ /* Live Update is not enabled. */
>+ if (ret == -EOPNOTSUPP)
>+ return NULL;
>+
>+ /* Live Update is enabled, but there is no incoming FLB data. */
>+ if (ret == -ENODATA)
>+ return NULL;
>+
>+ /*
>+ * Live Update is enabled and there is incoming FLB data, but none of it
>+ * matches pci_liveupdate_flb.compatible.
>+ *
>+ * This could mean that no PCI FLB data was passed by the previous
>+ * kernel, but it could also mean the previous kernel used a different
>+ * compatibility string (i.e. a different ABI).
>+ */
>+ if (ret == -ENOENT) {
>+ pr_info_once("No incoming FLB matched %s\n", pci_liveupdate_flb.compatible);
>+ return NULL;
>+ }
>+
>+ /*
>+ * There is incoming FLB data that matches pci_liveupdate_flb.compatible
>+ * but it cannot be retrieved.
>+ */
>+ if (ret) {
>+ WARN_ONCE(ret, "Failed to retrieve incoming FLB data\n");

I think this should probably be considered fatal as mentioned above or
the caller of this function should get an error so it can fail. I think
retrievel of preserved state should generally not fail unless there is
memory corruption or ABI is incompatible.

Yeah. I think I will just call panic() here to cover all cases.

We have an luo specific panic macro/function that you can use.

luo_restore_fail()

>+ return NULL;
>+ }
>+
>+ return incoming;
>+}
>+

[snip]
>+
>+static inline bool pci_liveupdate_is_incoming(struct pci_dev *dev)
>+{
>+ return false;
>+}
> #endif
>
> #endif /* LINUX_PCI_LIVEUPDATE_H */
>--
>2.54.0.746.g67dd491aae-goog
>

Sami