Re: [PATCH v6 09/12] PCI: liveupdate: Inherit ARI Forwarding Enable on preserved bridges
From: David Matlack
Date: Tue Jun 30 2026 - 18:37:24 EST
On Mon, Jun 8, 2026 at 11:19 AM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
> On Mon, Jun 08, 2026 at 11:33:07AM +0000, Pranjal Shrivastava wrote:
> > On Fri, May 22, 2026 at 08:24:07PM +0000, David Matlack wrote:
> > > Inherit the ARI Forwarding Enable on preserved bridges and update
> > > pci_dev->ari_enabled accordingly during a Live Update. This ensures that
> > > the preserved devices on the bridge's secondary bus can be identified
> > > with the same expanded 8-bit function number after a Live Update.
> > >
> > > Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx>
> > > ---
> > > drivers/pci/liveupdate.c | 18 ++++++++++++++++++
> > > drivers/pci/liveupdate.h | 6 ++++++
> > > drivers/pci/pci.c | 8 +++++++-
> > > 3 files changed, 31 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
> > > index a93b7ef065f2..701276ef6cfb 100644
> > > --- a/drivers/pci/liveupdate.c
> > > +++ b/drivers/pci/liveupdate.c
> > > @@ -128,6 +128,10 @@
> > > * way after Live Update and ensures that IOMMU groups do not change. Note
> > > * that a device will use its inherited ACS flags for the lifetime of its
> > > * struct pci_dev (i.e. even after pci_liveupdate_finish()).
> > > + *
> > > + * * The PCI core inherits ARI Forwarding Enable on all bridges with downstream
> > > + * preserved devices to ensure that all preserved devices on the bridge's
> > > + * secondary bus are addressable after the Live Update.
> > > */
> > >
> > > #define pr_fmt(fmt) "PCI: liveupdate: " fmt
> > > @@ -756,6 +760,20 @@ int pci_liveupdate_enable_acs(struct pci_dev *dev)
> > > return 0;
> > > }
> > >
> > > +int pci_liveupdate_configure_ari(struct pci_dev *dev)
> > > +{
> > > + u16 val;
> > > +
> > > + guard(rwsem_read)(&pci_liveupdate.rwsem);
> > > +
> > > + if (!dev->liveupdate.incoming)
> > > + return -EINVAL;
> > > +
> > > + pcie_capability_read_word(dev, PCI_EXP_DEVCTL2, &val);
> >
> > Again, I might be thinking out loud here, but since these are
> > hot-pluggable devices, with some FW / SW running on them, I'm a little
> > worried while assuming the HW registers can be trusted across a kexec.
This is, in-part [*], why I explicitly read the capability rather than
caching what the previous kernel thinks it was in struct pci_dev_ser.
In other words, the kerenl does not assume that hardware registers
remain unchanged across kexec. It reads them after kexec to determine
what state the device is in and inherits that.
[*] The other reason is just to keep the ABI as minimal as possible.
> > Say, if the bridge experiences a reset (e.g. link drop etc) during the
> > kexec blackout, the PCI_EXP_DEVCTL2 register could revert to its default
> > state, meaning the ARI bit will be 0.
>
> This does seem like something to be concerned about, but realistically
> I think if you get a PCIe error I'm not sure the incoming kernel is
> equipped to handle it at all :\
>
> Just resuming the driver is going to fail too, I don't know how VFIO
> can learn and forward the event, and so on..
>
> But maybe it is worth being a little more defensive here
If this were to happen, the kernel would see that ARI is disabled on
the bridge after kexec and proceed with that state (ARI disabled).
Devices (devfn) 8+ on the bus would not be available to use or restore
(the kernel would not discover them during scanning). It should be
fairly obvious to the user that something went wrong (devices they are
trying to restore do not even exist) and they can reboot at that point
to recover.
This should be extremely rare though as it requires a PCIe error that
affects the bridge to race with kexec.
This seems like a reasonable way to handle this scenario, at least in
the initial version. We can always look at more gracefully handling
PCIe errors during kexec for Live Update if-and-when there is value.