Re: [PATCH v9 3/9] PCI: Avoid saving config space state in reset path

From: Alex Williamson

Date: Fri Feb 20 2026 - 15:53:21 EST


On Thu, 19 Feb 2026 10:06:05 -0800
Farhan Ali <alifm@xxxxxxxxxxxxx> wrote:

> On 2/18/2026 4:20 PM, Bjorn Helgaas wrote:
> > On Wed, Feb 18, 2026 at 01:48:57PM -0800, Farhan Ali wrote:
> >> On 2/18/2026 11:35 AM, Bjorn Helgaas wrote:
> >>> On Wed, Feb 18, 2026 at 12:02:01PM -0700, Keith Busch wrote:
> >>>> On Tue, Feb 17, 2026 at 11:55:43AM -0800, Farhan Ali wrote:
> >>>>> Yes I think you are right, with this change the PCI Command
> >>>>> register gets restored to state at enumeration. So we will
> >>>>> lose the updated state after pci_clear_master() and
> >>>>> pci_enable_device(). I think we can update the vfio driver to
> >>>>> call pci_save_state() after pci_enable_device()?
> >>>> Either that, or move the pci_enable_device() call to after the
> >>>> function reset.
> >>> I kind of like the latter idea because it seems a little simpler
> >>> for the rule of thumb to be that a reset done by the PCI core
> >>> returns the device to the same state as when the driver first
> >>> probed the device. Drivers would generally not use
> >>> pci_save_state() at all, and they could share some initialization
> >>> logic between probe and post-reset recovery.
> >> I think the vfio-pci driver was intentionally doing the
> >> pci_enable_device() before doing the reset. As per commit
> >> 9a92c5091a42 ("vfio-pci: Enable device before attempting reset") it
> >> was done to handle devices using PM reset, that were getting
> >> incorrectly identified not supporting PM reset due to current state
> >> of the device not being D0. It looks like pci_pm_reset() still
> >> returns -EINVAL if current power state is not D0. So I think we
> >> can't move pci_enable_device() after reset. Unless we want to update
> >> pci_pm_reset() to not use cached value of current_state and read it
> >> directly from register?
> > Devices are generally disabled at .probe() time, so that will be the
> > default saved state. But every driver will expect the device to be
> > enabled after the reset. Skipping the save state at reset time seems
> > like it would need a lot of work first and maybe it wouldn't ever be
> > practical. It wasn't really thought out; I was just hoping we could
> > simplify the save-state model and maybe unify driver reset and error
> > recovery paths. I think we need to drop this patch at least for now.
>
> Yeah, I agree this patch might be too disruptive for drivers. In that
> case would my previous version [1] to at least prevent saving state in
> case of an error be acceptable? Or is there another approach we should
> consider?
>
> [1] https://lore.kernel.org/all/20260122194437.1903-4-alifm@xxxxxxxxxxxxx/
>
> >
> > 9a92c5091a42 ("vfio-pci: Enable device before attempting reset") was
> > mostly done to make pci_pm_reset() work, which requires the device to
> > be in D0. The main purpose of pci_enable_device() is to make device
> > BARs accessible; it *does* also put the device in D0 because BARs are
> > only accessible in D0, but pci_pm_reset() itself doesn't need the
> > BARs.
> >
> > Other reset methods, e.g., FLR, don't seem to require the device to be
> > in D0, so I'm not sure why pci_pm_reset() requires that. I think the
> > critical piece is the D3->D0 transition, and maybe we could arrange
> > for that to happen even if the device is already in D1/D2/D3hot or
> > even D3cold.
>
> Looking at the PCI spec (v6.1) I didn't see any requirement for the
> device to be in D0 state to perform a power state change. So I think we
> should be able to transition from D1/D2/D3hot to D0. But IIUC if a
> device is in D3cold, then won't register reads/writes fail till power is
> available to the device?

Yes, config space could be inaccessible in D3cold. IIRC, 9a92c5091a42
was specifically addressing that devices are typically provided to the
driver in the PCI_UNKNOWN state and at the time vfio-pci wasn't
changing that in the .probe function, like most drivers would, so we
needed to adjust the ordering of enabling the device versus calling
reset function.

Now that we've gained PM management in vfio-pci, that's no longer an
issue, but pci_pm_reset() does still require the device to arrive in
D0. Accepting devices arriving in D3cold or D3hot (with NoSoftReset-)
might avoid a power state bounce in some circumstances, but would not
have solved the original 9a92c5091a42 scenario where the device was in
PCI_UNKNOWN power state.

Sorry I missed my opportunity to reply to the suggestion for this
approach in the previous revision. I'm not sure if anything
specifically breaks with this approach to restore the initial device
state, but it's certainly not the contract I currently expect as a
user of the reset-function interfaces. I think that contract is
"reset the internal state of the device while saving and restoring
current config space". If we stray from that, what's the expectation
for things like resizable BARs? I don't think we want to reprovision
resources as a result of reset.

Here we seem to be worried about a specific, testable scenario where
config space might be inaccessible after error and applying the
workaround to that regardless whether that specific scenario is preset.
I don't see that a "test if config space is accessible and stuff the
original save state into the buffer rather than creating an invalid
save state" should be so complex as to require this simplification and
associated risk. Thanks,

Alex