Re: [PATCH 0/4] PCI: Introduce pci_dev_suspend_retention_supported() API

From: Manivannan Sadhasivam

Date: Sat Apr 18 2026 - 01:16:21 EST


On Fri, Apr 17, 2026 at 05:29:04PM -0500, Bjorn Helgaas wrote:
> On Fri, Apr 17, 2026 at 04:34:53PM +0530, Manivannan Sadhasivam wrote:
> > On Thu, Apr 16, 2026 at 02:11:11PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Apr 14, 2026 at 09:29:38PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > > > This series introduces a new PCI API
> > > > pci_dev_suspend_retention_supported() to let the client drivers
> > > > know whether they can expect context retention across
> > > > suspend/resume or not and uses it in the NVMe PCI host driver.
> > > >
> > > > This new API is targeted to abstract the PCI power management
> > > > details away from the client drivers. This is needed because
> > > > client drivers like NVMe make use of APIs such as
> > > > pm_suspend_via_firmware() and decide to keep the device in low
> > > > power mode if this API returns 'false'. But some platforms may
> > > > have other limitations like in the case of Qcom, where if the RC
> > > > driver removes the resource vote to allow the SoC to enter low
> > > > power mode, it cannot reliably exit the L1ss state when the
> > > > endpoint asserts CLKREQ#. So in this case also, the client
> > > > drivers cannot keep the device in low power state during suspend
> > > > and expect context retention.
> > >
> > > I don't know what pm_suspend_via_firmware() means. The kernel-doc
> > > says "platform firmware is going to be invoked at the end of the
> > > system-wide power management transition," but that doesn't say
> > > anything about what firmware might do or what it means to drivers.
> >
> > It's hard to predict what the firmware might do after it gains
> > control from the OS. But as far as the API goes, it just expects the
> > drivers to save the context and reset the device so that the
> > firmware could do anything it want.
>
> I don't see anything about the driver needing to reset the device.
> (Kernel-doc says "driver *may* need to reset it" but no hint about how
> to know.)
>
> Adding something like "device internal state is not preserved" would
> go a long ways here.
>

IIUC, 'may' is used in the description because not all firmware are going to
turn off or do something with the device. But for a driver that is supposed to
work with all firmware implementations, like a NIC/storage client driver, it
should save the internal state and prepare for a possible power loss. This is
what the NVMe driver does currently.

> > > Based on d916b1be94b6 ("nvme-pci: use host managed power state for
> > > suspend"), which used it in nvme_suspend(), I guess the assumption
> > > is that pm_suspend_via_firmware() means the device might be put in
> > > D3cold and lose all its internal state, and conversely,
> > > !pm_suspend_via_firmware() means the device will *never* be put in
> > > a low-power state that loses internal state.
> >
> > Yes, that's the assumption. Though, the firmware might not do D3Cold
> > at all, but the drivers should be prepared for that to be compatible
> > with all firmware implementations.
>
> I don't think it's useful for a driver to know "firmware might not do
> D3cold". What could a driver do with that? Unless the driver *knows*
> internal state will be preserved, it must act as though the state is
> lost.

A driver doesn't need to know whether device will be put into D3Cold or not. But
it does need to know whether there is a possibility or not. Because, AFAIK,
there is no way the OS can query what the firmware is going to do at the end of
the suspend. So to be on the conservative side, this API gives an indication to
the client drivers saying 'hey, firmware is going to be invoked at the end of
suspend and it may do something with the device state like invoking D3Cold or
doing something else. So be prepared for that.'

And 'be prepared' means, saving the context and resetting the device.

@Rafael: Please correct me if my above understanding is wrong.

- Mani

--
மணிவண்ணன் சதாசிவம்