Re: [PATCH 0/4] PCI: Introduce pci_dev_suspend_retention_supported() API
From: Bjorn Helgaas
Date: Fri Apr 17 2026 - 18:31:40 EST
On Fri, Apr 17, 2026 at 04:34:53PM +0530, Manivannan Sadhasivam wrote:
> On Thu, Apr 16, 2026 at 02:11:11PM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 14, 2026 at 09:29:38PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > > This series introduces a new PCI API
> > > pci_dev_suspend_retention_supported() to let the client drivers
> > > know whether they can expect context retention across
> > > suspend/resume or not and uses it in the NVMe PCI host driver.
> > >
> > > This new API is targeted to abstract the PCI power management
> > > details away from the client drivers. This is needed because
> > > client drivers like NVMe make use of APIs such as
> > > pm_suspend_via_firmware() and decide to keep the device in low
> > > power mode if this API returns 'false'. But some platforms may
> > > have other limitations like in the case of Qcom, where if the RC
> > > driver removes the resource vote to allow the SoC to enter low
> > > power mode, it cannot reliably exit the L1ss state when the
> > > endpoint asserts CLKREQ#. So in this case also, the client
> > > drivers cannot keep the device in low power state during suspend
> > > and expect context retention.
> >
> > I don't know what pm_suspend_via_firmware() means. The kernel-doc
> > says "platform firmware is going to be invoked at the end of the
> > system-wide power management transition," but that doesn't say
> > anything about what firmware might do or what it means to drivers.
>
> It's hard to predict what the firmware might do after it gains
> control from the OS. But as far as the API goes, it just expects the
> drivers to save the context and reset the device so that the
> firmware could do anything it want.
I don't see anything about the driver needing to reset the device.
(Kernel-doc says "driver *may* need to reset it" but no hint about how
to know.)
Adding something like "device internal state is not preserved" would
go a long ways here.
> > Based on d916b1be94b6 ("nvme-pci: use host managed power state for
> > suspend"), which used it in nvme_suspend(), I guess the assumption
> > is that pm_suspend_via_firmware() means the device might be put in
> > D3cold and lose all its internal state, and conversely,
> > !pm_suspend_via_firmware() means the device will *never* be put in
> > a low-power state that loses internal state.
>
> Yes, that's the assumption. Though, the firmware might not do D3Cold
> at all, but the drivers should be prepared for that to be compatible
> with all firmware implementations.
I don't think it's useful for a driver to know "firmware might not do
D3cold". What could a driver do with that? Unless the driver *knows*
internal state will be preserved, it must act as though the state is
lost.