Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

From: Manivannan Sadhasivam

Date: Thu Feb 26 2026 - 06:17:34 EST


On Thu, Feb 19, 2026 at 05:42:37PM +0000, Jon Hunter wrote:
> Hi Mani,
>
> On 16/02/2026 14:35, Jon Hunter wrote:
>
> ...
>
> > > Krishna posted the series a couple of weeks before but forgot to CC you:
> > > https://lore.kernel.org/linux-pci/20260128-d3cold-v1-0-
> > > dd8f3f0ce824@xxxxxxxxxxxxxxxx/
> > >
> > > You are expected to use the helper
> > > pci_host_common_can_enter_d3cold() in the
> > > suspend path.
>
>
> I have been playing around with this, but so far I have not got anything
> to work. Right now I have just made the following change (note that this
> is based upon Manikanta's fixes series [0]) ...
>
> diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> index 9883d14f7f97..9f88e4c1db08 100644
> --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> @@ -2311,6 +2311,7 @@ static int tegra_pcie_dw_suspend_late(struct device *dev)
> static int tegra_pcie_dw_suspend_noirq(struct device *dev)
> {
> struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> + struct dw_pcie *pci = &pcie->pci;
> if (pcie->of_data->mode == DW_PCIE_EP_TYPE)
> return 0;
> @@ -2318,6 +2319,9 @@ static int tegra_pcie_dw_suspend_noirq(struct device *dev)
> if (!pcie->link_state)
> return 0;
> + if (!pci_host_common_can_enter_d3cold(pci->pp.bridge))
> + return 0;
> +
> tegra_pcie_dw_pme_turnoff(pcie);
> tegra_pcie_unconfig_controller(pcie);
>
>
> At first I was thinking that is we are not actually suspending the
> controller we can skip the configuration of the controller in the
> resume. However, if we skip configuring the controller in the resume
> then the device does not resume at all.

Device mean the 'host' here?

> So right now I have the
> above, but clearly this is not sufficient. The device resumes but
> the NVMe is not working ...
>
> nvme nvme0: ctrl state 1 is not RESETTING
> nvme nvme0: Disabling device after reset failure: -19
> nvme nvme0: Ignoring bogus Namespace Identifiers
> Aborting journal on device nvme0n1p1-8.
> nvme0n1: detected capacity change from 0 to 976773168
> EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1613: inode #18622533: comm (t-helper): reading directory lblock 0
> Buffer I/O error on dev nvme0n1p1, logical block 60850176, lost sync page write
> Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
> JBD2: I/O error when updating journal superblock for nvme0n1p1-8.
> EXT4-fs (nvme0n1p1): I/O error while writing superblock
> EXT4-fs error (device nvme0n1p1): ext4_journal_check_start:86: comm rs:main Q:Reg: Detected aborted journal
> Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
> EXT4-fs (nvme0n1p1): I/O error while writing superblock
> EXT4-fs (nvme0n1p1): Remounting filesystem read-only
> EXT4-fs (nvme0n1p1): shut down requested (2)
>
> Is the above what you were thinking? Anything else I am missing?
>

I can't certainly know what is going wrong. If controller driver suspend is
skipped, then ideally the controller and the NVMe device should stay powered ON
during suspend. But if the platform pulls the plug at the end of suspend
(firmware, gdsc or some other entity), then all the context would be lost and
that might explain the failure because both the controller driver and NVMe
driver would expect the RC and NVMe to be active.

You can try commenting out the whole PM callbacks:
// .pm = &tegra_pcie_dw_pm_ops

If the host itself doesn't resume, then it confirms that some other entity is
pulling the plug (which is common in ARM platforms). In that case, we have to
let the NVMe driver know about it so that it can shutdown the controller.

- Mani

--
மணிவண்ணன் சதாசிவம்