Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
From: Manivannan Sadhasivam
Date: Thu Feb 26 2026 - 06:08:53 EST
On Thu, Feb 26, 2026 at 10:34:18AM +0000, Jon Hunter wrote:
> Hi Mani, Bjorn,
>
> On 19/02/2026 17:42, Jon Hunter wrote:
> > Hi Mani,
> >
> > On 16/02/2026 14:35, Jon Hunter wrote:
> >
> > ...
> >
> > > > Krishna posted the series a couple of weeks before but forgot to CC you:
> > > > https://lore.kernel.org/linux-pci/20260128-d3cold-v1-0-
> > > > dd8f3f0ce824@xxxxxxxxxxxxxxxx/
> > > >
> > > > You are expected to use the helper
> > > > pci_host_common_can_enter_d3cold() in the
> > > > suspend path.
> >
> >
> > I have been playing around with this, but so far I have not got anything
> > to work. Right now I have just made the following change (note that this
> > is based upon Manikanta's fixes series [0]) ...
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/
> > controller/dwc/pcie-tegra194.c
> > index 9883d14f7f97..9f88e4c1db08 100644
> > --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> > +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> > @@ -2311,6 +2311,7 @@ static int tegra_pcie_dw_suspend_late(struct
> > device *dev)
> > static int tegra_pcie_dw_suspend_noirq(struct device *dev)
> > {
> > struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> > + struct dw_pcie *pci = &pcie->pci;
> >
> > if (pcie->of_data->mode == DW_PCIE_EP_TYPE)
> > return 0;
> > @@ -2318,6 +2319,9 @@ static int tegra_pcie_dw_suspend_noirq(struct
> > device *dev)
> > if (!pcie->link_state)
> > return 0;
> >
> > + if (!pci_host_common_can_enter_d3cold(pci->pp.bridge))
> > + return 0;
> > +
> > tegra_pcie_dw_pme_turnoff(pcie);
> > tegra_pcie_unconfig_controller(pcie);
> >
> >
> > At first I was thinking that is we are not actually suspending the
> > controller we can skip the configuration of the controller in the
> > resume. However, if we skip configuring the controller in the resume
> > then the device does not resume at all. So right now I have the
> > above, but clearly this is not sufficient. The device resumes but
> > the NVMe is not working ...
> >
> > nvme nvme0: ctrl state 1 is not RESETTING
> > nvme nvme0: Disabling device after reset failure: -19
> > nvme nvme0: Ignoring bogus Namespace Identifiers
> > Aborting journal on device nvme0n1p1-8.
> > nvme0n1: detected capacity change from 0 to 976773168
> > EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1613: inode
> > #18622533: comm (t-helper): reading directory lblock 0
> > Buffer I/O error on dev nvme0n1p1, logical block 60850176, lost sync
> > page write
> > Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
> > JBD2: I/O error when updating journal superblock for nvme0n1p1-8.
> > EXT4-fs (nvme0n1p1): I/O error while writing superblock
> > EXT4-fs error (device nvme0n1p1): ext4_journal_check_start:86: comm
> > rs:main Q:Reg: Detected aborted journal
> > Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
> > EXT4-fs (nvme0n1p1): I/O error while writing superblock
> > EXT4-fs (nvme0n1p1): Remounting filesystem read-only
> > EXT4-fs (nvme0n1p1): shut down requested (2)
> >
> > Is the above what you were thinking? Anything else I am missing?
>
> So NVMe is still broken for us and I admit, I don't fully understand the
> issue. However, it seems to me that this change is not working for all
> device-tree platforms as intended. So for now, would it be acceptable to add
> a callback function for drivers such as the Tegra194 PCIe driver to opt out
> of this? This would at least allow NVMe to work as it was before.
>
Since we know that ASPM is the issue on your platform and the failure also
confirms that ASPM was never enabled before, I'd suggest disabling ASPM for the
Root Port as a workaround:
```
diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
index 06571d806ab3..f504b4ffbcb6 100644
--- a/drivers/pci/controller/dwc/pcie-tegra194.c
+++ b/drivers/pci/controller/dwc/pcie-tegra194.c
@@ -2499,6 +2499,13 @@ module_platform_driver(tegra_pcie_dw_driver);
MODULE_DEVICE_TABLE(of, tegra_pcie_dw_of_match);
+static void tegra_pcie_quirk_disable_aspm(struct pci_dev *dev)
+{
+ pcie_aspm_remove_cap(dev, PCI_EXP_LNKCAP_ASPM_L1 |
+ PCI_EXP_LNKCAP_ASPM_L0S);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, tegra_pcie_quirk_disable_aspm);
+
MODULE_AUTHOR("Vidya Sagar <vidyas@xxxxxxxxxx>");
MODULE_DESCRIPTION("NVIDIA PCIe host controller driver");
MODULE_LICENSE("GPL v2");
```
You can use specific Root Port IDs or PCI_ANY_ID depending on the impact. We can
also work on fixing the actual issue parallelly.
- Mani
--
மணிவண்ணன் சதாசிவம்