Re: [PATCH v2 0/2] PCI/ASPM: Enable ASPM and Clock PM by default on devicetree platforms
From: Manivannan Sadhasivam
Date: Tue Nov 11 2025 - 02:25:22 EST
On Tue, Nov 11, 2025 at 03:51:03AM -0300, Val Packett wrote:
>
> On 11/8/25 1:18 PM, Dmitry Baryshkov wrote:
> > On Mon, Sep 22, 2025 at 09:46:43PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > > Hi,
> > >
> > > This series is one of the 'let's bite the bullet' kind, where we have decided to
> > > enable all ASPM and Clock PM states by default on devicetree platforms [1]. The
> > > reason why devicetree platforms were chosen because, it will be of minimal
> > > impact compared to the ACPI platforms. So seemed ideal to test the waters.
> > >
> > > This series is tested on Lenovo Thinkpad T14s based on Snapdragon X1 SoC. All
> > > supported ASPM states are getting enabled for both the NVMe and WLAN devices by
> > > default.
> > > [..]
> > The series breaks the DRM CI on DB820C board (apq8096, PCIe network
> > card, NFS root). The board resets randomly after some time ([1]).
>
> Is that reset.. due to the watchdog resetting a hard-frozen system?
>
> Me and a bunch of other people in the #aarch64-laptops irc/matrix room have
> been experiencing these random hard freezes with ASPM enabled for the NVMe
> SSD, on Hamoa (and Purwa too I think) devices.
>
Interesting! ASPM is tested and found to be working on Hamoa and other Qcom
chipsets also, except Makena based chipsets that doesn't support L0s due to
incorrect PHY settings. APQ8096 might be an exception since it is a really old
target and I'm digging up internally regarding the ASPM support.
> Totally unpredictable, could be after 4 minutes or 4 days of uptime.
> Panic-indicator LED not blinking, no reaction to magic SysRq, display image
> frozen, just a complete hang until the watchdog does the reset.
>
I have KIOXIA SSD on my T14s. I do see some random hang, but I thought those
predate the ASPM enablement as I saw them earlier as well. But even before this
series, we had ASPM enabled for SSDs on Qcom targets (or devices that gets
enumerated during initial bus scan), so it might be that the SSD doesn't support
ASPM well enough.
But I'm clueless on why it results in a hang. What I know on ARM platforms is
that we get SError aborts and other crazy bus/NOC issues if the device doesn't
respond to the PCIe read request. So the hang could be due to one of those
issues.
> I have confirmed with a modified (to accept args) enable-aspm.sh script[1]
> that disabling ASPM *only* for the SSD, while keeping it *on* for the WiFi
> adapter, is enough to keep the system stable (got to about a month of uptime
> in that state).
>
So this confirms that the controller supports it, and the device (SSD) might be
of fault here.
> If you have reproduced the same issue on an entirely different SoC, it's
> probably a general driver issue.
>
> Please, please help us debug this using your internal secret debug equipment
> :)
>
Starting from v6.18-rc3, we only enable L0s and L1 by default on all devicetree
platforms. Are you seeing the hangs post -rc3 also? If so, could you please
share the SSD model by doing 'lspci -nn'?
Apologies for the inconvenience!
- Mani
--
மணிவண்ணன் சதாசிவம்