Re: crash during resume of PCIe bridge from v5.17 to next-20260130 (v5.16 works)

From: Christian König

Date: Mon Feb 02 2026 - 05:40:42 EST


On 2/1/26 17:42, Thomas Gleixner wrote:
> On Sun, Feb 01 2026 at 01:36, Bert Karwatzki wrote:
>> I found the error, the commit
>> ("drm/amd: Check if ASPM is enabled from PCIe subsystem")
>> has been applied twice first as cba07cce39ac and a second time
>> as 7294863a6f01 after it had been superseeded by commit
>> 0ab5d711ec74 ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
>> This effectively disables ASPM globally after the built-in GPU (which does not
>> support ASPM) is probed. This is the reason for the crashes and loss of devices
>> errors which on average occur after ~1000 resumes of the discrete GPU.
>
> Wow. Nice detective work...

Good catch, indeed.

But it is not clear to me why disabling ASPM causes trouble, usually it is the other way around.

Regards,
Christian.