Re: crash during resume of PCIe bridge from v5.17 to next-20260130 (v5.16 works)
From: Christian König
Date: Mon Feb 02 2026 - 05:40:42 EST
On 2/1/26 17:42, Thomas Gleixner wrote:
> On Sun, Feb 01 2026 at 01:36, Bert Karwatzki wrote:
>> I found the error, the commit
>> ("drm/amd: Check if ASPM is enabled from PCIe subsystem")
>> has been applied twice first as cba07cce39ac and a second time
>> as 7294863a6f01 after it had been superseeded by commit
>> 0ab5d711ec74 ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
>> This effectively disables ASPM globally after the built-in GPU (which does not
>> support ASPM) is probed. This is the reason for the crashes and loss of devices
>> errors which on average occur after ~1000 resumes of the discrete GPU.
>
> Wow. Nice detective work...
Good catch, indeed.
But it is not clear to me why disabling ASPM causes trouble, usually it is the other way around.
Regards,
Christian.