Re: [PATCH] PCI/ASPM: Enable L0s/L1 for removable devices when BIOS didn't configure ASPM

From: Mario Limonciello

Date: Tue May 05 2026 - 23:36:55 EST




On 5/5/26 16:42, Bjorn Helgaas wrote:
[+cc Mika]

On Tue, May 05, 2026 at 11:08:14AM -0500, Mario Limonciello wrote:
On 5/5/26 11:05, Bjorn Helgaas wrote:
On Mon, May 04, 2026 at 05:52:46PM -0500, Mario Limonciello wrote:
When comparing lspci output between Windows and Linux for hotplugged
Thunderbolt 5 eGPU devices, Windows enables ASPM L1 but Linux doesn't:

Windows: LnkCtl: ASPM L1 Enabled
Linux: LnkCtl: ASPM Disabled

This difference in ASPM configuration can cause behavioral differences
between the two operating systems for the same hardware.

A tangent, not a comment on the patch itself, but what sort of
behavioral differences are these? If ASPM is working correctly, the
only differences *should* be in power consumption and performance.

This originally stemmed from a significant performance difference that was
observed between Windows and Linux with eGPUs. The link in the patch points
at that bug if you want to look more closely at it.

Hmm. The bug (https://bugzilla.kernel.org/show_bug.cgi?id=221319)
reports "instant reboot", which is definitely a behavioral difference.
But AFAICS this patch would just fix something noticed along the way
but not the reboot itself.

To avoid confusion, I would use "performance difference" or "power
difference" when describing this patch.

There is a lot of traffic in that bug and similar eGPU bugs; but some people have narrowed down that using NVIDIA's GSP "causes the instant reboot" but the performance difference is tangential to the reboot (or maybe it's part of the cause - I don't actually know).

The reboots /seem/ to be caused by sync floods which I originally hypothesized to be caused by Linux using AER and Windows not using it (potentially leading to a flood of errors in Linux), but turning off AER from kernel command line didn't change that.


I was hopeful that aligning ASPM would align the behavior, but alas this
didn't.

It was still a difference that I figured we should discuss whether it should
be changed to be consistent.

Definitely. I hope we can at least enable L1.1. L1.2 is a whole
'nother issue.

Yup.