Re: [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
From: Maciej W. Rozycki
Date: Thu Feb 26 2026 - 17:02:46 EST
On Tue, 24 Feb 2026, Matthew W Carlis wrote:
> > I argue that by applying this change the issues with NVMe hot-plug will
> > be sorted while keeping the configuration working that
> > pcie_failed_link_retrain() is needed for. Win-win.
>
> I don't think that what you are saying is true there is invariably going to be
> some other consequence of this change.. Its hard to believe there can be any
> changes to the pci drivers that won't break something.
You're being sarcastic, aren't you?
While I sympathise with your feeling, may I pretty please ask you to at
the very least give my fix a try in your test environment?
> > I note that active links are unaffected, so to say it's meddling with the
> > link on every device is I think a bit of an overstatement, and reports of
> > issues are from a few people only...
>
> There is no discrimination about which device it can be invoked on..
> I'm looking at a fleet of millions of hot-plug'able devices.... I don't really
> know if it matters how many people report an issue, I think what probably
> matters is making the right change. Initially was there any other reports
> of the quirk helping with other devices besides the delock 41433?
No reports that I know of. Please bear in mind that the failure mode is
such that you need enough knowledge of PCIe internals and the spec to
actually realise there is periodic link training activity taking place.
In the absence of the quirk for the average user there's just no
communication, as with a dead downstream device (and the upstream device
is sound as anything else plugged in, including but not limited to NVMe
storage, works just fine). In the presence of the quirk the downstream
device just works and I expect hardly anyone can be bothered to report
seeing "broken device, retraining non-functional downstream link at
2.5GT/s" in the log. It's only cases like yours that bring attention to
the message.
> > What outcome would you envisage had I taken the approach from this update
> > right away with the original change? My only fault was I have no use(*)
> > for PCIe hot-plug and did not predict the impact there.
>
> What I'm seeing now is an overall confusion about whether a link failed to train
> to gen 1 or was recovered by the quirk or recovered on its own etc... In my systems
> I would prefer to NEVER invoke the quirk under any circumstances because I expect
> my devices to work. With the quirk it becomes more unclear about what the cause
> of a link issue might have been or whether it was even a real link issue in the
> first place or some weird timing..
I can see your point.
However from your description I infer this is about a test environment, a
development lab so to speak. And you are a highly skilled professional
who has access to measurement, test, and hardware debug equipment, and are
therefore able to figure out stuff. Conversely, the vast majority of
Linux deployments is in the field, where no sophisticated equipment is
available and the operator, if any, may have basic technical skills only.
I have been taught that in the field it is more desirable for equipment
to operate according to expectations rather than to strictly follow the
relevant specifications and consequently fail operating. And the quirk I
have come up with just follows this principle, letting unqualified people
use their equipment (this is similar to Postel's law if you know what I
mean).
I realise that in the lab you want strict compliance as this will verify
interoperation of the devices you design.
So I think we have conflicting objectives here and I can only offer a
sysfs setting that will switch between the modes according to the specific
user's needs, as the intent is not something the kernel can figure out by
itself.
Please mind however that throughout this week and the next I'm away on
holiday (a proper one, as in alpine skiing), so my availability to respond
or work on stuff is limited. I'll appreciate if you give my fix a try
meanwhile.
Maciej