Re: [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining

From: Maciej W. Rozycki

Date: Mon Feb 23 2026 - 18:18:44 EST

On Mon, 23 Feb 2026, Bjorn Helgaas wrote:

> > > > Can we reconsider my patch that restricts the link retrain mechanism
> > > > to the specific device that created the work-around?
> > > > https://lore.kernel.org/all/20250702052430.13716-1-mattc@xxxxxxxxxxxxxxx/
> > >
> > > I think we already at least potentially meddle with the link on every
> > > device, and it definitely makes me nervous. I would like it much
> > > better if it's possible to limit it to devices with known defects.
> > >
> > > I'll defer these for now and we can see if a consensus emerges.
> >
> > As I say it's logically impossible to figure out whether or not to
> > apply such a workaround where the culprit is the downstream device,
> > because until you've succeeded establishing a link you have no way
> > to figure out what the downstream device actually is.
>
> IIUC Matthew [1] and Alok [2] have reported issues that only happen
> when we run pcie_failed_link_retrain(). The issues seem to be with
> NVMe devices, but I don't see a root cause or a solution (other than
> skipping pcie_failed_link_retrain()).

I argue that by applying this change the issues with NVMe hot-plug will
be sorted while keeping the configuration working that
pcie_failed_link_retrain() is needed for. Win-win.

I note that active links are unaffected, so to say it's meddling with the
link on every device is I think a bit of an overstatement, and reports of
issues are from a few people only, of which Matthew insists on dropping
the pcie_failed_link_retrain() stuff while Alok is happy with the solution
proposed.

Shall we try this change and drop the whole pcie_failed_link_retrain()
stuff after all if it turns out to cause issues yet again?

What outcome would you envisage had I taken the approach from this update
right away with the original change? My only fault was I have no use(*)
for PCIe hot-plug and did not predict the impact there. But no one can
predict everything and I think dropping a solution rather than fixing it
just because it wasn't perfect right from the beginning would be unfair.

As to the root cause, I suppose it's hw people who see the problem in the
lab could say more.

I can only guess LBMS is set in the course of device removal, perhaps due
to contacts bouncing. It's not clear to me if this is compliant even, as
the spec is explicit that LBMS is only allowed to be set if the port has
not transitioned through the DL_Down status, which the loss of electrical
connection (circuit open) would seem (?) to imply.

Then again at x4 width which NVMe tends to imply the lanes may disconnect
at different times each, so the loss of connection on one lane would not
imply the loss of link, so the setting of LBMS could be legitimate before
the device has gone for good.

At insertion the link has to get running stable at 2.5GT/s first before
switching to a higher speed, so even more so the LBMS is not supposed to
be set.

As I say it's only a guess and I may be missing something. I can't claim
any familiarity with the PCIe physical layer, though I can take some time
and study it.

In any case the unconditional unclamping and retraining actions proposed
are expected to clean up the state, whether compliant or not.

(*) Except for ExpressCard in my laptop, but that's another story.

Maciej