Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing

From: Jiwei
Date: Mon Jan 13 2025 - 07:51:10 EST




On 1/12/25 00:00, Maciej W. Rozycki wrote:
> On Fri, 10 Jan 2025, Jiwei Sun wrote:
>
>> In order to fix the issue, don't do the retraining work except ASMedia
>> ASM2824.
>
> I yet need to go through all of your submission in detail, but this
> assumption defeats the purpose of the workaround, as the current
> understanding of the origin of the training failure and the reason to
> retrain by hand with the speed limited to 2.5GT/s is the *downstream*
> device rather than the ASMedia ASM2824 switch.
>
> It is also why the quirk has been wired to run everywhere rather than
> having been keyed by VID:DID, and the VID:DID of the switch is only
> listed, conservatively, because it seems safe with the switch to lift the
> speed restriction once the link has successfully completed training.
>
> Overall I think we need to get your problem sorted differently, because I
> suppose in principle your hot-plug scenario could also happen with the
> ASMedia ASM2824 switch as the upstream device and your NVMe storage
> element as the downstream device. Perhaps the speed restriction could be
> always lifted, and then the bandwidth controller infrastructure used for
> that, so that it doesn't have to happen within `pcie_failed_link_retrain'?

According to our test, the following modification can fix the issue in our
test machine.

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 02d2e16672a8..9ca051b86878 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -97,10 +97,6 @@ static bool pcie_lbms_seen(struct pci_dev *dev, u16 lnksta)
*/
int pcie_failed_link_retrain(struct pci_dev *dev) {
- static const struct pci_device_id ids[] = {
- { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
- {}
- };
u16 lnksta, lnkctl2;
int ret = -ENOTTY;

@@ -128,8 +124,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
}

if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
- (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
- pci_match_id(ids, dev)) {
+ (lnkctl2 & PCI_EXP_LNKCTL2_TLS) ==
+ PCI_EXP_LNKCTL2_TLS_2_5GT) {
u32 lnkcap;

pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");

But I don't know if the above modification will have any other negative effects
on other devices. Could you please share your thoughts?

Thanks,
Regards,
Jiwei

>
> Maciej