[PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing

From: Jiwei Sun
Date: Fri Jan 10 2025 - 08:45:16 EST


From: Jiwei Sun <sunjw10@xxxxxxxxxx>

When we do the quick hot-add/hot-remove test (within 1 second) with a PCIE
Gen 5 NVMe disk, there is a possibility that the PCIe bridge will decrease
to 2.5GT/s from 32GT/s

pcieport 10002:00:04.0: pciehp: Slot(75): Link Down
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
...
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: broken device, retraining non-functional downstream link at 2.5GT/s
pcieport 10002:00:04.0: pciehp: Slot(75): No link
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): Link Up
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pcieport 10002:00:04.0: pciehp: Slot(75): No device found
pcieport 10002:00:04.0: pciehp: Slot(75): Card present
pci 10002:02:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint
pci 10002:02:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit]
pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit]
pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs
pci 10002:02:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 10002:00:04.0 (capable of 126.028 Gb/s with 32.0 GT/s PCIe x4 link)

If a NVMe disk is hot removed, the pciehp interrupt will be triggered, and
the kernel thread pciehp_ist will be woken up, the
pcie_failed_link_retrain() will be called as the following call trace.

irq/87-pciehp-2524 [121] ..... 152046.006765: pcie_failed_link_retrain <-pcie_wait_for_link
irq/87-pciehp-2524 [121] ..... 152046.006782: <stack trace>
=> [FTRACE TRAMPOLINE]
=> pcie_failed_link_retrain
=> pcie_wait_for_link
=> pciehp_check_link_status
=> pciehp_enable_slot
=> pciehp_handle_presence_or_link_change
=> pciehp_ist
=> irq_thread_fn
=> irq_thread
=> kthread
=> ret_from_fork
=> ret_from_fork_asm

Accorind to investigation, the issue is caused by the following scenerios,

NVMe disk pciehp hardirq
hot-remove top-half pciehp irq kernel thread
======================================================================
pciehp hardirq
will be triggered
cpu handle pciehp
hardirq
pciehp irq kthread will
be woken up
pciehp_ist
...
pcie_failed_link_retrain
read PCI_EXP_LNKCTL2 register
read PCI_EXP_LNKSTA register
If NVMe disk
hot-add before
calling pcie_retrain_link()
set target speed to 2_5GT
pcie_bwctrl_change_speed
pcie_retrain_link
: the retrain work will be
successful, because
pci_match_id() will be
0 in
pcie_failed_link_retrain()
the target link speed
field of the Link Control
2 Register will keep 0x1.

In order to fix the issue, don't do the retraining work except ASMedia
ASM2824.

Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Reported-by: Adrian Huang <ahuang12@xxxxxxxxxx>
Signed-off-by: Jiwei Sun <sunjw10@xxxxxxxxxx>
---
drivers/pci/quirks.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 605628c810a5..ff04ebd9ae16 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -104,6 +104,9 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
u16 lnksta, lnkctl2;
int ret = -ENOTTY;

+ if (!pci_match_id(ids, dev))
+ return 0;
+
if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
!pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
return ret;
@@ -129,8 +132,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
}

if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
- (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
- pci_match_id(ids, dev)) {
+ (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) {
u32 lnkcap;

pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
--
2.34.1