Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during hotplug testing
From: Ilpo Järvinen
Date: Mon Jan 13 2025 - 10:15:58 EST
On Fri, 10 Jan 2025, Jiwei Sun wrote:
> From: Jiwei Sun <sunjw10@xxxxxxxxxx>
>
> When we do the quick hot-add/hot-remove test (within 1 second) with a PCIE
> Gen 5 NVMe disk, there is a possibility that the PCIe bridge will decrease
> to 2.5GT/s from 32GT/s
>
> pcieport 10002:00:04.0: pciehp: Slot(75): Link Down
> pcieport 10002:00:04.0: pciehp: Slot(75): Card present
> pcieport 10002:00:04.0: pciehp: Slot(75): No device found
> ...
> pcieport 10002:00:04.0: pciehp: Slot(75): Card present
> pcieport 10002:00:04.0: pciehp: Slot(75): No device found
> pcieport 10002:00:04.0: pciehp: Slot(75): Card present
> pcieport 10002:00:04.0: pciehp: Slot(75): No device found
> pcieport 10002:00:04.0: pciehp: Slot(75): Card present
> pcieport 10002:00:04.0: pciehp: Slot(75): No device found
> pcieport 10002:00:04.0: pciehp: Slot(75): Card present
> pcieport 10002:00:04.0: pciehp: Slot(75): No device found
> pcieport 10002:00:04.0: pciehp: Slot(75): Card present
> pcieport 10002:00:04.0: pciehp: Slot(75): No device found
> pcieport 10002:00:04.0: pciehp: Slot(75): Card present
> pcieport 10002:00:04.0: broken device, retraining non-functional downstream link at 2.5GT/s
> pcieport 10002:00:04.0: pciehp: Slot(75): No link
> pcieport 10002:00:04.0: pciehp: Slot(75): Card present
> pcieport 10002:00:04.0: pciehp: Slot(75): Link Up
> pcieport 10002:00:04.0: pciehp: Slot(75): No device found
> pcieport 10002:00:04.0: pciehp: Slot(75): Card present
> pcieport 10002:00:04.0: pciehp: Slot(75): No device found
> pcieport 10002:00:04.0: pciehp: Slot(75): Card present
> pci 10002:02:00.0: [144d:a826] type 00 class 0x010802 PCIe Endpoint
> pci 10002:02:00.0: BAR 0 [mem 0x00000000-0x00007fff 64bit]
> pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x00007fff 64bit]
> pci 10002:02:00.0: VF BAR 0 [mem 0x00000000-0x001fffff 64bit]: contains BAR 0 for 64 VFs
> pci 10002:02:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 10002:00:04.0 (capable of 126.028 Gb/s with 32.0 GT/s PCIe x4 link)
>
> If a NVMe disk is hot removed, the pciehp interrupt will be triggered, and
> the kernel thread pciehp_ist will be woken up, the
> pcie_failed_link_retrain() will be called as the following call trace.
>
> irq/87-pciehp-2524 [121] ..... 152046.006765: pcie_failed_link_retrain <-pcie_wait_for_link
> irq/87-pciehp-2524 [121] ..... 152046.006782: <stack trace>
> => [FTRACE TRAMPOLINE]
> => pcie_failed_link_retrain
> => pcie_wait_for_link
> => pciehp_check_link_status
> => pciehp_enable_slot
> => pciehp_handle_presence_or_link_change
> => pciehp_ist
> => irq_thread_fn
> => irq_thread
> => kthread
> => ret_from_fork
> => ret_from_fork_asm
>
> Accorind to investigation, the issue is caused by the following scenerios,
>
> NVMe disk pciehp hardirq
> hot-remove top-half pciehp irq kernel thread
> ======================================================================
> pciehp hardirq
> will be triggered
> cpu handle pciehp
> hardirq
> pciehp irq kthread will
> be woken up
> pciehp_ist
> ...
> pcie_failed_link_retrain
> read PCI_EXP_LNKCTL2 register
> read PCI_EXP_LNKSTA register
> If NVMe disk
> hot-add before
> calling pcie_retrain_link()
> set target speed to 2_5GT
This assumes LBMS has been seen but DLLLA isn't? Why is that?
> pcie_bwctrl_change_speed
> pcie_retrain_link
> : the retrain work will be
> successful, because
> pci_match_id() will be
> 0 in
> pcie_failed_link_retrain()
There's no pci_match_id() in pcie_retrain_link() ?? What does that : mean?
I think the nesting level is wrong in your flow description?
I don't understand how retrain success relates to the pci_match_id() as
there are two different steps in pcie_failed_link_retrain().
In step 1, pcie_failed_link_retrain() sets speed to 2.5GT/s if DLLLA=0 and
LBMS has been seen. Why is that condition happening in your case? You
didn't explain LBMS (nor DLLLA) in the above sequence so it's hard to
follow what is going on here. LBMS in particular is of high interest here
because I'm trying to understand if something should clear it on the
hotplug side (there's already one call to clear it in remove_board()).
In step 2 (pcie_set_target_speed() in step 1 succeeded),
pcie_failed_link_retrain() attempts to restore >2.5GT/s speed, this only
occurs when pci_match_id() matches. I guess you're trying to say that step
2 is not taken because pci_match_id() is not matching but the wording
above is very confusing.
Overall, I failed to understand the scenario here fully despite trying to
think it through over these few days.
> the target link speed
> field of the Link Control
> 2 Register will keep 0x1.
>
> In order to fix the issue, don't do the retraining work except ASMedia
> ASM2824.
>
> Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
> Reported-by: Adrian Huang <ahuang12@xxxxxxxxxx>
> Signed-off-by: Jiwei Sun <sunjw10@xxxxxxxxxx>
> ---
> drivers/pci/quirks.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 605628c810a5..ff04ebd9ae16 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -104,6 +104,9 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
> u16 lnksta, lnkctl2;
> int ret = -ENOTTY;
>
> + if (!pci_match_id(ids, dev))
> + return 0;
> +
> if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
> !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
> return ret;
> @@ -129,8 +132,7 @@ int pcie_failed_link_retrain(struct pci_dev *dev)
> }
>
> if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
> - (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
> - pci_match_id(ids, dev)) {
> + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) {
> u32 lnkcap;
>
> pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
>
--
i.