Re: [PATCH v9 6/9] PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller
From: Niklas Schnelle
Date: Fri Dec 06 2024 - 13:12:51 EST
On Fri, 2024-10-18 at 17:47 +0300, Ilpo Järvinen wrote:
> This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove
> bandwidth notification"). An upcoming commit extends this driver
> building PCIe bandwidth controller on top of it.
>
> The PCIe bandwidth notification were first added in the commit
> e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
> notification") but later had to be removed. The significant changes
> compared with the old bandwidth notification driver include:
>
> 1) Don't print the notifications into kernel log, just keep the Link
> Speed cached in struct pci_bus updated. While somewhat
> unfortunate,
> the log spam was the source of complaints that eventually lead to
> the removal of the bandwidth notifications driver (see the links
> below for further information).
>
> 2) Besides the Link Bandwidth Management Interrupt, enable also Link
> Autonomous Bandwidth Interrupt to cover the other source of
> bandwidth changes.
>
> 3) Use threaded IRQ with IRQF_ONESHOT to handle Bandwidth
> Notification
> Interrupts to address the problem fixed in the commit 3e82a7f9031f
> ("PCI/LINK: Supply IRQ handler so level-triggered IRQs are
> acked")).
>
> 4) Handle Link Speed updates robustly. Refresh the cached Link Speed
> when enabling Bandwidth Notification Interrupts, and solve the
> race
> between Link Speed read and LBMS/LABS update in
> pcie_bwnotif_irq_thread().
>
> 5) Use concurrency safe LNKCTL RMW operations.
>
> 6) The driver is now called PCIe bwctrl (bandwidth controller)
> instead
> of just bandwidth notifications because of increased scope and
> functionality within the driver.
>
> 7) Coexist with the Target Link Speed quirk in
> pcie_failed_link_retrain(). Provide LBMS counting API for it.
>
> 8) Tweaks to variable/functions names for consistency and length
> reasons.
>
> Bandwidth Notifications enable the cur_bus_speed in the struct
> pci_bus
> to keep track PCIe Link Speed changes.
>
> Link:
> https://lore.kernel.org/all/20190429185611.121751-1-helgaas@xxxxxxxxxx/
> Link:
> https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@xxxxxxxxx/
> Link:
> https://lore.kernel.org/linux-pci/20200115221008.GA191037@xxxxxxxxxx/
> Suggested-by: Lukas Wunner <lukas@xxxxxxxxx> # Building bwctrl on top
> of bwnotif
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> ---
Hi Ilpo,
I bisected a v6.13-rc1 boot hang on my personal workstation to this
patch. Sadly I don't have much details like a panic or so because the
boot hangs before any kernel messages, or at least they're not visible
long enough to see. I haven't yet looked into the code as I wanted to
raise awareness first. Since the commit doesn't revert cleanly on
v6.13-rc1 I also haven't tried that yet.
Here are some details on my system:
- AMD Ryzen 9 3900X
- ASRock X570 Creator Motherboard
- Radeon RX 5600 XT
- Intel JHL7540 Thunderbolt 3 USB Controller (only USB 2 plugged)
- Intel 82599 10 Gigabit NIC with SR-IOV enabled with 2 VFs
- Intel n I211 Gigabit NIC
- Intel Wi-Fi 6 AX200
- Aquantia AQtion AQC107 NIC
If you have patches or things to try just ask.
Thanks,
Niklas