Re: [PATCH v9 6/9] PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller

From: Lukas Wunner
Date: Sat Dec 07 2024 - 11:31:38 EST


[cc += Mika, start of thread:
https://lore.kernel.org/all/db8e457fcd155436449b035e8791a8241b0df400.camel@xxxxxxxxxx/
]

On Sat, Dec 07, 2024 at 12:06:49AM +0100, Niklas Schnelle wrote:
> > > On Fri, 2024-12-06 at 19:12 +0100, Niklas Schnelle wrote:
> > > > I bisected a v6.13-rc1 boot hang on my personal workstation to this
> > > > patch. Sadly I don't have much details like a panic or so because the
> > > > boot hangs before any kernel messages, or at least they're not visible
> > > > long enough to see. I haven't yet looked into the code as I wanted to
> > > > raise awareness first. Since the commit doesn't revert cleanly on
> > > > v6.13-rc1 I also haven't tried that yet.
> > > >
> > > > Here are some details on my system:
> > > > - AMD Ryzen 9 3900X
> > > > - ASRock X570 Creator Motherboard
> > > > - Radeon RX 5600 XT
> > > > - Intel JHL7540 Thunderbolt 3 USB Controller (only USB 2 plugged)
> > > > - Intel 82599 10 Gigabit NIC with SR-IOV enabled with 2 VFs
> > > > - Intel n I211 Gigabit NIC
> > > > - Intel Wi-Fi 6 AX200
> > > > - Aquantia AQtion AQC107 NIC
>
> Ok did some fiddeling and it's the thunderbolt ports. The below diff
> works around the issue. That said I guess for a proper fix this would
> should get filtered by the port service matching? Also as can be seen
> in lspci the port still claims to support bandwidth management so maybe
> other thunderbolt ports actually do.
[...]
> --- a/drivers/pci/pcie/bwctrl.c
> +++ b/drivers/pci/pcie/bwctrl.c
> @@ -294,6 +294,9 @@ static int pcie_bwnotif_probe(struct pcie_device *srv)
> struct pci_dev *port = srv->port;
> int ret;
>
> + if (srv->port->is_thunderbolt)
> + return -EOPNOTSUPP;
> +

Thanks for reporting and analyzing this.

The PCIe bandwidth controller is only instantiated on Downstream Ports.
Per the spec, Thunderbolt PCIe Downstream Ports are just tunnel endpoints
with a fake Max Link Speed of 2.5 GT/s (USB4 v2 sec 11.2.1):

"Max Link Speed field in the Link Capabilities Register set to 0001b
(data rate of 2.5 GT/s only).
Note: These settings do not represent actual throughput.
Throughput is implementation specific and based on the USB4 Fabric
performance."

So your patch does make sense in so far as the link speed of Thunderbolt
PCIe Downstream Ports is fixed to 2.5 GT/s and cannot be throttled because
that's already the lowest possible PCIe speed. The actual speed is
determined by the Thunderbolt links.

The check for the is_thunderbolt bit should be moved to the if-clause in
get_port_device_capability() which sets the PCIE_PORT_SERVICE_BWCTRL bit
in the services mask.

Alternatively, it may be worth considering not to instantiate the
bandwidth controller if the only link speed supported is 2.5 GT/s.

We should try to find out what actually causes the boot hang
(some interrupt storm maybe?), but that can hopefully be done
internally at Intel if the boot hang is reproducible.

Thanks,

Lukas