Re: [PATCH v5 7/9] PCI: PCI: Add pcie_link_is_active() to determine if the PCIe link is active

From: Krishna Chaitanya Chundru
Date: Mon Apr 14 2025 - 00:21:55 EST




On 4/13/2025 10:44 PM, Lukas Wunner wrote:
On Sat, Apr 12, 2025 at 05:52:56AM +0200, Lukas Wunner wrote:
On Sat, Apr 12, 2025 at 07:19:56AM +0530, Krishna Chaitanya Chundru wrote:
Introduce a common API to check if the PCIe link is active, replacing
duplicate code in multiple locations.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@xxxxxxxxxxxxxxxx>

Reviewed-by: Lukas Wunner <lukas@xxxxxxxxx>

Looking at this with a fresh pair of eyeballs, I realize there's an issue
here, so unfortunately I have to retract the Reviewed-by:

pcie_link_is_active() differs from the existing pciehp_check_link_active()
in that it returns 0 not only if the link is down, but also if the
Config Space read returns with an error.

In particular, if Config Space of a hotplug bridge is inaccessible,
0 is returned instead of -ENODEV with this patch. That can happen if
the hotplug bridge itself has been hot-removed, which is common with
Thunderbolt, but also on servers with nested PCIe switches.

The existing invocations of pciehp_check_link_active() do the right
thing if the hotplug bridge has been hot-removed, but after this patch
they no longer do. For example in this hunk ...

--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -584,7 +557,7 @@ static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
* Synthesize it to ensure that it is acted on.
*/
down_read_nested(&ctrl->reset_lock, ctrl->depth);
- if (!pciehp_check_link_active(ctrl))
+ if (!pcie_link_is_active(ctrl_dev(ctrl)))
pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
up_read(&ctrl->reset_lock);
}

... pciehp_request() will be called if the hotplug bridge was
hot-removed, which isn't the right thing to do. The current
behavior is to do nothing.

I realize I steered you in the wrong direction because in my
review of your v4 I asked why pcie_link_is_active() doesn't
return a bool:

https://lore.kernel.org/all/Z72TRBvpzizcgm9S@xxxxxxxxx/

So I sincerely apologize for that! You actually did the right
thing in v4 by returning a negative int if the Config Space read
returned an error.


Ok, No problem. I will revert v4 patch i.e returning int instead of
bool. I will wait for few days for any other new review comments.

- Krishna Chaitanya.
Thanks,

Lukas