Re: [PATCH v5 7/9] PCI: PCI: Add pcie_link_is_active() to determine if the PCIe link is active
From: Lukas Wunner
Date: Sun Apr 13 2025 - 13:15:01 EST
On Sat, Apr 12, 2025 at 05:52:56AM +0200, Lukas Wunner wrote:
> On Sat, Apr 12, 2025 at 07:19:56AM +0530, Krishna Chaitanya Chundru wrote:
> > Introduce a common API to check if the PCIe link is active, replacing
> > duplicate code in multiple locations.
> >
> > Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@xxxxxxxxxxxxxxxx>
>
> Reviewed-by: Lukas Wunner <lukas@xxxxxxxxx>
Looking at this with a fresh pair of eyeballs, I realize there's an issue
here, so unfortunately I have to retract the Reviewed-by:
pcie_link_is_active() differs from the existing pciehp_check_link_active()
in that it returns 0 not only if the link is down, but also if the
Config Space read returns with an error.
In particular, if Config Space of a hotplug bridge is inaccessible,
0 is returned instead of -ENODEV with this patch. That can happen if
the hotplug bridge itself has been hot-removed, which is common with
Thunderbolt, but also on servers with nested PCIe switches.
The existing invocations of pciehp_check_link_active() do the right
thing if the hotplug bridge has been hot-removed, but after this patch
they no longer do. For example in this hunk ...
> > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > @@ -584,7 +557,7 @@ static void pciehp_ignore_dpc_link_change(struct controller *ctrl,
> > * Synthesize it to ensure that it is acted on.
> > */
> > down_read_nested(&ctrl->reset_lock, ctrl->depth);
> > - if (!pciehp_check_link_active(ctrl))
> > + if (!pcie_link_is_active(ctrl_dev(ctrl)))
> > pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
> > up_read(&ctrl->reset_lock);
> > }
... pciehp_request() will be called if the hotplug bridge was
hot-removed, which isn't the right thing to do. The current
behavior is to do nothing.
I realize I steered you in the wrong direction because in my
review of your v4 I asked why pcie_link_is_active() doesn't
return a bool:
https://lore.kernel.org/all/Z72TRBvpzizcgm9S@xxxxxxxxx/
So I sincerely apologize for that! You actually did the right
thing in v4 by returning a negative int if the Config Space read
returned an error.
Thanks,
Lukas