Re: [PATCH 1/1] PCI: armada8k: Add link-down handle
From: Manivannan Sadhasivam
Date: Fri Feb 07 2025 - 12:58:14 EST
+ Niklas (who was interested in link down handling)
On Sat, Feb 01, 2025 at 11:05:56PM +0000, Wilson Ding wrote:
> > On November 13, 2024 3:02:55 AM GMT+05:30, Bjorn Helgaas
> > <mailto:helgaas@xxxxxxxxxx> wrote:
> > >In subject:
> > >
> > > PCI: armada8k: Add link-down handling
> > >
> > >On Mon, Nov 11, 2024 at 10:48:13PM -0800, Jenishkumar Maheshbhai
> > Patel wrote:
> > >> In PCIE ISR routine caused by RST_LINK_DOWN we schedule work to
> > >> handle the link-down procedure.
> > >> Link-down procedure will:
> > >> 1. Remove PCIe bus
> > >> 2. Reset the MAC
> > >> 3. Reconfigure link back up
> > >> 4. Rescan PCIe bus
> > >
> > >s/PCIE/PCIe/
> > >
> > >Rewrap to fill 75 columns.
> > >
> > >I assume this basically removes a Root Port (and the hierarchy below
> > >it) if the link goes down, and then resets the MAC and tries to bring
> > >up the link and enumerate the hierarchy again.
> > >
> > >No other drivers do this, so why does armada8k need it? Is this to
> > >work around some unreliable link?
> >
> > Certainly Qcom IPs have this same feature and I was also looking to implement
> > it. But the link down should not be handled by this in the controller driver.
> >
> > Instead, it should be tied to bus reset in the core and the reset should be done
> > through a callback implemented in the controller drivers. This way, the reset
> > cannot happen in the back of PCI core and client drivers.
> >
> > That said, the Link down IRQ received by this driver should also be propagated
> > back to the PCI core and the core should then call the callback to reset the bus
> > that I mentioned above.
> >
>
> It's more than a work-around for the unreliable link. A few customers may have
> such application - independent power supply to the device with dedicated reset
> GPIO to #PRST. In this way, the power cycle and warm reset of RC and EP won't
> have impact on each other. However, it may lead into the PCI driver not aware
> of the link down when an unexpected power down or reset occurs on the device.
> We cannot assume the link will be recovered soon. The worse thing is the driver
> may continue access to the device, which may hang the bus. Since the device
> is no longer present on the bus, it's better to remove it. Besides, in order to bring
> up the link, the only way is to reset the MAC, which starts over the state machine
> of LTSSM.
>
> Well, we also noticed that there is no other driver that did this. I agree it is not
> necessary if the power cycle or warm reset of the device is done gracefully. The
> user can remove the device prior to the power cycle/reset. And do the rescan
> after the link is recovered. However, the unexpected power down is still possible.
> Please enlighten me if there is any better approach to handle such unexpected
> link down.
>
There is no issue in retraining the link. My concern is that, the retrain should
not happen autonomously in the controller driver. PCI core should be made
informed of it. More below.
> In the meanwhile, I am quite interested the callback implementation suggested
> by Mani. But I am sure if we have such infrastructure right there. Mani, could
> you please elaborate a bit more, or is there any examples in the existing code
> and patches.
>
There is no such implementation available right now. This idea is on my mind for
quite some time, but never had time to do it. Maybe this gives me motivation to
do so.
Niklas: Did you get a chance to look into this? Else, let me know. I'll take a
stab at it.
- Mani
--
மணிவண்ணன் சதாசிவம்