Re: [PATCH 1/2] PCI/portdrv: Add necessary delay for disabling hotplug events

From: Feng Tang
Date: Wed Feb 05 2025 - 22:19:13 EST


Hi Sathyanarayanan,

On Wed, Feb 05, 2025 at 10:26:59AM -0800, Sathyanarayanan Kuppuswamy wrote:
>
> On 2/3/25 9:37 PM, Feng Tang wrote:
> > According to PCIe 6.1 spec, section 6.7.3.2, software need to wait at
> > least 1 second for the command-complete event, before resending the cmd
> > or sending a new cmd.
> >
> > Currently get_port_device_capability() sends slot control cmd to disable
> > PCIe hotplug interrupts without waiting for its completion and there was
> > real problem reported for the lack of waiting.
>
> Can you include the error log associated with this issue? What is the
> actual issue you are seeing and in which hardware?

For this one, we don't have specific log, as it was raised by firmware
developer, as in https://lore.kernel.org/lkml/Z6LRAozZm1UfgjqT@U-2FWC9VHC-2323.local/

When handling PCI hotplug problem, they hit issue and found their state
machine corrupted , and back traced to OS. They didn't expect to receive
2 link control commands at almost the same time, which doesn't comply to
pcie spec, and normally the handling of one command will take some time
in BIOS, though not as long as 1 second. The HW is an ARM server.

I will try to add these info to commit log in next version.

>
> >
> > Add the necessary wait to comply with PCIe spec. The waiting logic refers
> > existing pcie_poll_cmd().
> >
> > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxxxxxxxxxx>
> > ---
> > drivers/pci/pci.h | 2 ++
> > drivers/pci/pcie/portdrv.c | 33 +++++++++++++++++++++++++++++++--
> > 2 files changed, 33 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 01e51db8d285..c1e234d1b81d 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -759,12 +759,14 @@ static inline void pcie_ecrc_get_policy(char *str) { }
> > #ifdef CONFIG_PCIEPORTBUS
> > void pcie_reset_lbms_count(struct pci_dev *port);
> > int pcie_lbms_count(struct pci_dev *port, unsigned long *val);
> > +void pcie_disable_hp_interrupts_early(struct pci_dev *dev);
> > #else
> > static inline void pcie_reset_lbms_count(struct pci_dev *port) {}
> > static inline int pcie_lbms_count(struct pci_dev *port, unsigned long *val)
> > {
> > return -EOPNOTSUPP;
> > }
> > +static inline void pcie_disable_hp_interrupts_early(struct pci_dev *dev) {}
> > #endif
> > struct pci_dev_reset_methods {
> > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > index 02e73099bad0..16010973bfe2 100644
> > --- a/drivers/pci/pcie/portdrv.c
> > +++ b/drivers/pci/pcie/portdrv.c
> > @@ -18,6 +18,7 @@
> > #include <linux/string.h>
> > #include <linux/slab.h>
> > #include <linux/aer.h>
> > +#include <linux/delay.h>
> > #include "../pci.h"
> > #include "portdrv.h"
> > @@ -205,6 +206,35 @@ static int pcie_init_service_irqs(struct pci_dev *dev, int *irqs, int mask)
> > return 0;
> > }
> > +static int pcie_wait_sltctl_cmd_raw(struct pci_dev *pdev)
> > +{
> > + u16 slot_status;
> > + /* 1000 ms, according toPCIe spec 6.1, section 6.7.3.2 */
> > + int timeout = 1000;
> > +
> > + do {
> > + pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
> > + if (slot_status & PCI_EXP_SLTSTA_CC) {
> > + pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> > + PCI_EXP_SLTSTA_CC);
> > + return 0;
> > + }
> > + msleep(10);
> > + timeout -= 10;
> > + } while (timeout);
> > +
> > + /* Timeout */
> > + return -1;
> > +}
>
> May be this logic can be simplified using readl_poll_timeout()?

Seems this is what exactly I needed :) Many thanks for the suggestion!

- Feng