Re: [PATCH v2] PCI: imx6: Add suspend/resume support for i.MX6QDL

From: Stefan Eichenberger
Date: Mon Oct 14 2024 - 04:48:57 EST


Hi Mani,

Thanks a lot for the comments.

On Sat, Oct 12, 2024 at 09:43:15AM +0530, Manivannan Sadhasivam wrote:
> On Wed, Oct 09, 2024 at 03:14:05PM +0200, Stefan Eichenberger wrote:
> > From: Stefan Eichenberger <stefan.eichenberger@xxxxxxxxxxx>
> >
> > The suspend/resume support is broken on the i.MX6QDL platform. This
>
> You mean the 'system suspend/resume'?
>
> > patch resets the link upon resuming to recover functionality. It shares
> > most of the sequences with other i.MX devices but does not touch the
> > critical registers, which might break PCIe. This patch addresses the
> > same issue as the following downstream commit:
> > https://github.com/nxp-imx/linux-imx/commit/4e92355e1f79d225ea842511fcfd42b343b32995
> > In comparison this patch will also reset the device if possible. Without
> > this patch suspend/resume will not work if a PCIe device is connected.
> > The kernel will hang on resume and print an error:
> > ath10k_pci 0000:01:00.0: Unable to change power state from D3hot to D0, device inaccessible
>
> Looks like the device is turned off during suspend.

Yes, I don't have a deep understanding about PCIe to be honest. However,
my understanding is that the PCIe devices will always try to suspend and
there is no way from the PCIe host controller to prevent this. Or am I
wrong? Currently suspend is not implemented for the i.MX6Dual/Quad
variant in the pci-imx6 driver and the device will still be turned off
by the PCI subsystem. On the other side I don't think it will fix
anything if I can prevent suspend for the devices because the
communication will still fail after resume.

>
> > 8<--- cut here ---
> > Unhandled fault: imprecise external abort (0x1406) at 0x0106f944
> >
> > Signed-off-by: Stefan Eichenberger <stefan.eichenberger@xxxxxxxxxxx>
> > ---
> > v1 -> v2: Share most code with other i.MX platforms and set suspend
> > support flag for i.MX6QDL. Version 1 can be found here:
> > https://lore.kernel.org/all/20240819090428.17349-1-eichest@xxxxxxxxx/
> >
> > drivers/pci/controller/dwc/pci-imx6.c | 44 +++++++++++++++++++++++++--
> > 1 file changed, 41 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c
> > index 808d1f1054173..f33bef0aa1071 100644
> > --- a/drivers/pci/controller/dwc/pci-imx6.c
> > +++ b/drivers/pci/controller/dwc/pci-imx6.c
> > @@ -1238,8 +1238,23 @@ static int imx_pcie_suspend_noirq(struct device *dev)
> >
> > imx_pcie_msi_save_restore(imx_pcie, true);
> > imx_pcie_pm_turnoff(imx_pcie);
> > - imx_pcie_stop_link(imx_pcie->pci);
> > - imx_pcie_host_exit(pp);
> > + /*
> > + * Do not turn off the PCIe controller because of ERR003756, ERR004490, ERR005188,
> > + * they all document issues with LLTSSM and the PCIe controller which
>
> LTSSM
>
> But LTSSM is for the PCIe link state, not sure how it impacts controller state.
> Can you share the link to those erratums?

The erratas are all from this document:
https://www.nxp.com/docs/en/errata/IMX6DQCE.pdf

Only the i.MX 6Dual/6Quad variant is affected but not the newer Plus
variants.

>
> > + * does not come out of reset properly. Therefore, try to keep the controller enabled
> > + * and only reset the link. However, the reference clock still needs to be turned off,
>
> You are resetting the *device* below, not the link.

Thanks, I will fix this comment.

> > + * else the controller will freeze on resume.
> > + */
>
> Please use 80 columns for comments. Exception is for the code.

Thanks, I will fix this.

> > + if (imx_pcie->drvdata->variant == IMX6Q) {
> > + /* Reset the PCIe device */
> > + gpiod_set_value_cansleep(imx_pcie->reset_gpiod, 1);
> > +
> > + if (imx_pcie->drvdata->enable_ref_clk)
> > + imx_pcie->drvdata->enable_ref_clk(imx_pcie, false);
> > + } else {
> > + imx_pcie_stop_link(imx_pcie->pci);
> > + imx_pcie_host_exit(pp);
> > + }
> >
> > return 0;
> > }
> > @@ -1253,6 +1268,28 @@ static int imx_pcie_resume_noirq(struct device *dev)
> > if (!(imx_pcie->drvdata->flags & IMX_PCIE_FLAG_SUPPORTS_SUSPEND))
> > return 0;
> >
> > + /*
> > + * Even though the i.MX6Q does not support proper suspend/resume, we
> > + * need to reset the link after resume or the memory mapped PCIe I/O
> > + * space will be inaccessible. This will cause the system to freeze.
> > + */

Thanks, I will fix this.

>
> > + if (imx_pcie->drvdata->variant == IMX6Q) {
> > + if (imx_pcie->drvdata->enable_ref_clk)
> > + imx_pcie->drvdata->enable_ref_clk(imx_pcie, true);
> > +
> > + imx_pcie_deassert_core_reset(imx_pcie);
>
> There is no corresponding imx_pcie_assert_core_reset() in suspend.

Thanks, I will try to fix this similar to what they do in the host_init.

> > +
> > + /*
> > + * Setup the root complex again and enable msi. Without this PCIe will
> > + * not work in msi mode and drivers will crash if they try to access
> > + * the device memory area
> > + */
>
> This indicates that the controller state is not preserved. I think we need a bit
> more understanding on what is going on during system suspend on this platform.

Yes I fully agree, but unfortunately I don't really know how to fix it
properly. I just tried to fix the driver by searching for an absolute
minimum that is required to make suspend/resume work when a PCIe device
is connected. As an alternative the downstream patch would do something
similar but I thought also resetting the device would be a better
solution (to stay closer to what the other variants do):
https://github.com/nxp-imx/linux-imx/commit/4e92355e1f79d225ea842511fcfd42b343b32995
I just now saw that I didn't mention ERR005723 at all, sorry for that.
If this would be a more acceptable solution I would do some more tests
with this patch in our setup.

Regards,
Stefan