Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
From: Mika Westerberg
Date: Wed Sep 04 2024 - 08:10:44 EST
Hi,
On Tue, Sep 03, 2024 at 01:32:30PM -0500, Mario Limonciello wrote:
> On 9/3/2024 13:25, Bjorn Helgaas wrote:
> > On Tue, Sep 03, 2024 at 12:31:00PM -0500, Mario Limonciello wrote:
> > > On 9/3/2024 12:11, Bjorn Helgaas wrote:
> > > ...
> >
> > > > 8) The USB4 stack sees the device and assumes it is in D0, but it
> > > > seems to still be in D3cold. What is this based on? Is there a
> > > > config read that returns ~0 data when it shouldn't?
> > >
> > > Yes there is. From earlier in the thread I have a [log] I shared.
> > >
> > > The message emitted is from ring_interrupt_active():
> > >
> > > "thunderbolt 0000:e5:00.5: interrupt for TX ring 0 is already enabled"
> >
> > Right, that's in the cover letter, but I can't tell from this what the
> > ioread32(ring->nhi->iobase + reg) returned. It looks like this is an
> > MMIO read of BAR 0, not a config read.
> >
>
> Yeah. I suppose another way to approach this problem is to make something
> else in the call chain poll PCI_PM_CTRL.
>
> Polling at the start of nhi_runtime_resume() should also work. For the
> "normal" scenario it would just be a single read to PCI_PM_CTRL.
>
> Mika, thoughts?
I'm starting to wonder if we are looking at the correct place ;-) This
reminds me that our PCIe SV people recently reported a couple of Linux
related issues which they recommended to fix, and these are on my list
but I'll share them because maybe they are related?
First problem, and actually a PCI spec violation, is that Linux does not
clear Bus Master, MMIO and IO space enables when it programs the device
to D3 on runtime suspend path. It does so on system sleep path though.
Something like below (untested) should do that:
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index f412ef73a6e4..79a566376301 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1332,6 +1332,7 @@ static int pci_pm_runtime_suspend(struct device *dev)
if (!pci_dev->state_saved) {
pci_save_state(pci_dev);
+ pci_pm_default_suspend(pci_dev);
pci_finish_runtime_suspend(pci_dev);
}
The second thing is that Thunderbolt driver, for historical reasons,
leaves the MSI enabled when entering D3. This too might be related. I
think we can unconditionally disable it so below hack should do that
(untested as well). I wonder if you could try if any of these or both
can help here? Both of these issues can result unwanted events during D3
entry as far as I understand.
diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
index dc1f456736dc..73b815fbbceb 100644
--- a/drivers/thunderbolt/ctl.c
+++ b/drivers/thunderbolt/ctl.c
@@ -659,12 +659,11 @@ struct tb_ctl *tb_ctl_alloc(struct tb_nhi *nhi, int index, int timeout_msec,
if (!ctl->frame_pool)
goto err;
- ctl->tx = tb_ring_alloc_tx(nhi, 0, 10, RING_FLAG_NO_SUSPEND);
+ ctl->tx = tb_ring_alloc_tx(nhi, 0, 10, 0);
if (!ctl->tx)
goto err;
- ctl->rx = tb_ring_alloc_rx(nhi, 0, 10, RING_FLAG_NO_SUSPEND, 0, 0xffff,
- 0xffff, NULL, NULL);
+ ctl->rx = tb_ring_alloc_rx(nhi, 0, 10, 0, 0, 0xffff, 0xffff, NULL, NULL);
if (!ctl->rx)
goto err;