Re: [PATCH v5 0/5] Verify devices transition from D3cold to D0

From: Mario Limonciello
Date: Wed Dec 04 2024 - 22:44:18 EST


On 12/4/2024 17:45, Bjorn Helgaas wrote:
On Wed, Dec 04, 2024 at 11:30:51AM -0600, Mario Limonciello wrote:
On 8/23/2024 10:40, Mario Limonciello wrote:
From: Mario Limonciello <mario.limonciello@xxxxxxx>

Gary has reported that when a dock is plugged into a system at the same
time the autosuspend delay has tripped that the USB4 stack malfunctions.

Messages show up like this:

```
thunderbolt 0000:e5:00.6: ring_interrupt_active: interrupt for TX ring 0 is already enabled
```

Furthermore the USB4 router is non-functional at this point.

Those messages happen because the device is still in D3cold at the time
that the PCI core handed control back to the USB4 connection manager
(thunderbolt).

The issue is that it takes time for a device to enter D3cold and do a
conventional reset, and then more time for it to exit D3cold.

This appears not to be a new problem; previously there were very similar
reports from Ryzen XHCI controllers. Quirks were added for those.
Furthermore; adding extra logging it's apparent that other PCI devices
in the system can take more than 10ms to recover from D3cold as well.

This series add a wait into pci_power_up() specifically for D3cold exit and
then drops the quirks that were previously used for the Ryzen XHCI controllers.

Mario Limonciello (5):
PCI: Use an enum for reset type in pci_dev_wait()
PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
PCI: Verify functions currently in D3cold have entered D0
PCI: Allow Ryzen XHCI controllers into D3cold and drop delays
PCI: Drop Radeon quirk for Macbook Pro 8.2

drivers/pci/pci-driver.c | 2 +-
drivers/pci/pci.c | 70 +++++++++++++++++++++++++++----------
drivers/pci/pci.h | 13 ++++++-
drivers/pci/pcie/dpc.c | 2 +-
drivers/pci/quirks.c | 25 -------------
drivers/usb/host/xhci-pci.c | 11 ------
6 files changed, 66 insertions(+), 57 deletions(-)

Bjorn,

This series has stalled a while.

Mika and I went back and forth and I think are generally in agreement so I
think it's waiting on your feedback.

Can you take another look?

The alternative is to add some more piles of quirks, but I'm hoping that we
can go this direction and drop a bunch of the old ones instead.

LMK if you want me to rebase it on 6.13-rc1 and resend a v6.

I'm still stuck on patch 2/5 because I'm not aware of any spec
language about polling PCI_PM_CTRL to wait for a power state
transition, so it seems really ad hoc.

I'm not really sure how to overcome this. If I rebase everything I'll give specs another read through in case I missed anything, but I suspect you know these specs better than anyoe on this list.

Is it worth raising this to PCI-SIG to discuss?
Did you perhaps already do that?


If you do rebase to v6.13-rc1, in the 2/5 commit log,
s/evices/devices/.

I guess that whole patch and commit log needs updating since the RRS
code was added to pci_dev_wait() in the interim, so the "device that
has gone through a reset may return a value in PCI_COMMAND but that
doesn't mean it's finished transitioning to D0" doesn't directly apply
anymore.

Bjorn