Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()

From: Mario Limonciello
Date: Fri Sep 13 2024 - 00:12:34 EST


On 9/10/2024 04:13, Mika Westerberg wrote:
Hi,

On Mon, Sep 09, 2024 at 03:40:54PM -0500, Mario Limonciello wrote:
Few additional side paths here, though. This is supposed to work so that
once the host router sleep bit is set the driver is supposed to allow
the domain to enter sleep (e.g it should not be waken up before it is
fully transitioned). That's what we do:

1. All tunneled PCIe Root/Downstream ports are in D3.
2. All tunneled USB 3.x ports are in U3.
3. No DisplayPort is tunneled.
4. Thunderbolt driver enables wakes.
5. Thunderbolt driver writes sleep ready bit of the host router.
6. Thunderbolt driver runtime suspend is complete.
7. ACPI method is called (_PS3 or _PR3.OFF) that will trigger the "Sleep
Event".

If between 5 and 7 there is device connected, it should not "abort" the
sequence. Unfortunately this is not explict in the USB4 spec but the
connection manager guide has similar note. Even if the connect happens
there the "Sleep Event" should happen but after that it can trigger
normal wakeup which will then bring everything back.

Would it be possible to enable tracing around these steps so that we
could see if there is hotplug notification somewhere there that is not
expected? Here are instructions how to get pretty accurate trace:

https://github.com/intel/tbtools?tab=readme-ov-file#tracing

Please also take full dmesg.

Sure, here is the dmesg with tracing enabled:

https://gist.github.com/superm1/5186e0023c8a5d2ecd75c50fd2168308

Thanks! I meant also enable control channel tracing as explained in the
above linked page so we could maybe see the hotplug packet coming from
the lane adapter too. Anyhow,

[ 560.789681] thunderbolt 0-2: device disconnected
[ 560.789771] thunderbolt 0000:e5:00.5: bandwidth consumption changed, re-calculating estimated bandwidth
[ 560.789775] thunderbolt 0000:e5:00.5: bandwidth re-calculation done
[ 560.789778] thunderbolt 0000:e5:00.5: looking for DP IN <-> DP OUT pairs:
[ 560.789800] thunderbolt 0000:e5:00.5: 0:6: DP IN available
[ 560.789803] thunderbolt 0000:e5:00.5: 0:6: no suitable DP OUT adapter available, not tunneling
[ 560.790484] thunderbolt 0000:e5:00.5: 0:7: DP IN available
[ 560.790487] thunderbolt 0000:e5:00.5: 0:7: no suitable DP OUT adapter available, not tunneling
...
[ 578.677640] thunderbolt 0000:e5:00.5: nhi_runtime_suspend() - enter, pdev->current_state = 0
[ 578.677648] thunderbolt 0000:e5:00.5: 0: suspending switch
[ 578.677653] thunderbolt 0000:e5:00.5: 0: enabling wakeup: 0x3f
[ 578.678248] thunderbolt 0000:e5:00.5: stopping RX ring 0
[ 578.678256] thunderbolt 0000:e5:00.5: disabling interrupt at register 0x38200 bit 3 (0x9 -> 0x1)
[ 578.678272] thunderbolt 0000:e5:00.5: stopping TX ring 0
[ 578.678277] thunderbolt 0000:e5:00.5: disabling interrupt at register 0x38200 bit 0 (0x1 -> 0x0)
[ 578.678287] thunderbolt 0000:e5:00.5: control channel stopped
...
pci_pm_runtime_resume()
pci_pm_default_resume_early(pci_dev);
pci_pm_power_up_and_verify_state(pci_dev);
pci_power_up(pci_dev);
platform_pci_set_power_state(dev, PCI_D0);
// here we should see acpi_pci_set_power_state() printing "xxx: power
// state changed by ACPI to D0" but we don't.
//
// also pdev->current_state is set to PCI_D0
pci_update_current_state(pci_dev, PCI_D0);
// this one reads it back, it should be 0..
pm->runtime_resume(dev);
nhi_runtime_resume(dev)
[ 578.773503] thunderbolt 0000:e5:00.5: nhi_runtime_resume() - enter, pdev->current_state = 3

// .. but it is not. It seems to be powered off, D3cold.

[ 578.773516] thunderbolt 0000:e5:00.5: control channel starting...
[ 578.773518] thunderbolt 0000:e5:00.5: starting TX ring 0
[ 578.773524] thunderbolt 0000:e5:00.5: enabling interrupt at register 0x38200 bit 0 (0xffffffff -> 0xffffffff)

// As here too.

It would be interesting to see what ACPI does here, I mean enabling
dynamic debugging of acpi_device_set_power() and friends.

Here is a log with dyndbg turned up.
https://gist.github.com/superm1/2f7ca6123431365d11e9bc3cc9329c14

You can see firmware node \_SB_.PCI0.BUSC.NHI0 is physical device 0000:e5:00.5.

One suspect is
that when the runtime suspends happens, the power resource or _PS3()
gets called to put the device into D3cold and it has some sort of
Sleep() inside. This allows the OS to relese the CPU to do other things
too according to ACPI spec (and Linux does this) so now when you plug in
the device the GPE triggers wake and Linux starts processing that one.
We see the above runtime resume but then the Sleep() in the _PS3()
completes and cuts the power from the NHI while we are in the middle of
resuming it already (or something along those lines).

We saw similar "context switch" happening when PCIe L2 failed, BIOS
implemented the timeout using Sleep() that allowed Linux to process
other things which resulted unexpected wake (not the same as here
though).

So one thing to check too is how the ACPI methods get called and
especially if they somehow end up messing with each other.

At least to me it looks pretty straightforward that each D0 sequence is going through and nothing looked out of order:

acpi_pci_set_power_state()
->acpi_device_set_power()
->->acpi_dev_pm_explicit_set()


There is a Sleep() call in M232() which is a SMI used in both _PS0 and _PS3, but as far as I can tell it's not actually called as the case always has zero as an argument.

Here's some snippets of ASL from the NHI0 device to see.
https://gist.github.com/superm1/32ef1f822a6220a41b19574024717f79

I feel your theory is right about it being an aborted D3cold request, but I don't believe it's from concurrency issue of _PS3. I feel there isn't any guard rails either in Linux or the AML for ensuring that
the transition actually finished.