Re: [BUG] ASUS ProArt PX13 HN7306WU: amd_pmc s2idle S0ix corrupts AMD 1022:150b root port, NVIDIA dGPU returns header type 7f
From: Mario Limonciello
Date: Fri Apr 03 2026 - 14:41:33 EST
On 4/3/26 1:04 PM, Bjorn Helgaas wrote:
[+cc Lukas, pciehp expert, beginning of thread with full dmesg/lspci at
https://lore.kernel.org/all/CADj6jrgK+sRXoNaYH90Rdc8DYEFK2iSF4vkJC=KE4UaZ73y67A@xxxxxxxxxxxxxx]
On Fri, Apr 03, 2026 at 11:48:15AM -0500, Mario Limonciello wrote:
On 4/3/26 11:19 AM, Joyful Lee wrote:
On Fri, Apr 3, 2026 at 10:25 AM Mario Limonciello
<mario.limonciello@xxxxxxx> wrote:
That's really unfortunate to hear. If I was in your shoes - If not
solved by the end of the return period I would return the machine and
purchase one from a vendor that has been testing, fixing BIOS issues and
supporting Linux.
I'm not going to pick favorites, but Dell, Framework, HP, and Lenovo all
have offerings that they do this.
By chance does the BIOS offer access to "AMD PBS" or "AMD CBS" menus?
If so, there may be an option nestled in there for dGPU D3 behavior.
I got it working. My kernel, which I build up from defconfig, was
missing CONFIG_HOTPLUG_PCI_PCIE. It makes sense, but I wish there was
more evidence that pointed me to this option. At least we can close the
loop here in case anyone else runs into this problem. As a side benefit,
enabling this option also got the dGPU to enter D3cold where before the
lowest it would get is D3hot.
That's great news! I'll add a check to flag this in amd-debug-tools too to
help anyone else in the future.
That is indeed great news.
But as you point out, it doesn't close the issue. Somebody else is
going to trip over the same issue. Most likely they will not report
it and have no idea how to fix it. Even if they do report it, we'll
have to go through this whole debug process again.
The kernel should work correctly (possibly with increased power
consumption or some other non-functional issue) regardless of whether
CONFIG_HOTPLUG_PCI_PCIE is enabled.
I do hope as part of this we can reconsider why CONFIG_HOTPLUG_PCI_PCIE isn't part of the defconfig in the first place.
defconfig doesn't work on any hardware of mine by default and it's too much work to figure out what to add to it. So I always start at distro configs and peel back for my own use.
But, if we could actually make defconfig *usable* for general purpose kernel users maybe more people would use it.
How can we make Linux smart enough that if we're lacking pciehp or
whatever is necessary, we automatically avoid s2idle or S0ix or
whatever causes this problem?
I suppose we /could/ have CONFIG_AMD_PMC depend on CONFIG_HOTPLUG_PCI_PCIE but it feels like using super glue on a wound until we know why this happens.
Here are the obvious clues in dmesg during resume from S0ix:
pci 0000:00:03.1: [1022:150b] type 01 class 0x060400 PCIe Root Port
pci 0000:00:03.1: PCI bridge to [bus c4]
pci 0000:c4:00.0: [10de:28a1] type 00 class 0x030000 PCIe Legacy Endpoint
pci 0000:c4:00.1: [10de:22be] type 00 class 0x040300 PCIe Endpoint
pci 0000:c4:00.1: extending delay after power-on from D3hot to 20 msec
pci 0000:c4:00.1: D0 power state depends on 0000:c4:00.0
pci 0000:c4:00.0: Unable to change power state from D0 to D0, device inaccessible
snd_hda_intel 0000:c4:00.1: Unable to change power state from D3hot to D0, device inaccessible
It looks like something is wrong with the 00:03.1 Root Port config
space after S0ix, e.g., the HwInit Port Number is non-sensical:
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/Strix Halo GPP Bridge
- LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM not supported
+ LnkCap: Port #247, Speed 16GT/s, Width x8, ASPM not supported
but the c4:00 device below it seems completely inaccessible; maybe the
link is down or the endpoint is in D3cold so config reads return ~0:
c4:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD107M
+ !!! Unknown header type 7f
+ Interrupt: pin ? routed to IRQ 255
I don't know what pciehp is doing that avoids this issue.
Understanding that seems like the first step in avoiding or fixing the
problem.
If I was to guess what's happening here is the firmware* has never been tested with the PcieHotplug _OSC negotiation failing and there is an implicit assumption on that working. All Windows testing has that in place, and our internal Linux testing has always been on kernels with it too (see defconfig comment above).
Joyful, could you collect a dmesg log with pciehp enabled and with
this kernel parameter (the quotes are a required part of the
parameter):
dyndbg="file drivers/pci/* +p"
* I don't know if this is an ASUS firmware or AMD (AGESA) firmware issue.