Re: [BUG] ASUS ProArt PX13 HN7306WU: amd_pmc s2idle S0ix corrupts AMD 1022:150b root port, NVIDIA dGPU returns header type 7f

From: Bjorn Helgaas

Date: Fri Apr 03 2026 - 14:04:34 EST

[+cc Lukas, pciehp expert, beginning of thread with full dmesg/lspci at
https://lore.kernel.org/all/CADj6jrgK+sRXoNaYH90Rdc8DYEFK2iSF4vkJC=KE4UaZ73y67A@xxxxxxxxxxxxxx]

On Fri, Apr 03, 2026 at 11:48:15AM -0500, Mario Limonciello wrote:
> On 4/3/26 11:19 AM, Joyful Lee wrote:
> > On Fri, Apr 3, 2026 at 10:25 AM Mario Limonciello
> > <mario.limonciello@xxxxxxx> wrote:
> > > That's really unfortunate to hear. If I was in your shoes - If not
> > > solved by the end of the return period I would return the machine and
> > > purchase one from a vendor that has been testing, fixing BIOS issues and
> > > supporting Linux.
> > >
> > > I'm not going to pick favorites, but Dell, Framework, HP, and Lenovo all
> > > have offerings that they do this.
> > >
> > > By chance does the BIOS offer access to "AMD PBS" or "AMD CBS" menus?
> > > If so, there may be an option nestled in there for dGPU D3 behavior.
> >
> > I got it working. My kernel, which I build up from defconfig, was
> > missing CONFIG_HOTPLUG_PCI_PCIE. It makes sense, but I wish there was
> > more evidence that pointed me to this option. At least we can close the
> > loop here in case anyone else runs into this problem. As a side benefit,
> > enabling this option also got the dGPU to enter D3cold where before the
> > lowest it would get is D3hot.
>
> That's great news! I'll add a check to flag this in amd-debug-tools too to
> help anyone else in the future.

That is indeed great news.

But as you point out, it doesn't close the issue. Somebody else is
going to trip over the same issue. Most likely they will not report
it and have no idea how to fix it. Even if they do report it, we'll
have to go through this whole debug process again.

The kernel should work correctly (possibly with increased power
consumption or some other non-functional issue) regardless of whether
CONFIG_HOTPLUG_PCI_PCIE is enabled.

How can we make Linux smart enough that if we're lacking pciehp or
whatever is necessary, we automatically avoid s2idle or S0ix or
whatever causes this problem?

Here are the obvious clues in dmesg during resume from S0ix:

pci 0000:00:03.1: [1022:150b] type 01 class 0x060400 PCIe Root Port
pci 0000:00:03.1: PCI bridge to [bus c4]
pci 0000:c4:00.0: [10de:28a1] type 00 class 0x030000 PCIe Legacy Endpoint
pci 0000:c4:00.1: [10de:22be] type 00 class 0x040300 PCIe Endpoint
pci 0000:c4:00.1: extending delay after power-on from D3hot to 20 msec
pci 0000:c4:00.1: D0 power state depends on 0000:c4:00.0
pci 0000:c4:00.0: Unable to change power state from D0 to D0, device inaccessible
snd_hda_intel 0000:c4:00.1: Unable to change power state from D3hot to D0, device inaccessible

It looks like something is wrong with the 00:03.1 Root Port config
space after S0ix, e.g., the HwInit Port Number is non-sensical:

00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Strix/Strix Halo GPP Bridge
- LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM not supported
+ LnkCap: Port #247, Speed 16GT/s, Width x8, ASPM not supported

but the c4:00 device below it seems completely inaccessible; maybe the
link is down or the endpoint is in D3cold so config reads return ~0:

c4:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD107M
+ !!! Unknown header type 7f
+ Interrupt: pin ? routed to IRQ 255

I don't know what pciehp is doing that avoids this issue.
Understanding that seems like the first step in avoiding or fixing the
problem.

Joyful, could you collect a dmesg log with pciehp enabled and with
this kernel parameter (the quotes are a required part of the
parameter):

dyndbg="file drivers/pci/* +p"