Re: Fwd: Regression: Kernel 6.4 rc1 and higher causes Steam Deck to fail to wake from suspend (bisected)

From: Bjorn Helgaas
Date: Wed Apr 10 2024 - 16:59:31 EST


On Wed, Apr 10, 2024 at 02:20:31PM +0800, Kai-Heng Feng wrote:
> On Sat, Mar 30, 2024 at 9:47 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Wed, Nov 01, 2023 at 06:45:41AM -0500, Bjorn Helgaas wrote:
> > > On Tue, Oct 31, 2023 at 03:21:20PM +0700, Bagas Sanjaya wrote:
> > > > I notice a regression report on Bugzilla [1]. Quoting from it:
> > > >
> > > > > On Kernel 6.4 rc1 and higher if you put the Steam Deck into
> > > > > suspend then press the power button again it will not wake up.
> > > > >
> > > > > I don't have a clue as to -why- this commit breaks wake from
> > > > > suspend on steam deck, but it does. Bisected to:
> > > > >
> > > > > ```
> > > > > 1ad11eafc63ac16e667853bee4273879226d2d1b is the first bad commit
> > > > > commit 1ad11eafc63ac16e667853bee4273879226d2d1b
> > > > > Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> ...

> > silverspring attached lspci output and a dmesg log from v6.8 to the
> > bugzilla and also noted that "pci=noaer" works around the problem.
> >
> > The problem commit is 1ad11eafc63a ("nvme-pci: drop redundant
> > pci_enable_pcie_error_reporting()")
> > (https://git.kernel.org/linus/1ad11eafc63a)
> >
> > 1ad11eafc63a removed pci_disable_pcie_error_reporting() from the
> > nvme_suspend() path, so we now leave the PCIe Device Control error
> > enables set when we didn't before. My theory is that the PCIe link
> > goes down during suspend, which causes an error interrupt, and the
> > interrupt causes a problem on Steam Deck. Maybe there's some BIOS
> > connection.
> >
> > "pci=noaer" would work around this because those error enables would
> > never be set in the first place.
> >
> > I asked reporters to test the debug patches below to disable those
> > error interrupts during suspend.
> >
> > I don't think this would be the *right* fix; if we need to do this, I
> > think it should be done by the PCI core, not by individual drivers.
> > Kai-Heng has been suggesting this for a while for a different
> > scenario.
>
> Should I send the patch to mailing list again to stir more discussion?

Yes, please. Include the folks from this thread, too, and the Steam
Deck bugzilla link since we have more more problem reports now.

Bjorn