Re: [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

From: Karol Herbst
Date: Mon Oct 21 2019 - 12:41:07 EST


On Mon, Oct 21, 2019 at 5:46 PM Mika Westerberg
<mika.westerberg@xxxxxxxxx> wrote:
>
> On Mon, Oct 21, 2019 at 04:49:09PM +0200, Karol Herbst wrote:
> > On Mon, Oct 21, 2019 at 4:09 PM Mika Westerberg
> > <mika.westerberg@xxxxxxxxx> wrote:
> > >
> > > On Mon, Oct 21, 2019 at 03:54:09PM +0200, Karol Herbst wrote:
> > > > > I really would like to provide you more information about such
> > > > > workaround but I'm not aware of any ;-) I have not seen any issues like
> > > > > this when D3cold is properly implemented in the platform. That's why
> > > > > I'm bit skeptical that this has anything to do with specific Intel PCIe
> > > > > ports. More likely it is some power sequence in the _ON/_OFF() methods
> > > > > that is run differently on Windows.
> > > >
> > > > yeah.. maybe. I really don't know what's the actual root cause. I just
> > > > know that with this workaround it works perfectly fine on my and some
> > > > other systems it was tested on. Do you know who would be best to
> > > > approach to get proper documentation about those methods and what are
> > > > the actual prerequisites of those methods?
> > >
> > > Those should be documented in the ACPI spec. Chapter 7 should explain
> > > power resources and the device power methods in detail.
> >
> > either I looked up the wrong spec or the documentation isn't really
> > saying much there.
>
> Well it explains those methods, _PSx, _PRx and _ON()/_OFF(). In case of
> PCIe device you also want to check PCIe spec. PCIe 5.0 section 5.8 "PCI
> Function Power State Transitions" has a picture about the supported
> power state transitions and there we can find that function must be in
> D3hot before it can be transitioned into D3cold so if the _OFF() for
> example blindly assumes that the device is in D0 when it is called, it
> is a bug in the BIOS.
>
> BTW, where can I find acpidump of such system?

I am sure it's uploaded somewhere already. But it's not an issue of
just one system. It's essentially hitting every single laptop with a
skylake or kaby lake CPU + Nvidia GPU. I didn't see any system where
it's actually working right now (and we are pestering nvidia about
this issue for over a year already with no solution)

I've attached an acpidump from my system.

Attachment: xps_9560.tar.xz
Description: application/xz