Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

From: Karol Herbst
Date: Tue Dec 10 2019 - 15:50:17 EST


On Tue, Dec 10, 2019 at 8:58 PM Dave Airlie <airlied@xxxxxxxxx> wrote:
>
> On Mon, 9 Dec 2019 at 21:39, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> >
> > On Mon, Dec 9, 2019 at 12:17 PM Karol Herbst <kherbst@xxxxxxxxxx> wrote:
> > >
> > > anybody any other ideas?
> >
> > Not yet, but I'm trying to collect some more information.
> >
> > > It seems that both patches don't really fix
> > > the issue and I have no idea left on my side to try out. The only
> > > thing left I could do to further investigate would be to reverse
> > > engineer the Nvidia driver as they support runpm on Turing+ GPUs now,
> > > but I've heard users having similar issues to the one Lyude told us
> > > about... and I couldn't verify that the patches help there either in a
> > > reliable way.
> >
> > It looks like the newer (8+) versions of Windows expect the GPU driver
> > to prepare the GPU for power removal in some specific way and the
> > latter fails if the GPU has not been prepared as expected.
> >
> > Because testing indicates that the Windows 7 path in the platform
> > firmware works, it may be worth trying to do what it does to the PCIe
> > link before invoking the _OFF method for the power resource
> > controlling the GPU power.
> >
>
> Remember the pre Win8 path required calling a DSM method to actually
> power the card down, I think by the time we reach these methods in
> those cases the card is already gone.
>
> Dave.
>

The point was that the firmware seems to do more in the legacy paths
and maybe we just have to do those things inside the driver instead
when using the new method. Also the _DSM call just wraps around the
interfaces on newer firmware anyway. The OS check is usually what
makes the difference. I might be wrong about the _DSM call just
wrapping though, but I think I saw it at least in some firmware at
some point.