Re: [REGRESSION] Too-low frequency limit for AMD GPU PCI-passed-through to Windows VM

From: Paul Menzel
Date: Fri Mar 18 2022 - 01:43:28 EST


Dear Thorsten, dear James,


Am 17.03.22 um 13:54 schrieb Thorsten Leemhuis:
On 13.03.22 19:33, James Turner wrote:

My understanding at this point is that the root problem is probably
not in the Linux kernel but rather something else (e.g. the machine
firmware or AMD Windows driver) and that the change in f9b7f3703ff9
("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)") simply
exposed the underlying problem.

FWIW: that in the end is irrelevant when it comes to the Linux kernel's
'no regressions' rule. For details see:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/Documentation/admin-guide/reporting-regressions.rst
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/Documentation/process/handling-regressions.rst

That being said: sometimes for the greater good it's better to not
insist on that. And I guess that might be the case here.

But who decides that? Running stuff in a virtual machine is not that uncommon.

Should the commit be reverted, and re-added with a more elaborate commit message documenting the downsides?

Could the user be notified somehow? Can PCI passthrough and a loaded amdgpu driver be detected, so Linux warns about this?

Also, should this be documented in the code?

I'm not sure where to go from here. This issue isn't much of a concern
for me anymore, since blacklisting `amdgpu` works for my machine. At
this point, my understanding is that the root problem needs to be fixed
in AMD's Windows GPU driver or Dell's firmware, not the Linux kernel. If
any of the AMD developers on this thread would like to forward it to the
AMD Windows driver team, I'd be happy to work with AMD to fix the issue
properly.

(Thorsten, your mailer mangled the quote somehow – I reformatted it –, which is too bad, as this message is shown when clicking on the link *marked invalid* in the regzbot Web page [1]. (The link is a very nice feature.)

In that case I'll drop it from the list of regressions, unless what I
wrote above makes you change your mind.

#regzbot invalid: firmware issue exposed by kernel change, user seems to
be happy with a workaround

Thx everyone who participated in handling this.

Should the regression issue be re-opened until the questions above are answered, and a more user friendly solution is found?


Kind regards,

Paul


[1]: https://linux-regtracking.leemhuis.info/regzbot/resolved/