Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling

Next message: Calvin Owens: "Re: [REGRESSION][PATCH] drm/amd/display: Fix uninitialized variable which breaks full LTO"
Previous message: Krzysztof Kozlowski: "Re: [PATCH v4 4/7] dt-bindings: remoteproc: qcom: Document pas for SoCCP on Kaanapali and Glymur platforms"
In reply to: William Roche: "Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling"
Next in thread: William Roche: "Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Borislav Petkov

Date: Thu Mar 12 2026 - 12:16:24 EST

On Thu, Mar 12, 2026 at 04:11:10PM +0100, William Roche wrote:
> From the kernel point of view (regardless if it is running on bare metal or
> in a VM), access to these registers registers is provided by the platform:
> either the Hardware or the emulation framework.

Except the emulation doesn't emulate the platform properly. We test on real
hw. If your hypervisor doesn't do that properly then that's not really
upstream kernel's problem.

> Errors are injected into VMs by the hypervisor when real memory hardware
> errors occur on the system that impact the VM address space.

And?

Why?

What's the recovery action scenario for having errors injected into guests?
Where is that documented? Why does the upstream kernel need to care?

Basically I'm asking you for the use case in order to determine whether that
use case is valid for the *upstream* kernel to support.

> This is not only a test, this is real life mechanism. With the fix
> 7cb735d7c0cb that has been integrated, VMs kernel running on AMD now crashes
> on Deferred errors, where it used to be able to deal with them before this
> commit.

Because we don't know of your use case. So when we do upstream development how
can we test your case?

Before that, is that case even worth testing?

I hope I'm making sense here. The MCA and other low-level hw code works on
baremetal as that's its main target. If it is supposed to work in VMs, then
there better be a proper use case which we are willing to support and we can
*actually* *test*.

If not, you can keep this "fix" in your guest kernels and everyone's happy.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette