Re: [PATCH 4/7] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum

From: Jürgen Groß

Date: Tue Oct 07 2025 - 09:31:45 EST


On 02.10.25 17:06, Dave Hansen wrote:
On 10/2/25 00:46, Juergen Gross wrote:
So lets compare the 2 cases with kdump enabled and disabled in your
scenario (crash of the host OS):

kdump enabled: No dump can be produced due to the #MC and system is
rebooted.

kdump disabled: No dump is produced and system is rebooted after crash.
What is the main concern with kdump enabled? I don't see any
disadvantage with enabling it, just the advantage that in many cases
a dump will be written.
The disadvantage is that a kernel bug from long ago results in a machine
check. Machine checks are generally indicative of bad hardware. So the
disadvantage is that someone mistakes the long ago kernel bug for bad
hardware.

There are two ways of looking at this:

1. A theoretically fragile kdump is better than no kdump at all. All of
the stars would have to align for kdump to _fail_ and we don't think
that's going to happen often enough to matter.
2. kdump happens after kernel bugs. The machine checks happen because of
kernel bugs. It's not a big stretch to think that, at scale, kdump is
going to run in to these #MCs on a regular basis.

Does that capture the two perspectives fairly?

Basically yes.

If we can't come to an agreement that kdump should be allowed in spite of
a potential #MC, maybe we could disable kdump only if TDX guests have been
active on the machine before? Disabling kdump on a distro kernel just because
TDX was enabled but without anyone having used TDX would be quite hard.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature