Re: [PATCH V2 2/2] x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present
From: Edgecombe, Rick P
Date: Mon Jul 07 2025 - 14:16:13 EST
On Mon, 2025-07-07 at 07:32 -0700, Dave Hansen wrote:
> On 7/7/25 00:15, Chao Gao wrote:
> > > Why should this specific kind of freeing (TDX private memory being freed
> > > back to the host) operation be different from any other kind of free?
> > To limit the impact of software bugs (e.g., TDX module bugs) to TDX guests
> > rather than affecting the entire kernel.
>
> It's one thing if the TDX module is so constantly buggy that we're
> getting tons of kernel crash reports that we track back to the TDX module.
Even if this happens, I think it would be good to limit kernel-side safety code
to finding TDX module bugs. Not working around them.
>
> It's quite another thing to add kernel complexity to preemptively lessen
> the chance of a theoretical TDX bug.
And lessen the chance of catching the bug and fixing it in the TDX module.
Otherwise we develop a "works by accident" solution that causes crashes for
unknown reasons if anyone removes code.
This pattern of adding defensive protections against TDX module bugs came up in
the TDX huge pages patches as well. Let's make the type of reasoning here the
norm.