Re: [PATCH v15 22/23] x86/mce: Improve error log of kernel space TDX #MC due to erratum

From: Huang, Kai
Date: Mon Dec 04 2023 - 18:41:31 EST


On Mon, 2023-12-04 at 23:24 +0000, Huang, Kai wrote:
> Long-termly, if we go this design then there might be other problems when other
> kernel components are using TDX.  For example, the VT-d driver will need to be
> changed to support TDX-IO, and it will need to enable TDX module much earlier
> than KVM to do some initialization.  It might need to some TDX work (e.g.,
> cleanup) while KVM is unloaded.  I am not super familiar with TDX-IO but looks
> we might have some problem here if we go with such design.

Perhaps I shouldn't use the future feature as argument, e.g., with multiple TDX
users we are likely to have a refcount to see whether we can truly shutdown TDX.

And VMX on/off will also need to be moved out of KVM for these work.

But the point is it's better to not assume how these kernel components will use
VMX on/off. E.g., it may just choose to simply turn on VMX, do SEMACALL, and
then turn off VMX immediately. While the TDX module will be alive all the time.

Keeping VMX on will suppress INIT, I guess that's another reason we prefer to
turning VMX on when needed.

/*
* Disable virtualization, i.e. VMX or SVM, to ensure INIT is recognized during
* reboot. VMX blocks INIT if the CPU is post-VMXON, and SVM blocks INIT if
* GIF=0, i.e. if the crash occurred between CLGI and STGI.
*/
void cpu_emergency_disable_virtualization(void)
{
...
}