Re: [PATCH v15 22/23] x86/mce: Improve error log of kernel space TDX #MC due to erratum

From: Huang, Kai
Date: Mon Dec 04 2023 - 18:24:49 EST


On Mon, 2023-12-04 at 14:04 -0800, Hansen, Dave wrote:
> On 12/4/23 13:00, Huang, Kai wrote:
> > > tl;dr: I think even looking a #MC on the PAMT after the kvm module is
> > > removed is a fool's errand.
> > Sorry I wasn't clear enough. KVM actually turns off VMX when it destroys the
> > last VM, so the KVM module doesn't need to be removed to turn off VMX. I used
> > "KVM can be unloaded" as an example to explain the PAMT can be working when VMX
> > is off.
>
> Can't we just fix this by having KVM do an "extra" hardware_enable_all()
> before initializing the TDX module?  
>

Yes KVM needs to do hardware_enable_all() anyway before initializing the TDX
module.  

I believe you mean we can keep VMX enabled after initializing the TDX module,
i.e., not calling hardware_disable_all() after that, so that kvm_usage_count
will remain non-zero even when last VM is destroyed?

The current behaviour that KVM only enable VMX when there's active VM is because
it (or the kernel) wants to allow to be able to load and run third-party VMX
module (yes the virtual BOX) when KVM module is loaded. Only one of them can
actually use the VMX hardware but they can be both loaded.

In ancient time KVM used to immediately enable VMX when it is loaded, but later
it was changed to only enable VMX when there's active VM because of the above
reason.

See commit 10474ae8945ce ("KVM: Activate Virtualization On Demand").

> It's not wrong to say that TDX is a
> KVM user. If KVm wants 'kvm_usage_count' to go back to 0, it can shut
> down the TDX module. Then there's no PAMT to worry about.
>
> The shutdown would be something like:
>
> 1. TDX module shutdown
> 2. Deallocate/Convert PAMT
> 3. vmxoff
>
> Then, no SEAMCALL failure because of vmxoff can cause a PAMT-induced #MC
> to be missed.

The limitation is once the TDX module is shutdown, it cannot be initialized
again unless it is runtimely updated.

Long-termly, if we go this design then there might be other problems when other
kernel components are using TDX. For example, the VT-d driver will need to be
changed to support TDX-IO, and it will need to enable TDX module much earlier
than KVM to do some initialization. It might need to some TDX work (e.g.,
cleanup) while KVM is unloaded. I am not super familiar with TDX-IO but looks
we might have some problem here if we go with such design.