Re: [PATCH v15 22/23] x86/mce: Improve error log of kernel space TDX #MC due to erratum

From: Huang, Kai
Date: Mon Dec 04 2023 - 18:57:02 EST


On Mon, 2023-12-04 at 15:39 -0800, Dave Hansen wrote:
> On 12/4/23 15:24, Huang, Kai wrote:
> > On Mon, 2023-12-04 at 14:04 -0800, Hansen, Dave wrote:
> ...
> > In ancient time KVM used to immediately enable VMX when it is loaded, but later
> > it was changed to only enable VMX when there's active VM because of the above
> > reason.
> >
> > See commit 10474ae8945ce ("KVM: Activate Virtualization On Demand").
>
> Fine. This doesn't need to change ... until you load TDX. Once you
> initialize the TDX module, no more out-of-tree VMMs for you.
>
> That doesn't seem too insane. This is yet *ANOTHER* reason that doing
> dynamic TDX module initialization is a good idea.

I don't have objection to this.

>
> > > It's not wrong to say that TDX is a
> > > KVM user. If KVm wants 'kvm_usage_count' to go back to 0, it can shut
> > > down the TDX module. Then there's no PAMT to worry about.
> > >
> > > The shutdown would be something like:
> > >
> > > 1. TDX module shutdown
> > > 2. Deallocate/Convert PAMT
> > > 3. vmxoff
> > >
> > > Then, no SEAMCALL failure because of vmxoff can cause a PAMT-induced #MC
> > > to be missed.
> >
> > The limitation is once the TDX module is shutdown, it cannot be initialized
> > again unless it is runtimely updated.
> >
> > Long-termly, if we go this design then there might be other problems when other
> > kernel components are using TDX. For example, the VT-d driver will need to be
> > changed to support TDX-IO, and it will need to enable TDX module much earlier
> > than KVM to do some initialization. It might need to some TDX work (e.g.,
> > cleanup) while KVM is unloaded. I am not super familiar with TDX-IO but looks
> > we might have some problem here if we go with such design.
>
> The burden for who does vmxon will simply need to change from KVM itself
> to some common code that KVM depends on. Probably not dissimilar to
> those nutty (sorry folks, just calling it as I see 'em) multi-KVM module
> patches that are floating around.
>

Right we will need to move VMX on/off out of KVM for that purpose. I think the
point is it's better to not assume how these kernel components will use VMX
on/off. E.g., it may just choose to simply turn on VMX, do SEMACALL, and
then turn off VMX immediately, while the TDX module will be alive all the time.

Or we require they all need to: 1) enable VMX; 2) enable/use TDX; 3) disable TDX
when no need; 4) disable VMX.

But I don't have strong opinion here too.