Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module
From: Sean Christopherson
Date: Tue Apr 23 2024 - 11:15:54 EST
On Tue, Apr 23, 2024, Kai Huang wrote:
> On Tue, 2024-04-23 at 13:34 +1200, Kai Huang wrote:
> > >
> > > > > And the intent isn't to catch every possible problem. As with many sanity checks,
> > > > > the intent is to detect the most likely failure mode to make triaging and debugging
> > > > > issues a bit easier.
> > > >
> > > > The SEAMCALL will literally return a unique error code to indicate CPU
> > > > isn't in post-VMXON, or tdx_cpu_enable() hasn't been done. I think the
> > > > error code is already clear to pinpoint the problem (due to these pre-
> > > > SEAMCALL-condition not being met).
> > >
> > > No, SEAMCALL #UDs if the CPU isn't post-VMXON. I.e. the CPU doesn't make it to
> > > the TDX Module to provide a unique error code, all KVM will see is a #UD.
> >
> > #UD is handled by the SEAMCALL assembly code. Please see TDX_MODULE_CALL
> > assembly macro:
Right, but that doesn't say why the #UD occurred. The macro dresses it up in
TDX_SW_ERROR so that KVM only needs a single parser, but at the end of the day
KVM is still only going to see that SEAMCALL hit a #UD.
> > > There is no reason to rely on the caller to take cpu_hotplug_lock, and definitely
> > > no reason to rely on the caller to invoke tdx_cpu_enable() separately from invoking
> > > tdx_enable(). I suspect they got that way because of KVM's unnecessarily complex
> > > code, e.g. if KVM is already doing on_each_cpu() to do VMXON, then it's easy enough
> > > to also do TDH_SYS_LP_INIT, so why do two IPIs?
> >
> > The main reason is we relaxed the TDH.SYS.LP.INIT to be called _after_ TDX
> > module initialization.
> >
> > Previously, the TDH.SYS.LP.INIT must be done on *ALL* CPUs that the
> > platform has (i.e., cpu_present_mask) right after TDH.SYS.INIT and before
> > any other SEAMCALLs. This didn't quite work with (kernel software) CPU
> > hotplug, and it had problem dealing with things like SMT disable
> > mitigation:
> >
> > https://lore.kernel.org/lkml/529a22d05e21b9218dc3f29c17ac5a176334cac1.camel@xxxxxxxxx/T/#mf42fa2d68d6b98edcc2aae11dba3c2487caf3b8f
> >
> > So the x86 maintainers requested to change this. The original proposal
> > was to eliminate the entire TDH.SYS.INIT and TDH.SYS.LP.INIT:
> >
> > https://lore.kernel.org/lkml/529a22d05e21b9218dc3f29c17ac5a176334cac1.camel@xxxxxxxxx/T/#m78c0c48078f231e92ea1b87a69bac38564d46469
> >
> > But somehow it wasn't feasible, and the result was we relaxed to allow
> > TDH.SYS.LP.INIT to be called after module initialization.
> >
> > So we need a separate tdx_cpu_enable() for that.
No, you don't, at least not given the TDX patches I'm looking at. Allowing
TDH.SYS.LP.INIT after module initialization makes sense because otherwise the
kernel would need to online all possible CPUs before initializing TDX. But that
doesn't mean that the kernel needs to, or should, punt TDH.SYS.LP.INIT to KVM.
AFAICT, KVM is NOT doing TDH.SYS.LP.INIT when a CPU is onlined, only when KVM
is loaded, which means that tdx_enable() can process all online CPUs just as
easily as KVM.
Presumably that approach relies on something blocking onlining CPUs when TDX is
active. And if that's not the case, the proposed patches are buggy.
> Btw, the ideal (or probably the final) plan is to handle tdx_cpu_enable()
> in TDX's own CPU hotplug callback in the core-kernel and hide it from all
> other in-kernel TDX users.
>
> Specifically:
>
> 1) that callback, e.g., tdx_online_cpu() will be placed _before_ any in-
> kernel TDX users like KVM's callback.
> 2) In tdx_online_cpu(), we do VMXON + tdx_cpu_enable() + VMXOFF, and
> return error in case of any error to prevent that cpu from going online.
>
> That makes sure that, if TDX is supported by the platform, we basically
> guarantees all online CPUs are ready to issue SEAMCALL (of course, the in-
> kernel TDX user still needs to do VMXON for it, but that's TDX user's
> responsibility).
>
> But that obviously needs to move VMXON to the core-kernel.
It doesn't strictly have to be core kernel per se, just in code that sits below
KVM, e.g. in a seperate module called VAC[*] ;-)
[*] https://lore.kernel.org/all/ZW6FRBnOwYV-UCkY@xxxxxxxxxx
> Currently, export tdx_cpu_enable() as a separate API and require KVM to
> call it explicitly is a temporary solution.
>
> That being said, we could do tdx_cpu_enable() inside tdx_enable(), but I
> don't see it's a better idea.
It simplifies the API surface for enabling TDX and eliminates an export.