Re: [PATCH 3/3] KVM: VMX: Initialize TDX during KVM module load
From: Huang, Kai
Date: Wed Nov 06 2024 - 05:52:55 EST
On Thu, 2024-10-31 at 13:22 -0700, Sean Christopherson wrote:
> On Thu, Oct 31, 2024, Kai Huang wrote:
> > On Wed, 2024-10-30 at 08:19 -0700, Sean Christopherson wrote:
> > > > +void __init tdx_bringup(void)
> > > > +{
> > > > + enable_tdx = enable_tdx && !__tdx_bringup();
> > >
> > > Ah. I don't love this approach because it mixes "failure" due to an unsupported
> > > configuration, with failure due to unexpected issues. E.g. if enabling virtualization
> > > fails, loading KVM-the-module absolutely should fail too, not simply disable TDX.
> >
> > Thanks for the comments.
> >
> > I see your point. However for "enabling virtualization failure" kvm_init() will
> > also try to do (default behaviour), so if it fails it will result in module
> > loading failure eventually. So while I guess it would be slightly better to
> > make module loading fail if "enabling virtualization fails" in TDX, it is a nit
> > issue to me.
> >
> > I think "enabling virtualization failure" is the only "unexpected issue" that
> > should result in module loading failure. For any other TDX-specific
> > initialization failure (e.g., any memory allocation in future patches) it's
> > better to only disable TDX.
>
> I disagree. The platform owner wants TDX to be enabled, KVM shouldn't silently
> disable TDX because of a transient, unrelated failure.
>
> If TDX _can't_ be supported, e.g. because EPT or MMIO SPTE caching was explicitly
> disable, then that's different. And that's the general pattern throughout KVM.
> If a requested feature isn't supported, then KVM continues on updates the module
> param accordingly. But if something outright fails during setup, KVM aborts the
> entire sequence.
>
> > So I can change to "make loading KVM-the-module fail if enabling virtualization
> > fails in TDX", but I want to confirm this is what you want?
>
> I would prefer the logic to be: reject loading kvm-intel.ko if an operation that
> would normally succeed, fails.
I looked at the final tdx.c that in our development branch [*], and below is the
list of the things that need to be done to init TDX (the code in
__tdx_bringup()), and my thinking of whether to fail loading the module or just
disable TDX:
1) Early dependency check fails. Those include: tdp_mmu_enabled,
enable_mmio_caching, X86_FEATURE_MOVDIR64B check and check the presence of
TSX_CTL uret MSR.
For those we can disable TDX only but continue to load module.
2) Enable virtualization fails.
For this we fail to load module (as you suggested).
3) Fail to register TDX cpuhp to do tdx_cpu_enable() and handle cpu hotplug.
For this we only disable TDX but continue to load module. The reason is I think
this is similar to enable a specific KVM feature but the hardware doesn't
support it. We can go further to check the return value of tdx_cpu_enable() to
distinguish cases like "module not loaded" and "unexpected error", but I really
don't want to go that far.
4) tdx_enable() fails.
Ditto to 3).
5) tdx_get_sysinfo() fails.
This is a kernel bug since tdx_get_sysinfo() should always return valid TDX
sysinfo structure pointer after tdx_enable() is done successfully. Currently we
just WARN() if the returned pointer is NULL and disable TDX only. I think it's
also fine.
6) TDX global metadata check fails, e.g., MAX_VCPUS etc.
Ditto to 3). For this we disable TDX only.
To summarize, if you agree with above, then only "enable virtualization failure"
results in module loading failure.
And I am not sure whether it's worth to distinguish those cases. For the
concern of "KVM shouldn't silently disable TDX because of a transient, unrelated
failure", we can explicitly print error message to tell user why TDX is
disabled.
Any comments?
[*]
https://github.com/intel/tdx/blob/tdx_kvm_dev-2024-10-25.1-host-metadata-v6-rebase/arch/x86/kvm/vmx/tdx.c