Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module
From: Huang, Kai
Date: Mon Apr 22 2024 - 18:47:43 EST
On Mon, 2024-04-22 at 09:54 -0700, Sean Christopherson wrote:
> On Mon, Apr 22, 2024, Kai Huang wrote:
> > On Fri, 2024-04-19 at 10:23 -0700, Sean Christopherson wrote:
> > > On Fri, Apr 19, 2024, Kai Huang wrote:
> > > And tdx_enable() should also do its best to verify that the caller is post-VMXON:
> > >
> > > if (WARN_ON_ONCE(!(__read_cr4() & X86_CR4_VMXE)))
> > > return -EINVAL;
> >
> > This won't be helpful, or at least isn't sufficient.
> >
> > tdx_enable() can SEAMCALLs on all online CPUs, so checking "the caller is
> > post-VMXON" isn't enough. It needs "checking all online CPUs are in post-
> > VMXON with tdx_cpu_enable() having been done".
>
> I'm suggesting adding it in the responding code that does that actual SEAMCALL.
The thing is to check you will need to do additional things like making
sure no scheduling would happen during "check + make SEAMCALL". Doesn't
seem worth to do for me.
The intent of tdx_enable() was the caller should make sure no new CPU
should come online (well this can be relaxed if we move
cpuhp_setup_state() to hardware_enable_all()), and all existing online
CPUs are in post-VMXON with tdx_cpu_enable() been done.
I think, if we ever need any check, the latter seems to be more
reasonable.
But if we allow new CPU to become online during tdx_enable() (with your
enhancement to the hardware enabling), then I don't know how to make such
check at the beginning of tdx_enable(), because do
on_each_cpu(check_seamcall_precondition, NULL, 1);
cannot catch any new CPU during tdx_enable().
>
> And the intent isn't to catch every possible problem. As with many sanity checks,
> the intent is to detect the most likely failure mode to make triaging and debugging
> issues a bit easier.
The SEAMCALL will literally return a unique error code to indicate CPU
isn't in post-VMXON, or tdx_cpu_enable() hasn't been done. I think the
error code is already clear to pinpoint the problem (due to these pre-
SEAMCALL-condition not being met).
>
> > I didn't add such check because it's not mandatory, i.e., the later
> > SEAMCALL will catch such violation.
>
> Yeah, a sanity check definitely isn't manadatory, but I do think it would be
> useful and worthwhile. The code in question is relatively unique (enables VMX
> at module load) and a rare operation, i.e. the cost of sanity checking CR4.VMXE
> is meaningless. If we do end up with a bug where a CPU fails to do VMXON, this
> sanity check would give a decent chance of a precise report, whereas #UD on a
> SEAMCALL will be less clearcut.
If VMXON fails for any CPU then cpuhp_setup_state() will fail, preventing
KVM to be loaded.
And if it fails during kvm_online_cpu(), the new CPU will fail to online.
>
> > Btw, I noticed there's another problem, that is currently tdx_cpu_enable()
> > actually requires IRQ being disabled. Again it was implemented based on
> > it would be invoked via both on_each_cpu() and kvm_online_cpu().
> >
> > It also also implemented with consideration that it could be called by
> > multiple in-kernel TDX users in parallel via both SMP call and in normal
> > context, so it was implemented to simply request the caller to make sure
> > it is called with IRQ disabled so it can be IRQ safe (it uses a percpu
> > variable to track whether TDH.SYS.LP.INIT has been done for local cpu
> > similar to the hardware_enabled percpu variable).
>
> Is this is an actual problem, or is it just something that would need to be
> updated in the TDX code to handle the change in direction?
For now this isn't, because KVM is the solo user, and in KVM
hardware_enable_all() and kvm_online_cpu() uses kvm_lock mutex to make
hardware_enable_nolock() IPI safe.
I am not sure how TDX/SEAMCALL will be used in TDX Connect.
However I needed to consider KVM as a user, so I decided to just make it
must be called with IRQ disabled so I could know it is IRQ safe.
Back to the current tdx_enable() and tdx_cpu_enable(), my personal
preference is, of course, to keep the existing way, that is:
During module load:
cpus_read_lock();
tdx_enable();
cpus_read_unlock();
and in kvm_online_cpu():
local_irq_save();
tdx_cpu_enable();
local_irq_restore();
But given KVM is the solo user now, I am also fine to change if you
believe this is not acceptable.