Re: [PATCH v19 103/130] KVM: TDX: Handle EXIT_REASON_OTHER_SMI with MSMI

From: Isaku Yamahata
Date: Wed Apr 03 2024 - 18:23:46 EST


On Mon, Apr 01, 2024 at 05:14:02PM +0800,
Chao Gao <chao.gao@xxxxxxxxx> wrote:

> On Mon, Feb 26, 2024 at 12:26:45AM -0800, isaku.yamahata@xxxxxxxxx wrote:
> >From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> >
> >When BIOS eMCA MCE-SMI morphing is enabled, the #MC is morphed to MSMI
> >(Machine Check System Management Interrupt). Then the SMI causes TD exit
> >with the read reason of EXIT_REASON_OTHER_SMI with MSMI bit set in the exit
> >qualification to KVM instead of EXIT_REASON_EXCEPTION_NMI with MC
> >exception.
> >
> >Handle EXIT_REASON_OTHER_SMI with MSMI bit set in the exit qualification as
> >MCE(Machine Check Exception) happened during TD guest running.
> >
> >Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> >---
> > arch/x86/kvm/vmx/tdx.c | 40 ++++++++++++++++++++++++++++++++++---
> > arch/x86/kvm/vmx/tdx_arch.h | 2 ++
> > 2 files changed, 39 insertions(+), 3 deletions(-)
> >
> >diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> >index bdd74682b474..117c2315f087 100644
> >--- a/arch/x86/kvm/vmx/tdx.c
> >+++ b/arch/x86/kvm/vmx/tdx.c
> >@@ -916,6 +916,30 @@ void tdx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
> > tdexit_intr_info(vcpu));
> > else if (exit_reason == EXIT_REASON_EXCEPTION_NMI)
> > vmx_handle_exception_irqoff(vcpu, tdexit_intr_info(vcpu));
> >+ else if (unlikely(tdx->exit_reason.non_recoverable ||
> >+ tdx->exit_reason.error)) {
>
> why not just:
> else if (tdx->exit_reason.basic == EXIT_REASON_OTHER_SMI) {
>
>
> i.e., does EXIT_REASON_OTHER_SMI imply exit_reason.non_recoverable or
> exit_reason.error?

Yes, this should be refined.


> >+ /*
> >+ * The only reason it gets EXIT_REASON_OTHER_SMI is there is an
> >+ * #MSMI(Machine Check System Management Interrupt) with
> >+ * exit_qualification bit 0 set in TD guest.
> >+ * The #MSMI is delivered right after SEAMCALL returns,
> >+ * and an #MC is delivered to host kernel after SMI handler
> >+ * returns.
> >+ *
> >+ * The #MC right after SEAMCALL is fixed up and skipped in #MC
>
> Looks fixing up and skipping #MC on the first instruction after TD-exit is
> missing in v19?

Right. We removed it as MSMI will provides if #MC happened in SEAM or not.


>
> >+ * handler because it's an #MC happens in TD guest we cannot
> >+ * handle it with host's context.
> >+ *
> >+ * Call KVM's machine check handler explicitly here.
> >+ */
> >+ if (tdx->exit_reason.basic == EXIT_REASON_OTHER_SMI) {
> >+ unsigned long exit_qual;
> >+
> >+ exit_qual = tdexit_exit_qual(vcpu);
> >+ if (exit_qual & TD_EXIT_OTHER_SMI_IS_MSMI)
>
> >+ kvm_machine_check();
> >+ }
> >+ }
> > }
> >
> > static int tdx_handle_exception(struct kvm_vcpu *vcpu)
> >@@ -1381,6 +1405,11 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
> > exit_reason.full, exit_reason.basic,
> > to_kvm_tdx(vcpu->kvm)->hkid,
> > set_hkid_to_hpa(0, to_kvm_tdx(vcpu->kvm)->hkid));
> >+
> >+ /*
> >+ * tdx_handle_exit_irqoff() handled EXIT_REASON_OTHER_SMI. It
> >+ * must be handled before enabling preemption because it's #MC.
> >+ */
>
> Then EXIT_REASON_OTHER_SMI is handled, why still go to unhandled_exit?

Let me update the comment.
exit_irqoff() doesn't return value to tell vcpu_run loop to continue or exit to
user-space. As the guest is dead, we'd like to exit to the user-space.


> > goto unhandled_exit;
> > }
> >
> >@@ -1419,9 +1448,14 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
> > return tdx_handle_ept_misconfig(vcpu);
> > case EXIT_REASON_OTHER_SMI:
> > /*
> >- * If reach here, it's not a Machine Check System Management
> >- * Interrupt(MSMI). #SMI is delivered and handled right after
> >- * SEAMRET, nothing needs to be done in KVM.
> >+ * Unlike VMX, all the SMI in SEAM non-root mode (i.e. when
> >+ * TD guest vcpu is running) will cause TD exit to TDX module,
> >+ * then SEAMRET to KVM. Once it exits to KVM, SMI is delivered
> >+ * and handled right away.
> >+ *
> >+ * - If it's an Machine Check System Management Interrupt
> >+ * (MSMI), it's handled above due to non_recoverable bit set.
> >+ * - If it's not an MSMI, don't need to do anything here.
>
> This corrects a comment added in patch 100. Maybe we can just merge patch 100 into
> this one?

Yes. Will do.

> > */
> > return 1;
> > default:
> >diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
> >index efc3c61c14ab..87ef22e9cd49 100644
> >--- a/arch/x86/kvm/vmx/tdx_arch.h
> >+++ b/arch/x86/kvm/vmx/tdx_arch.h
> >@@ -42,6 +42,8 @@
> > #define TDH_VP_WR 43
> > #define TDH_SYS_LP_SHUTDOWN 44
> >
> >+#define TD_EXIT_OTHER_SMI_IS_MSMI BIT(1)
> >+
> > /* TDX control structure (TDR/TDCS/TDVPS) field access codes */
> > #define TDX_NON_ARCH BIT_ULL(63)
> > #define TDX_CLASS_SHIFT 56
> >--
> >2.25.1
> >
> >
>

--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>