Re: [PATCH] x86/tdx, KVM: fix HKID leak when kexec is initiated with active TDs

From: Sean Christopherson

Date: Wed Apr 22 2026 - 09:20:41 EST

On Wed, Apr 22, 2026, Robert Nowicki wrote:
> When kexec is initiated while TDs are running, vCPU threads can be
> mid-TDH.VP.ENTER on other CPUs when tdx_shutdown() fires. The TDX
> module rejects TDH.MNG.VPFLUSHDONE for a VP in RUNNING state, leaving
> the HKID in a leaked state:
>
> kvm_intel: tdh_mng_vpflushdone() failed. HKID 33 is leaked.
>
> Fix this by introducing a quiescing flag set at the start of
> tdx_shutdown(). KVM's tdx_vcpu_run() checks the flag and returns
> EXIT_FASTPATH_NONE before attempting TDH.VP.ENTER. After setting the
> flag, tdx_shutdown() calls on_each_cpu(tdx_seam_sync) with wait=1 to
> ensure any CPU currently inside TDH.VP.ENTER has exited SEAM before
> tdx_sys_disable() is called.
>
> Fixes: 58171ae22e11 ("x86/tdx: Disable the TDX module during kexec and kdump")

Please don't post seemingly standalone patches for code that hasn't yet been
merged, it's quite confusing.

> u64 tdh_vp_enter(struct tdx_vp *vp, struct tdx_module_args *args);
> u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page);
> @@ -206,6 +207,7 @@ static inline u32 tdx_get_nr_guest_keyids(void) { return 0; }
> static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; }
> static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NULL; }
> static inline void tdx_sys_disable(void) { }
> +static inline bool tdx_kexec_quiescing(void) { return false; }
> #endif /* CONFIG_INTEL_TDX_HOST */
>
> #endif /* !__ASSEMBLER__ */
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 50a5cfdbd33e..2d658db7700d 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1053,6 +1053,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
> struct vcpu_tdx *tdx = to_tdx(vcpu);
> struct vcpu_vt *vt = to_vt(vcpu);
>
> + if (unlikely(tdx_kexec_quiescing()))

Requiring KVM to check a global on every entry is pretty ugly, especially since
this is for a very rare scenario (in terms of number of entries). And forcing
KVM to do a CALL+RET to check an almost-never-set flag is especially ugly.

Why not handle this entirely in tdx_shutdown_cpu()? E.g. have the last CPU through
disable TDX, and hld all the CPUs hostage until that's done. It's not the prettiest
thing in the world, but it's entirely self-contained.

static void tdx_shutdown_cpu(void *__nr_cpus_remaining)
{
atomic_t *nr_cpus_remaining = __nr_cpus_remaining;

if (!atomic_add_unless(nr_cpus_remaining, -1, 1)) {
tdx_sys_disable();
atomic_set(nr_cpus_remaining, 0);
}

x86_virt_put_ref(X86_FEATURE_VMX);

while (!atomic_read(nr_cpus_remaining))
cpu_relax();
}

static void tdx_shutdown(void *ign)
{
atomic_t nr_cpus_remaining = ATOMIC_INIT(num_online_cpus());

on_each_cpu(tdx_shutdown_cpu, &nr_cpus_remaining, 1);
}