Re: [PATCH v3 07/26] x86/virt/seamldr: Introduce a wrapper for P-SEAMLDR SEAMCALLs

From: Chao Gao

Date: Tue Feb 03 2026 - 07:23:17 EST


>>> I'd be shocked if this is the one and only place in the whole kernel
>>> that can unceremoniously zap VMX state.
>>>
>>> I'd *bet* that you don't really need to do the vmptrld and that KVM can
>>> figure it out because it can vmptrld on demand anyway. Something along
>>> the lines of:
>>>
>>> local_irq_disable();
>>> list_for_each(handwaving...)
>>> vmcs_clear();
>>> ret = seamldr_prerr(fn, args);
>>> local_irq_enable();
>>>
>>> Basically, zap this CPU's vmcs state and then make KVM reload it at some
>>> later time.
>>
>> The idea is feasible. But just calling vmcs_clear() won't work. We need to
>> reset all the tracking state associated with each VMCS. We should call
>> vmclear_local_loaded_vmcss() instead, similar to what's done before VMXOFF.
>>
>>>
>>> I'm sure Sean and Paolo will tell me if I'm crazy.
>>
>> To me, this approach needs more work since we need to either move
>> vmclear_local_loaded_vmcss() to the kernel or allow KVM to register a callback.
>>
>> I don't think it's as straightforward as just doing the save/restore.
>
>Could you please just do me a favor and spend 20 minutes to see what
>this looks like in practice and if the KVM folks hate it?

Sure. KVM tracks the current VMCS and only executes vmptrld for a new VMCS if
it differs from the current one. See arch/x86/kvm/vmx/vmx.c::vmx_vcpu_load_vmcs()

prev = per_cpu(current_vmcs, cpu);
if (prev != vmx->loaded_vmcs->vmcs) {
per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
vmcs_load(vmx->loaded_vmcs->vmcs);
}

By resetting current_vmcs to NULL during P-SEAMLDR calls, KVM is forced to do a
vmptrld on the next VMCS load. So, we can implement seamldr_call() as:

static int seamldr_call(u64 fn, struct tdx_module_args *args)
{
int ret;

WARN_ON_ONCE(!is_seamldr_call(fn));

/*
* Serialize P-SEAMLDR calls since only a single CPU is allowed to
* interact with P-SEAMLDR at a time.
*
* P-SEAMLDR calls invalidate the current VMCS. Exclude KVM access to
* the VMCS by disabling interrupts. This is not safe against VMCS use
* in NMIs, but there are none of those today.
*
* Set the per-CPU current_vmcs cache to NULL to force KVM to reload
* the VMCS.
*/
guard(raw_spinlock_irqsave)(&seamldr_lock);
ret = seamcall_prerr(fn, args);
this_cpu_write(current_vmcs, NULL);

return ret;
}

This requires moving the per-CPU current_vmcs from KVM to the kernel, which
should be trivial with Sean's VMXON series.

And I tested this. Without this_cpu_write(), vmread/vmwrite errors occur after
TDX Module updates. But with it, no errors.