Re: [PATCH v5 1/4] KVM: TDX: Explicitly set user-return MSRs that *may* be clobbered by the TDX-Module

From: Xiaoyao Li

Date: Wed Nov 05 2025 - 04:17:03 EST


On 11/5/2025 9:52 AM, Yan Zhao wrote:
On Tue, Nov 04, 2025 at 09:55:54AM -0800, Sean Christopherson wrote:
On Tue, Nov 04, 2025, Yan Zhao wrote:
On Tue, Nov 04, 2025 at 04:40:44PM +0800, Xiaoyao Li wrote:
On 11/4/2025 3:06 PM, Yan Zhao wrote:
Another nit:
Remove the tdx_user_return_msr_update_cache() in the comment of __tdx_bringup().

Or could we just invoke tdx_user_return_msr_update_cache() in
tdx_prepare_switch_to_guest()?

No. It lacks the WRMSR operation to update the hardware value, which is the
key of this patch.
As [1], I don't think the WRMSR operation to update the hardware value is
necessary. The value will be updated to guest value soon any way if
tdh_vp_enter() succeeds, or the hardware value remains to be the host value or
the default value.

As explained in the original thread:

: > If the MSR's do not get clobbered, does it matter whether or not they get
: > restored.
:
: It matters because KVM needs to know the actual value in hardware. If KVM thinks
: an MSR is 'X', but it's actually 'Y', then KVM could fail to write the correct
: value into hardware when returning to userspace and/or when running a different
: vCPU.

I.e. updating the cache effectively corrupts state if the TDX-Module doesn't
clobber MSRs as expected, i.e. if the current value is preserved in hardware.
I'm not against this patch. But I think the above explanation is not that
convincing, (or somewhat confusing).


By "if the TDX-Module doesn't clobber MSRs as expected",
- if it occurs due to tdh_vp_enter() failure, I think it's fine.
Though KVM thinks the MSR is 'X', the actual value in hardware should be
either 'Y' (the host value) or 'X' (the expected clobbered value).
It's benign to preserving value 'Y', no?

For example, after tdh_vp_enter() failure, the state becomes

.curr == 'X'
hardware == 'Y'

and the TD vcpu thread is preempted and the pcpu is scheduled to run another VM's vcpu, which is a normal VMX vcpu and it happens to have the MSR value of 'X'. So in

vmx_prepare_switch_to_guest()
-> kvm_set_user_return_msr()

it will skip the WRMSR because written_value == .curr == 'X', but the hardware value is 'Y'. Then KVM fails to load the expected value 'X' for the VMX vcpu.

- if it occurs due to TDX module bugs, e.g., if after a successful
tdh_vp_enter() and VM exits, the TDX module clobbers the MSR to 'Z', while
the host value for the MSR is 'Y' and KVM thinks the actual value is 'X'.
Then the hardware state will be incorrect after returning to userspace if
'X' == 'Y'. But this patch can't guard against this condition as well, right?


But I think invoking tdx_user_return_msr_update_cache() in
tdx_prepare_switch_to_guest() is better than in
tdx_prepare_switch_to_host().

[1] https://lore.kernel.org/kvm/aQhJol0CvT6bNCJQ@xxxxxxxxxxxxxxxxxxxxxxxxx/