Re: [PATCH 7/7] KVM: TDX: Add TSX_CTRL msr into uret_msrs list

From: Xiaoyao Li
Date: Thu Dec 05 2024 - 22:37:50 EST


On 12/6/2024 1:31 AM, Adrian Hunter wrote:
On 4/12/24 17:33, Xiaoyao Li wrote:
On 12/4/2024 7:55 PM, Adrian Hunter wrote:
On 4/12/24 13:13, Chao Gao wrote:
On Wed, Dec 04, 2024 at 08:57:23AM +0200, Adrian Hunter wrote:
On 4/12/24 08:37, Chao Gao wrote:
On Wed, Dec 04, 2024 at 08:18:32AM +0200, Adrian Hunter wrote:
On 4/12/24 03:25, Chao Gao wrote:
+#define TDX_FEATURE_TSX (__feature_bit(X86_FEATURE_HLE) | __feature_bit(X86_FEATURE_RTM))
+
+static bool has_tsx(const struct kvm_cpuid_entry2 *entry)
+{
+    return entry->function == 7 && entry->index == 0 &&
+           (entry->ebx & TDX_FEATURE_TSX);
+}
+
+static void clear_tsx(struct kvm_cpuid_entry2 *entry)
+{
+    entry->ebx &= ~TDX_FEATURE_TSX;
+}
+
+static bool has_waitpkg(const struct kvm_cpuid_entry2 *entry)
+{
+    return entry->function == 7 && entry->index == 0 &&
+           (entry->ecx & __feature_bit(X86_FEATURE_WAITPKG));
+}
+
+static void clear_waitpkg(struct kvm_cpuid_entry2 *entry)
+{
+    entry->ecx &= ~__feature_bit(X86_FEATURE_WAITPKG);
+}
+
+static void tdx_clear_unsupported_cpuid(struct kvm_cpuid_entry2 *entry)
+{
+    if (has_tsx(entry))
+        clear_tsx(entry);
+
+    if (has_waitpkg(entry))
+        clear_waitpkg(entry);
+}
+
+static bool tdx_unsupported_cpuid(const struct kvm_cpuid_entry2 *entry)
+{
+    return has_tsx(entry) || has_waitpkg(entry);
+}

No need to check TSX/WAITPKG explicitly because setup_tdparams_cpuids() already
ensures that unconfigurable bits are not set by userspace.

Aren't they configurable?

They are cleared from the configurable bitmap by tdx_clear_unsupported_cpuid(),
so they are not configurable from a userspace perspective. Did I miss anything?
KVM should check user inputs against its adjusted configurable bitmap, right?

Maybe I misunderstand but we rely on the TDX module to reject
invalid configuration.  We don't check exactly what is configurable
for the TDX Module.

Ok, this is what I missed. I thought KVM validated user input and masked
out all unsupported features. sorry for this.


TSX and WAITPKG are not invalid for the TDX Module, but KVM
must either support them by restoring their MSRs, or disallow
them.  This patch disallows them for now.

Yes. I agree. what if a new feature (supported by a future TDX module) also
needs KVM to restore some MSRs? current KVM will allow it to be exposed (since
only TSX/WAITPKG are checked); then some MSRs may get corrupted. I may think
this is not a good design. Current KVM should work with future TDX modules.

With respect to CPUID, I gather this kind of thing has been
discussed, such as here:

    https://lore.kernel.org/all/ZhVsHVqaff7AKagu@xxxxxxxxxx/

and Rick and Xiaoyao are working on something.

In general, I would expect a new TDX Module would advertise support for
new features, but KVM would have to opt in to use them.


There were discussion[1] on whether KVM to gatekeep the configurable/supported CPUIDs for TDX. I stand by Sean that KVM needs to do so.

Regarding KVM opt in the new feature, KVM gatekeeps the CPUID bit that can be set by userspace is exactly the behavior of opt-in. i.e., for a given KVM, it only allows a CPUID set {S} to be configured by userspace, if new TDX module supports new feature X, it needs KVM to opt-in X by adding X to {S} so that X is allowed to be configured by userspace.

Besides, I find current interface between KVM and userspace lacks the ability to tell userspace what bits are not supported by KVM. KVM_TDX_CAPABILITIES.cpuid doesn't work because it represents the configurable CPUIDs, not supported CPUIDs (I think we might rename it to configurable_cpuid to better reflect its meaning). So userspace has to hardcode that TSX and WAITPKG is not support itself.

I don't follow why hardcoding would be necessary.

If the leaf is represented in KVM_TDX_CAPABILITIES.cpuid, and
the bits are 0 there, why would userspace try to set them to 1?

Userspace doesn't set the bit to 1 in kvm_tdx_init_vm.cpuid, doesn't mean userspace wants the bit to be 0.

Note, KVM_TDX_CAPABILITIES.cpuid reports the configurable bits. The value 0 of a bit in KVM_TDX_CAPABILITIES.cpuid means the bit is not configurable, not means the bit is unsupported.

For kvm_tdx_init_vm.cpuid,
- if the corresponding bit is reported as 1 in KVM_TDX_CAPABILITIES.cpuid, then a value 0 in kvm_tdx_init_vm.cpuid means userspace wants to configure it as 0.
- if the corresponding bit is reported as 0 in KVM_TDX_CAPABILITIES.cpuid, then userspace has to pass a value 0 in kvm_tdx_init_vm.cpuid. But it doesn't mean the value of the bit will be 0.

e.g., X2APIC bit is 0 in KVM_TDX_CAPABILITIES.cpuid, and it's also 0 in kvm_tdx_init_vm.cpuid, but TD guest sees a value of 1. In the view of QEMU, it maintains the bit of X2APIC as 1, and QEMU filters X2APIC bit when calling KVM_TDX_INIT_VM because X2APIC is not configurable.

So when it comes to TSX and WAITPKG, QEMU also needs an interface to be informed that they are unsupported. Without the interface of fixed0 bits reported by KVM, QEMU needs to hardcode itself like [1]. The problem of hardcode is that it will conflict when future KVM allows them to be configurable.

In the future, if we have interface from KVM to report the fixed0 and fixed1 bit (on top of the proposal [2]), userspace can drop the hardcoded one it maintains. At that time, KVM can ensure no conflict by removing the bits from fixed0/1 array when allowing them to be configurable.

[1] https://lore.kernel.org/qemu-devel/20241105062408.3533704-49-xiaoyao.li@xxxxxxxxx/
[2] https://lore.kernel.org/all/43b26df1-4c27-41ff-a482-e258f872cc31@xxxxxxxxx/


[1] https://lore.kernel.org/all/ZuM12EFbOXmpHHVQ@xxxxxxxxxx/