[PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is trustworthy

From: Vitaly Kuznetsov
Date: Tue Nov 05 2019 - 11:17:45 EST


Virtualized guests may pick a different strategy to mitigate hardware
vulnerabilities when it comes to hyper-threading: disable SMT completely,
use core scheduling, or, for example, opt in for STIBP. Making the
decision, however, requires an extra bit of information which is currently
missing: does the topology the guest see match hardware or if it is 'fake'
and two vCPUs which look like different cores from guest's perspective can
actually be scheduled on the same physical core. Disabling SMT or doing
core scheduling only makes sense when the topology is trustworthy.

Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
topology is actually trustworthy. It would, of course, be possible to get
away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
compatibility but the current approach looks more straightforward.

There were some offline discussions on whether this new feature bit should
be complemented with a 're-enlightenment' mechanism for live migration (so
it can change in guest's lifetime) but it doesn't seem to be very
practical: what a sane guest is supposed to do if it's told that SMT
topology is about to become fake other than kill itself? Also, it seems to
make little sense to do e.g. CPU pinning on the source but not on the
destination.

There is also one additional piece of the information missing. A VM can be
sharing physical cores with other VMs (or other userspace tasks on the
host) so does KVM_FEATURE_TRUSTWORTHY_SMT imply that it's not the case or
not? It is unclear if this changes anything and can probably be left out
of scope (just don't do that).

Similar to the already existent 'NoNonArchitecturalCoreSharing' Hyper-V
enlightenment, the default value of KVM_HINTS_TRUSTWORTHY_SMT is set to
!cpu_smt_possible(). KVM userspace is thus supposed to pass it to guest's
CPUIDs in case it is '1' (meaning no SMT on the host at all) or do some
extra work (like CPU pinning and exposing the correct topology) before
passing '1' to the guest.

Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
---
Documentation/virt/kvm/cpuid.rst | 27 +++++++++++++++++++--------
arch/x86/include/uapi/asm/kvm_para.h | 2 ++
arch/x86/kvm/cpuid.c | 7 ++++++-
3 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 01b081f6e7ea..64b94103fc90 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
before using paravirtualized
sched yield.

+KVM_FEATURE_TRUSTWORTHY_SMT 14 set when host supports 'SMT
+ topology is trustworthy' hint
+ (KVM_HINTS_TRUSTWORTHY_SMT).
+
KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
per-cpu warps are expeced in
kvmclock
@@ -97,11 +101,18 @@ KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side

Where ``flag`` here is defined as below:

-================== ============ =================================
-flag value meaning
-================== ============ =================================
-KVM_HINTS_REALTIME 0 guest checks this feature bit to
- determine that vCPUs are never
- preempted for an unlimited time
- allowing optimizations
-================== ============ =================================
+================================= =========== =================================
+flag value meaning
+================================= =========== =================================
+KVM_HINTS_REALTIME 0 guest checks this feature bit to
+ determine that vCPUs are never
+ preempted for an unlimited time
+ allowing optimizations
+
+KVM_HINTS_TRUSTWORTHY_SMT 1 the bit is set when the exposed
+ SMT topology is trustworthy, this
+ means that two guest vCPUs will
+ never share a physical core
+ unless they are exposed as SMT
+ threads.
+================================= =========== =================================
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 2a8e0b6b9805..183239d5dfad 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -31,8 +31,10 @@
#define KVM_FEATURE_PV_SEND_IPI 11
#define KVM_FEATURE_POLL_CONTROL 12
#define KVM_FEATURE_PV_SCHED_YIELD 13
+#define KVM_FEATURE_TRUSTWORTHY_SMT 14

#define KVM_HINTS_REALTIME 0
+#define KVM_HINTS_TRUSTWORTHY_SMT 1

/* The last 8 bits are used to indicate how to interpret the flags field
* in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f68c0c753c38..dab527a7081f 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -712,7 +712,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
(1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
(1 << KVM_FEATURE_PV_SEND_IPI) |
(1 << KVM_FEATURE_POLL_CONTROL) |
- (1 << KVM_FEATURE_PV_SCHED_YIELD);
+ (1 << KVM_FEATURE_PV_SCHED_YIELD) |
+ (1 << KVM_FEATURE_TRUSTWORTHY_SMT);

if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
@@ -720,6 +721,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
entry->ebx = 0;
entry->ecx = 0;
entry->edx = 0;
+
+ if (!cpu_smt_possible())
+ entry->edx |= (1 << KVM_HINTS_TRUSTWORTHY_SMT);
+
break;
case 0x80000000:
entry->eax = min(entry->eax, 0x8000001f);
--
2.20.1