Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs

From: Yang, Weijiang
Date: Mon Jun 26 2023 - 10:07:56 EST



On 6/24/2023 7:53 AM, Sean Christopherson wrote:
On Thu, May 11, 2023, Yang Weijiang wrote:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c872a5aafa50..0ccaa467d7d3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2093,6 +2093,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_PL3_SSP:
+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+ return 1;
+ kvm_get_xsave_msr(msr_info);
+ break;
Please put as much MSR handling in x86.c as possible. We quite obviously know
that AMD support is coming along, there's no reason to duplicate all of this code.
And unless I'm missing something, John's series misses several #GP checks, e.g.
for MSR_IA32_S_CET reserved bits, which means that providing a common implementation
would actually fix bugs.

OK, will move the common part to x86.c


For MSRs that require vendor input and/or handling, please follow what was
recently done for MSR_IA32_CR_PAT, where the common bits are handled in common
code, and vendor code does its updates.

The divergent alignment between AMD and Intel could get annoying, but I'm sure
we can figure out a solution.
Got it, will refer to the PAT handling.

case MSR_IA32_DEBUGCTLMSR:
msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
break;
@@ -2405,6 +2411,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
vmx->pt_desc.guest.addr_a[index / 2] = data;
break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_PL3_SSP:
+ if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
+ return 1;
+ if (is_noncanonical_address(data, vcpu))
+ return 1;
+ if (msr_index == MSR_IA32_U_CET && (data & GENMASK(9, 6)))
+ return 1;
+ if (msr_index == MSR_IA32_PL3_SSP && (data & GENMASK(2, 0)))
Please #define reserved bits, ideally using the inverse of the valid masks. And
for SSP, it might be better to do IS_ALIGNED(data, 8) (or 4, pending my question
about the SDM's wording).

OK.


Side topic, what on earth does the SDM mean by this?!?

The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
(hardware requires bits 1:0 to be 0).

I know Intel retroactively changed the alignment requirements, but the above
is nonsensical. If ucode prevents writing bits 2:0, who cares what hardware
requires?

So do I ;-/


+ return 1;
+ kvm_set_xsave_msr(msr_info);
+ break;
case MSR_IA32_PERF_CAPABILITIES:
if (data && !vcpu_to_pmu(vcpu)->version)
return 1;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b6eec9143129..2e3a39c9297c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13630,6 +13630,26 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
}
EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
+bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
+{
+ if (!kvm_cet_user_supported())
This feels wrong. KVM should differentiate between SHSTK and IBT in the host.
E.g. if running in a VM with SHSTK but not IBT, or vice versa, KVM should allow
writes to non-existent MSRs.

I don't follow you, in this case, which part KVM is on behalf of? guest or user space?

I.e. this looks wrong:

/*
* If SHSTK and IBT are available in KVM, clear CET user bit in
* kvm_caps.supported_xss so that kvm_cet_user_supported() returns
* false when called.
*/
if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
!kvm_cpu_cap_has(X86_FEATURE_IBT))
kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_USER;

The comment is wrong, it should be "are not available in KVM". My intent is,  if both features are not

available in KVM, then clear the precondition bit so that all dependent checks will fail quickly.


and by extension, all dependent code is also wrong. IIRC, there's a virtualization
hole, but I don't see any reason why KVM has to make the hole even bigger.

Do you mean the issue that both SHSTK and IBT share one control MSR? i.e., U_CET/S_CET?


+ return false;
+
+ if (msr->host_initiated)
+ return true;
+
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
+ return false;
+
+ if (msr->index == MSR_IA32_PL3_SSP &&
+ !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))
I probably asked this long ago, but if I did I since forgot. Is it really just
PL3_SSP that depends on SHSTK? I would expect all shadow stack MSRs to depend
on SHSTK.

All PL{0,1,2,3}_SSP plus INT_SSP_TAB msr depend on SHSTK. In patch 21, I added more

MSRs in this helper.

@@ -546,5 +557,25 @@ int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
unsigned int port, void *data, unsigned int count,
int in);
+bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr);
+
+/*
+ * We've already loaded guest MSRs in __msr_io() after check the MSR index.
Please avoid pronouns

OK.

+ * In case vcpu has been preempted, we need to disable preemption, check
vCPU. And this doesn't make any sense. The "vCPU" being preempted doesn't matter,
it's KVM, i.e. the task that's accessing vCPU state that cares about preemption.
I *think* what you're trying to say is that preemption needs to be disabled to
ensure that the guest values are resident.

Sorry the comment is broken, I meant to say between kvm_load_guest_fpu() and the

place to use this helper, the vCPU could have been preempted, so need to reload fpu with

fpregs_lock_and_load() and disable preemption now before access MSR.


+ * and reload the guest fpu states before read/write xsaves-managed MSRs.
+ */
+static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
+{
+ fpregs_lock_and_load();
KVM already has helpers that do exactly this, and they have far better names for
KVM: kvm_fpu_get() and kvm_fpu_put(). Can you convert kvm_fpu_get() to
fpregs_lock_and_load() and use those isntead? And if the extra consistency checks
in fpregs_lock_and_load() fire, we definitely want to know, as it means we probably
have bugs in KVM.

Do you want me to do some experiments to make sure the WARN()  in fpregs_lock_and load() would be

triggered or not?

If no WARN() trigger, then replace fpregs_lock_and_load()/fpregs_unlock() with kvm_fpu_get()/

kvm_fpu_put()?