Re: [PATCH v3 13/21] KVM:VMX: Emulate reads and writes to CET MSRs

From: Sean Christopherson
Date: Fri Jun 23 2023 - 19:53:54 EST


On Thu, May 11, 2023, Yang Weijiang wrote:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c872a5aafa50..0ccaa467d7d3 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2093,6 +2093,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> else
> msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
> break;
> + case MSR_IA32_U_CET:
> + case MSR_IA32_PL3_SSP:
> + if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> + return 1;
> + kvm_get_xsave_msr(msr_info);
> + break;

Please put as much MSR handling in x86.c as possible. We quite obviously know
that AMD support is coming along, there's no reason to duplicate all of this code.
And unless I'm missing something, John's series misses several #GP checks, e.g.
for MSR_IA32_S_CET reserved bits, which means that providing a common implementation
would actually fix bugs.

For MSRs that require vendor input and/or handling, please follow what was
recently done for MSR_IA32_CR_PAT, where the common bits are handled in common
code, and vendor code does its updates.

The divergent alignment between AMD and Intel could get annoying, but I'm sure
we can figure out a solution.

> case MSR_IA32_DEBUGCTLMSR:
> msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> break;
> @@ -2405,6 +2411,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> else
> vmx->pt_desc.guest.addr_a[index / 2] = data;
> break;
> + case MSR_IA32_U_CET:
> + case MSR_IA32_PL3_SSP:
> + if (!kvm_cet_is_msr_accessible(vcpu, msr_info))
> + return 1;
> + if (is_noncanonical_address(data, vcpu))
> + return 1;
> + if (msr_index == MSR_IA32_U_CET && (data & GENMASK(9, 6)))
> + return 1;
> + if (msr_index == MSR_IA32_PL3_SSP && (data & GENMASK(2, 0)))

Please #define reserved bits, ideally using the inverse of the valid masks. And
for SSP, it might be better to do IS_ALIGNED(data, 8) (or 4, pending my question
about the SDM's wording).

Side topic, what on earth does the SDM mean by this?!?

The linear address written must be aligned to 8 bytes and bits 2:0 must be 0
(hardware requires bits 1:0 to be 0).

I know Intel retroactively changed the alignment requirements, but the above
is nonsensical. If ucode prevents writing bits 2:0, who cares what hardware
requires?

> + return 1;
> + kvm_set_xsave_msr(msr_info);
> + break;
> case MSR_IA32_PERF_CAPABILITIES:
> if (data && !vcpu_to_pmu(vcpu)->version)
> return 1;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b6eec9143129..2e3a39c9297c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13630,6 +13630,26 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
> }
> EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
>
> +bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr)
> +{
> + if (!kvm_cet_user_supported())

This feels wrong. KVM should differentiate between SHSTK and IBT in the host.
E.g. if running in a VM with SHSTK but not IBT, or vice versa, KVM should allow
writes to non-existent MSRs. I.e. this looks wrong:

/*
* If SHSTK and IBT are available in KVM, clear CET user bit in
* kvm_caps.supported_xss so that kvm_cet_user_supported() returns
* false when called.
*/
if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
!kvm_cpu_cap_has(X86_FEATURE_IBT))
kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_USER;

and by extension, all dependent code is also wrong. IIRC, there's a virtualization
hole, but I don't see any reason why KVM has to make the hole even bigger.

> + return false;
> +
> + if (msr->host_initiated)
> + return true;
> +
> + if (!guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_IBT))
> + return false;
> +
> + if (msr->index == MSR_IA32_PL3_SSP &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_SHSTK))

I probably asked this long ago, but if I did I since forgot. Is it really just
PL3_SSP that depends on SHSTK? I would expect all shadow stack MSRs to depend
on SHSTK.

> @@ -546,5 +557,25 @@ int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
> int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
> unsigned int port, void *data, unsigned int count,
> int in);
> +bool kvm_cet_is_msr_accessible(struct kvm_vcpu *vcpu, struct msr_data *msr);
> +
> +/*
> + * We've already loaded guest MSRs in __msr_io() after check the MSR index.

Please avoid pronouns

> + * In case vcpu has been preempted, we need to disable preemption, check

vCPU. And this doesn't make any sense. The "vCPU" being preempted doesn't matter,
it's KVM, i.e. the task that's accessing vCPU state that cares about preemption.
I *think* what you're trying to say is that preemption needs to be disabled to
ensure that the guest values are resident.

> + * and reload the guest fpu states before read/write xsaves-managed MSRs.
> + */
> +static inline void kvm_get_xsave_msr(struct msr_data *msr_info)
> +{
> + fpregs_lock_and_load();

KVM already has helpers that do exactly this, and they have far better names for
KVM: kvm_fpu_get() and kvm_fpu_put(). Can you convert kvm_fpu_get() to
fpregs_lock_and_load() and use those isntead? And if the extra consistency checks
in fpregs_lock_and_load() fire, we definitely want to know, as it means we probably
have bugs in KVM.