Re: [PATCH 0/3] KVM: x86: guest interface for SEV live migration

From: Paolo Bonzini
Date: Tue Apr 20 2021 - 14:39:58 EST


On 20/04/21 19:31, Sean Christopherson wrote:
+ case KVM_HC_PAGE_ENC_STATUS: {
+ u64 gpa = a0, npages = a1, enc = a2;
+
+ ret = -KVM_ENOSYS;
+ if (!vcpu->kvm->arch.hypercall_exit_enabled)

I don't follow, why does the hypercall need to be gated by a capability? What
would break if this were changed to?

if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))

The problem is that it's valid to take KVM_GET_SUPPORTED_CPUID and send it unmodified to KVM_SET_CPUID2. For this reason, features that are conditional on other ioctls, or that require some kind of userspace support, must not be in KVM_GET_SUPPORTED_CPUID. For example:

- TSC_DEADLINE because it is only implemented after KVM_CREATE_IRQCHIP (or after KVM_ENABLE_CAP of KVM_CAP_IRQCHIP_SPLIT)

- MONITOR only makes sense if userspace enables KVM_CAP_X86_DISABLE_EXITS

X2APIC is reported even though it shouldn't be. Too late to fix that, I think.

In this particular case, if userspace sets the bit in CPUID2 but doesn't handle KVM_EXIT_HYPERCALL, the guest will probably trigger some kind of assertion failure as soon as it invokes the HC_PAGE_ENC_STATUS hypercall.

(I should document that, Jim asked for documentation around KVM_GET_SUPPORTED_CPUID and KVM_GET_MSR_INDEX_LIST many times).

Paolo

+ break;
+
+ if (!PAGE_ALIGNED(gpa) || !npages ||
+ gpa_to_gfn(gpa) + npages <= gpa_to_gfn(gpa)) {
+ ret = -EINVAL;
+ break;
+ }
+
+ vcpu->run->exit_reason = KVM_EXIT_HYPERCALL;
+ vcpu->run->hypercall.nr = KVM_HC_PAGE_ENC_STATUS;
+ vcpu->run->hypercall.args[0] = gpa;
+ vcpu->run->hypercall.args[1] = npages;
+ vcpu->run->hypercall.args[2] = enc;
+ vcpu->run->hypercall.longmode = op_64_bit;
+ vcpu->arch.complete_userspace_io = complete_hypercall_exit;
+ return 0;
+ }
default:
ret = -KVM_ENOSYS;
break;

...

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 590cc811c99a..d696a9f13e33 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3258,6 +3258,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vcpu->arch.msr_kvm_poll_control = data;
break;
+ case MSR_KVM_MIGRATION_CONTROL:
+ if (data & ~KVM_PAGE_ENC_STATUS_UPTODATE)
+ return 1;
+
+ if (data && !guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))

Why let the guest write '0'? Letting the guest do WRMSR but not RDMSR is
bizarre.

Because it was the simplest way to write the code, but returning 0 unconditionally from RDMSR is actually simpler.

Paolo

+ return 1;
+ break;
+
case MSR_IA32_MCG_CTL:
case MSR_IA32_MCG_STATUS:
case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
@@ -3549,6 +3557,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (!guest_pv_has(vcpu, KVM_FEATURE_ASYNC_PF))
return 1;
+ msr_info->data = 0;
+ break;
+ case MSR_KVM_MIGRATION_CONTROL:
+ if (!guest_pv_has(vcpu, KVM_FEATURE_HC_PAGE_ENC_STATUS))
+ return 1;
+
msr_info->data = 0;
break;
case MSR_KVM_STEAL_TIME:
--
2.26.2