Re: [PATCH v3 2/2] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

From: Steve Rutherford
Date: Mon May 03 2021 - 19:23:16 EST


On Sat, May 1, 2021 at 2:01 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
>
> On 30/04/21 22:10, Sean Christopherson wrote:
> > On Thu, Apr 29, 2021, Paolo Bonzini wrote:
> >> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> >> index 57fc4090031a..cf1b0b2099b0 100644
> >> --- a/Documentation/virt/kvm/msr.rst
> >> +++ b/Documentation/virt/kvm/msr.rst
> >> @@ -383,5 +383,10 @@ MSR_KVM_MIGRATION_CONTROL:
> >> data:
> >> This MSR is available if KVM_FEATURE_MIGRATION_CONTROL is present in
> >> CPUID. Bit 0 represents whether live migration of the guest is allowed.
> >> +
> >> When a guest is started, bit 0 will be 1 if the guest has encrypted
> >> - memory and 0 if the guest does not have encrypted memory.
> >> + memory and 0 if the guest does not have encrypted memory. If the
> >> + guest is communicating page encryption status to the host using the
> >> + ``KVM_HC_PAGE_ENC_STATUS`` hypercall, it can set bit 0 in this MSR to
> >> + allow live migration of the guest. The MSR is read-only if
> >> + ``KVM_FEATURE_HC_PAGE_STATUS`` is not advertised to the guest.
> >
> > I still don't get the desire to tie MSR_KVM_MIGRATION_CONTROL to PAGE_ENC_STATUS
> > in any way shape or form. I can understand making it read-only or dropping
> > writes if it's not intercepted by userspace, but making it read-only for
> > non-encrypted guests makes it useful only for encrypted guests, which defeats
> > the purpose of genericizing the MSR.
>
> Yeah, I see your point. On the other hand by making it unconditionally
> writable we must implement the writability in KVM, because a read-only
> implementation would not comply with the spec.
>
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index e9c40be9235c..0c2524bbaa84 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -3279,6 +3279,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >> if (!guest_pv_has(vcpu, KVM_FEATURE_MIGRATION_CONTROL))
> >> return 1;
> >>
> >> + /*
> >> + * This implementation is only good if userspace has *not*
> >> + * enabled KVM_FEATURE_HC_PAGE_ENC_STATUS. If userspace
> >> + * enables KVM_FEATURE_HC_PAGE_ENC_STATUS it must set up an
> >> + * MSR filter in order to accept writes that change bit 0.
> >> + */
> >> if (data != !static_call(kvm_x86_has_encrypted_memory)(vcpu->kvm))
> >> return 1;
> >
> > This behavior doesn't match the documentation.
> >
> > a. The MSR is not read-only for legacy guests since they can write '0'.
> > b. The MSR is not read-only if KVM_FEATURE_HC_PAGE_STATUS isn't advertised,
> > a guest with encrypted memory can write '1' regardless of whether userspace
> > has enabled KVM_FEATURE_HC_PAGE_STATUS.
>
> Right, I should have said "not changeable" rather than "read-only".
>
> > c. The MSR is never fully writable, e.g. a guest with encrypted memory can set
> > bit 0, but not clear it. This doesn't seem intentional?
>
> It is intentional, clearing it would mean preserving the value in the
> kernel so that userspace can read it.
>
> So... I don't know, all in all having both the separate CPUID and the
> userspace implementation reeks of overengineering. It should be either
> of these:
>
> - separate CPUID bit, MSR unconditionally writable and implemented in
> KVM. Userspace is expected to ignore the MSR value for encrypted guests
> unless KVM_FEATURE_HC_PAGE_STATUS is exposed. Userspace should respect
> it even for unencrypted guests (not a migration-DoS vector, because
> userspace can just not expose the feature).
>
> - make it completely independent from migration, i.e. it's just a facet
> of MSR_KVM_PAGE_ENC_STATUS saying whether the bitmap is up-to-date. It
> would use CPUID bit as the encryption status bitmap and have no code at
> all in KVM (userspace needs to set up the filter and implement everything).
As far as I know, because of MSR filtering, the only "code" that needs
to be in KVM for MSR handling is a #define reserving the PV feature
number and a #define for the MSR number.

Arguably, you don't even need to add the new PV bits to the supported
cpuid, since MSR filtering is really what determines if kernel support
is present.

>
> At this point I very much prefer the latter, which is basically Ashish's
> earlier patch.
The minor distinction would be that if you expose the cpuid bit to the
guest you plan on intercepting the MSR with filters, and would not
need any handler code in the kernel.

Steve
>
> Paolo