Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

From: Ashish Kalra
Date: Fri Apr 02 2021 - 07:09:37 EST


Hello Steve,

On Thu, Apr 01, 2021 at 06:40:06PM -0700, Steve Rutherford wrote:
> On Fri, Mar 19, 2021 at 11:00 AM Ashish Kalra <ashish.kalra@xxxxxxx> wrote:
> >
> > On Thu, Mar 11, 2021 at 12:48:07PM -0800, Steve Rutherford wrote:
> > > On Thu, Mar 11, 2021 at 10:15 AM Ashish Kalra <ashish.kalra@xxxxxxx> wrote:
> > > >
> > > > On Wed, Mar 03, 2021 at 06:54:41PM +0000, Will Deacon wrote:
> > > > > [+Marc]
> > > > >
> > > > > On Tue, Mar 02, 2021 at 02:55:43PM +0000, Ashish Kalra wrote:
> > > > > > On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
> > > > > > > On Fri, Feb 26, 2021, Ashish Kalra wrote:
> > > > > > > > On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> > > > > > > > > On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra <ashish.kalra@xxxxxxx> wrote:
> > > > > > > > > Thanks for grabbing the data!
> > > > > > > > >
> > > > > > > > > I am fine with both paths. Sean has stated an explicit desire for
> > > > > > > > > hypercall exiting, so I think that would be the current consensus.
> > > > > > >
> > > > > > > Yep, though it'd be good to get Paolo's input, too.
> > > > > > >
> > > > > > > > > If we want to do hypercall exiting, this should be in a follow-up
> > > > > > > > > series where we implement something more generic, e.g. a hypercall
> > > > > > > > > exiting bitmap or hypercall exit list. If we are taking the hypercall
> > > > > > > > > exit route, we can drop the kvm side of the hypercall.
> > > > > > >
> > > > > > > I don't think this is a good candidate for arbitrary hypercall interception. Or
> > > > > > > rather, I think hypercall interception should be an orthogonal implementation.
> > > > > > >
> > > > > > > The guest, including guest firmware, needs to be aware that the hypercall is
> > > > > > > supported, and the ABI needs to be well-defined. Relying on userspace VMMs to
> > > > > > > implement a common ABI is an unnecessary risk.
> > > > > > >
> > > > > > > We could make KVM's default behavior be a nop, i.e. have KVM enforce the ABI but
> > > > > > > require further VMM intervention. But, I just don't see the point, it would
> > > > > > > save only a few lines of code. It would also limit what KVM could do in the
> > > > > > > future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to userspace,
> > > > > > > then mandatory interception would essentially make it impossible for KVM to do
> > > > > > > bookkeeping while still honoring the interception request.
> > > > > > >
> > > > > > > However, I do think it would make sense to have the userspace exit be a generic
> > > > > > > exit type. But hey, we already have the necessary ABI defined for that! It's
> > > > > > > just not used anywhere.
> > > > > > >
> > > > > > > /* KVM_EXIT_HYPERCALL */
> > > > > > > struct {
> > > > > > > __u64 nr;
> > > > > > > __u64 args[6];
> > > > > > > __u64 ret;
> > > > > > > __u32 longmode;
> > > > > > > __u32 pad;
> > > > > > > } hypercall;
> > > > > > >
> > > > > > >
> > > > > > > > > Userspace could also handle the MSR using MSR filters (would need to
> > > > > > > > > confirm that). Then userspace could also be in control of the cpuid bit.
> > > > > > >
> > > > > > > An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
> > > > > > > The data limitation could be fudged by shoving data into non-standard GPRs, but
> > > > > > > that will result in truly heinous guest code, and extensibility issues.
> > > > > > >
> >
> > We may also need to pass-through the MSR to userspace, as it is a part of this
> > complete host (userspace/kernel), OVMF and guest kernel negotiation of
> > the SEV live migration feature.
> >
> > Host (userspace/kernel) advertises it's support for SEV live migration
> > feature via the CPUID bits, which is queried by OVMF and which in turn
> > adds a new UEFI runtime variable to indicate support for SEV live
> > migration, which is later queried during guest kernel boot and
> > accordingly the guest does a wrmrsl() to custom MSR to complete SEV
> > live migration negotiation and enable it.
> >
> > Now, the GET_SHARED_REGION_LIST ioctl returns error, until this MSR write
> > enables SEV live migration, hence, preventing userspace to start live
> > migration before the feature support has been negotiated and enabled on
> > all the three components - host, guest OVMF and kernel.
> >
> > But, now with this ioctl not existing anymore, we will need to
> > pass-through the MSR to userspace too, for it to only initiate live
> > migration once the feature negotiation has been completed.
>
> I can't tell if you were waiting for feedback on this before posting
> the follow-up patch series.

Actually, i am going to post the follow-up patch series upstream early
next week.

I have already added support for MSR handling and exit to userspace, the
current implementation looks like this :

The custom MSR is hooked both in svm_get_msr() and svm_set_msr():

@@ -2800,6 +2800,17 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_F10H_DECFG:
msr_info->data = svm->msr_decfg;
break;
+ case MSR_KVM_SEV_LIVE_MIGRATION:
+ if (!sev_guest(vcpu->kvm))
+ return 1;
+
+ if (!guest_cpuid_has(vcpu, KVM_FEATURE_SEV_LIVE_MIGRATION))
+ return 1;
+
+ /*
+ * Let userspace handle the MSR using MSR filters.
+ */
+ return KVM_MSR_RET_FILTERED;

So there is special check added for both sev_guest() and requisite CPUID
bit set in guest CPUID, if either fails this will signal #GP to guest.

Otherwise, it returns MSR_FILTER return code, which will allow userspace
to use msr intercepts to handle the reads and writes via userspace exits
using KVM_EXIT_X86_RDMSR/KVM_EXIT_X86_WRMSR.

Let me know if you have any feedback/comments on the above handling.

Thanks,
Ashish

>
> Here are a few options:
> 1) Add the MSR explicitly to the list of custom kvm MSRs, but don't
> have it hooked up anywhere. The expectation would be for the VMM to
> use msr intercepts to handle the reads and writes. If that seems
> weird, have svm_set_msr (or whatever) explicitly ignore it.
> 2) Add a getter and setter for the MSR. Only allow guests to use it if
> they are sev_guests with the requisite CPUID bit set.
>
> I think I prefer the former, and it should work fine from my
> understanding of the msr intercepts implementation. I'm also open to
> other ideas. You could also have the MSR write trigger a KVM_EXIT of
> the same type as the hypercall, but have it just say "the msr value
> changed to XYZ", but that design sounds awkward.
>
> Thanks,
> Steve