Re: [PATCH v4 2/2] KVM: nSVM: implement ondemand allocation of the nested state

From: Maxim Levitsky
Date: Mon Sep 21 2020 - 04:57:41 EST


On Mon, 2020-09-21 at 10:53 +0300, Maxim Levitsky wrote:
> On Sun, 2020-09-20 at 18:42 +0200, Paolo Bonzini wrote:
> > On 20/09/20 18:16, Sean Christopherson wrote:
> > > > Maxim, your previous version was adding some error handling to
> > > > kvm_x86_ops.set_efer. I don't remember what was the issue; did you have
> > > > any problems propagating all the errors up to KVM_SET_SREGS (easy),
> > > > kvm_set_msr (harder) etc.?
> > > I objected to letting .set_efer() return a fault.
> >
> > So did I, and that's why we get KVM_REQ_OUT_OF_MEMORY. But it was more
> > of an "it's ugly and it ought not to fail" thing than something I could
> > pinpoint.
> >
> > It looks like we agree, but still we have to choose the lesser evil?
> >
> > Paolo
> >
> > > A relatively minor issue is
> > > the code in vmx_set_efer() that handles lack of EFER because technically KVM
> > > can emulate EFER.SCE+SYSCALL without supporting EFER in hardware. Returning
> > > success/'0' would avoid that particular issue. My primary concern is that I'd
> > > prefer not to add another case where KVM can potentially ignore a fault
> > > indicated by a helper, a la vmx_set_cr4().
>
> The thing is that kvm_emulate_wrmsr injects #GP when kvm_set_msr returns any non zero value,
> and returns 1 which means keep on going if I understand correctly (0 is userspace exit,
> negative value would be a return to userspace with an error)
>
> So the question is if we have other wrmsr handlers which return negative value, and would
> be affected by changing kvm_emulate_wrmsr to pass through the error value.
> I am checking the code now.
>
> I do agree now that this is the *correct* solution to this problem.
>
> Best regards,
> Maxim Levitsky


So those are results of my analysis:

WRMSR called functions that return negative value (I could have missed something,
but I double checked the wrmsr code in both SVM and VMX, and in the common x86 code):

vmx_set_vmx_msr - this is only called from userspace (msr_info->host_initiated == true),
so this can be left as is

xen_hvm_config - this code should probably return 1 in some cases, but in one case,
it legit does memory allocation like I do, and failure should probably kill the guest
as well (but I can keep it as is if we are afraid that new behavier will not be
backward compatible)

What do you think about this (only compile tested since I don't have any xen setups):

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 36e963dc1da61..66a57c5b14dfd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2695,24 +2695,19 @@ static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
u32 page_num = data & ~PAGE_MASK;
u64 page_addr = data & PAGE_MASK;
u8 *page;
- int r;

- r = -E2BIG;
if (page_num >= blob_size)
- goto out;
- r = -ENOMEM;
+ return 1;
+
page = memdup_user(blob_addr + (page_num * PAGE_SIZE), PAGE_SIZE);
- if (IS_ERR(page)) {
- r = PTR_ERR(page);
- goto out;
+ if (IS_ERR(page))
+ return PTR_ERR(page);
+
+ if (kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE)) {
+ kfree(page);
+ return 1;
}
- if (kvm_vcpu_write_guest(vcpu, page_addr, page, PAGE_SIZE))
- goto out_free;
- r = 0;
-out_free:
- kfree(page);
-out:
- return r;
+ return 0;
}


The msr write itself can be reached from the guest through two functions,
from kvm_emulate_wrmsr which is called in wrmsr interception from both VMX and SVM,
and from em_wrmsr which is called in unlikely case the emulator needs to emulate a wrmsr.

Both should be changed to inject #GP only on positive return value and pass the error
otherwise.

Sounds reasonable? If you agree I'll post the patches implementing this.

Best regards,
Maxim Levitsky