Re: [PATCH 35/43] KVM: x86: Move setting of sregs during vCPU RESET/INIT to common x86

From: Sean Christopherson
Date: Tue May 18 2021 - 17:45:34 EST


On Mon, May 17, 2021, Reiji Watanabe wrote:
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1204,12 +1204,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
> > init_sys_seg(&save->ldtr, SEG_TYPE_LDT);
> > init_sys_seg(&save->tr, SEG_TYPE_BUSY_TSS16);
> >
> > - svm_set_cr0(vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET);
> > - svm_set_cr4(vcpu, 0);
> > - svm_set_efer(vcpu, 0);
> > - kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
> > - vcpu->arch.regs[VCPU_REGS_RIP] = 0x0000fff0;
>
> Reviewed-by: Reiji Watanabe <reijiw@xxxxxxxxxx>
>
> Those your vCPU RESET/INIT changes look great.
>
> I think the change in init_vmcb() basically assumes that the
> function is called from kvm_vcpu_reset(via svm_vcpu_reset()).
> Although shutdown_interception() directly calls init_mcb(),
> I would think the change doesn't matter for the shutdown
> interception case.
>
> IMHO it would be a bit misleading that a function named 'init_vmcb',
> which is called from other than kvm_vcpu_reset (svm_vcpu_reset()),
> only partially resets the vmcb (probably just to me though).

It's not just you, that code is funky. If I could go back in time, I would lobby
to not automatically init the VMCB and instead put the vCPU into
KVM_MP_STATE_UNINITIALIZED and force userspace to explicitly INIT or RESET the
vCPU. Even better would be to add KVM_MP_STATE_SHUTDOWN, since technically NMI
can break shutdown (and SMI on Intel CPUs).

Anyways, that ship has sailed, but we might be able to get away with replacing
init_vmcb() with kvm_vcpu_reset(vcpu, true), i.e. effecting a full INIT. That
would ensure the VMCB is consistent with KVM's software model, which I'm guessing
is not true with the direct init_vmcb() call. It would also have some connection
to reality since there exist bare metal platforms that automatically INIT the CPU
if it hits shutdown (maybe only for the BSP?).

Side topic, the NMI thing got me looking through init_vmcb() to see how it
handles the IDT and GDT, and surprise, surprise, it fails to zero IDTR.base and
GDTR.base. I'll add a patch to fix that, and maybe try to consolidate the VMX
and SVM segmentation logic.

> So, I personally think it would be better if its name or comment
> can give some more specific information about the assumption.
>
> BTW, it looks like two lines of "vcpu->arch.hflags = 0;"
> can be also removed from the init_vmcb() as well.

Ya, I'll add that to the pile.

Thanks!