Re: [PATCH Part2 RFC v4 40/40] KVM: SVM: Support SEV-SNP AP Creation NAE event

From: Sean Christopherson
Date: Wed Jul 21 2021 - 15:53:04 EST


On Wed, Jul 21, 2021, Tom Lendacky wrote:
> On 7/20/21 7:01 PM, Sean Christopherson wrote:
> > On Wed, Jul 07, 2021, Brijesh Singh wrote:
> >> From: Tom Lendacky <thomas.lendacky@xxxxxxx>
> >> +
> >> + svm->snp_vmsa_pfn = pfn;
> >> +
> >> + /* Use the new VMSA in the sev_es_init_vmcb() path */
> >> + svm->vmsa_pa = pfn_to_hpa(pfn);
> >> + svm->vmcb->control.vmsa_pa = svm->vmsa_pa;
> >> +
> >> + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> >> + } else {
> >> + vcpu->arch.pv.pv_unhalted = false;
> >
> > Shouldn't the RUNNABLE path also clear pv_unhalted?
>
> If anything it should set it, since it will be "unhalted" now. But, I
> looked through the code to try and understand if there was a need to set
> it and didn't really see any reason. It is only ever set (at least
> directly) in one place and is cleared everywhere else. It was odd to me.

pv_unhalted is specifically used for a "magic" IPI (KVM hijacked a defunct
IPI type) in the context of PV spinlocks. The idea is that a vCPU that's releasing
a spinlock can kick the next vCPU in the queue, and KVM will directly yield to the
vCPU being kicked so that the guest can efficiently make forward progress.

So it's not wrong to leave pv_unhalted as is, but it's odd to clear it in the
DESTROY case but not CREATE_INIT case. It should be a moot point, as a sane
implementation should make it impossible to get to CREATE with pv_unhalted=1.

> >> + vcpu->arch.mp_state = KVM_MP_STATE_UNINITIALIZED;
> >
> > What happens if userspace calls kvm_arch_vcpu_ioctl_set_mpstate, or even worse
> > the guest sends INIT-SIPI? Unless I'm mistaken, either case will cause KVM to
> > run the vCPU with vmcb->control.vmsa_pa==0.
>
> Using the INVALID_PAGE value now (and even when it was 0), you'll get a
> VMRUN failure.
>
> The AP CREATE_ON_INIT is meant to be used with INIT-SIPI, so if the guest
> hasn't done the right thing, then it will fail on VMRUN.
>
> >
> > My initial reaction is that the "offline" case needs a new mp_state, or maybe
> > just use KVM_MP_STATE_STOPPED.
>
> I'll look at KVM_MP_STATE_STOPPED. Qemu doesn't reference that state at
> all in the i386 support, though, which is why I went with UNINITIALIZED.

Ya, it'd effectively be a new feature. My concern with UNINITIALIZED is that it
be impossible for KVM to differentiate between "never run" and "destroyed and may
have an invalid VMSA" without looking at the VMSA.

> >> + mutex_lock(&target_svm->snp_vmsa_mutex);
> >
> > This seems like it's missing a big pile of sanity checks. E.g. KVM should reject
> > SVM_VMGEXIT_AP_CREATE if the target vCPU is already "created", including the case
> > where it was "created_on_init" but hasn't yet received INIT-SIPI.
>
> Why? If the guest wants to call it multiple times I guess I don't see a
> reason that it would need to call DESTROY first and then CREATE. I don't
> know why a guest would want to do that, but I don't think we should
> prevent it.

Because "creating" a vCPU that already exists is non-sensical. Ditto for
onlining a vCPU that is already onlined. E.g. from the guest's perspective, I
would fully expect a SVM_VMGEXIT_AP_CREATE to fail, not effectively send the vCPU
to an arbitrary state.

Any ambiguity as to the legality of CREATE/DESTROY absolutely needs to be clarified
in the GHCB.

> >> +
> >> + target_svm->snp_vmsa_gpa = 0;
> >> + target_svm->snp_vmsa_update_on_init = false;
> >> +
> >> + /* Interrupt injection mode shouldn't change for AP creation */
> >> + if (request < SVM_VMGEXIT_AP_DESTROY) {
> >> + u64 sev_features;
> >> +
> >> + sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
> >> + sev_features ^= sev->sev_features;
> >> + if (sev_features & SVM_SEV_FEATURES_INT_INJ_MODES) {
> >
> > Why is only INT_INJ_MODES checked? The new comment in sev_es_sync_vmsa() explicitly
> > states that sev_features are the same for all vCPUs, but that's not enforced here.
> > At a bare minimum I would expect this to sanity check SVM_SEV_FEATURES_SNP_ACTIVE.
>
> That's because we can't really enforce it. The SEV_FEATURES value is part
> of the VMSA, of which the hypervisor has no insight into (its encrypted).
>
> The interrupt injection mechanism was specifically requested as a sanity
> check type of thing during the GHCB review, and as there were no
> objections, it was added (see the end of section 4.1.9).
>
> I can definitely add the check for the SNP_ACTIVE bit, but it isn't required.

I'm confused. If we've no insight into what the guest is actually using, what's
the point of the INT_INJ_MODES check?