Re: [RFC PATCH 2/3] nSVM: introduce smv->nested.save to cache save area fields

From: Sean Christopherson
Date: Tue Sep 28 2021 - 18:24:41 EST


On Tue, Sep 28, 2021, Paolo Bonzini wrote:
> On 12/09/21 12:39, Maxim Levitsky wrote:
> > On Fri, 2021-09-03 at 12:20 +0200, Emanuele Giuseppe Esposito wrote:
> > > This is useful in next patch, to avoid having temporary
> > > copies of vmcb12 registers and passing them manually.
> >
> > This is NOT what I had in mind, but I do like that idea very much,
> > IMHO this is much better than what I had in mind!
> >
> > The only thing that I would change is that I woudn't reuse 'struct vmcb_save_area'
> > for the copy, as this both wastes space (minor issue),
> > and introduces a chance of someone later using non copied
> > fields from it (can cause a bug later on).
> >
> > I would just define a new struct for that (but keep same names
> > for readability)
> >
> > Maybe something like 'struct vmcb_save_area_cached'?
>
> I agree, I like this too. However, it needs a comment that this new struct
> is not kept up-to-date, and is only valid until enter_svm_guest_mode.
>
> I might even propose a
>
> #ifdef CONFIG_DEBUG_KERNEL
> memset(&svm->nested.save, 0xaf, sizeof(svm->nested.save));
> #endif
>
> but there are no uses of CONFIG_DEBUG_KERNEL in all of Linux so it's
> probably not the way one should use that symbol. Can anybody think of a
> similar alternative? Or should the memset simply be unconditional?

I still think this doesn't go far enough to prevent TOCTOU bugs, and in general
KVM lacks a coherent design/approach in this area. Case in point, the next patch
fails to handle at least one, probably more, TOCTOU bugs. CR3 is checked using
KVM's copy (svm->nested.save)

/*
* These checks are also performed by KVM_SET_SREGS,
* except that EFER.LMA is not checked by SVM against
* CR0.PG && EFER.LME.
*/
if ((save->efer & EFER_LME) && (save->cr0 & X86_CR0_PG)) {
if (CC(!(save->cr4 & X86_CR4_PAE)) ||
CC(!(save->cr0 & X86_CR0_PE)) ||
CC(kvm_vcpu_is_illegal_gpa(vcpu, save->cr3)))
return false;
}

but KVM prepares vmcb02 and the MMU using the CR3 value directly from vmcb12.

nested_vmcb02_prepare_control(svm);
nested_vmcb02_prepare_save(svm, vmcb12);

ret = nested_svm_load_cr3(&svm->vcpu, vmcb12->save.cr3,
nested_npt_enabled(svm), true);

I assume there is similar badness in nested_vmcb02_prepare_save()'s usage of
vmcb12, but even if there isn't, IMO that it's even a question/possibility means
KVM is broken. I.e. KVM should fully unmap L1's vmcb12 before doing _anything_.
(1) map, (2) copy, (3) unmap, (4) check, (5) consume. Yes, there are performance
gains to be had (or lost), but we need a fully correct/functional baseline before
we start worrying about performance. E.g. the copy at step (2) can be optimized
to copy only data that is not marked cleaned, but first we should have a version
of KVM that has no optimizations and just copies the entire vmcb (or at least the
chunks KVM consumes).

On a related topic, this would be a good opportunity to resolve the naming
discrepancies between VMX and SVM. VMX generally refers to vmcs12 as KVM's copy
of L1's VMCS, whereas SVM generally refers to vmcb12 as the "direct" mapping of
L1's VMCB. I'd prefer to go with VMX's terminology, i.e. rework nSVM to refer to
the copy as vmcb12, but I'm more than a bit biased since I've spent so much time
in nVMX,