Re: [PATCH 2/4] KVM: x86/mmu: Defer "full" MMU setup until after vendor hardware_setup()

From: Sean Christopherson
Date: Mon Jun 27 2022 - 11:41:00 EST


On Sat, Jun 25, 2022, David Matlack wrote:
> On Fri, Jun 24, 2022 at 11:27:33PM +0000, Sean Christopherson wrote:
> > Alternatively, the setup could be done in kvm_configure_mmu(), but that
> > would require vendor code to call e.g. kvm_unconfigure_mmu() in teardown
> > and error paths, i.e. doesn't actually save code and is arguably uglier.
> [...]
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 17ac30b9e22c..ceb81e04aea3 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -6673,10 +6673,8 @@ void kvm_mmu_x86_module_init(void)
> > * loaded as many of the masks/values may be modified by VMX or SVM, i.e. need
> > * to be reset when a potentially different vendor module is loaded.
> > */
> > -int kvm_mmu_vendor_module_init(void)
> > +void kvm_mmu_vendor_module_init(void)
> > {
> > - int ret = -ENOMEM;
> > -
> > /*
> > * MMU roles use union aliasing which is, generally speaking, an
> > * undefined behavior. However, we supposedly know how compilers behave
> > @@ -6687,7 +6685,13 @@ int kvm_mmu_vendor_module_init(void)
> > BUILD_BUG_ON(sizeof(union kvm_mmu_extended_role) != sizeof(u32));
> > BUILD_BUG_ON(sizeof(union kvm_cpu_role) != sizeof(u64));
> >
> > + /* Reset the PTE masks before the vendor module's hardware setup. */
> > kvm_mmu_reset_all_pte_masks();
> > +}
> > +
> > +int kvm_mmu_hardware_setup(void)
> > +{
>
> Instead of putting this code in a new function and calling it after
> hardware_setup(), we could put it in kvm_configure_mmu().a

Ya, I noted that as an alternative in the changelog but obviously opted to not
do the allocation in kvm_configure_mmu(). I view kvm_configure_mmu() as a necessary
evil. Ideally vendor code wouldn't call into the MMU during initialization, and
common x86 would fully dictate the order of calls so that MMU setup. We could force
that, but it'd require something gross like filling a struct passed into
ops->hardware_setup(), and probably would be less robust (more likely to omit a
"required" field).

In other words, I like the explicit kvm_mmu_hardware_setup() call from common x86,
e.g. to show that vendor code needs to do setup before the MMU, and so that MMU
setup isn't buried in a somewhat arbitrary location in vendor hardware setup.

I'm not dead set against handling this in kvm_configure_mmu() (though I'd probably
vote to rename it to kvm_mmu_hardware_setup()) if anyone has a super strong opinion.

> This will result in a larger patch diff, but has it eliminates a subtle
> and non-trivial-to-verify dependency ordering between

Verification is "trivial" in that this WARN will fire if the order is swapped:

if (WARN_ON_ONCE(!nr_sptes_per_pte_list))
return -EIO;

> kvm_configure_mmu() and kvm_mmu_hardware_setup() and it will co-locate
> the initialization of nr_sptes_per_pte_list and the code that uses it to
> create pte_list_desc_cache in a single function.