Re: [PATCH 06/43] KVM: x86: Properly reset MMU context at vCPU RESET/INIT

From: Sean Christopherson
Date: Wed May 19 2021 - 13:16:29 EST


On Tue, May 18, 2021, Reiji Watanabe wrote:
> > > > + if (kvm_cr0_mmu_role_changed(old_cr0, kvm_read_cr0(vcpu)) ||
> > > > + kvm_cr4_mmu_role_changed(old_cr4, kvm_read_cr4(vcpu)))
> > > > + kvm_mmu_reset_context(vcpu);
> > > > }
> > >
> > > I'm wondering if kvm_vcpu_reset() should call kvm_mmu_reset_context()
> > > for a change in EFER.NX as well.
> >
> > Oooh. So there _should_ be no need. Paging has to be enabled for EFER.NX to
> > be relevant, and INIT toggles CR0.PG 1=>0 if paging was enabled and so is
> > guaranteed to trigger a context reset. And we do want to skip the context reset,
> > e.g. INIT-SIPI-SIPI when the vCPU has paging disabled should continue using the
> > same MMU.
> >
> > But, kvm_calc_mmu_role_common() neglects to ignore NX if CR0.PG=0, and so the
> > MMU role will be stale if INIT clears EFER.NX without forcing a context reset.
> > However, that's benign from a functionality perspective because the context
> > itself correctly incorporates CR0.PG, it's only the role that's borked. I.e.
> > KVM will fail to reuse a page/context due to the spurious role.nxe, but the
> > permission checks are always be correct.
> >
> > I'll add a comment here and send a patch to fix the role calculation.
>
> Thank you so much for the explanation !
> I understand your intention and why it would be benign.
>
> Then, I'm wondering if kvm_cr4_mmu_role_changed() needs to be
> called here. Looking at the Intel SDM, in my understanding,
> all the bits kvm_cr4_mmu_role_changed() checks are relevant
> only if paging is enabled. (Or is my understanding incorrect ??)

Duh, yes. And it goes even beyond that, CR0.WP is only relevant if CR0.PG=1,
i.e. INIT with CR0.PG=0 and CR0.WP=1 will incorrectly trigger a MMU reset with
the current logic.

Sadly, simply omitting the CR4 check puts us in an awkward situation where, due
to the MMU role CR4 calculations not accounting for CR0.PG=0, KVM will run with
a stale role.

The other consideration is that kvm_post_set_cr4() and kvm_post_set_cr0() should
also skip kvm_mmu_reset_context() if CR0.PG=0, but again that requires fixing
the role calculations first (or at the same time).

I think I'll throw in those cleanups to the beginning of this series. The result
is going to be disgustingly long, but I really don't want to introduce code that
knowingly leaves KVM in an inconsistent state, nor do I want to add useless
checks on CR4 and EFER.