Re: [PATCH 11/15] KVM: x86/MMU: Refactor vmx_get_mt_mask
From: Sean Christopherson
Date: Mon Nov 22 2021 - 13:47:11 EST
On Mon, Nov 22, 2021, Ben Gardon wrote:
> On Fri, Nov 19, 2021 at 1:03 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
> > On 11/18/21 16:30, Sean Christopherson wrote:
> > > If we really want to make this state per-vCPU, KVM would need to incorporate the
> > > CR0.CD and MTRR settings in kvm_mmu_page_role. For MTRRs in particular, the worst
> > > case scenario is that every vCPU has different MTRR settings, which means that
> > > kvm_mmu_page_role would need to be expanded by 10 bits in order to track every
> > > possible vcpu_idx (currently capped at 1024).
> > Yes, that's insanity. I was also a bit skeptical about Ben's try_get_mt_mask callback,
> > but this would be much much worse.
> Yeah, the implementation of that felt a bit kludgy to me too, but
> refactoring the handling of all those CR bits was way more complex
> than I wanted to handle in this patch set.
> I'd love to see some of those CR0 / MTRR settings be set on a VM basis
> and enforced as uniform across vCPUs.
Architecturally, we can't do that. Even a perfectly well-behaved guest will have
(small) periods where the BSP has different settings than APs. And it's technically
legal to have non-uniform MTRR and CR0.CD/NW configurations, even though no modern
BIOS/kernel does that. Except for non-coherent DMA, it's a moot point because KVM
can simply ignore guest cacheability settings.
> Looking up vCPU 0 and basing things on that feels extra hacky though,
> especially if we're still not asserting uniformity of settings across
IMO, it's marginally less hacky than what KVM has today as it allows KVM's behavior
to be clearly and sanely stated, e.g. KVM uses vCPU0's cacheability settings when
mapping non-coherent DMA. Compare that with today's behavior where the cacheability
settings depend on which vCPU first faulted in the address for a given MMU role and
instance of the associated root, and whether other vCPUs share an MMU role/root.
> If we need to track that state to accurately virtualize the hardware
> though, that would be unfortunate.