Re: [PATCH v3 06/12] KVM: x86: don't disable APICv memslot when inhibited

From: Sean Christopherson
Date: Mon Aug 09 2021 - 15:15:01 EST


On Mon, Aug 09, 2021, Maxim Levitsky wrote:
> On Tue, 2021-08-03 at 10:44 +0200, Paolo Bonzini wrote:
> > Reviewing this patch and the next one together.
> >
> > On 02/08/21 20:33, Maxim Levitsky wrote:
> > > +static int avic_alloc_access_page(struct kvm *kvm)
> > > {
> > > void __user *ret;
> > > int r = 0;
> > >
> > > mutex_lock(&kvm->slots_lock);
> > > +
> > > + if (kvm->arch.apic_access_memslot_enabled)
> > > goto out;
> >
> > This variable is overloaded between "is access enabled" and "is the
> > memslot allocated". I think you should check
> > kvm->arch.apicv_inhibit_reasons instead in kvm_faultin_pfn.
> >
> >
> > > + if (!activate)
> > > + kvm_zap_gfn_range(kvm, gpa_to_gfn(APIC_DEFAULT_PHYS_BASE),
> > > + gpa_to_gfn(APIC_DEFAULT_PHYS_BASE + PAGE_SIZE));
> > > +
> >
> > Off by one, the last argument of kvm_zap_gfn_range is inclusive:
>
> Actually is it?

Nope. The actual implementation is exclusive for both legacy and TDP MMU. And
as you covered below, the fixed and variable MTRR helpers provide exclusive
start+end, so there's no functional bug. The "0 - ~0" use case is irrevelant
because there can't be physical memory at -4096.

The ~0ull case can be fixed by adding a helper to get the max GFN possible, e.g.
steal this code from kvm_tdp_mmu_put_root()

gfn_t max_gfn = 1ULL << (shadow_phys_bits - PAGE_SHIFT);

and maybe add a comment saying it intentionally ignores guest.MAXPHYADDR (from
CPUID) so that the helper can be used even when CPUID is being modified.

> There are 3 uses of this function.
> Two of them (kvm_post_set_cr0 and one case in update_mtrr) use 0,~0ULL which is indeed inclusive,
> but for variable mtrrs I see that in var_mtrr_range this code:
>
> *end = (*start | ~mask) + 1;
>
> and the *end is passed to kvm_zap_gfn_range.
>
>
> Another thing I noticed that I added calls to kvm_inc_notifier_count/kvm_dec_notifier_count
> in the kvm_zap_gfn_range but these do seem to have non inclusive ends, thus
> I need to fix them sadly if this is the case.
> This depends on mmu_notifier_ops and it is not documented well.
>
> However at least mmu_notifier_retry_hva, does assume a non inclusive range since it checks
>
>
> hva >= kvm->mmu_notifier_range_start &&
> hva < kvm->mmu_notifier_range_end
>
>
> Also looking at the algorithm of the kvm_zap_gfn_range.
> Suppose that gfn_start == gfn_end and we have a memslot with one page at gfn_start
>
> Then:
>
>
> start = max(gfn_start, memslot->base_gfn); // start = memslot->base_gfn
> end = min(gfn_end, memslot->base_gfn + memslot->npages); // end = memslot->base_gfn
>
> if (start >= end)
> continue;
>
> In this case it seems that it will do nothing. So I suspect that kvm_zap_gfn_range
> actually needs non inclusive range but due to the facts that it was used much
> it didn't cause trouble.
>
> Another thing I found in kvm_zap_gfn_range:
>
> kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
>
> But kvm_flush_remote_tlbs_with_address expects (struct kvm *kvm, u64 start_gfn, u64 pages)

Heh, surpise, surprise, a rare path with no architecturally visible effects is
busted :-)

> kvm_flush_remote_tlbs_with_address is also for some reason called twice with
> the same parameters.

It's called twice in the current code because mmu_lock is dropped between handling
the current MMU and the legacy mmu.

> Could you help with that? Am I missing something?