Re: [PATCH] KVM: MMU: make PTE_PREFETCH_NUM tunable

From: Sergey Senozhatsky
Date: Wed Oct 13 2021 - 01:52:29 EST


On (21/10/12 09:50), David Matlack wrote:
> On Tue, Oct 12, 2021 at 2:16 AM Sergey Senozhatsky
> <senozhatsky@xxxxxxxxxxxx> wrote:
> >
> > Turn PTE_PREFETCH_NUM into a module parameter, so that it
> > can be tuned per-VM.
>
> Module parameters do not allow tuning per VM, they effect every VM on
> the machine.
>
> If you want per-VM tuning you could introduce a VM ioctl.

ACK.

> > ---
> > arch/x86/kvm/mmu/mmu.c | 31 ++++++++++++++++++++++---------
>
> Please also update the shadow paging prefetching code in
> arch/x86/kvm/mmu/paging_tmpl.h, unless there is a good reason to
> diverge.

ACK.

> > @@ -732,7 +734,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
> >
> > /* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */
> > r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
> > - 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
> > + 1 + PT64_ROOT_MAX_LEVEL + pte_prefetch_num);
>
> There is a sampling problem. What happens if the user changes
> pte_prefetch_num while a fault is being handled?

Good catch.

> > @@ -2753,20 +2755,29 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
> > struct kvm_mmu_page *sp,
> > u64 *start, u64 *end)
> > {
> > - struct page *pages[PTE_PREFETCH_NUM];
> > + struct page **pages;
> > struct kvm_memory_slot *slot;
> > unsigned int access = sp->role.access;
> > int i, ret;
> > gfn_t gfn;
> >
> > + pages = kmalloc_array(pte_prefetch_num, sizeof(struct page *),
> > + GFP_KERNEL);
>
> This code runs with the MMU lock held. From
> In general we avoid doing any dynamic memory allocation while the MMU
> lock is held. That's why the memory caches exist. You can avoid
> allocating under a lock by allocating the prefetch array when the vCPU
> is first initialized. This would also solve the module parameter
> sampling problem because you can read it once and store it in struct
> kvm_vcpu.

I'll do per-VCPU pre-allocation, thanks. GFP_KERNEL is less of a problem
if we hold read kvm->mmu_lock, but more so if we hold write kvm->mmu_lock.

> > static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
> > @@ -2785,10 +2798,10 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
> >
> > WARN_ON(!sp->role.direct);
> >
> > - i = (sptep - sp->spt) & ~(PTE_PREFETCH_NUM - 1);
> > + i = (sptep - sp->spt) & ~(pte_prefetch_num - 1);
>
> This code assumes pte_prefetch_num is a power of 2, which is now no
> longer guaranteed to be true.

It does. I can test if it's a pow(2) in ioctl