Re: [PATCH] KVM: x86/mmu: Complete prefetch for trailing SPTEs for direct, legacy MMU

From: Lai Jiangshan
Date: Wed Aug 25 2021 - 18:49:52 EST


On Thu, Aug 19, 2021 at 7:57 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> Make a final call to direct_pte_prefetch_many() if there are "trailing"
> SPTEs to prefetch, i.e. SPTEs for GFNs following the faulting GFN. The
> call to direct_pte_prefetch_many() in the loop only handles the case
> where there are !PRESENT SPTEs preceding a PRESENT SPTE.
>
> E.g. if the faulting GFN is a multiple of 8 (the prefetch size) and all
> SPTEs for the following GFNs are !PRESENT, the loop will terminate with
> "start = sptep+1" and not prefetch any SPTEs.
>
> Prefetching trailing SPTEs as intended can drastically reduce the number
> of guest page faults, e.g. accessing the first byte of every 4kb page in
> a 6gb chunk of virtual memory, in a VM with 8gb of preallocated memory,
> the number of pf_fixed events observed in L0 drops from ~1.75M to <0.27M.
>
> Note, this only affects memory that is backed by 4kb pages as KVM doesn't
> prefetch when installing hugepages. Shadow paging prefetching is not
> affected as it does not batch the prefetches due to the need to process
> the corresponding guest PTE. The TDP MMU is not affected because it
> doesn't have prefetching, yet...
>
> Fixes: 957ed9effd80 ("KVM: MMU: prefetch ptes when intercepted guest #PF")
> Cc: Sergey Senozhatsky <senozhatsky@xxxxxxxxxx>
> Cc: Ben Gardon <bgardon@xxxxxxxxxx>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
>
> Cc'd Ben as this highlights a potential gap with the TDP MMU, which lacks
> prefetching of any sort. For large VMs, which are likely backed by
> hugepages anyways, this is a non-issue as the benefits of holding mmu_lock
> for read likely masks the cost of taking more VM-Exits. But VMs with a
> small number of vCPUs won't benefit as much from parallel page faults,
> e.g. there's no benefit at all if there's a single vCPU.
>
> arch/x86/kvm/mmu/mmu.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a272ccbddfa1..daf7df35f788 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2818,11 +2818,13 @@ static void __direct_pte_prefetch(struct kvm_vcpu *vcpu,
> if (!start)
> continue;
> if (direct_pte_prefetch_many(vcpu, sp, start, spte) < 0)
> - break;
> + return;
> start = NULL;
> } else if (!start)
> start = spte;
> }
> + if (start)
> + direct_pte_prefetch_many(vcpu, sp, start, spte);
> }


Reviewed-by: Lai Jiangshan <jiangshanlai@xxxxxxxxx>

>
> static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
> --
> 2.33.0.rc1.237.g0d66db33f3-goog
>