Re: [PATCH 00/19] mm: Support huge pfnmaps
From: Oliver Upton
Date: Wed Aug 14 2024 - 19:28:14 EST
On Wed, Aug 14, 2024 at 01:54:04PM -0700, Sean Christopherson wrote:
> TL;DR: it's probably worth looking at mmu_stress_test (was: max_guest_memory_test)
> on arm64, specifically the mprotect() testcase[1], as performance is significantly
> worse compared to x86,
Sharing what we discussed offline:
Sean was using a machine w/o FEAT_FWB for this test, so the increased
runtime on arm64 is likely explained by the CMOs we're doing when
creating or invalidating a stage-2 PTE.
Using a machine w/ FEAT_FWB would be better for making these sort of
cross-architecture comparisons. Beyond CMOs, we do have some
> and there might be bugs lurking the mmu_notifier flows.
Impossible! :)
> Jumping back to mmap_lock, adding a lock, vma_lookup(), and unlock in x86's page
> fault path for valid VMAs does introduce a performance regression, but only ~30%,
> not the ~6x jump from x86 to arm64. So that too makes it unlikely taking mmap_lock
> is the main problem, though it's still good justification for avoid mmap_lock in
> the page fault path.
I'm curious how much of that 30% in a microbenchmark would translate to
real world performance, since it isn't *that* egregious. We also have
other uses for getting at the VMA beyond mapping granularity (MTE and
the VFIO Normal-NC hint) that'd require some attention too.
--
Thanks,
Oliver