Re: [PATCH v7 00/18] mm: multi-gen LRU: Walk secondary MMU page tables while aging

From: James Houghton
Date: Mon Oct 14 2024 - 20:08:05 EST


On Mon, Oct 14, 2024 at 4:22 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Thu, Sep 26, 2024, James Houghton wrote:
> > This patchset makes it possible for MGLRU to consult secondary MMUs
> > while doing aging, not just during eviction. This allows for more
> > accurate reclaim decisions, which is especially important for proactive
> > reclaim.
>
> ...
>
> > James Houghton (14):
> > KVM: Remove kvm_handle_hva_range helper functions
> > KVM: Add lockless memslot walk to KVM
> > KVM: x86/mmu: Factor out spte atomic bit clearing routine
> > KVM: x86/mmu: Relax locking for kvm_test_age_gfn and kvm_age_gfn
> > KVM: x86/mmu: Rearrange kvm_{test_,}age_gfn
> > KVM: x86/mmu: Only check gfn age in shadow MMU if
> > indirect_shadow_pages > 0
> > mm: Add missing mmu_notifier_clear_young for !MMU_NOTIFIER
> > mm: Add has_fast_aging to struct mmu_notifier
> > mm: Add fast_only bool to test_young and clear_young MMU notifiers
>
> Per offline discussions, there's a non-zero chance that fast_only won't be needed,
> because it may be preferable to incorporate secondary MMUs into MGLRU, even if
> they don't support "fast" aging.
>
> What's the status on that front? Even if the status is "TBD", it'd be very helpful
> to let others know, so that they don't spend time reviewing code that might be
> completely thrown away.

The fast_only MMU notifier changes will probably be removed in v8.

ChromeOS folks found that the way MGLRU *currently* interacts with KVM
is problematic. That is, today, with the MM_WALK MGLRU capability
enabled, normal PTEs have their Accessed bits cleared via a page table
scan and then during an rmap walk upon attempted eviction, whereas,
KVM SPTEs only have their Accessed bits cleared via the rmap walk at
eviction time. So KVM SPTEs have their Accessed bits cleared less
frequently than normal PTEs, and therefore they appear younger than
they should.

It turns out that this causes tab open latency regressions on ChromeOS
where a significant amount of memory is being used by a VM. IIUC, the
fix for this is to have MGLRU age SPTEs as often as it ages normal
PTEs; i.e., it should call the correct MMU notifiers each time it
clears A bits on PTEs. The final patch in this series sort of does
this, but instead of calling the new fast_only notifier, we need to
call the normal test/clear_young() notifiers regardless of how fast
they are.

This also means that the MGLRU changes no longer depend on the KVM
optimizations, as they can motivated independently.

Yu, have I gotten anything wrong here? Do you have any more details to share?