Re: [RFC PATCH v2 6/9] mm: provide anon locality evidence for zswap large swapin
From: Fujunjie
Date: Sun May 31 2026 - 09:13:52 EST
On 5/30/2026 3:22 AM, Nhat Pham wrote:
> On Fri, May 29, 2026 at 5:19 AM fujunjie <fujunjie1@xxxxxx> wrote:
>>
>> The common zswap large-swapin policy needs locality evidence from
>> callers before it can admit a large folio. For anonymous faults, provide
>> that evidence from existing VMA hints and from the PTE young state left
>> by earlier zswap-backed large swapins.
>>
>> Keep non-faulting PTEs old when mapping a speculative all-zswap large
>> folio. A later fault can then require a dense young previous range before
>> admitting another large swapin without adding VMA state.
>
> Makes sense to me.
>
>>
>> This also removes the old zswap-enabled guard from the THP swapin
>> candidate scan. The common swapin path now classifies the backend range
>> and falls back to order-0 for mixed zswap/disk ranges or races.
>>
>> Signed-off-by: fujunjie <fujunjie1@xxxxxx>
>> ---
>> mm/memory.c | 234 +++++++++++++++++++++++++++++++++++++++++++-----
>> mm/swap.h | 6 ++
>> mm/swap_state.c | 15 ++++
>> 3 files changed, 235 insertions(+), 20 deletions(-)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 92a82008d583..7bbb89632000 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -4556,6 +4556,35 @@ static void memcg1_swapin_retry_folio(struct folio *folio,
>> folio_unlock(folio);
>> }
>>
>> +static void set_swapin_ptes(struct vm_area_struct *vma,
>> + unsigned long address, pte_t *ptep, pte_t pte,
>> + unsigned int nr_pages, unsigned int fault_pte_idx,
>> + bool fault_only_young)
>> +{
>> + struct mm_struct *mm = vma->vm_mm;
>> + pte_t old_pte;
>> +
>> + if (!fault_only_young || nr_pages == 1) {
>> + set_ptes(mm, address, ptep, pte, nr_pages);
>> + return;
>> + }
>> +
>> + old_pte = pte_mkold(pte);
>> + if (fault_pte_idx)
>> + set_ptes(mm, address, ptep, old_pte, fault_pte_idx);
>> +
>> + set_pte_at(mm, address + fault_pte_idx * PAGE_SIZE,
>> + ptep + fault_pte_idx,
>> + pte_mkyoung(pte_advance_pfn(pte, fault_pte_idx)));
>
> Hmm, does this mean that without THP swapin, the faulting PTE is not
> marked young, but it is marked young if there is a THP swapin. That's
> a behavioral change right? Would this throw off other heuristics
> relying on this bit, or any justification that this is fine?
Thanks.
The intent was not to make the faulting PTE behave differently from the
normal swapin path. In do_swap_page() we first build the PTE with:
pte = mk_pte(page, vma->vm_page_prot);
and on the common architectures I checked, the normal user pgprot already
contains the accessed/young bit. For example arm64 PAGE_SHARED/PAGE_READONLY
are based on _PAGE_DEFAULT, which includes PTE_AF, and x86 user page
protections also include the accessed bit. So in practice the faulting PTE is
already young after mk_pte() there.Therefore, the default path is originally marked as young.
What I really wanted here is only to keep the speculative neighbouring PTEs
old. A large zswapin may install PTEs for pages that did not fault, and those
should not all look accessed just because mk_pte() produced a young PTE.
But, the explicit pte_mkyoung() on the faulting PTE makes this look
like THP swapin is adding a new behavior.I will try to improve it
in a way that is less ambiguous.