Re: [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios

From: Kairui Song

Date: Thu Jun 25 2026 - 10:42:46 EST

On Thu, Jun 25, 2026 at 05:14:45AM +0800, Barry Song wrote:
> On Wed, Jun 24, 2026 at 11:20 PM David Hildenbrand (Arm)
> <david@xxxxxxxxxx> wrote:
> >
> > On 6/24/26 01:16, Barry Song (Xiaomi) wrote:
> > > Originally, we unconditionally called lru_add_drain() for write
> > > swap-in page faults. This might drop the reference held by the per-CPU
> > > LRU cache if the folio happened to reside there. However, there was no
> > > guarantee that the folio was actually cached on the current CPU.
> > >
> > > Now that lru_add_drain() has been removed, we have lost one
> > > opportunity to drop a reference held by the LRU cache. We could
> > > instead incorporate that possibility into the condition evaluated by
> > > should_try_to_free_swap().
> > >
> > > Suggested-by: Kairui Song <ryncsn@xxxxxxxxx>
> > > Signed-off-by: Barry Song (Xiaomi) <baohua@xxxxxxxxxx>
> > > ---
> > > mm/memory.c | 5 ++++-
> > > 1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 2983a6baf474..14577c67c61a 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -5087,8 +5087,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > > * Remove the swap entry and conditionally try to free up the swapcache.
> > > * Do it after mapping, so raced page faults will likely see the folio
> > > * in swap cache and wait on the folio lock.
> > > + * Assume non-LRU folios may be queued in the LRU cache, which contributes
> > > + * an additional reference to the folio.
> > > */
> > > - if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags))
> > > + if (should_try_to_free_swap(si, folio, vma, nr_pages +
> > > + !folio_test_lru(folio), vmf->flags))
> > > folio_free_swap(folio);
> > >
> > > folio_unlock(folio);
> >
> > Hm, in wp_can_reuse_anon_folio() we'll try dropping the swapcache ourselves.
> >
> > So I wonder if we still need that handling ("If we want to map a page that's in
> > the swapcache writable, we ...") at all?
> >
> >
> > Ahh, I see the problem now:
> >
> > commit 4b34f1d82c6549837b2061096dea249e881a4495
> > Author: Kairui Song <kasong@xxxxxxxxxxx>
> > Date: Sat Dec 20 03:43:35 2025 +0800
> >
> > mm, swap: free the swap cache after folio is mapped
> >
> > Currently, we remove the folio from the swap cache and free the swap cache
> > before mapping the PTE. To reduce repeated faults due to parallel swapins
> > of the same PTE, change it to remove the folio from the swap cache after
> > it is mapped. So new faults from the swap PTE will be much more likely to
> > see the folio in the swap cache and wait on it.
> >
> > This does not eliminate all swapin races: an ongoing swapin fault may
> > still see an empty swap cache. That's harmless, as the PTE is changed
> > before the swap cache is cleared, so it will just return and not trigger
> > any repeated faults. This does help to reduce the chance.
> >
> >
> > That changed that behavior such that we *must* now always fallback to do_wp_page().
> >
> > What a mess (I didn't ack)

That's awkward, my bad, terribly sorry about this :(

I sincerely apologize for the oversight and the trouble this has caused.

I haven't seen any performance regression in any workload recently
though, or any correctness issue, perhaps the round trip of
do_wp_page wasn't that bad. It should still catch the reuse folios,
just more costly than doing things in-place.

I think we should restore the original check first. We might also want to
avoid dropping the swap cache if the folio will not be reused, which
was discussed here:

https://lore.kernel.org/linux-mm/CAMgjq7BDfvNXdWH0cqarsujjUn3i3tDDhDkmSg01TR4h-tDorQ@xxxxxxxxxxxxxx/

Maybe extracting some common part into a helper can help make this
cleaner.

>
> So it seems we need a patch before patch 4 to fix the missed
> swapcache freeing on write faults, because the write bit has already
> been cleared by the time the folio is mapped and reused?
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 14577c67c61a..e8fc5a86f6ea 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4752,6 +4752,7 @@ static void check_swap_exclusive(struct folio
> *folio, swp_entry_t entry,
> */
> vm_fault_t do_swap_page(struct vm_fault *vmf)
> {
> + enum fault_flag fault_flags = vmf->flags;
> struct vm_area_struct *vma = vmf->vma;
> struct folio *swapcache = NULL, *folio;
> struct page *page;
> @@ -5091,7 +5092,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> * an additional reference to the folio.
> */
> if (should_try_to_free_swap(si, folio, vma, nr_pages +
> - !folio_test_lru(folio), vmf->flags))
> + !folio_test_lru(folio), fault_flags))
> folio_free_swap(folio);
>
> folio_unlock(folio);
>
>
> Best Regards
> Barry

Hi Barry,

The problem is more than that, the `exclusive || folio_ref_count(folio) == 1`
in do_swap_page is also ineffective now.

But that earlier commit 4b34f1d82c65 is kind of meaningless after recent
refactor of swapin. It was trying to avoid redundant folio allocation
and thrashing from concurrent faulting process of the same PTE, but recent
swap table rework already gated that with unified folio allocation. We
are freeing the swap entry first here, that is effective enough so that
commit is no longer needed.

So perhaps we can first just revert it, how do you think?: