Re: [PATCH v2 4/4] mm: try to free swapcache for non-LRU folios

From: Barry Song

Date: Mon Jun 29 2026 - 19:59:27 EST

On Sat, Jun 27, 2026 at 12:36 AM David Hildenbrand (Arm)
<david@xxxxxxxxxx> wrote:
>
> >>> That changed that behavior such that we *must* now always fallback to do_wp_page().
> >>>
> >>> What a mess (I didn't ack)
> >
> > That's awkward, my bad, terribly sorry about this :(
> >
> > I sincerely apologize for the oversight and the trouble this has caused.
> >
>
> All good, I was just surprised to see a previous optimization partially
> reverted without a clear reasoning :)
>
> Because it should have removed the handling in should_try_to_free_swap() as well.
>
> It's good that we are discussing it now!
>
> > I haven't seen any performance regression in any workload recently
> > though, or any correctness issue, perhaps the round trip of
> > do_wp_page wasn't that bad. It should still catch the reuse folios,
> > just more costly than doing things in-place.
>
> Right, do_wp_page() handles it, after the page was mapped. It adds some overhead,
> but fortunately no TLB flush if we're just upgrading write permissions.
>
> The optimization dates back to pre PageAnonExclusive handling.
>
> >
> > I think we should restore the original check first. We might also want to
> > avoid dropping the swap cache if the folio will not be reused, which
> > was discussed here:
> >
> > https://lore.kernel.org/linux-mm/CAMgjq7BDfvNXdWH0cqarsujjUn3i3tDDhDkmSg01TR4h-tDorQ@xxxxxxxxxxxxxx/
> >
> > Maybe extracting some common part into a helper can help make this
> > cleaner.
> >
> >>
> >> So it seems we need a patch before patch 4 to fix the missed
> >> swapcache freeing on write faults, because the write bit has already
> >> been cleared by the time the folio is mapped and reused?
> >>
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index 14577c67c61a..e8fc5a86f6ea 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -4752,6 +4752,7 @@ static void check_swap_exclusive(struct folio
> >> *folio, swp_entry_t entry,
> >> */
> >> vm_fault_t do_swap_page(struct vm_fault *vmf)
> >> {
> >> + enum fault_flag fault_flags = vmf->flags;
> >> struct vm_area_struct *vma = vmf->vma;
> >> struct folio *swapcache = NULL, *folio;
> >> struct page *page;
> >> @@ -5091,7 +5092,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> >> * an additional reference to the folio.
> >> */
> >> if (should_try_to_free_swap(si, folio, vma, nr_pages +
> >> - !folio_test_lru(folio), vmf->flags))
> >> + !folio_test_lru(folio), fault_flags))
> >> folio_free_swap(folio);
> >>
> >> folio_unlock(folio);
> >>
> >>
> >> Best Regards
> >> Barry
> >
> > Hi Barry,
> >
> > The problem is more than that, the `exclusive || folio_ref_count(folio) == 1`
> > in do_swap_page is also ineffective now.
>
> Exactly.
>
> If the roundtrip through do_wp_page() is good enough today, we can just do
>
> @@ -4512,7 +4516,6 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
> static inline bool should_try_to_free_swap(struct swap_info_struct *si,
> struct folio *folio,
> struct vm_area_struct *vma,
> - unsigned int extra_refs,
> unsigned int fault_flags)
> {
> if (!folio_test_swapcache(folio))
> @@ -4528,14 +4531,7 @@ static inline bool should_try_to_free_swap(struct swap_info_struct *si,
> if (mem_cgroup_swap_full(folio) || (vma->vm_flags & VM_LOCKED) ||
> folio_test_mlocked(folio))
> return true;
> - /*
> - * If we want to map a page that's in the swapcache writable, we
> - * have to detect via the refcount if we're really the exclusive
> - * user. Try freeing the swapcache to get rid of the swapcache
> - * reference only in case it's likely that we'll be the exclusive user.
> - */
> - return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
> - folio_ref_count(folio) == (extra_refs + folio_nr_pages(folio));
> + return false;
> }
>
> static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
> @@ -5095,7 +5091,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> * Do it after mapping, so raced page faults will likely see the folio
> * in swap cache and wait on the folio lock.
> */
> - if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags))
> + if (should_try_to_free_swap(si, folio, vma, vmf->flags))
> folio_free_swap(folio);
>
> folio_unlock(folio);
>
> But then, one question is whether we'd actually want to try removing the swapcache when
> we mapped the page writable (iow: exclusive)?

I guess this question comes from my earlier commit c18160dba5ff
("mm: swap: reuse exclusive folio directly instead of wp page faults"),
where the folio is reused even for READ faults. In that case, we
would miss do_wp_page(), which could later drop the swapcache?
Also, again, nobody has reported any regression for this.

Holding swapcache for a clean folio for non-sync swap I/O has the
benefit of avoiding a potential pageout(). Now, reuse even for read
faults and always dropping swapcache seems to somewhat defeat that
benefit. On the other hand, we always drop swapcache for sync I/O
to avoid the copy in zRAM or elsewhere consuming memory, so it seems
safe enough to always enable reuse in do_swap_page() for sync I/O.

>
>
> @@ -4512,7 +4516,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
> static inline bool should_try_to_free_swap(struct swap_info_struct *si,
> struct folio *folio,
> struct vm_area_struct *vma,
> - unsigned int extra_refs,
> + bool exclusive,
> unsigned int fault_flags)
> {
> if (!folio_test_swapcache(folio))
> @@ -4529,13 +4533,11 @@ static inline bool should_try_to_free_swap(struct swap_info_struct *si,
> folio_test_mlocked(folio))
> return true;
> /*
> - * If we want to map a page that's in the swapcache writable, we
> - * have to detect via the refcount if we're really the exclusive
> - * user. Try freeing the swapcache to get rid of the swapcache
> - * reference only in case it's likely that we'll be the exclusive user.
> + * We have an exclusive page that was mapped writable or will soon
> + * be mapped writable (as we are in a write fault). Let's just try
> + * to reclaim swap immediately.
> */
> - return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
> - folio_ref_count(folio) == (extra_refs + folio_nr_pages(folio));
> + return (fault_flags & FAULT_FLAG_WRITE) && exclusive;

I assume you mean (fault_flags & FAULT_FLAG_WRITE) || exclusive, or
we can just remove (fault_flags & FAULT_FLAG_WRITE) and use
"exclusive" instead since we are always using do_wp_page() now,
and FAULT_FLAG_WRITE in fault_flags could have been cleared
by the reuse of do_swap_page().

Best Regards
Barry