Re: [RFC PATCH 3/3] mm: entirely remove lru_add_drain in do_swap_page

From: Kairui Song

Date: Thu Jun 18 2026 - 03:31:37 EST

On Wed, Jun 17, 2026 at 5:55 PM Barry Song <baohua@xxxxxxxxxx> wrote:
>
> On Wed, Jun 17, 2026 at 1:38 PM Kairui Song <ryncsn@xxxxxxxxx> wrote:
> >
> > On Thu, Jun 11, 2026 at 6:52 PM Barry Song (Xiaomi) <baohua@xxxxxxxxxx> wrote:
> > >
> > > We are doing a lot of redundant lru_add_drain() calls in
> > > do_swap_page(), especially for synchronous I/O devices. For
> > > example, the test program below currently ends up draining
> > > lru_cache 100% of the time:
> > >
> > > int main(int argc, char *argv[])
> > > {
> > > int i;
> > > #define SIZE 100*1024*1024
> > > while(1) {
> > > volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE,
> > > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > >
> > > for (int i = 0; i < SIZE/sizeof(int); i++)
> > > p[i] = i%64;
> > > madvise((void *)p, SIZE, MADV_PAGEOUT);
> > > for (int i = 0; i < SIZE/sizeof(int); i++)
> > > p[i] = i%64;
> > > munmap(p, SIZE);
> > > }
> > > return 0;
> > > }
> > >
> > > Folio reuse now relies primarily on the exclusive hint, making
> > > lru_cache draining to drop the refcount in lru_cache largely
> > > irrelevant.
> > >
> > > Signed-off-by: Barry Song (Xiaomi) <baohua@xxxxxxxxxx>
> > > ---
> > > mm/memory.c | 10 ----------
> > > 1 file changed, 10 deletions(-)
> > >
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index ce8ef27e7a54..b5a78670bcc8 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -4903,16 +4903,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > > } else if (folio != swapcache)
> > > page = folio_page(folio, 0);
> > >
> > > - /*
> > > - * If we want to map a page that's in the swapcache writable, we
> > > - * have to detect via the refcount if we're really the exclusive
> > > - * owner. Try removing the extra reference from the local LRU
> > > - * caches if required.
> > > - */
> > > - if ((vmf->flags & FAULT_FLAG_WRITE) &&
> > > - !folio_test_ksm(folio) && !folio_test_lru(folio))
> > > - lru_add_drain();
> > > -
> > > folio_throttle_swaprate(folio, GFP_KERNEL);
> > >
> > > /*
> > > --
> > > 2.39.3 (Apple Git-146)
> > >
> > >
> >
> > There is a ref check in should_try_to_free_swap though, perhaps we
> > should also improve that part too or we may have more more folios
> > stuck in swap cache after this change?
>
> Good catch. We could assume that a !LRU folio has an extra reference held
> by lru_cache. Does the below make sense to you?
>
> diff --git a/mm/memory.c b/mm/memory.c
> index ff338c2abe92..237220ae1572 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5094,8 +5094,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> * Remove the swap entry and conditionally try to free up the swapcache.
> * Do it after mapping, so raced page faults will likely see the folio
> * in swap cache and wait on the folio lock.
> + * Assume that lru_cache contributes an extra reference for non-LRU
> + * folios.
> */
> - if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags))
> + if (should_try_to_free_swap(si, folio, vma, nr_pages +
> + !folio_test_lru(folio), vmf->flags))
> folio_free_swap(folio);
>
> folio_unlock(folio);

Right, this seems better. I've been thinking if we can just remove
that ref check in should_try_to_free_swap and always return true to
reclaim the cache for write faults. However, that seems like a bad
idea after all since shared clean swapin folios on non-sync devices
might benefit significantly from being cached, and such cases are
unlikely to be very rare.

Just one thing: I think maybe we can have a more common helper to be
shared by `wp_can_reuse_anon_folio` and `should_try_to_free_swap`.
Since the cache reclaim in `should_try_to_free_swap` is meant to help
`wp_can_reuse_anon_folio` (IIUC), keeping them more consistent seems
easier to maintain and understand. Trivial though.

It's fine this way too, it's technically correct and this is only a
heuristic check which doesn't effect correctness.