Re: [PATCH] mm/swap_state: remove unnecessary lru_add_drain() from readahead

From: Barry Song

Date: Tue Jun 09 2026 - 04:04:19 EST

On Mon, Jun 8, 2026 at 10:33 PM Usama Arif <usama.arif@xxxxxxxxx> wrote:
>
> swap_cluster_readahead() and swap_vma_readahead() end the readahead
> loop with an explicit lru_add_drain() call. That drain is a leftover
> from 2.6.12 era code and serves no functional purpose for the callers:
>
> - do_swap_page() ignores LRU residency for the readahead folios;
> it only needs the target folio it called swapin_readahead() for,
> and if the write-fault path needs the target folio on the LRU to count
> references accurately, it runs its own lru_add_drain() at the
> wp_can_reuse_anon_folio() and do_swap_page() sites.

right. as i can see the below in do_swap_page():

/*
* If we want to map a page that's in the swapcache writable, we
* have to detect via the refcount if we're really the exclusive
* owner. Try removing the extra reference from the local LRU
* caches if required.
*/
if ((vmf->flags & FAULT_FLAG_WRITE) &&
!folio_test_ksm(folio) && !folio_test_lru(folio))
lru_add_drain();

and the below in wp_can_reuse_anon_folio():

if (!folio_test_lru(folio))
/*
* We cannot easily detect+handle references from
* remote LRU caches or references to LRU folios.
*/
lru_add_drain();

>
> - shmem_swapin_cluster() immediately locks the returned folio, waits
> for writeback, then operates on it - LRU residency of either the target
> or the readahead folios is irrelevant.
>
> - try_to_unuse() likewise locks the folio and calls unuse_pte() without
> depending on LRU presence.
>
> Folios newly added to the swap cache by the readahead loop sit in
> the per-CPU LRU folio_batch and will be drained naturally as the
> batch fills (FOLIO_BATCH_SIZE),by the next reclaim/compaction
> lru_add_drain_all() and so on. The unconditional drain only
> synchronously flushes a partial batch and forces contention on
> lruvec_lock.
>
> On a 176-CPU production host running a memory-pressured workload, this
> path was observed to call folio_batch_move_lru() from
> swap_cluster_readahead() ~28K/min, a very large source of LRU lock
> traffic.
>

Do we see a workload improvement? If yes, can we put the data?

Thanks
Barry