Re: [PATCH v2] mm: annotate data-race in cpu_needs_drain() and need_mlock_drain()

From: Pedro Falcato

Date: Thu Jun 25 2026 - 05:31:50 EST


On Thu, Jun 25, 2026 at 02:51:53PM +0800, Xuewen Wang wrote:
> KCSAN reports a data-race when cpu_needs_drain() reads another CPU's
> per-cpu folio_batch->nr without locking, while the owning CPU writes
> to it via folio_batch_add(). The same race exists in need_mlock_drain()
> which is called from cpu_needs_drain().
>
> Reading a slightly stale value is harmless -- cpu_needs_drain() only
> decides whether to schedule a drain, and the next iteration of
> __lru_add_drain_all() will re-check.
>
> All other callers of folio_batch_count() either use stack variables or
> access their own CPU's per-cpu data where no race exists, so
> data_race() is added at the call sites rather than in
> folio_batch_count() itself to avoid suppressing KCSAN warnings for
> future callers that may have real bugs.
>
> Signed-off-by: Xuewen Wang <wangxuewen@xxxxxxxxxx>
> ---
> Changes in v2:
> - Use data_race() instead of READ_ONCE() in folio_batch_count(), as
> suggested by Lorenzo. READ_ONCE() is unnecessary for a single-byte
> read and imposes overhead on all callers, most of which have no race.
> - Move the annotation from folio_batch_count() to the actual call sites
> (cpu_needs_drain() and need_mlock_drain()) where the cross-CPU race
> occurs, rather than affecting all callers.
> - Add need_mlock_drain() which has the same cross-CPU race.
> - Add comments explaining why the data race is safe.
> v1:
> https://lore.kernel.org/all/20260624092606.1083449-1-wangxuewen@xxxxxxxxxx/
> ---
> mm/mlock.c | 2 +-
> mm/swap.c | 12 ++++++------
> 2 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 8c227fefa2df..fbdb5018e2c3 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -232,7 +232,7 @@ void mlock_drain_remote(int cpu)
>
> bool need_mlock_drain(int cpu)
> {
> - return folio_batch_count(&per_cpu(mlock_fbatch.fbatch, cpu));
> + return data_race(folio_batch_count(&per_cpu(mlock_fbatch.fbatch, cpu)));
> }
>
> /**
> diff --git a/mm/swap.c b/mm/swap.c
> index 588f50d8f1a8..d046428caed6 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -828,12 +828,12 @@ static bool cpu_needs_drain(unsigned int cpu)
> struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);
>
> /* Check these in order of likelihood that they're not zero */
> - return folio_batch_count(&fbatches->lru_add) ||
> - folio_batch_count(&fbatches->lru_move_tail) ||
> - folio_batch_count(&fbatches->lru_deactivate_file) ||
> - folio_batch_count(&fbatches->lru_deactivate) ||
> - folio_batch_count(&fbatches->lru_lazyfree) ||
> - folio_batch_count(&fbatches->lru_activate) ||
> + return data_race(folio_batch_count(&fbatches->lru_add)) ||
> + data_race(folio_batch_count(&fbatches->lru_move_tail)) ||
> + data_race(folio_batch_count(&fbatches->lru_deactivate_file)) ||
> + data_race(folio_batch_count(&fbatches->lru_deactivate)) ||
> + data_race(folio_batch_count(&fbatches->lru_lazyfree)) ||
> + data_race(folio_batch_count(&fbatches->lru_activate)) ||
> need_mlock_drain(cpu) ||
> has_bh_in_lru(cpu, NULL);
> }

eww.

How about:

static bool cpu_needs_drain(unsigned int cpu)
{
struct cpu_fbatches *fbatches = &per_cpu(cpu_fbatches, cpu);

/* Check these in order of likelihood that they're not zero */
return data_race(
folio_batch_count(&fbatches->lru_add) ||
folio_batch_count(&fbatches->lru_move_tail) ||
folio_batch_count(&fbatches->lru_deactivate_file) ||
folio_batch_count(&fbatches->lru_deactivate) ||
folio_batch_count(&fbatches->lru_lazyfree) ||
folio_batch_count(&fbatches->lru_activate) ||
need_mlock_drain(cpu)) ||
has_bh_in_lru(cpu, NULL);
}

this should work equally well, while being far more aesthetically pleasing :)

> --
> 2.25.1
>

--
Pedro