Re: [PATCH v1] mm: fix data-race in folio_batch_count()
From: Lorenzo Stoakes
Date: Wed Jun 24 2026 - 10:24:06 EST
On Wed, Jun 24, 2026 at 05:26:06PM +0800, Xuewen Wang wrote:
> KCSAN reports:
>
> BUG: KCSAN: data-race in __lru_add_drain_all / folio_batch_add_and_move
Where? A syzbot report? A local run? Please specify.
>
> write to 0xffff98fe74c015f8 of 1 bytes by task 45153 on cpu 2:
> folio_batch_add+0x30/0xe0
>
> read to 0xffff98fe74c015f8 of 1 bytes by task 45175 on cpu 0:
> folio_batch_count+0x0/0x10
> cpu_needs_drain+0x253/0x430
>
> The write side is a per-cpu local operation (folio_batch_add on the
> CPU that owns the per-cpu batch), while cpu_needs_drain() reads
> another CPU's per-cpu batch without locking. Reading a slightly stale
> value is harmless -- it only determines whether to schedule a drain,
Then why are we adding a READ_ONCE() in such a core helper?
> and a subsequent check will catch it.
Where? Which check? Be specific.
>
> Use READ_ONCE() to annotate the read and prevent load tearing, which
> also suppresses the KCSAN warning.
Tearing on a single byte? Which architecture tears a single byte?
I think you're actually more concerned about the value being optimised out on
assumption of it not being updated elsewhere right?
But acutally you're not, because everybody else uses a stack variable or _their
own_ per-CPU value?
Only cpu_needs_drain() is the odd one out right?
>
> Signed-off-by: Xuewen Wang <wangxuewen@xxxxxxxxxx>
> ---
> include/linux/folio_batch.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/folio_batch.h b/include/linux/folio_batch.h
> index b45946adc50b..1e31e058e19d 100644
> --- a/include/linux/folio_batch.h
> +++ b/include/linux/folio_batch.h
> @@ -53,7 +53,7 @@ static inline void folio_batch_reinit(struct folio_batch *fbatch)
>
> static inline unsigned int folio_batch_count(const struct folio_batch *fbatch)
> {
> - return fbatch->nr;
> + return READ_ONCE(fbatch->nr);
This isn't free, you're breaking optimisations here by doing that...
It feels like the wrong level of abstraction, but actually I think every other
case is either stack or per-CPU _on its own CPU_ (please check), in which case
we _can_ suppress the check here but I think best done with data_race().
And see the kernel bug bot report, you need to add:
#include <linux/compiler.h>
Too for that.
> }
>
> static inline unsigned int folio_batch_space(const struct folio_batch *fbatch)
> --
> 2.25.1
>
Thanks, Lorenzo