Re: [PATCH v1] mm: fix data-race in folio_batch_count()

From: Xuewen Wang

Date: Thu Jun 25 2026 - 03:23:23 EST

Hi Lorenzo,

Thanks for the review.

This was not reported by syzbot. We found it via internal KCSAN
fuzzing on our downstream v6.6.89 kernel.

I will send v2 shortly with the following changes:

- Use data_race() instead of READ_ONCE() in folio_batch_count(),
as you suggested. READ_ONCE() is unnecessary for a single-byte
read and would impose overhead on all callers.
- Move the annotation from folio_batch_count() to the actual call
sites (cpu_needs_drain() and need_mlock_drain()) where the
cross-CPU race occurs, rather than affecting all callers.
- Also add data_race() to need_mlock_drain() which has the same
cross-CPU race.
- Add comments explaining why the data race is safe.

Looking forward to your review and suggestions on v2.

Regards,
Xuewen Wang

在 2026/6/24 22:23, Lorenzo Stoakes 写道:
> On Wed, Jun 24, 2026 at 05:26:06PM +0800, Xuewen Wang wrote:
>> KCSAN reports:
>>
>> BUG: KCSAN: data-race in __lru_add_drain_all / folio_batch_add_and_move
>
> Where? A syzbot report? A local run? Please specify.
>
>>
>> write to 0xffff98fe74c015f8 of 1 bytes by task 45153 on cpu 2:
>> folio_batch_add+0x30/0xe0
>>
>> read to 0xffff98fe74c015f8 of 1 bytes by task 45175 on cpu 0:
>> folio_batch_count+0x0/0x10
>> cpu_needs_drain+0x253/0x430
>>
>> The write side is a per-cpu local operation (folio_batch_add on the
>> CPU that owns the per-cpu batch), while cpu_needs_drain() reads
>> another CPU's per-cpu batch without locking. Reading a slightly stale
>> value is harmless -- it only determines whether to schedule a drain,
>
> Then why are we adding a READ_ONCE() in such a core helper?
>
>> and a subsequent check will catch it.
>
> Where? Which check? Be specific.
>
>>
>> Use READ_ONCE() to annotate the read and prevent load tearing, which
>> also suppresses the KCSAN warning.
>
> Tearing on a single byte? Which architecture tears a single byte?
>
> I think you're actually more concerned about the value being optimised out on
> assumption of it not being updated elsewhere right?
>
> But acutally you're not, because everybody else uses a stack variable or _their
> own_ per-CPU value?
>
> Only cpu_needs_drain() is the odd one out right?
>
>>
>> Signed-off-by: Xuewen Wang <wangxuewen@xxxxxxxxxx>
>> ---
>> include/linux/folio_batch.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/folio_batch.h b/include/linux/folio_batch.h
>> index b45946adc50b..1e31e058e19d 100644
>> --- a/include/linux/folio_batch.h
>> +++ b/include/linux/folio_batch.h
>> @@ -53,7 +53,7 @@ static inline void folio_batch_reinit(struct folio_batch *fbatch)
>>
>> static inline unsigned int folio_batch_count(const struct folio_batch *fbatch)
>> {
>> - return fbatch->nr;
>> + return READ_ONCE(fbatch->nr);
>
> This isn't free, you're breaking optimisations here by doing that...
>
> It feels like the wrong level of abstraction, but actually I think every other
> case is either stack or per-CPU _on its own CPU_ (please check), in which case
> we _can_ suppress the check here but I think best done with data_race().
>
> And see the kernel bug bot report, you need to add:
>
> #include <linux/compiler.h>
>
> Too for that.
>
>> }
>>
>> static inline unsigned int folio_batch_space(const struct folio_batch *fbatch)
>> --
>> 2.25.1
>>
>
> Thanks, Lorenzo