Re: [PATCH v2] md/raid5: protect bitmap batch counters aka seq_flush/seq_write consistency
From: Chen Cheng
Date: Sun Jun 21 2026 - 21:53:02 EST
在 2026/6/21 05:13, yu kuai 写道:
> Hi,
>
> 在 2026/6/17 22:28, Chen Cheng 写道:
>> From: Chen Cheng <chencheng@xxxxxxxxx>
>>
>> kcsan detect race :
>> - raid5d() closes the current bitmap batch by updating
>> conf->seq_flush under conf->device_lock.
>> - __add_stripe_bio() read conf->seq_flush without that
>> lock when assigning sh->bm_seq.
>>
>> so, protect seq_flush/seq_write consistency for multiple CPUs by
>> READ_ONCE()/WRITE_ONCE() under the path without held device_lock.
>>
>> re-explain the stripe batch sequence number update flow:
>> 1. sh->bm_seq declare which batch number the stripe belongs to
>> when perform bitmap-related write.
>> ==> bm_seq = seq_flush+1
>>
>> 2. stripe be handled,
>> * if sh->bm_seq - conf->seq_write > 0, means the
>> batch stripes **newer than** the last written
>> batch, it cannot proceed yet, queued on bitmap_list.
>> * otherwise , has already proceed.
>>
>> 3. raid5d() `++seq_flush` to closes the current batch, means
>> * no more stripes join that old batch
>> * just-closed batch ready to write-out to disk
>>
>> 4. raid5d() calls bitmap hooks unplug() or writeout, then,
>> `++seq_write` to the same as bm_seq.
>>
>> - seq_flush - for producer, to close batches.
>> - seq_write - for consumer, the checkpoint number.
>>
>> the report:
>> ====================================
>> BUG: KCSAN: data-race in __add_stripe_bio / raid5d
>>
>> write to 0xffff88ba5625d470 of 4 bytes by task 82401 on cpu 0:
>> raid5d+0x1d9/0xba0
>> [.....]
>>
>> read to 0xffff88ba5625d470 of 4 bytes by task 82421 on cpu 8:
>> __add_stripe_bio+0x332/0x400
>> raid5_make_request+0x6ac/0x2930
>> md_handle_request+0x4a2/0xa40
>> md_submit_bio+0x109/0x1a0
>> __submit_bio+0x2ec/0x390
>> [.....]
>>
>> v1 -> v2:
>> - remove WRITE_ONCE(conf->seq_write) in held device_lock path.
>> - remove READ_ONCE(conf->seq_flush) in held device_lock path.
>>
>> Signed-off-by: Chen Cheng <chencheng@xxxxxxxxx>
>> ---
>> drivers/md/raid5.c | 9 +++++----
>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index a320b71d7117..b2c5a1930841 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -3536,11 +3536,11 @@ static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi,
>> pr_debug("added bi b#%llu to stripe s#%llu, disk %d, logical %llu\n",
>> (*bip)->bi_iter.bi_sector, sh->sector, dd_idx,
>> sh->dev[dd_idx].sector);
>>
>> if (conf->mddev->bitmap && firstwrite && !sh->batch_head) {
>> - sh->bm_seq = conf->seq_flush+1;
>> + sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;
>> set_bit(STRIPE_BIT_DELAY, &sh->state);
>> }
>> }
>>
>> /*
>> @@ -5827,11 +5827,11 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
>> md_write_inc(mddev, bi);
>> sh->overwrite_disks++;
>> }
>> spin_unlock_irq(&sh->stripe_lock);
>> if (conf->mddev->bitmap) {
>> - sh->bm_seq = conf->seq_flush + 1;
>> + sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;
>> set_bit(STRIPE_BIT_DELAY, &sh->state);
>> }
>>
>> set_bit(STRIPE_HANDLE, &sh->state);
>> clear_bit(STRIPE_DELAYED, &sh->state);
>> @@ -6877,16 +6877,17 @@ static void raid5d(struct md_thread *thread)
>> clear_bit(R5_DID_ALLOC, &conf->cache_state);
>>
>> if (
>> !list_empty(&conf->bitmap_list)) {
>> /* Now is a good time to flush some bitmap updates */
>> - conf->seq_flush++;
>> + int seq = conf->seq_flush + 1;
>
> checkpatch should warn this about missing blank line after declaration.
ok.
>
>> + WRITE_ONCE(conf->seq_flush, seq);
>
> WRITE_ONCE(a, a + 1) is safe, you can see lots of examples.
Gotta re-use `conf->seq_flush` snapshot aka. `seq` below..
>
>> spin_unlock_irq(&conf->device_lock);
>> if (md_bitmap_enabled(mddev, true))
>> mddev->bitmap_ops->unplug(mddev, true);
>> spin_lock_irq(&conf->device_lock);
>> - conf->seq_write = conf->seq_flush;
>> + conf->seq_write = seq;
>> activate_bit_delay(conf, conf->temp_inactive_list);
>> }
>> raid5_activate_delayed(conf);
>>
>> while ((bio = remove_bio_from_retry(conf, &offset))) {
>