Re: [PATCH] md/raid5: protect batch_head->bm_seq updates
From: Chen Cheng
Date: Thu Jun 18 2026 - 08:30:20 EST
在 2026/6/18 20:15, Paul Menzel 写道:
Dear Chen,
Am 18.06.26 um 13:26 schrieb Chen Cheng:
在 2026/6/18 18:36, Paul Menzel 写道:
Am 18.06.26 um 08:55 schrieb Chen Cheng:
From: Chen Cheng <chencheng@xxxxxxxxx>
bm_seq means "stripe delay to flush until bm_seq <= seq_write".
do_release_stripe() keeps STRIPE_BIT_DELAY stripes on bitmap_list
when bm_seq >= seq_write.
after raid5d() flushes bitmap update and ++seq_write, and
active_bit_delay() retry to release delayed stripes.
the stripe batch head must carry the newest bm_seq among all
member stripes, because the whole batch later released according
to the batch head state and bm_seq.
race scenario:
===================
1. cpu0 - sh0->bm_seq=101; cpu1 - sh1->bm_seq=102;
2. both cpu0 and cpu1 read batch_head->bm_seq = 100;
3. cpu1 write 102, and cpu0 overwrite with 101;
the point is, if the head has a lower bm_seq than one of its
members, the whole batch could be released before that
member's bitmap is flushed.
and the on-disk bitmap not record sh1's changes.
It’s a little hard to read. Could you please improve the wording of the
last paragraph, and maybe also start each sentence with a capital
letter. Maybe also use 75 characters per line.
Do you have a reproducer by any chance?
Thanks to review, and , I will follow your advise.
Thank you for reading my comments.
Actually, I have some reproducer to hit KCSAN reports in RAID-5, but not
for this one. Because it's reported by sashiko-review bot, and, I think
it's a true risk.
Maybe also mention sashiko-review.
I will try to make a reproducer for this case later, after I figure-out
the other KCSAN reports.
A reproducer is not required, I was just interested, how the issue was found. So don’t spend too much on it or at all.
Well , I think a good reproducer has to be :
- Concurrency as much path as possible ,
e.g. reshape concurrency with normal I/O, etc...
Optionally ,
- Run some Sanitizer,
- Inject some fault, e.g. bad block, write error, disk drop...
- Provide some background I/O pressure,
- Do some combinations describes above
[…]
Kind regards,
Paul