Re: [PATCH md-6.12 0/7] md: enhance faulty chekcing for blocked handling

From: Yu Kuai
Date: Thu Oct 10 2024 - 08:38:55 EST


Hi,

在 2024/10/09 15:14, Mariusz Tkaczyk 写道:
On Fri, 30 Aug 2024 15:27:14 +0800
Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

From: Yu Kuai <yukuai3@xxxxxxxxxx>

The lifetime of badblocks:

- IO error, and decide to record badblocks, and record sb_flags;
- write IO found rdev has badblocks and not yet acknowledged, then this
IO is blocked;
- daemon found sb_flags is set, update superblock and flush badblocks;
- write IO continue;

Main idea is that badblocks will be set in memory fist, before badblocks
are acknowledged, new write request must be blocked to prevent reading
old data after power failure, and this behaviour is not necessary if rdev
is faulty in the first place.

Yu Kuai (7):
md: add a new helper rdev_blocked()
md: don't wait faulty rdev in md_wait_for_blocked_rdev()
md: don't record new badblocks for faulty rdev
md/raid1: factor out helper to handle blocked rdev from
raid1_write_request()
md/raid1: don't wait for Faulty rdev in wait_blocked_rdev()
md/raid10: don't wait for Faulty rdev in wait_blocked_rdev()
md/raid5: don't set Faulty rdev for blocked_rdev

drivers/md/md.c | 8 +++--
drivers/md/md.h | 24 +++++++++++++++
drivers/md/raid1.c | 75 +++++++++++++++++++++++----------------------
drivers/md/raid10.c | 40 +++++++++++-------------
drivers/md/raid5.c | 13 ++++----
5 files changed, 92 insertions(+), 68 deletions(-)



Hi,
We tested this patchset.

mdmon rework:
https://github.com/md-raid-utilities/mdadm/pull/66

Kernel build torvalds/linux.git master:
commit e32cde8d2bd7d251a8f9b434143977ddf13dcec6

I applied this patchset on top of that.

My tests proved that:
- If only mdmon PR is applied - hangs are reproducible.
- If only this patchset is applied - hangs are reproducible.
- If both kernel patchset and mdmon rework are applied- hangs are not
reproducible (at least until now).

It was tricky topic (I needed to deal with weird issues related to shared
descriptors in mdmon).

What the most important- there is no regression detected.

Good to here that, I'll send a V2 then. Usually this set will land in
v6.13, because this doesn't look like a fix in kernel. :)

Thanks,
Kuai


Thanks,
Mariusz

.