Re: [PATCH v2 2/3] md/raid1,raid10: fix error-path detection with md_cloned_bio()
From: Xiao Ni
Date: Tue May 19 2026 - 04:18:47 EST
On Fri, May 1, 2026 at 7:48 PM Abd-Alrhman Masalkhi
<abd.masalkhi@xxxxxxxxx> wrote:
>
> Detect the error path using md_cloned_bio() instead of relying
> on r1_bio in raid1 or r10_bio->read_slot in raid10, which may be
> NULL or -1 after splitting and resubmitting a failed bio.
>
> As a result, the error path may not be recognized and memory
> allocations can incorrectly use GFP_NOIO instead of
> (GFP_NOIO | __GFP_HIGH), which can lead to a deadlock under
> memory pressure.
>
> Fixes: 689389a06ce7 ("md/raid1: simplify handle_read_error().")
> Fixes: 545250f24809 ("md/raid10: simplify handle_read_error()")
> Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@xxxxxxxxx>
> ---
> This patch depends on patch 1.
>
> Changes in v2:
> - New patch.
> ---
> drivers/md/raid1.c | 13 ++++++++++---
> drivers/md/raid10.c | 20 ++++++++++++++------
> 2 files changed, 24 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index cc9914bd15c1..c52ecd38c163 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1321,11 +1321,18 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
> bool r1bio_existed = !!r1_bio;
>
> /*
> - * If r1_bio is set, we are blocking the raid1d thread
> - * so there is a tiny risk of deadlock. So ask for
> + * An md cloned bio indicates we are in the error path.
> + * This is more reliable than checking r1_bio, which might
> + * be NULL even in the error path if a failed bio was split.
> + */
> + bool err_path = md_cloned_bio(mddev, bio);
> +
> + /*
> + * If we are in the error path, we are blocking the raid1d
> + * thread so there is a tiny risk of deadlock. So ask for
> * emergency memory if needed.
> */
> - gfp_t gfp = r1_bio ? (GFP_NOIO | __GFP_HIGH) : GFP_NOIO;
> + gfp_t gfp = err_path ? (GFP_NOIO | __GFP_HIGH) : GFP_NOIO;
Hi
This patch looks good to me.
Reviewed-by: Xiao Ni <xiao@xxxxxxxxxx>
>
> /*
> * Still need barrier for READ in case that whole
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 3a591e60a144..8c6fc398260e 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1155,7 +1155,20 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
> char b[BDEVNAME_SIZE];
> int slot = r10_bio->read_slot;
> struct md_rdev *err_rdev = NULL;
> - gfp_t gfp = GFP_NOIO;
> +
> + /*
> + * An md cloned bio indicates we are in the error path.
> + * This is more reliable than checking slot, which might
> + * be -1 even in the error path if a failed bio was split.
> + */
> + bool err_path = md_cloned_bio(mddev, bio);
> +
> + /*
> + * If we are in the error path, we are blocking the raid10d
> + * thread so there is a tiny risk of deadlock. So ask for
> + * emergency memory if needed.
> + */
> + gfp_t gfp = err_path ? (GFP_NOIO | __GFP_HIGH) : GFP_NOIO;
>
> if (slot >= 0 && r10_bio->devs[slot].rdev) {
> /*
> @@ -1166,11 +1179,6 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
> * we lose the device name in error messages.
> */
> int disk;
> - /*
> - * As we are blocking raid10, it is a little safer to
> - * use __GFP_HIGH.
> - */
> - gfp = GFP_NOIO | __GFP_HIGH;
>
> disk = r10_bio->devs[slot].devnum;
> err_rdev = conf->mirrors[disk].rdev;
> --
> 2.43.0
>
>