Re: [PATCH v4 3/3] md/raid10: bound reused r10bio devs[] walks by used_nr_devs

From: yu kuai

Date: Sat Jun 20 2026 - 16:04:48 EST


Hi,

在 2026/6/3 11:59, Chen Cheng 写道:
> From: Chen Cheng <chencheng@xxxxxxxxx>
>
> After reshape changes raid_disks, an in-flight r10bio from the old geometry
> can still be completed or freed later. In that case, using the current
> geometry to walk r10_bio->devs[] is unsafe. A failure was reproduced with a
> simple write workload while reshaping a raid10 array from 4 disks to 5 disks.
> e.g.:
>
> mdadm -C /dev/md777 -l10 -n4 /dev/sda /dev/sdb /dev/sdc /dev/sdd
> mkfs.ext4 /dev/md777
> mount /dev/md777 /mnt/test
> fsstress -d /mnt/test -n 24000 -p 8 -l 24 &
> mdadm /dev/md777 --add /dev/sde
> mdadm --grow /dev/md777 --raid-devices=5 \
> --backup-file=/tmp/md-reshape-backup
>
> the sequence above can trigger:
>
> BUG: KASAN: slab-out-of-bounds in free_r10bio+0x1c4/0x260 [raid10]
> Read of size 8 at addr ffff00008c2dfac8 by task ksoftirqd/0/15
> free_r10bio
> raid_end_bio_io
> one_write_done
> raid10_end_write_request
>
> The buggy object was 200 bytes long, which matches an r10bio with space for
> only four devs[] entries. However, put_all_bios() and find_bio_disk() walk
> r10_bio->devs[] using the current conf->geo.raid_disks value. Once reshape
> switches conf->geo.raid_disks from 4 to 5, an old 4-slot r10bio can be
> completed or freed as if it had 5 slots, and the walk overruns devs[4]. The
I don't understand, is this still possible after patch 1.
> same stale-width mismatch can also surface during a 5-disk to 4-disk reshape.
>
> Track the number of valid devs[] entries in each reused r10bio with
> used_nr_devs. Initialize it whenever an r10bio is prepared for regular I/O,
> discard, or resync/recovery/reshape work, and use it to bound devs[] walks
> in put_all_bios() and find_bio_disk().
>
> Signed-off-by: Chen Cheng <chencheng@xxxxxxxxx>
> ---
> drivers/md/raid10.c | 8 ++++++--
> drivers/md/raid10.h | 2 ++
> 2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 5eca34432e63..f134b93fd593 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -273,11 +273,11 @@ static void r10buf_pool_free(void *__r10_bio, void *data)
>
> static void put_all_bios(struct r10conf *conf, struct r10bio *r10_bio)
> {
> int i;
>
> - for (i = 0; i < conf->geo.raid_disks; i++) {
> + for (i = 0; i < r10_bio->used_nr_devs; i++) {
> struct bio **bio = & r10_bio->devs[i].bio;
> if (!BIO_SPECIAL(*bio))
> bio_put(*bio);
> *bio = NULL;
> bio = &r10_bio->devs[i].repl_bio;
> @@ -370,11 +370,11 @@ static int find_bio_disk(struct r10conf *conf, struct r10bio *r10_bio,
> struct bio *bio, int *slotp, int *replp)
> {
> int slot;
> int repl = 0;
>
> - for (slot = 0; slot < conf->geo.raid_disks; slot++) {
> + for (slot = 0; slot < r10_bio->used_nr_devs; slot++) {
> if (r10_bio->devs[slot].bio == bio)
> break;
> if (r10_bio->devs[slot].repl_bio == bio) {
> repl = 1;
> break;
> @@ -1561,10 +1561,11 @@ static void __make_request(struct mddev *mddev, struct bio *bio, int sectors)
>
> r10_bio->mddev = mddev;
> r10_bio->sector = bio->bi_iter.bi_sector;
> r10_bio->state = 0;
> r10_bio->read_slot = -1;
> + r10_bio->used_nr_devs = conf->geo.raid_disks;
> memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) *
> conf->geo.raid_disks);
>
> if (bio_data_dir(bio) == READ)
> raid10_read_request(mddev, bio, r10_bio);
> @@ -1749,10 +1750,11 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
> r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
> r10_bio->mddev = mddev;
> r10_bio->state = 0;
> r10_bio->sectors = 0;
> r10_bio->read_slot = -1;
> + r10_bio->used_nr_devs = geo->raid_disks;
> memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) * geo->raid_disks);
> wait_blocked_dev(mddev, r10_bio);
>
> /*
> * For far layout it needs more than one r10bio to cover all regions.
> @@ -3083,10 +3085,12 @@ static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
> test_bit(MD_RECOVERY_RESHAPE, &conf->mddev->recovery))
> nalloc = conf->copies; /* resync */
> else
> nalloc = 2; /* recovery */
>
> + r10bio->used_nr_devs = nalloc;
> +
> for (i = 0; i < nalloc; i++) {
> bio = r10bio->devs[i].bio;
> rp = bio->bi_private;
> bio_reset(bio, NULL, 0);
> bio->bi_private = rp;
> diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
> index b711626a5db7..4751119f9770 100644
> --- a/drivers/md/raid10.h
> +++ b/drivers/md/raid10.h
> @@ -125,10 +125,12 @@ struct r10bio {
> struct bio *master_bio;
> /*
> * if the IO is in READ direction, then this is where we read
> */
> int read_slot;
> + /* Used to bound devs[] walks when the object is reused. */
> + unsigned int used_nr_devs;
>
> struct list_head retry_list;
> /*
> * if the IO is in WRITE direction, then multiple bios are used,
> * one for each copy.

--
Thanks,
Kuai