Re: question about bd_inode hashing against device_add() // Re: [PATCH 03/11] block: call bdev_add later in device_add_disk

From: Gao Xiang

Date: Tue Nov 04 2025 - 22:04:11 EST


Hi Christiph,

On 2025/10/31 22:44, Gao Xiang wrote:


..

I just spent time to reproduce with dynamic loop devices and
actually it's easy if msleep() is located artificiallly,
the diff as below:

diff --git a/block/bdev.c b/block/bdev.c
index 810707cca970..a4273b5ad456 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -821,7 +821,7 @@ struct block_device *blkdev_get_no_open(dev_t dev, bool autoload)
      struct inode *inode;

      inode = ilookup(blockdev_superblock, dev);
-    if (!inode && autoload && IS_ENABLED(CONFIG_BLOCK_LEGACY_AUTOLOAD)) {
+    if (0) {
          blk_request_module(dev);
          inode = ilookup(blockdev_superblock, dev);
          if (inode)
diff --git a/block/genhd.c b/block/genhd.c
index 9bbc38d12792..3c9116fdc1ce 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -428,6 +428,8 @@ static void add_disk_final(struct gendisk *disk)
      set_bit(GD_ADDED, &disk->state);
  }

+#include <linux/delay.h>
+
  static int __add_disk(struct device *parent, struct gendisk *disk,
                const struct attribute_group **groups,
                struct fwnode_handle *fwnode)
@@ -497,6 +499,9 @@ static int __add_disk(struct device *parent, struct gendisk *disk,
      if (ret)
          goto out_free_ext_minor;

+    if (disk->major == LOOP_MAJOR)
+        msleep(2500);           // delay 2.5s for all loops
+

Yes, so you need to watch for the uevent to happen, THEN it is safe to
access the block device.  Doing it before then isn't a good idea :)

But, if you think this is an issue, do you have a patch that passes your
testing to fix it?

I just raise it up for some ideas, and this change is
buried into the code refactor and honestly I need to
look into the codebase and related patchsets first.

Currently I have dozens of other development stuffs
on hand, if it's really a regression, I do hope
Christoph or other folks who are familiar with the code
could try to address this.

I've provided a reproducible way:
https://lore.kernel.org/linux-block/ec8b1c76-c211-49a5-a056-6a147faddd3b@xxxxxxxxxxxxxxxxx

As the author of these gendisk/bdev enhancement commits, what's
your opinion on this?

In other words, do you think it's a regression, or just a behavior
change but not a regression? Also, a minor confirmation:
if it is a regression on your side, would you like to address it?

Due to further code changes, I proposed a temporary workaround
for our 6.6 kernels as below (I don't think it's clean but we will
do more tests), but due to limited time, currently I don't have
time to come up with a cleaner solution and track this until the
upstream fix lands.

Thanks,
Gao Xiang

block/blk.h | 1 +
block/genhd.c | 18 ++++++++++++++++--
block/partitions/core.c | 6 +++++-
3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 475bbb40bb83..4410ae9da378 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -419,6 +419,7 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
#endif /* CONFIG_BLK_DEV_ZONED */
struct block_device *bdev_alloc(struct gendisk *disk, u8 partno);
+void bdev_inode_failed(struct block_device *bdev);
void bdev_add(struct block_device *bdev, dev_t dev);
int blk_alloc_ext_minor(void);
diff --git a/block/genhd.c b/block/genhd.c
index 039e7c17523b..cb4313a7c618 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -383,6 +383,14 @@ int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode)
return ret;
}
+void bdev_inode_failed(struct block_device *bdev)
+{
+ struct inode *inode = bdev->bd_inode;
+
+ make_bad_inode(inode);
+ unlock_new_inode(inode);
+}
+
/**
* device_add_disk - add disk information to kernel list
* @parent: parent device for the disk
@@ -452,8 +460,12 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
ddev->parent = parent;
ddev->groups = groups;
dev_set_name(ddev, "%s", disk->disk_name);
- if (!(disk->flags & GENHD_FL_HIDDEN))
+ if (!(disk->flags & GENHD_FL_HIDDEN)) {
ddev->devt = MKDEV(disk->major, disk->first_minor);
+ disk->part0->bd_inode->i_state |= I_NEW;
+ bdev_add(disk->part0, ddev->devt);
+ }
+
ret = device_add(ddev);
if (ret)
goto out_free_ext_minor;
@@ -505,7 +517,7 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
if (get_capacity(disk) && disk_has_partscan(disk))
set_bit(GD_NEED_PART_SCAN, &disk->state);
- bdev_add(disk->part0, ddev->devt);
+ unlock_new_inode(disk->part0->bd_inode);
if (get_capacity(disk))
disk_scan_partitions(disk, BLK_OPEN_READ);
@@ -546,6 +558,8 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
out_device_del:
device_del(ddev);
out_free_ext_minor:
+ if (!(disk->flags & GENHD_FL_HIDDEN))
+ bdev_inode_failed(disk->part0);
if (disk->major == BLOCK_EXT_MAJOR)
blk_free_ext_minor(disk->first_minor);
out_exit_elevator:
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 549ce89a657b..c69e369955b9 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -376,6 +376,9 @@ static struct block_device *add_partition(struct gendisk *disk, int partno,
goto out_put;
}
+ bdev->bd_inode->i_state |= I_NEW;
+ bdev_add(bdev, devt);
+
/* delay uevent until 'holders' subdir is created */
dev_set_uevent_suppress(pdev, 1);
err = device_add(pdev);
@@ -398,7 +401,7 @@ static struct block_device *add_partition(struct gendisk *disk, int partno,
err = xa_insert(&disk->part_tbl, partno, bdev, GFP_KERNEL);
if (err)
goto out_del;
- bdev_add(bdev, devt);
+ unlock_new_inode(bdev->bd_inode);
/* suppress uevent if the disk suppresses it */
if (!dev_get_uevent_suppress(ddev))
@@ -409,6 +412,7 @@ static struct block_device *add_partition(struct gendisk *disk, int partno,
kobject_put(bdev->bd_holder_dir);
device_del(pdev);
out_put:
+ bdev_inode_failed(bdev);
put_device(pdev);
return ERR_PTR(err);
out_put_disk:
--