Re: question about bd_inode hashing against device_add() // Re: [PATCH 03/11] block: call bdev_add later in device_add_disk

From: Gao Xiang

Date: Fri Oct 31 2025 - 10:40:56 EST

On 2025/10/31 22:31, Greg Kroah-Hartman wrote:

On Fri, Oct 31, 2025 at 06:12:05PM +0800, Gao Xiang wrote:

Hi Greg,

On 2025/10/31 17:58, Greg Kroah-Hartman wrote:

On Fri, Oct 31, 2025 at 05:54:10PM +0800, Gao Xiang wrote:

On 2025/10/31 17:45, Christoph Hellwig wrote:

On Fri, Oct 31, 2025 at 05:36:45PM +0800, Gao Xiang wrote:

Right, sorry yes, disk_uevent(KOBJ_ADD) is in the end.

Do you see that earlier, or do you have
code busy polling for a node?

Personally I think it will break many userspace programs
(although I also don't think it's a correct expectation.)

We've had this behavior for a few years, and this is the first report
I've seen.

After recheck internally, the userspace program logic is:
- stat /dev/vdX;
- if exists, mount directly;
- if non-exists, listen uevent disk_add instead.

Previously, for devtmpfs blkdev files, such stat/mount
assumption is always valid.

That assumption doesn't seem wrong.

;-) I was thought UNIX mknod doesn't imply the device is
ready or valid in any case (but dev files in devtmpfs
might be an exception but I didn't find some formal words)...
so uevent is clearly a right way, but..

Yes, anyone can do a mknod and attempt to open a device that isn't
present.

when devtmpfs creates the device node, it should be there. Unless it
gets removed, and then added back, so you could race with userspace, but
that's not normal.

But why does the device node
get created earlier? My assumption was that it would only be
created by the KOBJ_ADD uevent. Adding the device model maintainers
as my little dig through the core drivers/base/ code doesn't find
anything to the contrary, but maybe I don't fully understand it.

AFAIK, device_add() is used to trigger devtmpfs file
creation, and it can be observed if frequently
hotpluging device in the VM and mount. Currently
I don't have time slot to build an easy reproducer,
but I think it's a real issue anyway.

As I say above, that's not normal, and you have to be root to do this,

Just thinking out if I am a random reporter, I could
report the original symptom now because we face it,
but everyone has his own internal business or even
with limited kernel ability for example, in any
case, there is no such expectation to rush someone
into build a clean reproducer.

Nevertheless, I will take time on the reproducer, and
I think it could just add some artificial delay just
after device_add(). I could try anyway, but no rush.

so I don't understand what you are trying to prevent happening? What is

The original report was
https://lore.kernel.org/r/43375218-2a80-4a7a-b8bb-465f6419b595@xxxxxxxxxxxxxxxxx/

So you see cases where the device node is present, you try to open it,
but yet there is no real block device behind it at all?

Roughly yes, block devices have a pseudo filesystem, briefly
it registered the block device with device_add() so the
devtmpfs file is visible then but bdev_add() is not called yet
so for example, mounting like bdev_file_open_by_dev() cannot
find this and return ENXIO.

the bug and why is it just showing up now (i.e. what changed to cause
it?)

I don't know, I think just because 6.6 is a relatively
newer kernel, and most userspace logic has retry logic
to cover this up.

6.6 has been out for 2 years now, this is a long time in kernel
development cycles for things to just start showing up now.

I think for most cases devices are added during boot so
it's hard to find, but in the stress hotplug cases, it
can be observed easily honestly.

Thanks,
Gao Xiang

thanks,

greg k-h