Re: Race-free block device opening

From: James Bottomley
Date: Wed Apr 27 2022 - 09:29:32 EST


On Tue, 2022-04-26 at 14:12 -0400, Demi Marie Obenour wrote:
> Right now, opening block devices in a race-free way is incredibly
> hard.

Could you be more specific about what the race you're having problems
with is? What is racing.

> The only reasonable approach I know of is sd_device_new_from_path() +
> sd_device_open(), and is only available in systemd git main. It also
> requires waiting on systemd-udev to have processed udev rules, which
> can be a bottleneck.

This doesn't actually seem to be in my copy of systemd.

> There are better approaches in various special cases, such as using
> device-mapper ioctls to check that the device one has opened still
> has the name and/or UUID one expects. However, none of them works
> for a plain call to open(2).

Just so we're clear: if you call open on, say /dev/sdb1 and something
happens to hot unplug and then replug a different device under that
node, the file descriptor you got at open does *not* point to the new
node. It points to a dead device responder that errors everything.

The point being once you open() something, the file descriptor is
guaranteed to point to the same device (or error).

> A much better approach would be for udev to point its symlinks at
> "/dev/disk/by-diskseq/$DISKSEQ" for non-partition disk devices, or at
> "/dev/disk/by-diskseq/${DISKSEQ}p${PARTITION}" for partitions. A
> filesystem would then be mounted at "/dev/disk/by-diskseq" that
> provides for race-free opening of these paths. This could be
> implemented in userspace using FUSE, either with difficulty using the
> current kernel API, or easily and efficiently using a new kernel API
> for opening a block device by diskseq + partition. However, I think
> this should be handled by the Linux kernel itself.
>
> What would be necessary to get this into the kernel? I would like to
> implement this, but I don’t have the time to do so anytime soon. Is
> anyone else interested in taking this on? I suspect the kernel code
> needed to implement this would be quite a bit smaller than the FUSE
> implementation.

So it sounds like the problem is you want to be sure that the device
doesn't change after you've called libblkid to identify it but before
you call open? If that's so, the way you do this in userspace is to
call libblkid again after the open. If the before and after id match,
you're as sure as you can be the open was of the right device.

James

Attachment: signature.asc
Description: This is a digitally signed message part