Re: [GIT PULL] vfs fixes

From: Christian Brauner
Date: Wed Mar 20 2024 - 06:21:23 EST


> > Again, this comment (and the previous email) is more based on "this
> > does not feel right to me" than anything else.
> >
> > That code just makes my skin itch. I can't say it's _wrong_, but it
> > just FeelsWrongToMe(tm).
>
> So, initially I think the holder ops were intended to be generic by
> Christoph but I agree that it's probably not needed. I just didn't
> massage that code yet. Now on my todo for this cycle!

So, the block holder ops will gain additional implementers in the block
layer that will implement their own separate ops. So I trust the block
layer with this.

The holder is used to determine whether a block device can be reopened.
So both for internal (mounting, log device initialization) or userspace
opens we compare the holders of the block device. We do have allowed for
quite some time to open the same block device exclusively with different
flags. So there are multiple files open to the same block device and the
holder is used as proof that it can be reopened. So always using the
file as the holder would still mean that we have to compare
file->private_data to determine whether the block device can be
reopened. So it won't get us as much as we'd want.

The reason for the holder to remain valid is that the block layer does
have ioctl operations such as removal of a device in the case of nbd,
suspend and resume used in stuff like cryptsetup. In all such cases we
go from arbitrary block device to arbitrary holder and then inform them
about the operation calling the appropriate callback. So we would still
have to guarantee the validity of the holder in file->private_data.

There are also two internal codepaths where the block device is
temporarly marked as being in the process of being claimed. This will
cause actual openers to wait until bd_holder is really set or aborted
but not fail the actual open. This has traditionally been the case in
the loop code and during user initiated and internally triggered
partition scanning. That could be reworked but would be pretty ugly.

We'll continue considering additional cleanups and latest next merge
window I'll give you a detailed write up what happened.