Re: [PATCH v3 07/19] VFS: Add LOOKUP_SHARED flag.

From: NeilBrown

Date: Mon Apr 27 2026 - 19:52:11 EST

On Mon, 27 Apr 2026, Amir Goldstein wrote:
> On Mon, Apr 27, 2026 at 10:47 AM NeilBrown <neilb@xxxxxxxxxxx> wrote:
> >
> > On Mon, 27 Apr 2026, Amir Goldstein wrote:
> > > On Mon, Apr 27, 2026 at 6:06 AM NeilBrown <neilb@xxxxxxxxxxx> wrote:
> > > >
> > > > From: NeilBrown <neil@xxxxxxxxxx>
> > > >
> > > > Some ->lookup handlers will need to drop and retake the parent lock, so
> > > > they can safely use d_alloc_parallel().
> > > >
> > > > ->lookup can be called with the parent lock either exclusive or shared.
> > > >
> > > > A new flag, LOOKUP_SHARED, tells ->lookup how the parent is locked.
> > > >
> > > > This is rather ugly, but will be gone soon after we move
> > > > d_alloc_parallel() out of the directory lock as ->lookup() will *always*
> > > > called with a shared lock on the parent.
> > >
> > > Neil,
> > >
> > > Forgive me for being skeptical about the *always* part.
> >
> > Scepticism should always be encouraged.
> >
> > >
> > > How long ago did we add ->iterate_shared()?
> > >
> > > It's true that Linus eventually got rid of ->iterate(), but we did not
> > > get rid of the assumption that iterate_shared() might be upgraded
> > > to exclusive lock.
> > >
> > > The obvious reason is that *someone* needs to do this work for
> > > old filesystems, which are also hard to test and nobody wants to
> > > touch them.
> > >
> > > I have nothing against this patch, but I think it is more realistic
> > > to state that LOOKUP_SHARED is here to stay, so if you think it
> > > is too ugly, maybe there is something to be done about it.
> > > Personally, I do not see the ugliness though.
> > >
> > > Am I misjudging the situation of shared lookup wrt old filesystems?
> >
> > Yes.
> > All filesystems must support ->lookup with a shared lock because it is
> > already the case that with a simple lookup only a shared lock is provided.
> > A filesystem *could* examine the lookup_flags and deduce if the lock is
> > actually exclusive (e.g. if LOOKUP_CREATE is set) and misbehave
> > accordingly, but a vanishing small number of filesystems (NFS and ....
> > I can't think of any others) ever look at the lookup_flags and I'm
> > certain none do anything so ... creative.
> >
> > So I'm certain that it is safe from the filesystem's perspective to always
> > call with only a shared lock. All that is needed is to change the VFS
> > to only ever do that. That means separating the lock for lookup from
> > the lock for create/remove/move in directory ops, and one way to view
> > the current patch set (the complete one, not just this initial set) is
> > that it does exactly that.
> >
>
> I see.
> I will remain skeptical, because there is always *something* with some
> special filesystems - think fuse servers (not under your control) which
> have grown to rely on the kernel to serialize atomic_open() on the directory -
> but the level of skepticism is lower now ;)

atomic_open is different from lookup!
atomic_open already gets an exclusive lock if LOOKUP_CREATE, otherwise a
shared lock. That won't change.

So if an atomic_open handler wants to drop and retake the directory
lock, it can switch on LOOKUP_CREATE. If it wants to call d_add_ci()
from atomic_open (which is credible in the non-CREATE case) it would
need to be careful.
But no current users of d_add_ci() (xfs and ntfs) support atomic_open.

This proposed practice of dropping and retaking the directory lock is a
transitional arrangement only.
Eventually we will get to:

inode_lock_shared()
->lookup()
inode_unlock_shared()

and that will change to simple

->lookup()

with

foo_lookup()
{
inode_lock_shared(parent)

do the lookup

inode_unlock_shared(parent)
}

and then if "do the lookup" calls d_add_ci() or otherwise needs to drop
the lock, the code that needs to be unlocked gets moved after the
inode_unlock_shared() and we won't need to take the lock again.

So ultimately the fs will be completely responsible for any
directory-wide locks that it might want to take.

fuse would be well advised to keep the directory exclusive locked across
upcalls unless the server has advised that the lock isn't needed.

Thanks,
NeilBrown

>
> I mean it would suck pretty badly if locking order would be fs_type dependapt,
> but honestly, I do not think we will be able to avoid that, so we need to
> make sure that humans and lockdep are able to understand the different
> scopes of vfs locking at least per filesystem type.
>
> Thanks,
> Amir.
>