Re: [PATCH v3 04/19] VFS: use wait_var_event for waiting in d_alloc_parallel()

From: NeilBrown

Date: Thu Apr 30 2026 - 21:21:01 EST


On Fri, 01 May 2026, NeilBrown wrote:
> On Thu, 30 Apr 2026, Al Viro wrote:
> > On Wed, Apr 29, 2026 at 06:26:26AM +0100, Al Viro wrote:
> >
> > > with obvious adjustments in end_dir_add(). That's it. Outside of fs/dcache.c,
> > > same as in the patch you've posted, modulo renaming you've suggested for new flag.
> >
> > Something like patch below (on top of -rc1, completely untested). I've lifted
> > the wakeup part out of end_dir_add() into its callers - less confusing that way.
> > Note that in __d_move() the dentry you've ended up passing to end_dir_add() was
> > *NOT* the one added - it was the one replaced with existing one spliced in its place.
>
> I saw this comment the first time I read this email, but I didn't
> process it properly. That code is wrong. It only makes sense to
> __d_wake_in_lookup_waiters() a dentry that we know was in-lookup, and in
> d_move, that is target.
> This can only happen (I think) in nfs where nfs_lookup() skips the lookup
> for LOOKUP_RENAME_TARGET and leaves the dentry in-lookup. Other threads
> looking up that name will then block.
> After the rename completes that in-lookup dentry will now be unhashed
> but we need to wake it up so other threads can continue (and repeat the
> lookup).
>
> So we need
>
> __d_wake_in_lookup_waiters(target);
>
> in d_move. target, not dentry.
>
> Thanks for flagging this,
>
> Also my testing has hit a problem with some sort of deadlock in the nfs
> server (so accessing and XFS filesystem). They are tring to unlink a
> file and are waiting in d_alloc_parallel() under reconnect_path.
> This is running generic/467.
>
> So better hold off this patchset until I have that understood.

The two problems are actually one.

__d_move() is called in d_splice_alias() with the target dentry often
being in-lookup. reconnect_path() does exactly this and is expected to
find an existing dentry for a directory and to splice that dentry to the
in-lookup dentry is has.
So the wakeup of the wrong dentry in __d_move() is causing the deadlock
in nfsd.

I'll resend that short series after some testing.

Thanks,
NeilBrown