Re: [PATCH v3 04/19] VFS: use wait_var_event for waiting in d_alloc_parallel()

From: Al Viro

Date: Thu Apr 30 2026 - 21:11:40 EST


On Fri, May 01, 2026 at 09:51:28AM +1000, NeilBrown wrote:

> I saw this comment the first time I read this email, but I didn't
> process it properly. That code is wrong.

One in mainline isn't - d_wait comes from target, as it ought to.

> It only makes sense to
> __d_wake_in_lookup_waiters() a dentry that we know was in-lookup, and in
> d_move, that is target.
> This can only happen (I think) in nfs where nfs_lookup() skips the lookup
> for LOOKUP_RENAME_TARGET and leaves the dentry in-lookup. Other threads
> looking up that name will then block.
> After the rename completes that in-lookup dentry will now be unhashed
> but we need to wake it up so other threads can continue (and repeat the
> lookup).
>
> So we need
>
> __d_wake_in_lookup_waiters(target);
>
> in d_move. target, not dentry.

Yep.

> Thanks for flagging this,
>
> Also my testing has hit a problem with some sort of deadlock in the nfs
> server (so accessing and XFS filesystem). They are tring to unlink a
> file and are waiting in d_alloc_parallel() under reconnect_path.
> This is running generic/467.
>
> So better hold off this patchset until I have that understood.

Let's deal with d_alloc_parallel() first; it doesn't have to be tied
into the rest of patchset. Does the variant I've posted + s/dentry/target/ in that
call of __d_wake_in_lookup_waiters() trigger any problems in your testing?

If it doesn't, let's get that part out of the way - it makes sense on its own
and getting it into -next (I'm sitting on a bunch of fs/dcache.c patches, and
it would fit there nicely) would be a good idea.

FWIW, your "noblock" variant is a misnomer - it *is* a trylock analogue of
d_alloc_parallel(), all right, but it might very well block; on allocations,
if nothing else, and there's a chance of having that dput(dentry) in "wouldblock"
case coming right after the sucker ceased to be in-lookup and dropping the sole
remaining reference. Which may block on real IO, final dput() being what it is...

And I really dislike the "drop and regain a caller-held lock" games - we'd been
there many times and it had ended up with race galore again and again; see
https://lore.kernel.org/all/20250623213747.GJ1880847@ZenIV/ for one recent
example...