Re: [PATCH v3 20/21] __dentry_kill(): new locking scheme

From: Al Viro

Date: Wed Jan 21 2026 - 16:54:12 EST


On Tue, Jul 08, 2025 at 06:45:14AM +0200, Max Kellermann wrote:

> I believe the busy-wait was accidental.
> I've been trying to make you aware that this is effectively a
> busy-wait, one that can take a long time burning CPU cycles, but I
> have a feeling I can't reach you.
>
> Al, please confirm that it was your intention to busy-wait until dying
> dentries disappear!

It's not so much an intention as having nothing good to wait on.

Theoretically, there's a way to deal with that - dentry in the middle
of stuck iput() from dentry_unlink_inode() from __dentry_kill() is
guaranteed to be
* negative
* unhashed
* not in-lookup

What we could do is adding an hlist_head aliased with ->d_alias, ->d_rcu
and ->d_in_lookup_hash. Then select_collect2() running into a dentry
with negative refcount would set _that_ as victim and bugger off, same
as we do for ones on shrink lists.

shrink_dcache_parent() would do this:
if (data.victim) {
struct dentry *v = data.victim;

spin_lock(&v->d_lock);
if (v->d_lockref.count < 0 &&
!(v->d_flags & DCACHE_DENTRY_KILLED)) {
init_completion(&data.completion);
hlist_add_head(&data.node, &v->d_new_field);
spin_unlock(&v->d_lock);
rcu_read_unlock();
wait_for_completion(&data.completion);
} else if (!lock_for_kill(data.victim)) {
spin_unlock(&data.victim->d_lock);
rcu_read_unlock();
} else {
shrink_kill(data.victim);
}

and dentry_unlist() -
dentry->d_flags |= DCACHE_DENTRY_KILLED;
while (unlikely(dentry->d_new_field.first)) {
struct select_data *p;

p = hlist_entry(dentry->d_new_field.first,
struct select_data,
node);
hlist_del_init(&p->node);
complete(&p->complete);
}
...

AFAICS, that ought to be safe and would guaratee progress on each
iteration in shrink_dcache_parent() (note that finding negative
refcount and seeing that it had already been past dentry_unlist()
would mean falling through to lock_for_kill() and instantly
failing there; in any case, that dentry definitely won't be
found on any subsequent d_walk(), so we still get progress there).

Comments?