Re: [GIT PULL] fuse update for 6.19
From: Miklos Szeredi
Date: Mon Apr 13 2026 - 11:32:35 EST
On Mon, 13 Apr 2026 at 02:25, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> On Sun, Apr 12, 2026 at 07:16:45AM +0100, Al Viro wrote:
>
> > Looking at that thing again, I really hate how subtle it is ;-/
> > Consider the lockless case in your ->d_release() and try to write down
> > the reasons why it's safe. Among the other things, kfree_rcu() is there
> > to protect RCU callers of ->d_revalidate(); fair enough, but it quietly
> > doubles as delaying that freeing past the scope of dentry_hash[].lock in
> > which you'd done RB_CLEAR_NODE(&fd->node) that had pushed ->d_release()
> > to lockless path. Another side of that fun is the proof that if
> > fuse_dentry_tree_work() sees a dentry, it won't get freed under us.
> > Locked case of ->d_release() is easy; proving that the lockless one
> > is OK without explict barriers is more interesting. Basically, all
> > insertions are ordered wrt ->d_release() (on ->d_lock), so if the value
> > we are observing in RB_EMPTY_NODE() has come from those we will hit the
> > locked case. If the value we observe has come from RB_CLEAR_NODE() in
> > earlier fuse_dentry_tree_work() (these are ordered on dentry_hash[].lock,
> > wrt each other and insertions), there mustn't have been any subsequent
> > insertions, or ->d_release() would've observed the effect of those.
> > If the value has come from the _same_ fuse_dentry_tree_work(), the
> > implicit barrier in spin_lock() would've ordered that store wrt beginning
> > of the scope, and since dentry_free() is ordered wrt ->d_release()
> > the callback of call_rcu() in there wouldn't run until the the end of
> > the scope in question. So I think it's OK, but having it done without
> > a single comment either on barriers or on memory safety... Ouch.
> >
> > Note that we *can* run into a dentry getting killed just after
> > RB_CLEAR_NODE() in there; it's just that store to ->d_flags of dying
> > or killed dentry is safe under ->d_lock and d_dispose_if_unused() is
> > a no-op for dying and killed ones - it's not just "If dentry has no
> > external references, move it to shrink list" as your comment in
> > fs/dcache.c says. And in your case it very much does have an
> > external reference - it's just that it's an equivalent of RCU one,
> > with all the joy that inflicts upon the user.
>
> FWIW, here's what I've ended up with yesterday:
>
> /*
> * ->d_release() is ordered (on ->d_lock) after everything done
> * while holding counting references, including all calls of
> * fuse_dentry_add_tree_node(). Freeing a dentry is RCU-delayed
> * and scheduling it is ordered after ->d_release().
> *
> * In ->d_release() we see the initialization and all stores done by
> * fuse_dentry_add_tree_node(); since fuse_dentry_tree_work() is
> * ordered wrt fuse_dentry_add_tree_node() (on dentry_hash[].lock)
> * we also see all stores from fuse_dentry_tree_work() except possibly
> * the last one.
> *
> * If the value fuse_dentry_release() observes in
> * if (!RB_EMPTY_NODE(&fd->node))
> * has come from initialization, the node had never been inserted
> * into rbtree and fuse_dentry_tree_work() won't see it at all.
> *
> * If the value we observe there comes from the last store in
> * fuse_dentry_add_tree_node(), RB_EMPTY_NODE(&fd->node) will be
> * false and we are going to enter a dentry_hash[].lock scope,
> * providing an ordering wrt fuse_dentry_tree_work() which either
> * won't find fd in rbtree, or will complete before we
> * get to even scheduling dentry freeing.
> *
> * If the value we observe comes from fuse_dentry_tree_work(), we will
> * be ordered past the beginning of dentry_hash[].lock scope that store
> * had happened in. Since call_rcu() that schedules freeing the dentry
> * is ordered past the return from ->d_release(), freeing won't happen
> * until the end of that scope. Since there can be no subsequent calls
> * of fuse_dentry_add_tree_node(), any subsequent calls of
> * fuse_dentry_tree_work() won't see the node at all.
> *
> * Either way, if fuse_dentry_tree_work() sees a node, we are
> * guaranteed that node->dentry won't get freed under it.
This is indeed subtle.
I think best way to un-subtle it would be to just remove the lockless
check and unconditionally call fuse_dentry_tree_del_node(), which
would make the ordering explicit on dentry_hash[].lock.
> *
> * Freeing the node itself is also RCU-delayed and scheduled in the
> * very end of FUSE ->d_release(); similar considerations shows that
> * if fuse_dentry_tree_work() sees a node, we are guaranteed that
> * the node won't get freed under us either.
> *
> * While it is possible to run into a dying (or killed) dentry in
> * fuse_dentry_tree_work(), d_dispose_if_unused() is safe to be
> * used for those - it's a no-op in such case.
> */
>
> Is the above sane? If it is, something similar would be worth having
> written down somewhere in fs/fuse - it's not trivial and it's not hard
> to break accidentally...
Yeah, thanks for the analysis.
>
> As for the primitive itself...
>
> /**
> * __move_to_shrink_list - try to place a dentry into a shrink list
> * @dentry: dentry to try putting into shrink list
> * @list: the list to put @dentry into.
> * Returns: true @dentry had been placed into @list, false otherwise
> *
> * If @dentry is idle and not already include into a shrink list, move
> * it into @list and return %true; otherwise do nothing and return %false.
> *
> * Caller must be holding @dentry->d_lock. There must have been no calls of
> * dentry_free(@dentry) prior to the beginning of the RCU read-side critical
> * area in which __move_to_shrink_list(@dentry, @list) is called.
> *
> * @list should be thread-private and eventually emptied by passing it to
> * shrink_dentry_list().
> */
> bool __move_to_shrink_list(struct dentry *dentry, struct list_head *list)
> __must_hold(&dentry->d_lock)
> {
> if (likely(!dentry->d_lockref.count &&
> !(dentry->d_flags & DCACHE_SHRINK_LIST))) {
> if (dentry->d_flags & DCACHE_LRU_LIST)
> d_lru_del(dentry);
> d_shrink_add(dentry, list);
> return true;
> }
> return false;
> }
> EXPORT_SYMBOL(__move_to_shrink_list);
>
> With something like
> spin_lock(&fd->dentry->d_lock);
> /* If dentry is still referenced, let next dput release it */
> fd->dentry->d_flags |= DCACHE_OP_DELETE;
> __move_to_shrink_list(fd->dentry, &dispose);
> spin_unlock(&fd->dentry->d_lock);
> in fuse_dentry_tree_work() (along with the explanations of memory safety,
> that is)
Does this make any difference besides not having to lock d_lock twice?
Not against this change, just trying to understand the full implication.
Thanks,
Miklos