Re: [PATCH v5 4/4] fs: allow lockless ->i_count bumps as long as it does not transition 0->1

From: Mateusz Guzik

Date: Sat Apr 18 2026 - 08:33:11 EST


On Wed, Apr 8, 2026 at 7:01 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Tue 31-03-26 18:08:51, Mateusz Guzik wrote:
> > With this change only 0->1 and 1->0 transitions need the lock.
> >
> > I verified all places which look at the refcount either only care about
> > it staying 0 (and have the lock enforce it) or don't hold the inode lock
> > to begin with (making the above change irrelevant to their correcness or
> > lack thereof).
> >
> > I also confirmed nfs and btrfs like to call into these a lot and now
> > avoid the lock in the common case, shaving off some atomics.
> >
> > Signed-off-by: Mateusz Guzik <mjguzik@xxxxxxxxx>
>
> Nice!
>
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index 9ceab142896f..b63450ebb85c 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -2033,6 +2033,10 @@ void d_instantiate_new(struct dentry *entry, struct inode *inode)
> > __d_instantiate(entry, inode);
> > spin_unlock(&entry->d_lock);
> > WARN_ON(!(inode_state_read(inode) & I_NEW));
> > + /*
> > + * Paired with igrab_try_lockless()
> > + */
> > + smp_wmb();
> > inode_state_clear(inode, I_NEW | I_CREATING);
> > inode_wake_up_bit(inode, __I_NEW);
> > spin_unlock(&inode->i_lock);
>
> It's a bit sad you're reintroducing part of the barries you've removed in
> a27628f43634 ("fs: rework I_NEW handling to operate without fences") but I
> guess it's worth it.
>

Note the most problematic full fence is still gone, so not a big deal imo.

> > +/*
> > + * igrab_try_lockless - special inode refcount acquire primitive for the inode hash
> > + * (don't use elsewhere!)
>
> Why is this special for inode hash? Because of I_NEW & I_CREATING checks?
> Then I'd call this function igrab_from_hash() or something like that and
> integrate there also all the i_state checking and waiting common between
> find_inode() and find_inode_fast().
>

I was looking at deduplicating find_inode stuff and doing other clean
ups (notably pulling in waiting on I_NEW inodes into it), I don't like
the idea of making churn changes in this area with this patchset
though -- the hash code has proven itself to be weirdly eroror-prone,
so I want to keep everything incremental and submit further rework
later.




>
> Honza
>
> > + *
> > + * It provides lockless refcount acquire in the common case of no problematic
> > + * flags being set and the count being > 0.
> > + *
> > + * There are 4 state flags to worry about and the routine makes sure to not bump the
> > + * ref if any of them is present.
> > + *
> > + * I_NEW and I_CREATING can only legally get set *before* the inode becomes visible
> > + * during lookup. Thus if the flags are not spotted, they are guaranteed to not be
> > + * a factor. However, we need an acquire fence before returning the inode just
> > + * in case we raced against clearing the state to make sure our consumer picks up
> > + * any other changes made prior. atomic_add_unless provides a full fence, which
> > + * takes care of it.
> > + *
> > + * I_FREEING and I_WILL_FREE can only legally get set if ->i_count == 0 and it is
> > + * illegal to bump the ref if either is present. Consequently if atomic_add_unless
> > + * managed to replace a non-0 value with a bigger one, we have a guarantee neither
> > + * of these flags is set. Note this means explicitly checking of these flags below
> > + * is not necessary, it is only done because it does not cost anything on top of the
> > + * load which already needs to be done to handle the other flags.
> > + */
> > +static bool igrab_try_lockless(struct inode *inode)
> > +{
> > + if (inode_state_read_once(inode) & (I_NEW | I_CREATING | I_FREEING | I_WILL_FREE))
> > + return false;
> > + /*
> > + * Paired with routines clearing I_NEW
> > + */
> > + if (atomic_add_unless(&inode->i_count, 1, 0)) {
> > + VFS_BUG_ON_INODE(inode_state_read_once(inode) & (I_FREEING | I_WILL_FREE), inode);
> > + return true;
> > + }
> > + return false;
> > +}
> > +
> > /**
> > * ilookup5_nowait - search for an inode in the inode cache
> > * @sb: super block of file system to search
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 07363fce4406..119e0a3d2f42 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2234,8 +2234,8 @@ static inline int icount_read_once(const struct inode *inode)
> > }
> >
> > /*
> > - * returns the refcount on the inode. The lock guarantees no new references
> > - * are added, but references can be dropped as long as the result is > 0.
> > + * returns the refcount on the inode. The lock guarantees no 0->1 or 1->0 transitions
> > + * of the count are going to take place, otherwise it changes arbitrarily.
> > */
> > static inline int icount_read(const struct inode *inode)
> > {
> > --
> > 2.48.1
> >
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR