Re: [PATCH v5 4/4] fs: allow lockless ->i_count bumps as long as it does not transition 0->1
From: Jan Kara
Date: Wed Apr 08 2026 - 13:01:57 EST
On Tue 31-03-26 18:08:51, Mateusz Guzik wrote:
> With this change only 0->1 and 1->0 transitions need the lock.
>
> I verified all places which look at the refcount either only care about
> it staying 0 (and have the lock enforce it) or don't hold the inode lock
> to begin with (making the above change irrelevant to their correcness or
> lack thereof).
>
> I also confirmed nfs and btrfs like to call into these a lot and now
> avoid the lock in the common case, shaving off some atomics.
>
> Signed-off-by: Mateusz Guzik <mjguzik@xxxxxxxxx>
Nice!
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 9ceab142896f..b63450ebb85c 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -2033,6 +2033,10 @@ void d_instantiate_new(struct dentry *entry, struct inode *inode)
> __d_instantiate(entry, inode);
> spin_unlock(&entry->d_lock);
> WARN_ON(!(inode_state_read(inode) & I_NEW));
> + /*
> + * Paired with igrab_try_lockless()
> + */
> + smp_wmb();
> inode_state_clear(inode, I_NEW | I_CREATING);
> inode_wake_up_bit(inode, __I_NEW);
> spin_unlock(&inode->i_lock);
It's a bit sad you're reintroducing part of the barries you've removed in
a27628f43634 ("fs: rework I_NEW handling to operate without fences") but I
guess it's worth it.
> +/*
> + * igrab_try_lockless - special inode refcount acquire primitive for the inode hash
> + * (don't use elsewhere!)
Why is this special for inode hash? Because of I_NEW & I_CREATING checks?
Then I'd call this function igrab_from_hash() or something like that and
integrate there also all the i_state checking and waiting common between
find_inode() and find_inode_fast().
Otherwise the patch looks good.
Honza
> + *
> + * It provides lockless refcount acquire in the common case of no problematic
> + * flags being set and the count being > 0.
> + *
> + * There are 4 state flags to worry about and the routine makes sure to not bump the
> + * ref if any of them is present.
> + *
> + * I_NEW and I_CREATING can only legally get set *before* the inode becomes visible
> + * during lookup. Thus if the flags are not spotted, they are guaranteed to not be
> + * a factor. However, we need an acquire fence before returning the inode just
> + * in case we raced against clearing the state to make sure our consumer picks up
> + * any other changes made prior. atomic_add_unless provides a full fence, which
> + * takes care of it.
> + *
> + * I_FREEING and I_WILL_FREE can only legally get set if ->i_count == 0 and it is
> + * illegal to bump the ref if either is present. Consequently if atomic_add_unless
> + * managed to replace a non-0 value with a bigger one, we have a guarantee neither
> + * of these flags is set. Note this means explicitly checking of these flags below
> + * is not necessary, it is only done because it does not cost anything on top of the
> + * load which already needs to be done to handle the other flags.
> + */
> +static bool igrab_try_lockless(struct inode *inode)
> +{
> + if (inode_state_read_once(inode) & (I_NEW | I_CREATING | I_FREEING | I_WILL_FREE))
> + return false;
> + /*
> + * Paired with routines clearing I_NEW
> + */
> + if (atomic_add_unless(&inode->i_count, 1, 0)) {
> + VFS_BUG_ON_INODE(inode_state_read_once(inode) & (I_FREEING | I_WILL_FREE), inode);
> + return true;
> + }
> + return false;
> +}
> +
> /**
> * ilookup5_nowait - search for an inode in the inode cache
> * @sb: super block of file system to search
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 07363fce4406..119e0a3d2f42 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2234,8 +2234,8 @@ static inline int icount_read_once(const struct inode *inode)
> }
>
> /*
> - * returns the refcount on the inode. The lock guarantees no new references
> - * are added, but references can be dropped as long as the result is > 0.
> + * returns the refcount on the inode. The lock guarantees no 0->1 or 1->0 transitions
> + * of the count are going to take place, otherwise it changes arbitrarily.
> */
> static inline int icount_read(const struct inode *inode)
> {
> --
> 2.48.1
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR