Re: Re: Re: Re: [PATCH] fuse: fix inode initialization race

From: Joanne Koong

Date: Thu Mar 26 2026 - 14:49:12 EST

On Thu, Mar 26, 2026 at 11:11 AM Horst Birthelmer <horst@xxxxxxxxxxxxx> wrote:
>
> On Thu, Mar 26, 2026 at 11:00:54AM -0700, Joanne Koong wrote:
> > On Thu, Mar 26, 2026 at 10:54 AM Horst Birthelmer <horst@xxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Mar 26, 2026 at 09:43:00AM -0700, Joanne Koong wrote:
> > > > On Thu, Mar 26, 2026 at 8:48 AM Horst Birthelmer <horst@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, Mar 26, 2026 at 04:19:24PM +0100, Miklos Szeredi wrote:
> > > > > > On Thu, 26 Mar 2026 at 16:13, Bernd Schubert <bernd@xxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 3/26/26 15:26, Christian Brauner wrote:
> > > > > > > > On Wed, Mar 25, 2026 at 08:54:57AM +0100, Bernd Schubert wrote:
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On 3/18/26 14:43, Horst Birthelmer wrote:
> > > > > > > >>> From: Horst Birthelmer <hbirthelmer@xxxxxxx>
> > > > > >
> > > > > > > >>> fi->attr_version = atomic64_inc_return(&fc->attr_version);
> > > > > > > >>> + wake_up_all(&fc->attr_version_waitq);
> > > > > > > >>> fi->i_time = attr_valid;
> > > > > > >
> > > > > > >
> > > > > > > While I'm looking at this again, wouldn't it make sense to make this
> > > > > > > conditional? Because we wake this queue on every attr change for every
> > > > > > > inode. And the conditional in fuse_iget() based on I_NEW?
> > > > > >
> > > > > > Right, should only wake if fi->attr_version old value was zero.
> > > > > >
> > > > > > BTW I have a hunch that there are better solutions, but it's simple
> > > > > > enough as a stopgap measure.
> > > > >
> > > > > OK, I'll send a new version.
> > > > >
> > > > > Just out of curiosity, what would be a better solution?
> > > >
> > > > I'm probably missing something here but why can't we just call the
> > > >
> > > > fi = get_fuse_inode(inode);
> > > > spin_lock(&fi->lock);
> > > > fi->nlookup++;
> > > > spin_unlock(&fi->lock);
> > > > fuse_change_attributes_i(inode, attr, NULL, attr_valid, attr_version,
> > > > evict_ctr);
> > > >
> > > > logic before releasing the inode lock (the unlock_new_inode() call) in
> > > > fuse_iget() to address the race? unlock_new_inode() clears I_NEW so
> > > > fuse_reverse_inval_inode()'s fuse_ilookup() would only get the inode
> > > > after the attributes initialization has finished.
> > > >
> > > > As I understand it, fuse_change_attributes_i() would be pretty
> > > > straightforward / fast for I_NEW inodes, as it doesn't send any
> > > > synchronous requests and for the I_NEW case the
> > > > invalidate_inode_pages2() and truncate_pagecache() calls would get
> > > > skipped. (truncate_pagecache() getting skipped because inode->i_size
> > > > is already attr->size from fuse_init_inode(), so "oldsize !=
> > > > attr->size" is never true; and invalidate_inode_pages2() getting
> > > > skipped because "oldsize != attr->size" is never true and "if
> > > > (!timespec64_equal(&old_mtime, &new_mtime))" is never true because
> > > > fuse_init_inode() initialized the inode's mtime to attr->mtime).
> > >
> > > You understand the pretty well, I think.
> > > The problem I have there is that fuse_change_attributes_i() takes
> > > its own lock.
> > > That would be a pretty big operation to split that function.
> >
> > I believe fuse_change_attribtues_i() takes the fi lock, not the inode
> > lock, so this should be fine.
> >
>
> Yes, I got confused there, sorry.
> Still, a pretty big change for a corner case. Don't you think?

I don't think it's a big change. It would just be this:

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 84f78fb89d35..bd0ec405823e 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -470,6 +470,7 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
struct inode *inode;
struct fuse_inode *fi;
struct fuse_conn *fc = get_fuse_conn_super(sb);
+ bool is_new_inode = false;

/*
* Auto mount points get their node id from the submount root, which is
@@ -505,13 +506,13 @@ struct inode *fuse_iget(struct super_block *sb,
u64 nodeid,
if (!inode)
return NULL;

- if ((inode_state_read_once(inode) & I_NEW)) {
+ is_new_inode = inode_state_read_once(inode) & I_NEW;
+ if (is_new_inode) {
inode->i_flags |= S_NOATIME;
if (!fc->writeback_cache || !S_ISREG(attr->mode))
inode->i_flags |= S_NOCMTIME;
inode->i_generation = generation;
fuse_init_inode(inode, attr, fc);
- unlock_new_inode(inode);
} else if (fuse_stale_inode(inode, generation, attr)) {
/* nodeid was reused, any I/O on the old inode should fail */
fuse_make_bad(inode);
@@ -528,6 +529,9 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
done:
fuse_change_attributes_i(inode, attr, NULL, attr_valid, attr_version,
evict_ctr);
+ if (is_new_inode)
+ unlock_new_inode(inode);
+
return inode;
}

which imo is pretty minimal.

>
> What would be the advantage to the current situation with the requested
> changes from Miklos and Bernd, of course? So that we only do the
> wakeup_all() call only if we have initialized a new inode?

I think this is the proper fix because it eliminates the race
altogether and it to me is a lot cleaner than trying to work around
the race with the waitqueue/wakeup stuff.

Thanks,
Joanne

>
> > Thanks,
> > Joanne
> > >
> > > Is that required for this small (as Miklos put it, rare) case?
> > >
> > > If you guys want me to do that, I will.
> > >
> > > Thanks,
> > > Horst