Re: [PATCH] fuse: fix race between inode/dentry invalidation and readdir

From: Joanne Koong

Date: Mon Apr 27 2026 - 08:54:34 EST

On Mon, Apr 27, 2026 at 10:23 AM Luis Henriques <luis@xxxxxxxxxx> wrote:
>
> Hi Joanne!
>
> On Fri, Apr 24 2026, Joanne Koong wrote:
>
> > On Fri, Apr 24, 2026 at 6:53 AM Luis Henriques <luis@xxxxxxxxxx> wrote:
> >>
> >> When there's a readdir in progress, doing a FUSE_NOTIFY_INVAL_{INODE,ENTRY}
> >> on an inode or dentry may result in stale directory info being cached. This
> >> is because the invalidation does not reset the readdir cache.
> >>
> >> This patch fixes this issue by adding a call to fuse_rdc_reset() (modified
> >> to include the required locking) to these two operations, allowing the
> >> readdir cache to be invalidated while it's being filled-in.
> >
> > Hi Luis,
> >
> > Just curious, are you hitting this issue in practice or is this mostly
> > theoretical?
> >
> > afaict for fuse_notify_inval_entry(), it calls into
> > fuse_reverse_inval_entry() -> fuse_dir_changed(parent), which calls
> > inode_maybe_inc_iversion(). afaict, this actually increments i_version
> > (since I_VERSION_QUERIED flag was set when the cache's iversion was
> > initialized with inode_query_iversion() in fuse_readdir_cached()),
> > which means the next readdir call will detect this version change and
> > call fuse_rdc_reset() (in fuse_readdir_cached()). I'm not sure I see
> > how this leads to stale directory info lingering in the cache after a
> > concurrent fuse_notify_inval_entry()?
> >
> > For teh fuse_notify_inval_inode() case, which I'm assuming is the case
> > you're running into where the directory is the inode being
> > invalidated, I see the call to fuse_reverse_inval_inode() which calls
> > invalidate_inode_pages2_range() if the offset was non-negative, which
> > will invalidate the readdir cache's pages, which means on the next
> > readdir call, will already call fuse_rdc_reset() when it detects the
> > missing page in the cache (in fuse_readdir_cached()). So I'm not
> > really seeing how this can happen either for the
> > fuse_notify_inval_inode() case? Unless you are passing a negative
> > offset, but as I understand it, passing a negative offset is used only
> > if the server wants attributes invalidated [1], not any data.
> >
> > afaics, the onlyy stale directory info returned would be for the case
> > for a concurrent readdir that has already passed the pos == 0
> > iversion/mtime check when the invalidation arrives, but that seems
> > like a server synchronization issue, eg if the server wants uptodate
> > data when doing a concurrent readdir and invalidation, they have to
> > order that themselves. ANy fresh lookup after that though, I think
> > wouldalways return fresh/non-stale data for the reasons mentioned
> > above.
> >
> > Does this align with what you're seeing in the code or am I missing
> > something here?

Hi Luis,

>
> First of all, thanks a lot for looking into this and for doing such a
> great description of the issue.
>
> So, I did had a report regarding a possible race between a readdir and
> invalidation when using keep_cache and cache_readdir. But, unfortunately,
> I don't have a lot of information regarding the actual issue, and it isn't
> something reproducible.
>
> Then, looking at the code (and, for full-disclosure, I've also looked at a
> claude analysis that was handed over to me) I could see a race that I'm
> trying to fix with this patch. But I believe it's the race that you claim
> above that it's a server synchronisation problem. For example, with a
> NOTIFY_INVAL_INODE operation, when fuse_reverse_inval_inode() is called
> while fuse_add_dirent_to_cache() is being executed in parallel, the
> iversion/mtime update could be missed.
>
> It is possible to hit this small race by instrumenting the code, and I
> could occasionally (and momentarily) see stale data while running readdir
> in such instrumented testing environment. Do you think that's something
> inherent to the usage of the INVAL_INODE op, and this race will need to be
> handled by user-space?

imo yes, that is not a bug in the kernel and userspace is responsible
for synchronizing/coordinating that. I think the kernel is just
responsible for ensuring that any subsequent readdirs are not stale,
but afaict the existing code handles that.

>
> In fact, the report I got seemed to indicate that the issue was not going
> away with a fresh lookup (though an 'echo 1 > /proc/sys/vm/drop_cache'
> would fix it). But maybe that's another indication that this is a problem
> in the user-space server.

that seems weird to me, maybe there's something else at play here in
addition to the concurrent race? Is there a repro for where the stale
data survives a fresh lookup?

Thanks,
Joanne

>
> Cheers,

> --
> Luís