Re: [PATCH] fuse: fix race between inode/dentry invalidation and readdir

From: Joanne Koong

Date: Fri Apr 24 2026 - 15:35:44 EST


On Fri, Apr 24, 2026 at 6:53 AM Luis Henriques <luis@xxxxxxxxxx> wrote:
>
> When there's a readdir in progress, doing a FUSE_NOTIFY_INVAL_{INODE,ENTRY}
> on an inode or dentry may result in stale directory info being cached. This
> is because the invalidation does not reset the readdir cache.
>
> This patch fixes this issue by adding a call to fuse_rdc_reset() (modified
> to include the required locking) to these two operations, allowing the
> readdir cache to be invalidated while it's being filled-in.

Hi Luis,

Just curious, are you hitting this issue in practice or is this mostly
theoretical?

afaict for fuse_notify_inval_entry(), it calls into
fuse_reverse_inval_entry() -> fuse_dir_changed(parent), which calls
inode_maybe_inc_iversion(). afaict, this actually increments i_version
(since I_VERSION_QUERIED flag was set when the cache's iversion was
initialized with inode_query_iversion() in fuse_readdir_cached()),
which means the next readdir call will detect this version change and
call fuse_rdc_reset() (in fuse_readdir_cached()). I'm not sure I see
how this leads to stale directory info lingering in the cache after a
concurrent fuse_notify_inval_entry()?

For teh fuse_notify_inval_inode() case, which I'm assuming is the case
you're running into where the directory is the inode being
invalidated, I see the call to fuse_reverse_inval_inode() which calls
invalidate_inode_pages2_range() if the offset was non-negative, which
will invalidate the readdir cache's pages, which means on the next
readdir call, will already call fuse_rdc_reset() when it detects the
missing page in the cache (in fuse_readdir_cached()). So I'm not
really seeing how this can happen either for the
fuse_notify_inval_inode() case? Unless you are passing a negative
offset, but as I understand it, passing a negative offset is used only
if the server wants attributes invalidated [1], not any data.

afaics, the onlyy stale directory info returned would be for the case
for a concurrent readdir that has already passed the pos == 0
iversion/mtime check when the invalidation arrives, but that seems
like a server synchronization issue, eg if the server wants uptodate
data when doing a concurrent readdir and invalidation, they have to
order that themselves. ANy fresh lookup after that though, I think
wouldalways return fresh/non-stale data for the reasons mentioned
above.

Does this align with what you're seeing in the code or am I missing
something here?

>
> Assisted-by: Claude:claude-opus-4-5
> Signed-off-by: Luis Henriques <luis@xxxxxxxxxx>
> ---
> fs/fuse/dir.c | 5 +++--
> fs/fuse/fuse_i.h | 13 +++++++++++++
> fs/fuse/inode.c | 1 +
> fs/fuse/readdir.c | 6 +++---
> 4 files changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 7ac6b232ef12..6e5851de3613 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1615,6 +1615,7 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
> if (!(flags & FUSE_EXPIRE_ONLY))
> d_invalidate(entry);
> fuse_invalidate_entry_cache(entry);
> + fuse_rdc_reset(entry->d_inode);

Hmm... I think this resets the child's readdir cache but it's the
parent's readdir cache that would have to be invalidated, so would
this have to be fuse_rdc_reset(parent)?

Thanks,
Joanne

[1] https://libfuse.github.io/doxygen/fuse__lowlevel_8h.html#a9cb974af9745294ff446d11cba2422f1

>
> if (child_nodeid != 0) {
> inode_lock(d_inode(entry));
> @@ -1637,7 +1638,7 @@ int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
> dont_mount(entry);
> clear_nlink(d_inode(entry));
> err = 0;
> - badentry:
> +badentry:
> inode_unlock(d_inode(entry));
> if (!err)
> d_delete(entry);