Re: [PATCH] NFS: ensure nfs_safe_remove() atomic nlink drop

From: Trond Myklebust
Date: Mon Nov 17 2025 - 17:42:04 EST


On Mon, 2025-11-17 at 13:57 -0500, Aiden Lambert wrote:
> On Mon, Nov 17, 2025 at 01:24:54PM -0500, Trond Myklebust wrote:
> > On Mon, 2025-11-17 at 13:03 -0500, Aiden Lambert wrote:
> > > A race condition occurs when both unlink() and link() are running
> > > concurrently on the same inode, and the nlink count from the nfs
> > > server
> > > received in link()->nfs_do_access() clobbers the nlink count of
> > > the
> > > inode in nfs_safe_remove() after the "remove" RPC is made to the
> > > server
> > > but before we decrement the link count. If the nlink value from
> > > nfs_do_access() reflects the decremented nlink of the "remove"
> > > RPC, a
> > > double decrement occurs, which can lead to the dropping of the
> > > client
> > > side inode, causing the link call to return ENOENT. To fix this,
> > > we
> > > record an expected nlink value before the "remove" RPC and
> > > compare it
> > > with the value afterwards---if these two are the same, the drop
> > > is
> > > performed. Note that this does not take into account nlink values
> > > that
> > > are a result of multi-client (un)link operations as these are not
> > > guaranteed to be atomic by the NFS spec.
> >
> >
> > Why do we end up running nfs_do_access() at all in the above test?
> > That
> > sounds like a bug. We shouldn't ever need to validate if we can
> > create
> > or delete things using ACCESS. That just ends up producing an
> > unnecessary TOCTOU race.
> >
> >
>
> It seems that the call chain is
> 1. vfs_link()
> 2. may_create()
> 3. inode_permission()/nfs_permission() which fails to get a cached
> value as the cache is
> invalidated by the (un)link operations
> 4. nfs_do_access()

I'm still confused. Why wouldn't the S_IFDIR case in nfs_permission()
be filtering away the call to nfs_do_access()?

Either way, I think this cannot be fixed without treating it as a
generic attribute race. Your patch is sort of correct, but there are
loopholes that it won't catch either, so let's just fix it the way that
we fix all attribute races: by using the generation counter to tell us
when such a race may have occurred.

8<---------------------------------------