Re: [PATCH] NFS: ensure nfs_safe_remove() atomic nlink drop

From: Aiden Lambert
Date: Mon Nov 17 2025 - 14:02:43 EST


On Mon, Nov 17, 2025 at 01:24:54PM -0500, Trond Myklebust wrote:
> On Mon, 2025-11-17 at 13:03 -0500, Aiden Lambert wrote:
> > A race condition occurs when both unlink() and link() are running
> > concurrently on the same inode, and the nlink count from the nfs
> > server
> > received in link()->nfs_do_access() clobbers the nlink count of the
> > inode in nfs_safe_remove() after the "remove" RPC is made to the
> > server
> > but before we decrement the link count. If the nlink value from
> > nfs_do_access() reflects the decremented nlink of the "remove" RPC, a
> > double decrement occurs, which can lead to the dropping of the client
> > side inode, causing the link call to return ENOENT. To fix this, we
> > record an expected nlink value before the "remove" RPC and compare it
> > with the value afterwards---if these two are the same, the drop is
> > performed. Note that this does not take into account nlink values
> > that
> > are a result of multi-client (un)link operations as these are not
> > guaranteed to be atomic by the NFS spec.
>
>
> Why do we end up running nfs_do_access() at all in the above test? That
> sounds like a bug. We shouldn't ever need to validate if we can create
> or delete things using ACCESS. That just ends up producing an
> unnecessary TOCTOU race.
>
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx

It seems that the call chain is
1. vfs_link()
2. may_create()
3. inode_permission()/nfs_permission() which fails to get a cached value as the cache is
invalidated by the (un)link operations
4. nfs_do_access()

My initial thought for the fix was to elevate the inode rwsem to
exclusive write access in nfs_do_access if the thread does not already
have it, but I figured this design of not locking was a conscious one.

A second thought was to add a new lock around all operations talking to
nfsd and their corresponding client side metadata updates, but that may
not be the way to go.

I have a proof of concept script that catches the symptoms of this race
if you would like me to upload it.