Re: [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files
From: Chuck Lever
Date: Wed Jun 03 2026 - 13:36:38 EST
On Tue, Jun 2, 2026, at 9:23 AM, Jeff Layton wrote:
> Take a net-namespace reference in nfsd_file_dispose_list_delayed()
> (get_net) and release it in nfsd_file_free() (put_net), so that
> nf_net is always valid for files that the GC or shrinker has isolated
> from the hash table and LRU -- which __nfsd_file_cache_purge() cannot
> see.
>
> Without this, nf_net can dangle for in-flight files whose net namespace
> is torn down concurrently, causing a use-after-free when
> nfsd_file_dispose_list_delayed() calls net_generic(nf->nf_net, ...).
>
> Only files entering the delayed-dispose path need the net reference;
> files freed synchronously (non-GC stateful opens, purge, direct put)
> are always freed while the net namespace is still alive, so they skip
> get_net()/put_net() entirely. A new NFSD_FILE_NET_HELD flag tracks
> whether a given nfsd_file holds a net reference.
>
> Because nfsd_file_free() may now call put_net(nf->nf_net), the old
> nfsd_file_put_local() pattern of returning nf->nf_net after
> nfsd_file_put() is unsafe -- put_net() could theoretically drop the
> last net namespace reference, leaving the returned pointer stale.
> Fix this by moving the nfsd_net_put() call into nfsd_file_put_local()
> itself, before the nfsd_file_put() that may trigger nfsd_file_free().
> The function now returns void and the caller no longer needs to handle
> the net reference.
>
> Fixes: 43fd953fa7e2 ("nfsd: simplify the delayed disposal list code")
> Assisted-by: Claude:claude-opus-4-6
> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> ---
> fs/nfsd/filecache.c | 34 ++++++++++++++++++++++++++--------
> fs/nfsd/filecache.h | 3 ++-
> fs/nfsd/localio.c | 4 ++--
> include/linux/nfslocalio.h | 9 ++-------
> 4 files changed, 32 insertions(+), 18 deletions(-)
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 03f01a0beced..957fe57be063 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -295,6 +295,9 @@ nfsd_file_free(struct nfsd_file *nf)
> if (WARN_ON_ONCE(!list_empty(&nf->nf_lru)))
> return;
>
> + if (test_bit(NFSD_FILE_NET_HELD, &nf->nf_flags))
> + put_net(nf->nf_net);
> +
> call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
> }
>
> @@ -375,24 +378,28 @@ nfsd_file_put(struct nfsd_file *nf)
> }
>
> /**
> - * nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
> + * nfsd_file_put_local - put nfsd_file reference and release nfsd_net ref
> * @pnf: nfsd_file of which to put the reference
> *
> - * First save the associated net to return to caller, then put
> - * the reference of the nfsd_file.
> + * Drops both the nfsd_file reference and the associated nfsd_net
> + * reference. The nfsd_net ref is released before the file ref so
> + * that put_net() inside nfsd_file_free() cannot drop the last net
> + * namespace reference while the caller still needs it.
> */
> -struct net *
> +void
> nfsd_file_put_local(struct nfsd_file __rcu **pnf)
> {
> struct nfsd_file *nf;
> - struct net *net = NULL;
>
> nf = unrcu_pointer(xchg(pnf, NULL));
> if (nf) {
> - net = nf->nf_net;
> + struct net *net = nf->nf_net;
> +
> + rcu_read_lock();
> + nfsd_net_put(net);
> + rcu_read_unlock();
> nfsd_file_put(nf);
> }
> - return net;
> }
>
> /**
> @@ -433,9 +440,20 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
> while (!list_empty(dispose)) {
> struct nfsd_file *nf = list_first_entry(dispose,
> struct nfsd_file, nf_gc);
> - struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
> + struct nfsd_net *nn;
> struct svc_serv *serv;
>
> + /*
> + * Pin the net namespace so nf_net stays valid while the
> + * file sits on the per-net dispose list. Callers (the GC
> + * worker, shrinker, and fsnotify callbacks) always run
> + * before nfsd_net_exit(), so nf_net is still live here.
> + * The matching put_net() is in nfsd_file_free().
> + */
> + get_net(nf->nf_net);
> + set_bit(NFSD_FILE_NET_HELD, &nf->nf_flags);
> +
> + nn = net_generic(nf->nf_net, nfsd_net_id);
> spin_lock(&nn->fcache_dispose_lock);
> list_move_tail(&nf->nf_gc, &nn->fcache_dispose_list);
> spin_unlock(&nn->fcache_dispose_lock);
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index 683b6437cacc..7ae3c0ea0a2a 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -45,6 +45,7 @@ struct nfsd_file {
> #define NFSD_FILE_REFERENCED (2)
> #define NFSD_FILE_GC (3)
> #define NFSD_FILE_RECENT (4)
> +#define NFSD_FILE_NET_HELD (5)
> unsigned long nf_flags;
> refcount_t nf_ref;
> unsigned char nf_may;
> @@ -66,7 +67,7 @@ void nfsd_file_cache_shutdown(void);
> int nfsd_file_cache_start_net(struct net *net);
> void nfsd_file_cache_shutdown_net(struct net *net);
> void nfsd_file_put(struct nfsd_file *nf);
> -struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
> +void nfsd_file_put_local(struct nfsd_file __rcu **nf);
> struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> struct file *nfsd_file_file(struct nfsd_file *nf);
> void nfsd_file_close_inode_sync(struct inode *inode);
> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> index c3eb0557b3e1..e3295bae75a4 100644
> --- a/fs/nfsd/localio.c
> +++ b/fs/nfsd/localio.c
> @@ -40,8 +40,8 @@
> * avoid all the NFS overhead with reads, writes and commits.
> *
> * On successful return, returned nfsd_file will have its nf_net member
> - * set. Caller (NFS client) is responsible for calling nfsd_net_put and
> - * nfsd_file_put (via nfs_to_nfsd_file_put_local).
> + * set. Caller (NFS client) is responsible for calling nfsd_file_put
> + * (via nfs_to_nfsd_file_put_local), which also releases the nfsd_net
> ref.
> */
> static struct nfsd_file *
> nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index 3d91043254e6..7267a69092d1 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -62,7 +62,7 @@ struct nfsd_localio_operations {
> const struct nfs_fh *,
> struct nfsd_file __rcu **pnf,
> const fmode_t);
> - struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
> + void (*nfsd_file_put_local)(struct nfsd_file __rcu **);
> struct file *(*nfsd_file_file)(struct nfsd_file *);
> void (*nfsd_file_dio_alignment)(struct nfsd_file *,
> u32 *, u32 *, u32 *);
> @@ -96,12 +96,7 @@ static inline void nfs_to_nfsd_file_put_local(struct
> nfsd_file __rcu **localio)
> * must prevent nfsd shutdown from completing as nfs_close_local_fh()
> * does by blocking the nfs_uuid from being finally put.
> */
> - struct net *net;
> -
> - net = nfs_to->nfsd_file_put_local(localio);
> -
> - if (net)
> - nfs_to_nfsd_net_put(net);
> + nfs_to->nfsd_file_put_local(localio);
> }
>
> #else /* CONFIG_NFS_LOCALIO */
>
> --
> 2.54.0
It seems that all of the LLM reviewers have difficulty with this patch.
This is a consolidated review of the issue from Claude and Codex:
> The reordering in nfsd_file_put_local() -- nfsd_net_put() before
> nfsd_file_put() -- introduces a shutdown race.
>
> The nfsd_net_ref percpu refcount is taken only by LOCALIO
> (nfsd_open_local_fh() and nfs_open_local_fh()). The drain wait in
> nfsd_shutdown_net() (wait_for_completion(&nn->nfsd_net_free_done))
> is what holds off percpu_ref_exit() and nfsd_shutdown_generic() --
> and through the latter, nfsd_file_cache_shutdown(), which runs
> rcu_barrier() and then destroys nfsd_file_slab, nfsd_file_mark_slab,
> the fsnotify groups, and the rhltable.
>
> Per-I/O references are not covered by the nfs_uuid handshake. Each
> pgio call takes its own nfsd_file ref plus a paired nfsd_net ref
> (fs/nfs/pagelist.c, nfs_local_open_fh), stores it in iocb->localio,
> and releases it at completion through nfsd_file_put_local(). An
> iocb is not on nfs_uuid->files, so nfs_localio_invalidate_clients()
> does not wait for it; only the drain wait does. Meanwhile
> __nfsd_file_cache_purge() has already unhashed the nfsd_file but
> cannot free it (the iocb ref keeps refcount elevated in
> nfsd_file_cond_queue()).
>
> So with one I/O in flight when the server is stopped: the shutdown
> thread parks at the drain wait; the I/O completion thread enters
> nfsd_file_put_local() and drops the last nfsd_net ref, which runs
> complete() before nfsd_file_put() has executed. The shutdown thread
> then proceeds through nfsd_file_cache_shutdown() concurrently with
> the final nfsd_file_free(): the call_rcu() is queued after the
> rcu_barrier(), so nfsd_file_slab_free() does kmem_cache_free() into
> a destroyed cache, and nfsd_file_mark_put() runs against a destroyed
> fsnotify group. kmem_cache_destroy() also fires "objects remaining"
> because the nfsd_file is still allocated.
>
> The old ordering was the mechanism that prevented this: the caller
> held its paired nfsd_net ref across nfsd_file_put(), and percpu_ref
> guarantees the release callback runs only after every ref is
> dropped, so global teardown strictly followed the file free and the
> rcu_barrier() flushed its call_rcu().
>
> The hazard the commit message cites for the reorder cannot occur on
> this path: NFSD_FILE_NET_HELD is set only in
> nfsd_file_dispose_list_delayed(), reachable only through
> refcount_dec_if_one() in nfsd_file_lru_cb(), i.e. at refcount == 1.
> A file with an outstanding LOCALIO reference has refcount >= 2, so
> a file whose final put arrives via nfsd_file_put_local() never has
> NET_HELD set and its nfsd_file_free() never calls put_net().
>
> Suggest keeping the void API but restoring the put order:
>
> nf = unrcu_pointer(xchg(pnf, NULL));
> if (nf) {
> struct net *net = nf->nf_net;
>
> nfsd_file_put(nf);
> rcu_read_lock();
> nfsd_net_put(net);
> rcu_read_unlock();
> }
>
> with the kdoc comment and the commit message paragraph about the
> old ordering being unsafe adjusted to match.
--
Chuck Lever