Re: [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files

From: Jeff Layton

Date: Wed Jun 03 2026 - 15:17:18 EST


On Wed, 2026-06-03 at 11:20 -0700, Chuck Lever wrote:
>
> On Wed, Jun 3, 2026, at 10:50 AM, Jeff Layton wrote:
> > On Wed, 2026-06-03 at 10:33 -0700, Chuck Lever wrote:
> > >
> > > On Tue, Jun 2, 2026, at 9:23 AM, Jeff Layton wrote:
> > > > Take a net-namespace reference in nfsd_file_dispose_list_delayed()
> > > > (get_net) and release it in nfsd_file_free() (put_net), so that
> > > > nf_net is always valid for files that the GC or shrinker has isolated
> > > > from the hash table and LRU -- which __nfsd_file_cache_purge() cannot
> > > > see.
> > > >
> > > > Without this, nf_net can dangle for in-flight files whose net namespace
> > > > is torn down concurrently, causing a use-after-free when
> > > > nfsd_file_dispose_list_delayed() calls net_generic(nf->nf_net, ...).
> > > >
> > > > Only files entering the delayed-dispose path need the net reference;
> > > > files freed synchronously (non-GC stateful opens, purge, direct put)
> > > > are always freed while the net namespace is still alive, so they skip
> > > > get_net()/put_net() entirely. A new NFSD_FILE_NET_HELD flag tracks
> > > > whether a given nfsd_file holds a net reference.
> > > >
> > > > Because nfsd_file_free() may now call put_net(nf->nf_net), the old
> > > > nfsd_file_put_local() pattern of returning nf->nf_net after
> > > > nfsd_file_put() is unsafe -- put_net() could theoretically drop the
> > > > last net namespace reference, leaving the returned pointer stale.
> > > > Fix this by moving the nfsd_net_put() call into nfsd_file_put_local()
> > > > itself, before the nfsd_file_put() that may trigger nfsd_file_free().
> > > > The function now returns void and the caller no longer needs to handle
> > > > the net reference.
> > > >
> > > > Fixes: 43fd953fa7e2 ("nfsd: simplify the delayed disposal list code")
> > > > Assisted-by: Claude:claude-opus-4-6
> > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > > ---
> > > > fs/nfsd/filecache.c | 34 ++++++++++++++++++++++++++--------
> > > > fs/nfsd/filecache.h | 3 ++-
> > > > fs/nfsd/localio.c | 4 ++--
> > > > include/linux/nfslocalio.h | 9 ++-------
> > > > 4 files changed, 32 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > index 03f01a0beced..957fe57be063 100644
> > > > --- a/fs/nfsd/filecache.c
> > > > +++ b/fs/nfsd/filecache.c
> > > > @@ -295,6 +295,9 @@ nfsd_file_free(struct nfsd_file *nf)
> > > > if (WARN_ON_ONCE(!list_empty(&nf->nf_lru)))
> > > > return;
> > > >
> > > > + if (test_bit(NFSD_FILE_NET_HELD, &nf->nf_flags))
> > > > + put_net(nf->nf_net);
> > > > +
> > > > call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
> > > > }
> > > >
> > > > @@ -375,24 +378,28 @@ nfsd_file_put(struct nfsd_file *nf)
> > > > }
> > > >
> > > > /**
> > > > - * nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
> > > > + * nfsd_file_put_local - put nfsd_file reference and release nfsd_net ref
> > > > * @pnf: nfsd_file of which to put the reference
> > > > *
> > > > - * First save the associated net to return to caller, then put
> > > > - * the reference of the nfsd_file.
> > > > + * Drops both the nfsd_file reference and the associated nfsd_net
> > > > + * reference. The nfsd_net ref is released before the file ref so
> > > > + * that put_net() inside nfsd_file_free() cannot drop the last net
> > > > + * namespace reference while the caller still needs it.
> > > > */
> > > > -struct net *
> > > > +void
> > > > nfsd_file_put_local(struct nfsd_file __rcu **pnf)
> > > > {
> > > > struct nfsd_file *nf;
> > > > - struct net *net = NULL;
> > > >
> > > > nf = unrcu_pointer(xchg(pnf, NULL));
> > > > if (nf) {
> > > > - net = nf->nf_net;
> > > > + struct net *net = nf->nf_net;
> > > > +
> > > > + rcu_read_lock();
> > > > + nfsd_net_put(net);
> > > > + rcu_read_unlock();
> > > > nfsd_file_put(nf);
> > > > }
> > > > - return net;
> > > > }
> > > >
> > > > /**
> > > > @@ -433,9 +440,20 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
> > > > while (!list_empty(dispose)) {
> > > > struct nfsd_file *nf = list_first_entry(dispose,
> > > > struct nfsd_file, nf_gc);
> > > > - struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
> > > > + struct nfsd_net *nn;
> > > > struct svc_serv *serv;
> > > >
> > > > + /*
> > > > + * Pin the net namespace so nf_net stays valid while the
> > > > + * file sits on the per-net dispose list. Callers (the GC
> > > > + * worker, shrinker, and fsnotify callbacks) always run
> > > > + * before nfsd_net_exit(), so nf_net is still live here.
> > > > + * The matching put_net() is in nfsd_file_free().
> > > > + */
> > > > + get_net(nf->nf_net);
> > > > + set_bit(NFSD_FILE_NET_HELD, &nf->nf_flags);
> > > > +
> > > > + nn = net_generic(nf->nf_net, nfsd_net_id);
> > > > spin_lock(&nn->fcache_dispose_lock);
> > > > list_move_tail(&nf->nf_gc, &nn->fcache_dispose_list);
> > > > spin_unlock(&nn->fcache_dispose_lock);
> > > > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> > > > index 683b6437cacc..7ae3c0ea0a2a 100644
> > > > --- a/fs/nfsd/filecache.h
> > > > +++ b/fs/nfsd/filecache.h
> > > > @@ -45,6 +45,7 @@ struct nfsd_file {
> > > > #define NFSD_FILE_REFERENCED (2)
> > > > #define NFSD_FILE_GC (3)
> > > > #define NFSD_FILE_RECENT (4)
> > > > +#define NFSD_FILE_NET_HELD (5)
> > > > unsigned long nf_flags;
> > > > refcount_t nf_ref;
> > > > unsigned char nf_may;
> > > > @@ -66,7 +67,7 @@ void nfsd_file_cache_shutdown(void);
> > > > int nfsd_file_cache_start_net(struct net *net);
> > > > void nfsd_file_cache_shutdown_net(struct net *net);
> > > > void nfsd_file_put(struct nfsd_file *nf);
> > > > -struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
> > > > +void nfsd_file_put_local(struct nfsd_file __rcu **nf);
> > > > struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> > > > struct file *nfsd_file_file(struct nfsd_file *nf);
> > > > void nfsd_file_close_inode_sync(struct inode *inode);
> > > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > > index c3eb0557b3e1..e3295bae75a4 100644
> > > > --- a/fs/nfsd/localio.c
> > > > +++ b/fs/nfsd/localio.c
> > > > @@ -40,8 +40,8 @@
> > > > * avoid all the NFS overhead with reads, writes and commits.
> > > > *
> > > > * On successful return, returned nfsd_file will have its nf_net member
> > > > - * set. Caller (NFS client) is responsible for calling nfsd_net_put and
> > > > - * nfsd_file_put (via nfs_to_nfsd_file_put_local).
> > > > + * set. Caller (NFS client) is responsible for calling nfsd_file_put
> > > > + * (via nfs_to_nfsd_file_put_local), which also releases the nfsd_net
> > > > ref.
> > > > */
> > > > static struct nfsd_file *
> > > > nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
> > > > diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> > > > index 3d91043254e6..7267a69092d1 100644
> > > > --- a/include/linux/nfslocalio.h
> > > > +++ b/include/linux/nfslocalio.h
> > > > @@ -62,7 +62,7 @@ struct nfsd_localio_operations {
> > > > const struct nfs_fh *,
> > > > struct nfsd_file __rcu **pnf,
> > > > const fmode_t);
> > > > - struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
> > > > + void (*nfsd_file_put_local)(struct nfsd_file __rcu **);
> > > > struct file *(*nfsd_file_file)(struct nfsd_file *);
> > > > void (*nfsd_file_dio_alignment)(struct nfsd_file *,
> > > > u32 *, u32 *, u32 *);
> > > > @@ -96,12 +96,7 @@ static inline void nfs_to_nfsd_file_put_local(struct
> > > > nfsd_file __rcu **localio)
> > > > * must prevent nfsd shutdown from completing as nfs_close_local_fh()
> > > > * does by blocking the nfs_uuid from being finally put.
> > > > */
> > > > - struct net *net;
> > > > -
> > > > - net = nfs_to->nfsd_file_put_local(localio);
> > > > -
> > > > - if (net)
> > > > - nfs_to_nfsd_net_put(net);
> > > > + nfs_to->nfsd_file_put_local(localio);
> > > > }
> > > >
> > > > #else /* CONFIG_NFS_LOCALIO */
> > > >
> > > > --
> > > > 2.54.0
> > >
> > > It seems that all of the LLM reviewers have difficulty with this patch.
> > > This is a consolidated review of the issue from Claude and Codex:
> > >
> > > > The reordering in nfsd_file_put_local() -- nfsd_net_put() before
> > > > nfsd_file_put() -- introduces a shutdown race.
> > > >
> > > > The nfsd_net_ref percpu refcount is taken only by LOCALIO
> > > > (nfsd_open_local_fh() and nfs_open_local_fh()). The drain wait in
> > > > nfsd_shutdown_net() (wait_for_completion(&nn->nfsd_net_free_done))
> > > > is what holds off percpu_ref_exit() and nfsd_shutdown_generic() --
> > > > and through the latter, nfsd_file_cache_shutdown(), which runs
> > > > rcu_barrier() and then destroys nfsd_file_slab, nfsd_file_mark_slab,
> > > > the fsnotify groups, and the rhltable.
> > > >
> > > > Per-I/O references are not covered by the nfs_uuid handshake. Each
> > > > pgio call takes its own nfsd_file ref plus a paired nfsd_net ref
> > > > (fs/nfs/pagelist.c, nfs_local_open_fh), stores it in iocb->localio,
> > > > and releases it at completion through nfsd_file_put_local(). An
> > > > iocb is not on nfs_uuid->files, so nfs_localio_invalidate_clients()
> > > > does not wait for it; only the drain wait does. Meanwhile
> > > > __nfsd_file_cache_purge() has already unhashed the nfsd_file but
> > > > cannot free it (the iocb ref keeps refcount elevated in
> > > > nfsd_file_cond_queue()).
> > > >
> > > > So with one I/O in flight when the server is stopped: the shutdown
> > > > thread parks at the drain wait; the I/O completion thread enters
> > > > nfsd_file_put_local() and drops the last nfsd_net ref, which runs
> > > > complete() before nfsd_file_put() has executed. The shutdown thread
> > > > then proceeds through nfsd_file_cache_shutdown() concurrently with
> > > > the final nfsd_file_free(): the call_rcu() is queued after the
> > > > rcu_barrier(), so nfsd_file_slab_free() does kmem_cache_free() into
> > > > a destroyed cache, and nfsd_file_mark_put() runs against a destroyed
> > > > fsnotify group. kmem_cache_destroy() also fires "objects remaining"
> > > > because the nfsd_file is still allocated.
> > > >
> > > > The old ordering was the mechanism that prevented this: the caller
> > > > held its paired nfsd_net ref across nfsd_file_put(), and percpu_ref
> > > > guarantees the release callback runs only after every ref is
> > > > dropped, so global teardown strictly followed the file free and the
> > > > rcu_barrier() flushed its call_rcu().
> > > >
> > > > The hazard the commit message cites for the reorder cannot occur on
> > > > this path: NFSD_FILE_NET_HELD is set only in
> > > > nfsd_file_dispose_list_delayed(), reachable only through
> > > > refcount_dec_if_one() in nfsd_file_lru_cb(), i.e. at refcount == 1.
> > > > A file with an outstanding LOCALIO reference has refcount >= 2, so
> > > > a file whose final put arrives via nfsd_file_put_local() never has
> > > > NET_HELD set and its nfsd_file_free() never calls put_net().
> > > >
> > > > Suggest keeping the void API but restoring the put order:
> > > >
> > > > nf = unrcu_pointer(xchg(pnf, NULL));
> > > > if (nf) {
> > > > struct net *net = nf->nf_net;
> > > >
> > > > nfsd_file_put(nf);
> > > > rcu_read_lock();
> > > > nfsd_net_put(net);
> > > > rcu_read_unlock();
> > > > }
> > > >
> > > > with the kdoc comment and the commit message paragraph about the
> > > > old ordering being unsafe adjusted to match.
> > >
> >
> >
> > I had claude review this and it says:
> >
> > ● This is the same concern I just addressed for the previous patch's
> > Finding 3, restated as a
> > critical bug. The answer is the same: this is a false positive.
> >
> > The reviewer's scenario requires:
> >
> > 1. The global shrinker unhashes and isolates an nfsd_file onto a
> > local dispose list
> > 2. The net namespace teardown completes and struct net is freed
> > 3. The shrinker resumes and calls get_net() on freed memory
>
> That rebuttal answers a different finding. The three-step scenario it
> refutes is the one sashiko raised against the get_net() placement in
> nfsd_file_dispose_list_delayed(). The review quoted above is about
> the put order in nfsd_file_put_local() and does not involve struct
> net's refcount at any step. Two new issues appear with 8/9:
>
> 1. Put order in nfsd_file_put_local()
>
> The question is narrow: after an I/O completion thread executes
> nfsd_net_put() -- dropping the last nfsd_net_ref and running
> complete(&nn->nfsd_net_free_done) -- what prevents nfsd_shutdown_net()
> from continuing through percpu_ref_exit() and nfsd_shutdown_generic()
> into nfsd_file_cache_shutdown() before that same thread executes the
> nfsd_file_put() on the next line?
>
> In the current code the answer is the ref itself: the caller holds it
> across nfsd_file_put(), and percpu_ref runs the release callback only
> after every ref drops, so global teardown strictly follows the file
> free and the rcu_barrier() in nfsd_file_cache_shutdown() flushes the
> call_rcu() that nfsd_file_free() queued. The patch removes that
> ordering and installs nothing in its place.
>
> The nfs_uuid handshake does not cover this path: each pgio holds its
> own nfsd_file + nfsd_net ref pair (fs/nfs/pagelist.c, stored in
> iocb->localio), released at I/O completion through
> nfsd_file_put_local(), and an iocb is not on nfs_uuid->files. The
> purge has already unhashed the file but cannot free it
> (nfsd_file_cond_queue() sees the elevated refcount), so the
> completion thread's put is the final one and its nfsd_file_free()
> races kmem_cache_destroy(nfsd_file_slab).
>
> The stale-net hazard the commit message cites cannot occur on this
> path: NFSD_FILE_NET_HELD is set only via refcount_dec_if_one() in
> nfsd_file_lru_cb(), i.e. at refcount == 1, and a file with an
> outstanding LOCALIO reference has refcount >= 2. So the fix is to
> keep the void API but put the file ref first, then the net ref.
>
> 2. get_net() placement in nfsd_file_dispose_list_delayed()
>
> The rebuttal's struct-net argument does not hold for the files in
> question: they are no longer on the LRU (nfsd_file_lru_cb() unhashed
> and isolated them onto the worker's private list), and an nfsd_file
> holds no net reference -- that absence is what this patch is fixing.
>
> The commit message itself states the premise: the purge cannot see
> these files and nf_net can dangle. With a second nfsd-serving netns
> keeping nfsd_users > 0, nothing quiesces the shrinker or the
> laundrette during per-net teardown (cancel_delayed_work_sync() and
> shrinker_free() run only in global shutdown).
>
> A worker preempted between isolating a file and calling
> dispose_list_delayed() can resume after cleanup_net() has freed the
> namespace, at which point get_net(nf->nf_net), net_generic(), and
> the fcache_dispose_lock all touch freed memory. The get_net() sits
> at the consuming end of the window it is meant to close. v1 took
> the reference in nfsd_file_alloc(), the one place guaranteed to run
> while the net is alive; the v2 plan of taking it at alloc time for
> GC-capable files only would address Al's cacheline concern without
> reopening the window.
>
>
> As a process note, I'll note that I haven't found any non-pre-
> existing issues with the other patches on this series.
>

Ok, I'm convinced, and the fix looks fairly straightforward.

If you're ok with merging the rest of the set, I'll plan to resend this
one separately.

Thanks,
--
Jeff Layton <jlayton@xxxxxxxxxx>