Re: [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files
From: Chuck Lever
Date: Wed Jun 03 2026 - 14:25:27 EST
On Wed, Jun 3, 2026, at 10:50 AM, Jeff Layton wrote:
> On Wed, 2026-06-03 at 10:33 -0700, Chuck Lever wrote:
>>
>> On Tue, Jun 2, 2026, at 9:23 AM, Jeff Layton wrote:
>> > Take a net-namespace reference in nfsd_file_dispose_list_delayed()
>> > (get_net) and release it in nfsd_file_free() (put_net), so that
>> > nf_net is always valid for files that the GC or shrinker has isolated
>> > from the hash table and LRU -- which __nfsd_file_cache_purge() cannot
>> > see.
>> >
>> > Without this, nf_net can dangle for in-flight files whose net namespace
>> > is torn down concurrently, causing a use-after-free when
>> > nfsd_file_dispose_list_delayed() calls net_generic(nf->nf_net, ...).
>> >
>> > Only files entering the delayed-dispose path need the net reference;
>> > files freed synchronously (non-GC stateful opens, purge, direct put)
>> > are always freed while the net namespace is still alive, so they skip
>> > get_net()/put_net() entirely. A new NFSD_FILE_NET_HELD flag tracks
>> > whether a given nfsd_file holds a net reference.
>> >
>> > Because nfsd_file_free() may now call put_net(nf->nf_net), the old
>> > nfsd_file_put_local() pattern of returning nf->nf_net after
>> > nfsd_file_put() is unsafe -- put_net() could theoretically drop the
>> > last net namespace reference, leaving the returned pointer stale.
>> > Fix this by moving the nfsd_net_put() call into nfsd_file_put_local()
>> > itself, before the nfsd_file_put() that may trigger nfsd_file_free().
>> > The function now returns void and the caller no longer needs to handle
>> > the net reference.
>> >
>> > Fixes: 43fd953fa7e2 ("nfsd: simplify the delayed disposal list code")
>> > Assisted-by: Claude:claude-opus-4-6
>> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
>> > ---
>> > fs/nfsd/filecache.c | 34 ++++++++++++++++++++++++++--------
>> > fs/nfsd/filecache.h | 3 ++-
>> > fs/nfsd/localio.c | 4 ++--
>> > include/linux/nfslocalio.h | 9 ++-------
>> > 4 files changed, 32 insertions(+), 18 deletions(-)
>> >
>> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>> > index 03f01a0beced..957fe57be063 100644
>> > --- a/fs/nfsd/filecache.c
>> > +++ b/fs/nfsd/filecache.c
>> > @@ -295,6 +295,9 @@ nfsd_file_free(struct nfsd_file *nf)
>> > if (WARN_ON_ONCE(!list_empty(&nf->nf_lru)))
>> > return;
>> >
>> > + if (test_bit(NFSD_FILE_NET_HELD, &nf->nf_flags))
>> > + put_net(nf->nf_net);
>> > +
>> > call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
>> > }
>> >
>> > @@ -375,24 +378,28 @@ nfsd_file_put(struct nfsd_file *nf)
>> > }
>> >
>> > /**
>> > - * nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
>> > + * nfsd_file_put_local - put nfsd_file reference and release nfsd_net ref
>> > * @pnf: nfsd_file of which to put the reference
>> > *
>> > - * First save the associated net to return to caller, then put
>> > - * the reference of the nfsd_file.
>> > + * Drops both the nfsd_file reference and the associated nfsd_net
>> > + * reference. The nfsd_net ref is released before the file ref so
>> > + * that put_net() inside nfsd_file_free() cannot drop the last net
>> > + * namespace reference while the caller still needs it.
>> > */
>> > -struct net *
>> > +void
>> > nfsd_file_put_local(struct nfsd_file __rcu **pnf)
>> > {
>> > struct nfsd_file *nf;
>> > - struct net *net = NULL;
>> >
>> > nf = unrcu_pointer(xchg(pnf, NULL));
>> > if (nf) {
>> > - net = nf->nf_net;
>> > + struct net *net = nf->nf_net;
>> > +
>> > + rcu_read_lock();
>> > + nfsd_net_put(net);
>> > + rcu_read_unlock();
>> > nfsd_file_put(nf);
>> > }
>> > - return net;
>> > }
>> >
>> > /**
>> > @@ -433,9 +440,20 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
>> > while (!list_empty(dispose)) {
>> > struct nfsd_file *nf = list_first_entry(dispose,
>> > struct nfsd_file, nf_gc);
>> > - struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
>> > + struct nfsd_net *nn;
>> > struct svc_serv *serv;
>> >
>> > + /*
>> > + * Pin the net namespace so nf_net stays valid while the
>> > + * file sits on the per-net dispose list. Callers (the GC
>> > + * worker, shrinker, and fsnotify callbacks) always run
>> > + * before nfsd_net_exit(), so nf_net is still live here.
>> > + * The matching put_net() is in nfsd_file_free().
>> > + */
>> > + get_net(nf->nf_net);
>> > + set_bit(NFSD_FILE_NET_HELD, &nf->nf_flags);
>> > +
>> > + nn = net_generic(nf->nf_net, nfsd_net_id);
>> > spin_lock(&nn->fcache_dispose_lock);
>> > list_move_tail(&nf->nf_gc, &nn->fcache_dispose_list);
>> > spin_unlock(&nn->fcache_dispose_lock);
>> > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
>> > index 683b6437cacc..7ae3c0ea0a2a 100644
>> > --- a/fs/nfsd/filecache.h
>> > +++ b/fs/nfsd/filecache.h
>> > @@ -45,6 +45,7 @@ struct nfsd_file {
>> > #define NFSD_FILE_REFERENCED (2)
>> > #define NFSD_FILE_GC (3)
>> > #define NFSD_FILE_RECENT (4)
>> > +#define NFSD_FILE_NET_HELD (5)
>> > unsigned long nf_flags;
>> > refcount_t nf_ref;
>> > unsigned char nf_may;
>> > @@ -66,7 +67,7 @@ void nfsd_file_cache_shutdown(void);
>> > int nfsd_file_cache_start_net(struct net *net);
>> > void nfsd_file_cache_shutdown_net(struct net *net);
>> > void nfsd_file_put(struct nfsd_file *nf);
>> > -struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
>> > +void nfsd_file_put_local(struct nfsd_file __rcu **nf);
>> > struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
>> > struct file *nfsd_file_file(struct nfsd_file *nf);
>> > void nfsd_file_close_inode_sync(struct inode *inode);
>> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
>> > index c3eb0557b3e1..e3295bae75a4 100644
>> > --- a/fs/nfsd/localio.c
>> > +++ b/fs/nfsd/localio.c
>> > @@ -40,8 +40,8 @@
>> > * avoid all the NFS overhead with reads, writes and commits.
>> > *
>> > * On successful return, returned nfsd_file will have its nf_net member
>> > - * set. Caller (NFS client) is responsible for calling nfsd_net_put and
>> > - * nfsd_file_put (via nfs_to_nfsd_file_put_local).
>> > + * set. Caller (NFS client) is responsible for calling nfsd_file_put
>> > + * (via nfs_to_nfsd_file_put_local), which also releases the nfsd_net
>> > ref.
>> > */
>> > static struct nfsd_file *
>> > nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
>> > diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
>> > index 3d91043254e6..7267a69092d1 100644
>> > --- a/include/linux/nfslocalio.h
>> > +++ b/include/linux/nfslocalio.h
>> > @@ -62,7 +62,7 @@ struct nfsd_localio_operations {
>> > const struct nfs_fh *,
>> > struct nfsd_file __rcu **pnf,
>> > const fmode_t);
>> > - struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
>> > + void (*nfsd_file_put_local)(struct nfsd_file __rcu **);
>> > struct file *(*nfsd_file_file)(struct nfsd_file *);
>> > void (*nfsd_file_dio_alignment)(struct nfsd_file *,
>> > u32 *, u32 *, u32 *);
>> > @@ -96,12 +96,7 @@ static inline void nfs_to_nfsd_file_put_local(struct
>> > nfsd_file __rcu **localio)
>> > * must prevent nfsd shutdown from completing as nfs_close_local_fh()
>> > * does by blocking the nfs_uuid from being finally put.
>> > */
>> > - struct net *net;
>> > -
>> > - net = nfs_to->nfsd_file_put_local(localio);
>> > -
>> > - if (net)
>> > - nfs_to_nfsd_net_put(net);
>> > + nfs_to->nfsd_file_put_local(localio);
>> > }
>> >
>> > #else /* CONFIG_NFS_LOCALIO */
>> >
>> > --
>> > 2.54.0
>>
>> It seems that all of the LLM reviewers have difficulty with this patch.
>> This is a consolidated review of the issue from Claude and Codex:
>>
>> > The reordering in nfsd_file_put_local() -- nfsd_net_put() before
>> > nfsd_file_put() -- introduces a shutdown race.
>> >
>> > The nfsd_net_ref percpu refcount is taken only by LOCALIO
>> > (nfsd_open_local_fh() and nfs_open_local_fh()). The drain wait in
>> > nfsd_shutdown_net() (wait_for_completion(&nn->nfsd_net_free_done))
>> > is what holds off percpu_ref_exit() and nfsd_shutdown_generic() --
>> > and through the latter, nfsd_file_cache_shutdown(), which runs
>> > rcu_barrier() and then destroys nfsd_file_slab, nfsd_file_mark_slab,
>> > the fsnotify groups, and the rhltable.
>> >
>> > Per-I/O references are not covered by the nfs_uuid handshake. Each
>> > pgio call takes its own nfsd_file ref plus a paired nfsd_net ref
>> > (fs/nfs/pagelist.c, nfs_local_open_fh), stores it in iocb->localio,
>> > and releases it at completion through nfsd_file_put_local(). An
>> > iocb is not on nfs_uuid->files, so nfs_localio_invalidate_clients()
>> > does not wait for it; only the drain wait does. Meanwhile
>> > __nfsd_file_cache_purge() has already unhashed the nfsd_file but
>> > cannot free it (the iocb ref keeps refcount elevated in
>> > nfsd_file_cond_queue()).
>> >
>> > So with one I/O in flight when the server is stopped: the shutdown
>> > thread parks at the drain wait; the I/O completion thread enters
>> > nfsd_file_put_local() and drops the last nfsd_net ref, which runs
>> > complete() before nfsd_file_put() has executed. The shutdown thread
>> > then proceeds through nfsd_file_cache_shutdown() concurrently with
>> > the final nfsd_file_free(): the call_rcu() is queued after the
>> > rcu_barrier(), so nfsd_file_slab_free() does kmem_cache_free() into
>> > a destroyed cache, and nfsd_file_mark_put() runs against a destroyed
>> > fsnotify group. kmem_cache_destroy() also fires "objects remaining"
>> > because the nfsd_file is still allocated.
>> >
>> > The old ordering was the mechanism that prevented this: the caller
>> > held its paired nfsd_net ref across nfsd_file_put(), and percpu_ref
>> > guarantees the release callback runs only after every ref is
>> > dropped, so global teardown strictly followed the file free and the
>> > rcu_barrier() flushed its call_rcu().
>> >
>> > The hazard the commit message cites for the reorder cannot occur on
>> > this path: NFSD_FILE_NET_HELD is set only in
>> > nfsd_file_dispose_list_delayed(), reachable only through
>> > refcount_dec_if_one() in nfsd_file_lru_cb(), i.e. at refcount == 1.
>> > A file with an outstanding LOCALIO reference has refcount >= 2, so
>> > a file whose final put arrives via nfsd_file_put_local() never has
>> > NET_HELD set and its nfsd_file_free() never calls put_net().
>> >
>> > Suggest keeping the void API but restoring the put order:
>> >
>> > nf = unrcu_pointer(xchg(pnf, NULL));
>> > if (nf) {
>> > struct net *net = nf->nf_net;
>> >
>> > nfsd_file_put(nf);
>> > rcu_read_lock();
>> > nfsd_net_put(net);
>> > rcu_read_unlock();
>> > }
>> >
>> > with the kdoc comment and the commit message paragraph about the
>> > old ordering being unsafe adjusted to match.
>>
>
>
> I had claude review this and it says:
>
> ● This is the same concern I just addressed for the previous patch's
> Finding 3, restated as a
> critical bug. The answer is the same: this is a false positive.
>
> The reviewer's scenario requires:
>
> 1. The global shrinker unhashes and isolates an nfsd_file onto a
> local dispose list
> 2. The net namespace teardown completes and struct net is freed
> 3. The shrinker resumes and calls get_net() on freed memory
That rebuttal answers a different finding. The three-step scenario it
refutes is the one sashiko raised against the get_net() placement in
nfsd_file_dispose_list_delayed(). The review quoted above is about
the put order in nfsd_file_put_local() and does not involve struct
net's refcount at any step. Two new issues appear with 8/9:
1. Put order in nfsd_file_put_local()
The question is narrow: after an I/O completion thread executes
nfsd_net_put() -- dropping the last nfsd_net_ref and running
complete(&nn->nfsd_net_free_done) -- what prevents nfsd_shutdown_net()
from continuing through percpu_ref_exit() and nfsd_shutdown_generic()
into nfsd_file_cache_shutdown() before that same thread executes the
nfsd_file_put() on the next line?
In the current code the answer is the ref itself: the caller holds it
across nfsd_file_put(), and percpu_ref runs the release callback only
after every ref drops, so global teardown strictly follows the file
free and the rcu_barrier() in nfsd_file_cache_shutdown() flushes the
call_rcu() that nfsd_file_free() queued. The patch removes that
ordering and installs nothing in its place.
The nfs_uuid handshake does not cover this path: each pgio holds its
own nfsd_file + nfsd_net ref pair (fs/nfs/pagelist.c, stored in
iocb->localio), released at I/O completion through
nfsd_file_put_local(), and an iocb is not on nfs_uuid->files. The
purge has already unhashed the file but cannot free it
(nfsd_file_cond_queue() sees the elevated refcount), so the
completion thread's put is the final one and its nfsd_file_free()
races kmem_cache_destroy(nfsd_file_slab).
The stale-net hazard the commit message cites cannot occur on this
path: NFSD_FILE_NET_HELD is set only via refcount_dec_if_one() in
nfsd_file_lru_cb(), i.e. at refcount == 1, and a file with an
outstanding LOCALIO reference has refcount >= 2. So the fix is to
keep the void API but put the file ref first, then the net ref.
2. get_net() placement in nfsd_file_dispose_list_delayed()
The rebuttal's struct-net argument does not hold for the files in
question: they are no longer on the LRU (nfsd_file_lru_cb() unhashed
and isolated them onto the worker's private list), and an nfsd_file
holds no net reference -- that absence is what this patch is fixing.
The commit message itself states the premise: the purge cannot see
these files and nf_net can dangle. With a second nfsd-serving netns
keeping nfsd_users > 0, nothing quiesces the shrinker or the
laundrette during per-net teardown (cancel_delayed_work_sync() and
shrinker_free() run only in global shutdown).
A worker preempted between isolating a file and calling
dispose_list_delayed() can resume after cleanup_net() has freed the
namespace, at which point get_net(nf->nf_net), net_generic(), and
the fcache_dispose_lock all touch freed memory. The get_net() sits
at the consuming end of the window it is meant to close. v1 took
the reference in nfsd_file_alloc(), the one place guaranteed to run
while the net is alive; the v2 plan of taking it at alloc time for
GC-capable files only would address Al's cacheline concern without
reopening the window.
As a process note, I'll note that I haven't found any non-pre-
existing issues with the other patches on this series.
--
Chuck Lever