[PATCH 03/10] nfsd: serialize nfsd4_end_grace() with atomic test-and-set
From: Jeff Layton
Date: Thu May 28 2026 - 18:01:02 EST
From: Chris Mason <clm@xxxxxxxx>
nfsd4_end_grace() guards its drain path with a plain bool:
if (nn->grace_ended)
return;
nn->grace_ended = true;
The read and the write are independent, and nothing in struct
nfsd_net serializes them. At least two contexts can reach this
code with no lock held:
laundromat path
laundry_wq kworker
nfs4_laundromat()
nfsd4_end_grace()
RECLAIM_COMPLETE path
nfsd compound kthread
nfsd4_reclaim_complete()
inc_reclaim_complete()
nfsd4_end_grace()
Both callers can observe grace_ended == false on different CPUs,
both store true, and both proceed into nfsd4_record_grace_done(),
which invokes the active client_tracking_ops->grace_done callback.
For tracking ops that drain reclaim_str_hashtbl (legacy_tracking_ops
via nfsd4_recdir_purge_old, and the cld v1+ ops via
nfsd4_cld_grace_done), grace_done calls nfs4_release_reclaim(),
which walks every bucket of reclaim_str_hashtbl with no lock and
calls nfs4_remove_reclaim_record() (list_del + kfree) on each
entry. Two concurrent walkers corrupt the list and double-free
every nfs4_client_reclaim. A concurrent nfsd4_find_reclaim_client()
iterating the same bucket reads through freed memory.
A third call site exists in nfs4_state_start_net() on the
skip_grace startup path, but it runs under nfsd_mutex before any
client has connected and before the laundromat's first delayed
work fires, so it cannot race with the two callers above.
Fix by replacing the read/write pair with try_cmpxchg() so exactly
one caller transitions grace_ended from false to true and proceeds
into the drain; the loser returns immediately. bool supports
1-byte cmpxchg on all supported architectures, and no lock
ordering changes are needed.
Fixes: 362063a595be ("nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld")
Assisted-by: kres:claude-opus-4-7
Signed-off-by: Chris Mason <clm@xxxxxxxx>
---
fs/nfsd/nfs4state.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index f4d12dbcf97b..dc4ac541436f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7022,12 +7022,23 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
static void
nfsd4_end_grace(struct nfsd_net *nn)
{
- /* do nothing if grace period already ended */
- if (nn->grace_ended)
+ bool expected = false;
+
+ /*
+ * nfsd4_end_grace() can be entered concurrently from the
+ * laundromat workqueue and from an nfsd compound thread
+ * handling RECLAIM_COMPLETE. Without serialization, both
+ * callers can observe grace_ended==false and proceed into
+ * nfsd4_record_grace_done(). For tracking ops whose
+ * grace_done drains reclaim_str_hashtbl, that results in
+ * list corruption and a double free of every
+ * nfs4_client_reclaim entry. Use an atomic test-and-set so
+ * exactly one caller proceeds.
+ */
+ if (!try_cmpxchg(&nn->grace_ended, &expected, true))
return;
trace_nfsd_grace_complete(nn);
- nn->grace_ended = true;
/*
* If the server goes down again right now, an NFSv4
* client will still be allowed to reclaim after it comes back up,
--
2.54.0