Re: [PATCH 03/10] nfsd: serialize nfsd4_end_grace() with atomic test-and-set

From: Chuck Lever

Date: Fri May 29 2026 - 12:29:30 EST


On 5/29/26 11:57 AM, Jeff Layton wrote:
> On Fri, 2026-05-29 at 11:38 -0400, Chuck Lever wrote:
>>
>> On Thu, May 28, 2026, at 5:55 PM, Jeff Layton wrote:
>>> From: Chris Mason <clm@xxxxxxxx>
>>>
>>> nfsd4_end_grace() guards its drain path with a plain bool:
>>>
>>> if (nn->grace_ended)
>>> return;
>>> nn->grace_ended = true;
>>>
>>> The read and the write are independent, and nothing in struct
>>> nfsd_net serializes them. At least two contexts can reach this
>>> code with no lock held:
>>>
>>> laundromat path
>>> laundry_wq kworker
>>> nfs4_laundromat()
>>> nfsd4_end_grace()
>>>
>>> RECLAIM_COMPLETE path
>>> nfsd compound kthread
>>> nfsd4_reclaim_complete()
>>> inc_reclaim_complete()
>>> nfsd4_end_grace()
>>>
>>> Both callers can observe grace_ended == false on different CPUs,
>>> both store true, and both proceed into nfsd4_record_grace_done(),
>>> which invokes the active client_tracking_ops->grace_done callback.
>>> For tracking ops that drain reclaim_str_hashtbl (legacy_tracking_ops
>>> via nfsd4_recdir_purge_old, and the cld v1+ ops via
>>> nfsd4_cld_grace_done), grace_done calls nfs4_release_reclaim(),
>>> which walks every bucket of reclaim_str_hashtbl with no lock and
>>> calls nfs4_remove_reclaim_record() (list_del + kfree) on each
>>> entry. Two concurrent walkers corrupt the list and double-free
>>> every nfs4_client_reclaim. A concurrent nfsd4_find_reclaim_client()
>>> iterating the same bucket reads through freed memory.
>>>
>>> A third call site exists in nfs4_state_start_net() on the
>>> skip_grace startup path, but it runs under nfsd_mutex before any
>>> client has connected and before the laundromat's first delayed
>>> work fires, so it cannot race with the two callers above.
>>>
>>> Fix by replacing the read/write pair with try_cmpxchg() so exactly
>>> one caller transitions grace_ended from false to true and proceeds
>>> into the drain; the loser returns immediately. bool supports
>>> 1-byte cmpxchg on all supported architectures, and no lock
>>> ordering changes are needed.
>>>
>>> Fixes: 362063a595be ("nfsd: keep a tally of RECLAIM_COMPLETE operations
>>> when using nfsdcld")
>>> Assisted-by: kres:claude-opus-4-7
>>> Signed-off-by: Chris Mason <clm@xxxxxxxx>
>>> ---
>>> fs/nfsd/nfs4state.c | 17 ++++++++++++++---
>>> 1 file changed, 14 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>> index f4d12dbcf97b..dc4ac541436f 100644
>>> --- a/fs/nfsd/nfs4state.c
>>> +++ b/fs/nfsd/nfs4state.c
>>> @@ -7022,12 +7022,23 @@ nfsd4_renew(struct svc_rqst *rqstp, struct
>>> nfsd4_compound_state *cstate,
>>> static void
>>> nfsd4_end_grace(struct nfsd_net *nn)
>>> {
>>> - /* do nothing if grace period already ended */
>>> - if (nn->grace_ended)
>>> + bool expected = false;
>>> +
>>> + /*
>>> + * nfsd4_end_grace() can be entered concurrently from the
>>> + * laundromat workqueue and from an nfsd compound thread
>>> + * handling RECLAIM_COMPLETE. Without serialization, both
>>> + * callers can observe grace_ended==false and proceed into
>>> + * nfsd4_record_grace_done(). For tracking ops whose
>>> + * grace_done drains reclaim_str_hashtbl, that results in
>>> + * list corruption and a double free of every
>>> + * nfs4_client_reclaim entry. Use an atomic test-and-set so
>>> + * exactly one caller proceeds.
>>> + */
>>> + if (!try_cmpxchg(&nn->grace_ended, &expected, true))
>>> return;
>>>
>>> trace_nfsd_grace_complete(nn);
>>> - nn->grace_ended = true;
>>> /*
>>> * If the server goes down again right now, an NFSv4
>>> * client will still be allowed to reclaim after it comes back up,
>>>
>>> --
>>> 2.54.0
>>
>> Seems like the usual idiom for something like this is an atomic
>> bit op. Perhaps try_cmpxchg on a boolean variable is not going
>> to behave as you expect on every hardware platform.
>
> We just need a single flag here though. try_cmpxchg() had better work
> the same way on every platform or a lot of stuff is FUBAR. Where
> wouldn't it?

Codex suggests on Hexagon, cmpxchg grabs more than just that boolean.


--
Chuck Lever