Re: [PATCH 04/10] nfsd: dedup nfs4_client_to_reclaim inserts
From: Chuck Lever
Date: Fri May 29 2026 - 12:23:33 EST
On Thu, May 28, 2026, at 5:55 PM, Jeff Layton wrote:
> From: Chris Mason <clm@xxxxxxxx>
>
> nfs4_client_to_reclaim() unconditionally allocates a new
> nfs4_client_reclaim, prepends it to reclaim_str_hashtbl[], and bumps
> reclaim_str_hashtbl_size with no check for an existing entry for the
> same client name. After a reboot with a populated recovery directory
> that inflates the counter by one for every client that reclaims:
>
> boot: load_recdir()
> nfs4_client_to_reclaim(name) /* entry #1, size++ */
>
> grace: RECLAIM_COMPLETE
> __nfsd4_create_reclaim_record_grace()
> nfs4_client_to_reclaim(name) /* entry #2, size++ */
>
> inc_reclaim_complete() ends the grace period early only when
>
> atomic_inc_return(&nn->nr_reclaim_complete) ==
> nn->reclaim_str_hashtbl_size
>
> With reclaim_str_hashtbl_size at 2N and nr_reclaim_complete capped at
> N, the equality never holds and the fast end-of-grace path is dead.
> The grace period always runs out the full 90-second laundromat timer,
> and the shadow entry left in the hash table carries a dangling cr_clp
> for any reader that walks it.
>
> Fix nfs4_client_to_reclaim() to compute strhashval first, look the
> name up with nfsd4_find_reclaim_client(), and on a hit fold the new
> princhash into the existing record (if it lacks one) and return that
> record without allocating or touching reclaim_str_hashtbl_size. On
> kmemdup() failure during the fold-in, return NULL so
> __cld_pipe_inprogress_downcall() surfaces -EFAULT to nfsdcld, matching
> the miss-path contract.
>
> Because the fold-in writes cr_princhash.data and cr_princhash.len on
> a record that is already linked into reclaim_str_hashtbl[], pair the
> two stores with smp_store_release() on .len after WRITE_ONCE() on
> .data, and have nfsd4_cld_check_v2() read .len with smp_load_acquire()
> before READ_ONCE() on .data, so a concurrent principal-hash check
> cannot observe a torn (data, len) pair.
>
> Fixes: 362063a595be ("nfsd: keep a tally of RECLAIM_COMPLETE operations
> when using nfsdcld")
> Assisted-by: kres:claude-opus-4-7
> Signed-off-by: Chris Mason <clm@xxxxxxxx>
The motivating example in the commit message is built on the legacy
recovery-directory path, but the bug this patch fixes is nfsdcld-only,
which the Fixes: tag already reflects.
Could you recast the example around the path the patch actually
repairs -- a boot-time cld enumeration record followed by an in-grace
RECLAIM_COMPLETE downcall for the same cl_name -- where the records
are keyed consistently by cl_name and the dedup works? The code
change itself looks correct.
--
Chuck Lever