Re: [PATCH] nfsd: decrease sc_count directly if fail to queue dl_recall

From: Jeff Layton
Date: Thu Apr 10 2025 - 06:23:32 EST


On Thu, 2025-04-10 at 09:57 +0800, Li Lingfeng wrote:
> A deadlock warning occurred when invoking nfs4_put_stid following a failed
> dl_recall queue operation:
> T1 T2
> nfs4_laundromat
> nfs4_get_client_reaplist
> nfs4_anylock_blockers
> __break_lease
> spin_lock // ctx->flc_lock
> spin_lock // clp->cl_lock
> nfs4_lockowner_has_blockers
> locks_owner_has_blockers
> spin_lock // flctx->flc_lock
> nfsd_break_deleg_cb
> nfsd_break_one_deleg
> nfs4_put_stid
> refcount_dec_and_lock
> spin_lock // clp->cl_lock
>

Was this warning generated via static analysis? Given that the refcount
shouldn't go to 0 with the nfs4_put_stid() call in this function, I'm
wondering how the deadlock could occur?

> When a file is opened, an nfs4_delegation is allocated with sc_count
> initialized to 1, and the file_lease holds a reference to the delegation.
> The file_lease is then associated with the file through kernel_setlease.
>
> The disassociation is performed in nfsd4_delegreturn via the following
> call chain:
> nfsd4_delegreturn --> destroy_delegation --> destroy_unhashed_deleg -->
> nfs4_unlock_deleg_lease --> kernel_setlease --> generic_delete_lease
> The corresponding sc_count reference will be released after this
> disassociation.
>
> Since nfsd_break_one_deleg executes while holding the flc_lock, the
> disassociation process becomes blocked when attempting to acquire flc_lock
> in generic_delete_lease. This means:
> 1) sc_count in nfsd_break_one_deleg will not be decremented to 0;
> 2) The nfs4_put_stid called by nfsd_break_one_deleg will not attempt to
> acquire cl_lock;
> 3) Consequently, no deadlock condition is created.
>
> Given that sc_count in nfsd_break_one_deleg remains non-zero, we can
> safely perform refcount_dec on sc_count directly. This approach
> effectively avoids triggering deadlock warnings.
>
> Fixes: 230ca758453c ("nfsd: put dl_stid if fail to queue dl_recall")
> Signed-off-by: Li Lingfeng <lilingfeng3@xxxxxxxxxx>
> ---
> fs/nfsd/nfs4state.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 2041268b398a..59a693f22452 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -5430,7 +5430,7 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp)
> queued = nfsd4_run_cb(&dp->dl_recall);
> WARN_ON_ONCE(!queued);
> if (!queued)
> - nfs4_put_stid(&dp->dl_stid);
> + refcount_dec(&dp->dl_stid.sc_count);
> }
>
> /* Called from break_lease() with flc_lock held. */

Ok. I think you're right that it's safe to just decrement it since we
are able to do an unconditional increment just above it. It would be
nice to update the comment above this with an explanation so we're not
scratching our heads over this in a couple of years.

Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>