Re: [PATCH] NFSv4: Fix state recovery deadlock when server misses grace period
From: Zhihao Cheng
Date: Wed Apr 22 2026 - 02:56:39 EST
在 2026/4/22 14:44, Zhihao Cheng 写道:
Add lilingfeng3@xxxxxxxxxx
NFS server restart causes client to enter an infinite loop during state
recovery. The state manager gets stuck in NFS4CLNT_RECLAIM_NOGRACE processing,
with the server repeatedly returning NFS4ERR_GRACE for each file iteration.
This problem is reported in [1].
Trigger sequence:
1. Client opens 2 files. After server reboot, client enters
nfs4_do_reclaim(RECLAIM_REBOOT). Server misses grace period and returns
NFS4ERR_NO_GRACE, causing client to set NFS4CLNT_RECLAIM_NOGRACE.
2. Client enters nfs4_do_reclaim(RECLAIM_NOGRACE) to recover first file.
Server reboots again, open request returns NFS4ERR_BADSESSION, client
sets NFS4CLNT_SESSION_RESET.
3. nfs4_reset_session calls nfs4_proc_create_session which fails with
ETIMEDOUT due to network¹ÊÕÏ, nfs4_handle_reclaim_lease_error sets
NFS4CLNT_LEASE_EXPIRED but does NOT set NFS4CLNT_RECLAIM_REBOOT.
4. When nfs4_reclaim_lease runs, because NFS4CLNT_RECLAIM_NOGRACE is already
set, it skips setting NFS4CLNT_RECLAIM_REBOOT (the bug, modified by
commit b42353ff8d346 ("NFSv4.1: Clean up nfs4_reclaim_lease")).
5. Server never receives RECLAIM_COMPLETE, so cl_flags lacks
NFSD4_CLIENT_RECLAIM_COMPLETE. When processing subsequent files,
server always returns nfserr_grace, causing infinite retry loop.
Fix it by setting NFS4CLNT_RECLAIM_REBOOT in nfs4_reclaim_lease if
NFS4CLNT_SERVER_SCOPE_MISMATCH is not set, so that the client sends
RECLAIM_COMPLETE to the server first, allowing subsequent nograce
recovery to proceed.
Fetch a reproducer in [2].
[1] https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@xxxxxxxxxx/
[2] https://bugzilla.kernel.org/show_bug.cgi?id=221399
Fixes: b42353ff8d346 ("NFSv4.1: Clean up nfs4_reclaim_lease")
Cc: stable@xxxxxxxxxxxxxxx
Reported-by: Li Lingfeng <lilingfeng3@xxxxxxxxxx>
Closes: https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@xxxxxxxxxx/
Signed-off-by: Zhihao Cheng <chengzhihao1@xxxxxxxxxx>
---
fs/nfs/nfs4state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 305a772e5497..817327e73d88 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2012,7 +2012,7 @@ static int nfs4_reclaim_lease(struct nfs_client *clp)
return nfs4_handle_reclaim_lease_error(clp, status);
if (test_and_clear_bit(NFS4CLNT_SERVER_SCOPE_MISMATCH, &clp->cl_state))
nfs4_state_start_reclaim_nograce(clp);
- if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
+ else
set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);