Re: [PATCH] NFSv4: Fix state recovery deadlock when server misses grace period

From: Zhihao Cheng

Date: Tue May 05 2026 - 21:26:24 EST


在 2026/4/23 17:05, Zhihao Cheng 写道:
在 2026/4/22 20:38, Trond Myklebust 写道:
On Wed, 2026-04-22 at 14:55 +0800, Zhihao Cheng wrote:
在 2026/4/22 14:44, Zhihao Cheng 写道:
Add lilingfeng3@xxxxxxxxxx
NFS server restart causes client to enter an infinite loop during
state
recovery. The state manager gets stuck in NFS4CLNT_RECLAIM_NOGRACE
processing,
with the server repeatedly returning NFS4ERR_GRACE for each file
iteration.
This problem is reported in [1].

Trigger sequence:
   1. Client opens 2 files. After server reboot, client enters
      nfs4_do_reclaim(RECLAIM_REBOOT). Server misses grace period
and returns
      NFS4ERR_NO_GRACE, causing client to set
NFS4CLNT_RECLAIM_NOGRACE.
   2. Client enters nfs4_do_reclaim(RECLAIM_NOGRACE) to recover
first file.
      Server reboots again, open request returns NFS4ERR_BADSESSION,
client
      sets NFS4CLNT_SESSION_RESET.
   3. nfs4_reset_session calls nfs4_proc_create_session which fails
with
      ETIMEDOUT due to network¹ÊÕÏ, nfs4_handle_reclaim_lease_error
sets
      NFS4CLNT_LEASE_EXPIRED but does NOT set
NFS4CLNT_RECLAIM_REBOOT.
   4. When nfs4_reclaim_lease runs, because NFS4CLNT_RECLAIM_NOGRACE
is already
      set, it skips setting NFS4CLNT_RECLAIM_REBOOT (the bug,
modified by
      commit b42353ff8d346 ("NFSv4.1: Clean up
nfs4_reclaim_lease")).
   5. Server never receives RECLAIM_COMPLETE, so cl_flags lacks
      NFSD4_CLIENT_RECLAIM_COMPLETE. When processing subsequent
files,
      server always returns nfserr_grace, causing infinite retry
loop.

Fix it by setting NFS4CLNT_RECLAIM_REBOOT in nfs4_reclaim_lease if
NFS4CLNT_SERVER_SCOPE_MISMATCH is not set, so that the client sends
RECLAIM_COMPLETE to the server first, allowing subsequent nograce
recovery to proceed.

Fetch a reproducer in [2].

[1]
https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@xxxxxxxxxx/
[2] https://bugzilla.kernel.org/show_bug.cgi?id=221399

Fixes: b42353ff8d346 ("NFSv4.1: Clean up nfs4_reclaim_lease")
Cc: stable@xxxxxxxxxxxxxxx
Reported-by: Li Lingfeng <lilingfeng3@xxxxxxxxxx>
Closes:
https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@xxxxxxxxxx/
Signed-off-by: Zhihao Cheng <chengzhihao1@xxxxxxxxxx>
---
   fs/nfs/nfs4state.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 305a772e5497..817327e73d88 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2012,7 +2012,7 @@ static int nfs4_reclaim_lease(struct
nfs_client *clp)
           return nfs4_handle_reclaim_lease_error(clp,
status);
       if (test_and_clear_bit(NFS4CLNT_SERVER_SCOPE_MISMATCH,
&clp->cl_state))
           nfs4_state_start_reclaim_nograce(clp);
-    if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
+    else
           set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
       clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
       clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);


This will cause the client to try to do reboot recovery in a situation
where it isn't allowed to do so by the spec. We should never be setting
NFS4CLNT_RECLAIM_REBOOT if NFS4CLNT_RECLAIM_NOGRACE is already set.
Hi Trond, the client can only do reclaim reboot during the grace period(after server reboot), according to [1], any reclaim type messages should be rejected by server for NOGRACE state, am I understanding right?
[1] https://datatracker.ietf.org/doc/html/rfc5661#section-8.4.2.1

One solution would be to just immediately call
nfs4_state_end_reclaim_reboot() if NFS4CLNT_RECLAIM_NOGRACE is set.

friendly ping
I tried with your suggestion, and the problem still happens:
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 5044bb4c870f..5b6cb3f7ff2e 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2014,6 +2014,8 @@ static int nfs4_reclaim_lease(struct nfs_client *clp)
                nfs4_state_start_reclaim_nograce(clp);
        if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
                set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
+       else
+               nfs4_state_end_reclaim_reboot(clp);
        clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
        clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
        return 0;

The nfs4_state_end_reclaim_reboot sends RECLAIM_COMPLETE only if the clp has NFS4CLNT_RECLAIM_REBOOT flag, but it does not.

The root cause is that the client has identified an incorrect server status after the server's second reboot(since step 2). I think the client should do reclaim reboot every time the server restarts, so we should let client know the server reboots again, can we take following methods?
1. Clear NFS4CLNT_RECLAIM_NOGRACE flag if client knows that the server restart while doing nfs4_do_reclaim(NFS4CLNT_RECLAIM_NOGRACE). Client could identify the server state by NFS4ERR_BADSESSION retcode:
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 5044bb4c870f..3f73dc3919a2 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1875,6 +1875,9 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov
                         clp->cl_hostname, lost_locks);
                 set_bit(ops->owner_flag_bit, &sp->so_flags);
                 nfs4_put_state_owner(sp);
+                if (ops == clp->cl_mvops->nograce_recovery_ops &&
+                    status == -NFS4ERR_BADSESSION)
+                    clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state);
                 status = nfs4_recovery_handle_error(clp, status);
                 nfs4_free_state_owners(&freeme);
                 return (status != 0) ? status : -EAGAIN;

2. Client knows the server restarts by error code NFS4ERR_STALE_CLIENTID(nfs4_proc_create_session -> nfs4_handle_reclaim_lease_error), but nfs4_reset_session won't retry if the error code is ETIMEDOUT(eg. network failure in step 3), we should let client retry nfs4_reset_session if network failure.
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 5044bb4c870f..62fd57e602d6 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2459,6 +2459,8 @@ static int nfs4_reset_session(struct nfs_client *clp)
     if (status) {
         dprintk("%s: session reset failed with status %d for server %s!\n",
             __func__, status, clp->cl_hostname);
+        if (status == -ETIMEDOUT)
+            goto out;
         status = nfs4_handle_reclaim_lease_error(clp, status);
         goto out;
     }
@@ -2552,6 +2554,8 @@ static void nfs4_state_manager(struct nfs_client *clp)
         if (test_and_clear_bit(NFS4CLNT_SESSION_RESET, &clp->cl_state)) {
             section = "reset session";
             status = nfs4_reset_session(clp);
+            if (status == -ETIMEDOUT)
+                continue;
             if (test_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state))
                 continue;
             if (status < 0)
3. Conditionaly set NFS4CLNT_RECLAIM_REBOOT in nfs4_reclaim_lease, if the client has ever been recovered failed by nfs4_do_reclaim(NFS4CLNT_RECLAIM_NOGRACE)->BAD_SESSION, mark clp with NFS4CLNT_RECLAIM_REBOOT.
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 5044bb4c870f..82eb650cc135 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1875,6 +1875,9 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov
                         clp->cl_hostname, lost_locks);
                 set_bit(ops->owner_flag_bit, &sp->so_flags);
                 nfs4_put_state_owner(sp);
+                if (ops == clp->cl_mvops->nograce_recovery_ops &&
+                    status == -NFS4ERR_BADSESSION)
+                    __set_bit(NFS_CS_NOGRACE_REBOOT, &clp->cl_flags);
                 status = nfs4_recovery_handle_error(clp, status);
                 nfs4_free_state_owners(&freeme);
                 return (status != 0) ? status : -EAGAIN;
@@ -2012,7 +2015,7 @@ static int nfs4_reclaim_lease(struct nfs_client *clp)
         return nfs4_handle_reclaim_lease_error(clp, status);
     if (test_and_clear_bit(NFS4CLNT_SERVER_SCOPE_MISMATCH, &clp->cl_state))
         nfs4_state_start_reclaim_nograce(clp);
-    if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
+    if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state) || test_bit(NFS_CS_NOGRACE_REBOOT, &clp->cl_flags))
         set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
     clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
     clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
@@ -2624,6 +2627,7 @@ static void nfs4_state_manager(struct nfs_client *clp)
                 continue;
             if (status < 0)
                 goto out_error;
+            clear_bit(NFS_CS_NOGRACE_REBOOT, &clp->cl_flags);
             clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state);
         }

diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 4daee27fa5eb..8d36a727513c 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -51,6 +51,7 @@ struct nfs_client {
 #define NFS_CS_REUSEPORT    8        /* - reuse src port on reconnect */
 #define NFS_CS_PNFS        9        /* - Server used for pnfs */
 #define NFS_CS_NETUNREACH_FATAL    10        /* - ENETUNREACH errors are fatal */
+#define NFS_CS_NOGRACE_REBOOT    11        /* - Server reboot during nograce recovery */
     struct sockaddr_storage    cl_addr;    /* server identifier */
     size_t            cl_addrlen;
     char *            cl_hostname;    /* hostname of server */

.