On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:I found that a server-side bug could also cause such behavior, and I've
Recently, we meet a NFS problem in 5.10. There are so manyYou have the following options:
test_state_id request after a non-privilaged request in tcpdump
result. There are 40w+ delegations in client (I read the delegation
list from /proc/kcore).
Firstly, I think state manager cost a lot in
nfs_server_reap_expired_delegations. But I see they are all in
NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I
read this from /proc/kcore too).
I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure
meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and
never return it again. NFS server will keep the revoked delegation in
clp->cl_revoked forever. This will result in following sequence
response with RECALLABLE_STATE_REVOKED flag. Client will send
test_state_id request for all non-revoked delegation.
This can only be solved by restarting NFS server.
I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not
the only case that cause lots of non-terminable test_state_id
requests after any non-privilaged request.
Wish NFS experts give some advices on this problem.
1. Don't ever use "soft" or "softerr" on the NFS client.
2. Reboot your server every now and again.
3. Change the server code to not bother caching revoked state. Doing
so is rather pointless, since there is nothing a client can do
differently when presented with NFS4ERR_DELEG_REVOKED vs.
NFS4ERR_BAD_STATEID.
4. Change the server code to garbage collect revoked stateids after
a while.