Question about "Not Applicable" status for [PATCH v2] NFSv4.1/pNFS: fix LAYOUTCOMMIT retry loop on OLD_STATEID
From: Lei Yin
Date: Mon Apr 27 2026 - 00:53:15 EST
Hi,
Sorry for the confusion in the previous submissions. Due to an editing
mistake, the first two versions of this patch were not sent as one
proper series.
My patch "[PATCH v2] NFSv4.1/pNFS: fix LAYOUTCOMMIT retry loop on
OLD_STATEID" was marked as Not Applicable. I would like to ask for
clarification on the reason.
This patch is intended to handle the case where LAYOUTCOMMIT gets
NFS4ERR_OLD_STATEID in nfs4_layoutcommit_done(). The change refreshes
data->args.stateid via nfs4_layout_refresh_old_stateid(), updates the
layout stateid in the inode layout header when appropriate, and restarts
the RPC only after the refresh succeeds.
The purpose is to avoid retrying LAYOUTCOMMIT indefinitely with the same
stale stateid after OLD_STATEID.
The issue was reproduced on NFSv4.2. The most reliable way I found to
reproduce it is:
1. Run a workload with relatively high concurrent I/O on the client.
2. Kill the client-side I/O process with kill -9 while those I/Os are still
in flight.
3. In that situation, there is roughly a 50% chance that a subsequent
LAYOUTCOMMIT is sent with an old stateid.
4. Since LAYOUTCOMMIT does not handle NFS4ERR_OLD_STATEID in this path, the
same stale stateid may continue to be retried.
5. This can lead to an infinite retry loop, and the affected file then
appears to become unresponsive.
Using kill without -9 makes this problem much harder to reproduce.
However, even without kill -9, the same issue can still occasionally be
observed under sufficient concurrency and stress testing.
So my understanding of the bug is:
- kill -9 makes the stale stateid window much easier to hit;
- ordinary concurrency/stress testing can still trigger it occasionally;
- because LAYOUTCOMMIT does not recover from OLD_STATEID here, the RPC
can loop indefinitely with the stale stateid;
- once this happens, operations on the corresponding file may stop
making progress.
Could you please let me know whether the Not Applicable status means:
1. an equivalent fix is already present in the target tree,
2. the patch was sent against the wrong tree or branch, or
3. there is some issue with the problem analysis or the proposed fix?
If needed, I can resend the patch against the appropriate branch or adjust
the description accordingly.
Thanks,
Lei Yin