Re: [PATCH] NFSv4: clear exception state on successful mkdir retry
From: Thorsten Leemhuis
Date: Wed Jun 10 2026 - 12:19:17 EST
On 6/10/26 16:28, Anna Schumaker wrote:
> On Tue, Jun 9, 2026, at 6:05 AM, Thorsten Leemhuis wrote:
>> On 5/13/26 09:18, Thorsten Leemhuis wrote:
>>> [top-posting to facilitate processing]
>>>
>>> @NFSv4 maintainers, just wondering, did this patch maybe fall
>>> through the cracks? It fixes a regression, that's why it's on my
>>> radar. Or was there some progress and I missed it?
>
> The patch is in my linux-next branch here: https://git.linux-
> nfs.org/?p=anna/linux-
> nfs.git;a=commit;h=238e9b51aa29f48b6243212a3b75c8e48d6b96fd
>
> It'll be included when the merge window opens this weekend.
Great, thx Anna. Was a bit confused why I could not see in -next 90
minutes ago (that where I checked yesterday before prodding, too), but
it turned up there in the new -next release that was published a few
minutes ago. :-D
Ciao, Thorsten
>> Still no progress afaics. Feels like I'm missing something obvious
>> or like I'm totally of track.
>>
>> Igor, Neil, is that the case? Or are you also waiting for the fix
>> to make progress?
>>
>> Ciao, Thorsten
>>
>>> On 4/29/26 12:49, Igor Raits wrote:
>>>> After a server returns NFS4ERR_DELAY for an NFSv4 CREATE
>>>> issued by mkdir(2), the client correctly waits and retries.
>>>> When the retry succeeds, however, mkdir(2) can still surface -
>>>> EEXIST to userspace even though the directory was just created
>>>> on the server.
>>>>
>>>> Reproducer (random 16-hex names so collisions are not the
>>>> cause) against an in-kernel Linux nfsd; reproduces under both
>>>> NFSv4.0 and NFSv4.2:
>>>>
>>>> N=2000000; base=/var/gdc/export for ((i=1; i<=N; i++)); do
>>>> d=$base/$(openssl rand -hex 8) mkdir "$d" 2>/dev/null || echo
>>>> "$(date +%T) failed loop=$i $d" rmdir "$d" 2>/dev/null done
>>>>
>>>> Failures cluster at the cadence at which the server-side auth/
>>>> export cache refresh path causes nfsd to return NFS4ERR_DELAY
>>>> for CREATE.
>>>>
>>>> A wire trace of one failure (the three CREATE RPCs all come
>>>> from a single mkdir(2), generated by the do-while in
>>>> nfs4_proc_mkdir()):
>>>>
>>>> client -> server CREATE name=... -> NFS4ERR_DELAY ~100 ms
>>>> later client -> server CREATE name=... -> NFS4_OK
>>>> (dir created) ~80 us later client -> server CREATE name=... -
>>>> > NFS4ERR_EXIST (correct)
>>>>
>>>> Since commit dd862da61e91 ("nfs: fix incorrect handling of
>>>> large-number NFS errors in nfs4_do_mkdir()"),
>>>> nfs4_handle_exception() is called only when _nfs4_proc_mkdir()
>>>> returned an error. That gate breaks retry-state hygiene:
>>>> nfs4_do_handle_exception() resets exception.{delay,recovering,
>>>> retry} to 0 on entry, so calling it on success is what
>>>> previously cleared the retry flag set by the preceding
>>>> NFS4ERR_DELAY iteration. With the gate in place,
>>>> exception.retry stays at 1 after the successful retry, the
>>>> loop runs once more, and the resulting CREATE for an already-
>>>> created name yields NFS4ERR_EXIST -> -EEXIST to userspace.
>>>>
>>>> Drop the conditional and call nfs4_handle_exception()
>>>> unconditionally, matching every other do-while in fs/nfs/
>>>> nfs4proc.c (nfs4_proc_symlink(), nfs4_proc_link(), etc.). The
>>>> dentry/status separation introduced by that commit is
>>>> preserved.
>>>>
>>>> Fixes: dd862da61e91 ("nfs: fix incorrect handling of large-
>>>> number NFS errors in nfs4_do_mkdir()") Reported-and-tested-by:
>>>> Jan Čípa <jan.cipa@xxxxxxxxxxxx> Closes: https://
>>>> lore.kernel.org/linux-nfs/
>>>> CA+9S74hSp_tJu2Ffe2BPNC2T25gfkhgjjDkdgSsF5c2rnJq_wA@xxxxxxxxxxxxxx/
>>>> Reviewed-by: NeilBrown <neil@xxxxxxxxxx> Cc:
>>>> stable@xxxxxxxxxxxxxxx Signed-off-by: Igor Raits
>>>> <igor.raits@xxxxxxxxx> --- fs/nfs/nfs4proc.c | 5 ++--- 1 file
>>>> changed, 2 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index
>>>> a0885ae55abc..ffd14141ea1d 100644 --- a/fs/nfs/nfs4proc.c +++
>>>> b/fs/nfs/nfs4proc.c @@ -5393,10 +5393,9 @@ static struct
>>>> dentry *nfs4_proc_mkdir(struct inode *dir, struct dentry
>>>> *dentry, do { alias = _nfs4_proc_mkdir(dir, dentry, sattr,
>>>> label, &err); trace_nfs4_mkdir(dir, &dentry->d_name, err); +
>>>> err = nfs4_handle_exception(NFS_SERVER(dir), err, &exception);
>>>> if (err) - alias =
>>>> ERR_PTR(nfs4_handle_exception(NFS_SERVER(dir), -
>>>> err, - &exception)); + alias = ERR_PTR(err); }
>>>> while (exception.retry); nfs4_label_release_security(label);
>>>>
>>>