Re: [Question]nfs: should nfs timeout even with NFS_CS_NO_RETRANS_TIMEOUT ?
From: zhangjian (CG)
Date: Thu Mar 12 2026 - 23:22:22 EST
On 3/12/2026 9:09 PM, Trond Myklebust wrote:
> On Thu, 2026-03-12 at 12:19 +0800, zhangjian (CG) wrote:
>>
>>
>> On 3/6/2026 12:49 PM, Trond Myklebust wrote:
>>> On Fri, 2026-03-06 at 10:46 +0800, zhangjian (CG) wrote:
>>>> Hi experts on NFS:
>>>>
>>>> Recently we meet an error:
>>>> 1.Nfs wait for sunrpc
>>>> 2.Sunrpc send OPEN message and hang the rpc task onto sunrpc
>>>> pending
>>>> queue.
>>>> 3.Server never reply, and since NFS_CS_NO_RETRANS_TIMEOUT is
>>>> forced
>>>> and
>>>> connection is ESTABLISHED, task will never be retransmitted.
>>>> This cause procedures waiting on this file hang forever.
>>>> I know using "umount -f " to kill rpc task works. And the key to
>>>> the
>>>> problem most likely lies in the network layer. But should nfs
>>>> retransmit
>>>> it after waiting for so long?
>>>>
>>>> Wish for reply. Thanks
>>>>
>>>> Zhangjian
>>>>
>>> Please read the NFSv4 spec. It very clearly states that the client
>>> should never retransmit unless the connection breaks.
>>>
>>
>> NFSv4 spec said client should never retransmit, but not said client
>> need
>> to wait forever. Maybe sunrpc should tell nfs -ETIMEOUT and nfs
>> return
>> ERROR rather than retransmit.
>
> You are 100% free to use the existing 'soft' or 'softerr' mount options
> if you have applications that can parse those (non-POSIX) errors.
I have already mounted with soft,retrans,timeo options. The connection
is in established state. But since NFS_CS_NO_RETRANS_TIMEOUT is set. The
OPEN rpctask will not return -ETIMEOUT. Any operation waiting for the
seqid will hang. The soft don't works when connection is good.
> Note however that there is no way to tell the server that you are
> 'cancelling' an RPC call, so it will hold onto that slot until it is
> done executing the call (see RFC8881, Section 2.10.6.1.). So you are
> eventually going to run out of usable slots, and the system will gum up
> anyway.
Maybe client hanging for so long is more serious than running out of
client slot. Even auto-reconnecting is better than this.
>
> The default mount option is 'hard', because those are the only
> semantics that are compatible with POSIX and NFSv4.x.
>