Re: [Question]nfs: should nfs timeout even with NFS_CS_NO_RETRANS_TIMEOUT ?

From: Trond Myklebust

Date: Thu Mar 12 2026 - 09:09:41 EST


On Thu, 2026-03-12 at 12:19 +0800, zhangjian (CG) wrote:
>
>
> On 3/6/2026 12:49 PM, Trond Myklebust wrote:
> > On Fri, 2026-03-06 at 10:46 +0800, zhangjian (CG) wrote:
> > > Hi experts on NFS:
> > >
> > > Recently we meet an error:
> > > 1.Nfs wait for sunrpc
> > > 2.Sunrpc send OPEN message and hang the rpc task onto sunrpc
> > > pending
> > > queue.
> > > 3.Server never reply, and since NFS_CS_NO_RETRANS_TIMEOUT is
> > > forced
> > > and
> > > connection is ESTABLISHED, task will never be retransmitted.
> > > This cause procedures waiting on this file hang forever.
> > > I know using "umount -f " to kill rpc task works. And the key to
> > > the
> > > problem most likely lies in the network layer. But should nfs
> > > retransmit
> > > it after waiting for so long?
> > >
> > > Wish for reply. Thanks
> > >
> > > Zhangjian
> > >
> > Please read the NFSv4 spec. It very clearly states that the client
> > should never retransmit unless the connection breaks.
> >
>
> NFSv4 spec said client should never retransmit, but not said client
> need
> to wait forever. Maybe sunrpc should tell nfs -ETIMEOUT and nfs
> return
> ERROR rather than retransmit.

You are 100% free to use the existing 'soft' or 'softerr' mount options
if you have applications that can parse those (non-POSIX) errors.
Note however that there is no way to tell the server that you are
'cancelling' an RPC call, so it will hold onto that slot until it is
done executing the call (see RFC8881, Section 2.10.6.1.). So you are
eventually going to run out of usable slots, and the system will gum up
anyway.

The default mount option is 'hard', because those are the only
semantics that are compatible with POSIX and NFSv4.x.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx