Re: [Question]nfs: should nfs timeout even with NFS_CS_NO_RETRANS_TIMEOUT ?
From: Trond Myklebust
Date: Fri Mar 13 2026 - 11:23:03 EST
On Fri, 2026-03-13 at 11:22 +0800, zhangjian (CG) wrote:
>
> On 3/12/2026 9:09 PM, Trond Myklebust wrote:
> > On Thu, 2026-03-12 at 12:19 +0800, zhangjian (CG) wrote:
> > >
> > >
> > > On 3/6/2026 12:49 PM, Trond Myklebust wrote:
> > > > On Fri, 2026-03-06 at 10:46 +0800, zhangjian (CG) wrote:
> > > > > Hi experts on NFS:
> > > > >
> > > > > Recently we meet an error:
> > > > > 1.Nfs wait for sunrpc
> > > > > 2.Sunrpc send OPEN message and hang the rpc task onto sunrpc
> > > > > pending
> > > > > queue.
> > > > > 3.Server never reply, and since NFS_CS_NO_RETRANS_TIMEOUT is
> > > > > forced
> > > > > and
> > > > > connection is ESTABLISHED, task will never be retransmitted.
> > > > > This cause procedures waiting on this file hang forever.
> > > > > I know using "umount -f " to kill rpc task works. And the key
> > > > > to
> > > > > the
> > > > > problem most likely lies in the network layer. But should nfs
> > > > > retransmit
> > > > > it after waiting for so long?
> > > > >
> > > > > Wish for reply. Thanks
> > > > >
> > > > > Zhangjian
> > > > >
> > > > Please read the NFSv4 spec. It very clearly states that the
> > > > client
> > > > should never retransmit unless the connection breaks.
> > > >
> > >
> > > NFSv4 spec said client should never retransmit, but not said
> > > client
> > > need
> > > to wait forever. Maybe sunrpc should tell nfs -ETIMEOUT and nfs
> > > return
> > > ERROR rather than retransmit.
> >
> > You are 100% free to use the existing 'soft' or 'softerr' mount
> > options
> > if you have applications that can parse those (non-POSIX) errors.
>
> I have already mounted with soft,retrans,timeo options. The
> connection
> is in established state. But since NFS_CS_NO_RETRANS_TIMEOUT is set.
> The
> OPEN rpctask will not return -ETIMEOUT. Any operation waiting for the
> seqid will hang. The soft don't works when connection is good.
>
> > Note however that there is no way to tell the server that you are
> > 'cancelling' an RPC call, so it will hold onto that slot until it
> > is
> > done executing the call (see RFC8881, Section 2.10.6.1.). So you
> > are
> > eventually going to run out of usable slots, and the system will
> > gum up
> > anyway.
>
> Maybe client hanging for so long is more serious than running out of
> client slot. Even auto-reconnecting is better than this.
We do not ever "fix" broken servers by hacking the client.
I suggest that either you fix your server, or that you replace it with
one that isn't broken.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx