Re: [PATCH] NFS: Retry the CLOSE if the embedded GETATTR is rejected with ERR_STALE

From: Anchal Agarwal
Date: Thu Nov 19 2020 - 14:25:07 EST


On Wed, Nov 18, 2020 at 10:13:16PM +0000, Trond Myklebust wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Wed, 2020-11-18 at 21:29 +0000, Anchal Agarwal wrote:
> > On Wed, Nov 18, 2020 at 03:17:20AM +0000, Trond Myklebust wrote:
> > > CAUTION: This email originated from outside of the organization. Do
> > > not click links or open attachments unless you can confirm the
> > > sender and know the content is safe.
> > >
> > >
> > >
> > > On Wed, 2020-11-18 at 00:24 +0000, Anchal Agarwal wrote:
> > > > If our CLOSE RPC call is rejected with an ERR_STALE error, then
> > > > we
> > > > should remove the GETATTR call from the compound RPC and retry.
> > > > This could happen in a scenario where two clients tries to access
> > > > the same file. One client opens the file and the other client
> > > > removes
> > > > the file while it's opened by first client. When the first client
> > > > attempts to close the file the server returns ESTALE and the file
> > > > ends
> > > > up being leaked on the server. This depends on how nfs server is
> > > > configured and is not reproducible if running against nfsd.
> > >
> > > That would be a seriously broken server. If you return
> > > NFS4ERR_STALE to
> > > the client, you cannot expect any further interaction with that
> > > file
> > > from the client. It won't try to send CLOSE or DELEGRETURN or any
> > > other
> > > stateful operation.
> > >
> > In this scenario, the setup we have at EFS is more of a distributed
> > fashion. Multiple
> > clients are connected to multiple servers with a common filesystem.
> > So the above
> > scenario leads to leaked open file handles on the client that tries
> > to close deleted
> > file. So I was of the view, in that case client could retry close
> > without getattr
> > in the close sequence without anything to do on server side.
>
>
> If you send the client an NFS4ERR_STALE, you are telling it that its
> access to the file has been revoked. That is not a temporary error, it
> is a fatal one. The client is not responsible for cleaning up any
> state.
>
Ok, I get what you are saying. So from what I am understanding this is not
a valid error to be sent to client on close call and its the server who is doing
something fatally wrong and should be cleaning up its own state or basically not
be allowing to let this scenario happen.
Thanks for bearing with me.

--
Anchal
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx
>
>