Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access

From: Rick Macklem

Date: Fri May 29 2026 - 19:20:55 EST


On Fri, May 29, 2026 at 9:22 AM Benjamin Coddington
<ben.coddington@xxxxxxxxxxxxxxx> wrote:
>
> On 29 May 2026, at 10:00, Trond Myklebust wrote:
> >
> > Hmmm... I'd argue that while recalling the delegation in this case is
> > mandatory for NFSv4.0, that is certainly not true for NFSv4.1.
> >
> > Furthermore, I'd argue that if the holder of a write delegation is just
> > changing the mode, then that should never result in a delegation recall
> > for a well written NFSv4.1 server. The reason is this does not impact
> > the client's ability to cache data, metadata or lock state. It only
> > impacts its ability to rely on previously cached access data when
> > handling new opens.
> >
> > One can argue whether or not it's needed for a uid or gid change by
> > said holder of the delegation, but there too I'd say the right
> > behaviour is to err on the side of not recalling.
> > The exception might be if this is an attribute delegation, and the
> > result will be that the cred attached to the delegation will no longer
> > be able to issue a SETATTR to update the atime/mtime on delegation
> > return.
> >
> > So yes to pre-emptive invalidation of the access cache, but I'm very
> > sceptical to actually pre-emptively returning the delegation or even
> > the layouts.
>
> The latency I was chasing comes from the server electing to recall on these
> SETATTRs.
>
> You're right that for v4.1 the mode change doesn't need to trigger a recall,
> and that the only client-side exposure is stale cached access on subsequent
> opens. It's already covered in the SETATTR reply path changing mode/uid/gid
> where it sets NFS_INO_INVALID_ACCESS in nfs_update_inode(), and a fresh open
> over the delegation still goes through nfs_may_open() -> nfs_do_access(), so
> it revalidates with the server rather than trusting the cached result.
>
> The server isn't recalling these for cache coherence. When the change
> removes the holder's write, a later SETATTR(size) under the delegation
> carries the delegation stateid, not an open stateid
> (nfs4_copy_delegation_stateid()), and the client can reopen O_WRONLY from
> the delegation with no OPEN on the wire — so the server can't tell whether
> the holder still has the file open for write, nor apply the usual
> "open-for-write overrides current mode" check for the truncate. The recall
> forces a DELEGRETURN + CLAIM_DELEGATE_CUR reclaim that re-establishes a real
> (or absent) open stateid, letting the server decide the truncate correctly.

I will note that, since your server allows WRITEs based on the
delegation stateid,
it could allow SETATTR(size) based on a delegation stateid as well.
(SETATTR(size)
is just a weird variant of WRITE. is it not.) Now, how to implement
this in the server
would be up to you guys.

I suppose the disadvantage of doing the DELEGRETURN pre-emptively will never
allow server implementations to figure out how to allow it without a
CB_RECALL, etc.

I do plan on patching the FreeBSD server to allow READ, WRITE and SETATTR(size)
based on a write delegation stateid (and READ based on a read
delegation stateid).
Of course it will take a while for that to find its way into the
world. The FreeBSD
server will be doing the CB_RECALL in the meantime.

Useful discussion. Thanks, rick

>
> For write-retaining SETATTRs none of that applies and the server shouldn't
> recall — which is where the latency I was chasing actually came from, and
> the better place to remove it.
>
> Thanks for the review.
>
> Ben
>