Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access
From: Rick Macklem
Date: Fri May 29 2026 - 12:21:20 EST
On Fri, May 29, 2026 at 7:06 AM Trond Myklebust <trondmy@xxxxxxxxxx> wrote:
>
> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If you are unsure, forward the message to ITHelp@xxxxxxxxxxx for review.
>
>
> On Thu, 2026-05-28 at 15:22 -0400, Benjamin Coddington wrote:
> > A client holding an OPEN_DELEGATE_WRITE delegation can satisfy a
> > later
> > open(O_WRONLY) from the cached delegation (can_open_delegated())
> > without
> > sending an OPEN to the server. That cached "open for write" assertion
> > is
> > only valid while the delegation holder still has write access. A
> > SETATTR
> > that changes mode, owner, or group can revoke that access -- after
> > which an
> > open served from the delegation would bypass an access check the
> > server
> > would now fail, and, against a server that recalls the delegation on
> > such a
> > change, the SETATTR draws a CB_RECALL/NFS4ERR_DELAY/DELEGRETURN/retry
> > round
> > trip.
> >
> > Before issuing such a SETATTR, check whether the proposed
> > mode/owner/group
> > would remove write access for the delegation's owning credential,
> > judged by
> > the resulting POSIX mode bits. If so, return the delegation first:
> > the
> > return is synchronous and flushes modified data, so the SETATTR
> > proceeds on
> > an open or special stateid and the next open revalidates access with
> > the
> > server. Permission changes that keep the holder's write access leave
> > the
> > delegation in place.
> >
> > Only the mode bits and the holder's fsuid/fsgid are consulted. An
> > NFSv4 ACL
> > cannot be evaluated by the client, a privileged caller may retain
> > access the
> > bits deny, and supplementary group membership is not checked, so the
> > test is
> > necessarily approximate -- but an inexact answer costs at most an
> > unnecessary delegation return or a fall back to the server's recall,
> > never
> > incorrect access.
> >
> > RFC 8881 Section 10.4.4 permits a client to return a delegation
> > voluntarily,
> > performing the same pre-return state updates (data flush, pending
> > truncation, CLOSE/OPEN/LOCK) it would on a recall. Commit
> > c01d36457dcc
> > ("NFSv4: Don't return the delegation when not needed by NFSv4.x
> > (x>0)")
> > stopped returning write delegations on SETATTR for NFSv4.1+, since
> > the
> > server can identify the delegation holder from the SEQUENCE clientid
> > and
> > need not recall. That holds for changes that do not affect the
> > holder's
> > access; restore a return only for the narrow case where the holder's
> > own
> > write access is removed.
>
> Hmmm... I'd argue that while recalling the delegation in this case is
> mandatory for NFSv4.0, that is certainly not true for NFSv4.1.
>
> Furthermore, I'd argue that if the holder of a write delegation is just
> changing the mode, then that should never result in a delegation recall
> for a well written NFSv4.1 server. The reason is this does not impact
> the client's ability to cache data, metadata or lock state. It only
> impacts its ability to rely on previously cached access data when
> handling new opens.
I'm not sure I completely agree with this statement. The case I would
be concerned about is delayed writes sitting in the client.
Maybe an NFSv4.1/4.2 server should always allow writes from a
client that holds a write delegation for the file, but I don't think that
is spelled out in RFC8881 (I'm never sure, given that monstrous
document) and I'll admit that the FreeBSD server
does not do that. The FreeBSD server currently does always allow the
owner of the file to do writes, but does not do the same w.r.t. write
delegation held by the client. (I'll think about adding that override,
because it does seem reasonable.)
What does the Linux knfsd currently do w.r.t. allowing writes
from a client that holds a write delegation?
Certainly setting mode bits won't be a problem and clearing
owner mode bits isn't a problem for the FreeBSD server.
>
> One can argue whether or not it's needed for a uid or gid change by
> said holder of the delegation, but there too I'd say the right
> behaviour is to err on the side of not recalling.
I might argue that whether or not clearing mode bits requires a
delegation recall should be left up to the server.
> The exception might be if this is an attribute delegation, and the
> result will be that the cred attached to the delegation will no longer
> be able to issue a SETATTR to update the atime/mtime on delegation
> return.
Lost me. What's an attribute delegation?
rick
>
> So yes to pre-emptive invalidation of the access cache, but I'm very
> sceptical to actually pre-emptively returning the delegation or even
> the layouts.
>
> >
> > Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxxxxxxx>
> > ---
> > fs/nfs/nfs4proc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++-
> > --
> > 1 file changed, 62 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index a9b8d482d289..e4b7322bf75c 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -4506,7 +4506,55 @@ int nfs4_proc_getattr(struct nfs_server
> > *server, struct nfs_fh *fhandle,
> > return err;
> > }
> >
> > -/*
> > +/*
> > + * Would applying @sattr (which changes mode, owner, and/or group)
> > remove the
> > + * write access of a held write delegation's owning credential, as
> > judged by
> > + * the resulting file mode bits?
> > + *
> > + * Such a change makes the delegation's cached "open for write"
> > assertion
> > + * stale: a later open(O_WRONLY) could be served from the delegation
> > without
> > + * the server getting a chance to deny it. Only the mode bits and
> > the
> > + * holder's fsuid/fsgid are consulted; an NFSv4 ACL (which the
> > client cannot
> > + * evaluate locally), a privileged caller, or supplementary group
> > membership
> > + * may make the answer imprecise, but the cost is at most an
> > unnecessary
> > + * delegation return or a fall back to the server's recall -- never
> > incorrect
> > + * access.
> > + */
> > +static bool nfs4_setattr_removes_write(struct inode *inode, struct
> > iattr *sattr)
> > +{
> > + struct nfs_delegation *delegation;
> > + const struct cred *cred;
> > + umode_t mode = inode->i_mode;
> > + kuid_t uid = inode->i_uid;
> > + kgid_t gid = inode->i_gid;
> > + bool ret = false;
> > +
> > + delegation = nfs4_get_valid_delegation(inode);
> > + if (!delegation)
> > + return false;
> > + if (!(delegation->type & FMODE_WRITE))
> > + goto out;
> > + cred = delegation->cred;
> > +
> > + if (sattr->ia_valid & ATTR_MODE)
> > + mode = sattr->ia_mode;
> > + if (sattr->ia_valid & ATTR_UID)
> > + uid = sattr->ia_uid;
> > + if (sattr->ia_valid & ATTR_GID)
> > + gid = sattr->ia_gid;
> > +
> > + if (uid_eq(uid, cred->fsuid))
> > + ret = !(mode & S_IWUSR);
> > + else if (gid_eq(gid, cred->fsgid))
> > + ret = !(mode & S_IWGRP);
> > + else
> > + ret = !(mode & S_IWOTH);
> > +out:
> > + nfs_put_delegation(delegation);
> > + return ret;
> > +}
> > +
> > +/*
> > * The file is not closed if it is opened due to the a request to
> > change
> > * the size of the file. The open call will not be needed once the
> > * VFS layer lookup-intents are implemented.
> > @@ -4555,9 +4603,19 @@ nfs4_proc_setattr(struct dentry *dentry,
> > struct nfs_fattr *fattr,
> > cred = ctx->cred;
> > }
> >
> > - /* Return any delegations if we're going to change ACLs */
> > - if ((sattr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0)
> > - nfs4_inode_make_writeable(inode);
> > + /*
> > + * A change to mode, owner, or group that removes the write
> > + * delegation holder's own write access makes the
> > delegation's cached
> > + * "open for write" stale; return it so a later open()
> > revalidates
> > + * access with the server. A change that keeps write access
> > leaves
> > + * the delegation in place.
> > + */
> > + if (sattr->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) {
> > + if (nfs4_setattr_removes_write(inode, sattr))
> > + nfs4_inode_return_delegation(inode);
> > + else
> > + nfs4_inode_make_writeable(inode);
> > + }
> >
> > status = nfs4_do_setattr(inode, cred, fattr, sattr, ctx,
> > NULL);
> > if (status == 0) {
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx
>