Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access

From: Benjamin Coddington

Date: Fri May 29 2026 - 15:49:03 EST


On 29 May 2026, at 11:27, Rick Macklem wrote:

> On Fri, May 29, 2026 at 7:06 AM Trond Myklebust <trondmy@xxxxxxxxxx> wrote:
>>
>> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If you are unsure, forward the message to ITHelp@xxxxxxxxxxx for review.
>>
>>
>> On Thu, 2026-05-28 at 15:22 -0400, Benjamin Coddington wrote:
>>> A client holding an OPEN_DELEGATE_WRITE delegation can satisfy a
>>> later
>>> open(O_WRONLY) from the cached delegation (can_open_delegated())
>>> without
>>> sending an OPEN to the server. That cached "open for write" assertion
>>> is
>>> only valid while the delegation holder still has write access. A
>>> SETATTR
>>> that changes mode, owner, or group can revoke that access -- after
>>> which an
>>> open served from the delegation would bypass an access check the
>>> server
>>> would now fail, and, against a server that recalls the delegation on
>>> such a
>>> change, the SETATTR draws a CB_RECALL/NFS4ERR_DELAY/DELEGRETURN/retry
>>> round
>>> trip.
>>>
>>> Before issuing such a SETATTR, check whether the proposed
>>> mode/owner/group
>>> would remove write access for the delegation's owning credential,
>>> judged by
>>> the resulting POSIX mode bits. If so, return the delegation first:
>>> the
>>> return is synchronous and flushes modified data, so the SETATTR
>>> proceeds on
>>> an open or special stateid and the next open revalidates access with
>>> the
>>> server. Permission changes that keep the holder's write access leave
>>> the
>>> delegation in place.
>>>
>>> Only the mode bits and the holder's fsuid/fsgid are consulted. An
>>> NFSv4 ACL
>>> cannot be evaluated by the client, a privileged caller may retain
>>> access the
>>> bits deny, and supplementary group membership is not checked, so the
>>> test is
>>> necessarily approximate -- but an inexact answer costs at most an
>>> unnecessary delegation return or a fall back to the server's recall,
>>> never
>>> incorrect access.
>>>
>>> RFC 8881 Section 10.4.4 permits a client to return a delegation
>>> voluntarily,
>>> performing the same pre-return state updates (data flush, pending
>>> truncation, CLOSE/OPEN/LOCK) it would on a recall. Commit
>>> c01d36457dcc
>>> ("NFSv4: Don't return the delegation when not needed by NFSv4.x
>>> (x>0)")
>>> stopped returning write delegations on SETATTR for NFSv4.1+, since
>>> the
>>> server can identify the delegation holder from the SEQUENCE clientid
>>> and
>>> need not recall. That holds for changes that do not affect the
>>> holder's
>>> access; restore a return only for the narrow case where the holder's
>>> own
>>> write access is removed.
>>
>> Hmmm... I'd argue that while recalling the delegation in this case is
>> mandatory for NFSv4.0, that is certainly not true for NFSv4.1.
>>
>> Furthermore, I'd argue that if the holder of a write delegation is just
>> changing the mode, then that should never result in a delegation recall
>> for a well written NFSv4.1 server. The reason is this does not impact
>> the client's ability to cache data, metadata or lock state. It only
>> impacts its ability to rely on previously cached access data when
>> handling new opens.

> I'm not sure I completely agree with this statement. The case I would
> be concerned about is delayed writes sitting in the client.

Those stay safe on Linux. The client flushes cached writes on the delegation
(or open) stateid, and knfsd authorizes a WRITE from the stateid's granted
access, not from a re-check of the current mode — nfsd4_write() just rides
the already-open, write-capable nfsd_file resolved from the stateid, so a
later mode change doesn't block the holder's writes. And if the server does
recall, DELEGRETURN flushes the dirty data before the delegation goes back.
The only way to lose data is a server that neither honors the holder's
stateid writes nor recalls.

> Maybe an NFSv4.1/4.2 server should always allow writes from a
> client that holds a write delegation for the file, but I don't think that
> is spelled out in RFC8881 (I'm never sure, given that monstrous
> document) and I'll admit that the FreeBSD server
> does not do that. The FreeBSD server currently does always allow the
> owner of the file to do writes, but does not do the same w.r.t. write
> delegation held by the client. (I'll think about adding that override,
> because it does seem reasonable.)
>
> What does the Linux knfsd currently do w.r.t. allowing writes
> from a client that holds a write delegation?

It allows them. WRITE resolves the delegation (or open) stateid to an open
write-capable nfsd_file and writes through it; there's no per-WRITE mode
check in nfsd4_write() -> nfsd_vfs_write(). So knfsd effectively already
does the override you're considering for FreeBSD, just implicitly — write
authorization comes from the open/deleg stateid, the same principle as the
truncate-via-stateid case.

> Certainly setting mode bits won't be a problem and clearing
> owner mode bits isn't a problem for the FreeBSD server.
>
> Oh, and one more quirky corner..
> If the server provided a non-empty ACE for the permissions
> field for the write delegation, these SETATTR changes either
> require the server to recall the delegation or the client to
> invalidate use of this ACE.
>
> I'd suggest that the client invalidate use of the ACE (if it
> ever uses it?) and leave delegation recall up to the server.

The Linux client never uses it — decode_rw_delegation() parses the
delegation's permissions ACE and discards it (decode_ace() is passed a NULL
sink), and the client always does its own ACCESS. So there's nothing to
invalidate on our side, and agreed: the recall decision belongs to the
server.

...

>> The exception might be if this is an attribute delegation, and the
>> result will be that the cred attached to the delegation will no longer
>> be able to issue a SETATTR to update the atime/mtime on delegation
>> return.
> Lost me. What's an attribute delegation?

I'll let Trond give the canonical definition, but as I understand it: the
*_ATTRS_DELEG delegation variants additionally delegate attribute authority
— notably the timestamps — so the client can hold and modify them and write
them back at DELEGRETURN. That return-time SETATTR is what Trond meant about
the holder's cred needing to retain access.

Ben