Re: [PATCH v2 2/2] nfs: update inode ctime after removexattr operation

From: Thomas Haynes

Date: Tue Mar 31 2026 - 14:47:29 EST

On Tue, Mar 31, 2026 at 07:42:56AM -0800, Jeff Layton wrote:
> On Fri, 2026-03-27 at 11:22 -0700, Thomas Haynes wrote:
> > On Fri, Mar 27, 2026 at 12:59:54PM -0800, Jeff Layton wrote:
> > > On Fri, 2026-03-27 at 12:20 -0400, Olga Kornievskaia wrote:
> > > > On Fri, Mar 27, 2026 at 11:50 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, 2026-03-27 at 11:11 -0400, Olga Kornievskaia wrote:
> > > > > > On Tue, Mar 24, 2026 at 1:32 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > xfstest generic/728 fails with delegated timestamps. The client does a
> > > > > > > removexattr and then a stat to test the ctime, which doesn't change. The
> > > > > > > stat() doesn't trigger a GETATTR because of the delegated timestamps, so
> > > > > > > it relies on the cached ctime, which is wrong.
> > > > > > >
> > > > > > > The setxattr compound has a trailing GETATTR, which ensures that its
> > > > > > > ctime gets updated. Follow the same strategy with removexattr.
> > > > > >
> > > > > > This approach relies on the fact that the server the serves delegated
> > > > > > attributes would update change_attr on operations which might now
> > > > > > necessarily happen (ie, linux server does not update change_attribute
> > > > > > on writes or clone). I propose an alternative fix for the failing
> > > > > > generic/728.
> > > > > >
> > > > > > diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> > > > > > index 7b3ca68fb4bb..ede1835a45b3 100644
> > > > > > --- a/fs/nfs/nfs42proc.c
> > > > > > +++ b/fs/nfs/nfs42proc.c
> > > > > > @@ -1389,7 +1389,13 @@ static int _nfs42_proc_removexattr(struct inode
> > > > > > *inode, const char *name)
> > > > > > &res.seq_res, 1);
> > > > > > trace_nfs4_removexattr(inode, name, ret);
> > > > > > if (!ret)
> > > > > > - nfs4_update_changeattr(inode, &res.cinfo, timestamp, 0);
> > > > > > + if (nfs_have_delegated_attributes(inode)) {
> > > > > > + nfs_update_delegated_mtime(inode);
> > > > > > + spin_lock(&inode->i_lock);
> > > > > > + nfs_set_cache_invalid(inode, NFS_INO_INVALID_BLOCKS);
> > > > > > + spin_unlock(&inode->i_lock);
> > > > > > + } else
> > > > > > + nfs4_update_changeattr(inode, &res.cinfo, timestamp, 0);
> > > > > >
> > > > > > return ret;
> > > > > > }
> > > > > >
> > > > >
> > > > > What's the advantage of doing it this way?
> > > > >
> > > > > You just sent a REMOVEXATTR operation to the server that will change
> > > > > the mtime there. The server has the most up-to-date version of the
> > > > > mtime and ctime at that point.
> > > >
> > > > In presence of delegated attributes, Is the server required to update
> > > > its mtime/ctime on an operation? As I mentioned, the linux server does
> > > > not update its ctime/mtime for WRITE, CLONE, COPY.
> > > >
> > > > Is possible that
> > > > some implementations might be different and also do not update the
> > > > ctime/mtime on REMOVEXATTR?
> > > >
> > > > Therefore I was suggesting that the patch
> > > > relies on the fact that it would receive an updated value. Of course
> > > > perhaps all implementations are done the same as the linux server and
> > > > my point is moot. I didn't see anything in the spec that clarifies
> > > > what the server supposed to do (and client rely on).
> > > >
> > >
> > > (cc'ing Tom)
> > >
> > > That is a very good point.
> > >
> > > My interpretation was that delegated timestamps generally covered
> > > writes, but SETATTR style operations that do anything beyond only
> > > changing the mtime can't be cached.
> > >
> > > We probably need some delstid spec clarification: for what operations
> > > is the server required to disable timestamp updates when a write
> > > delegation is outstanding?
> > >
> > > In the case of nfsd, we disable timestamp updates for WRITE/COPY/CLONE
> > > but not SETATTR/SETXATTR/REMOVEXATTR.
> > >
> > > How does the Hammerspace anvil behave? Does it disable c/mtime updates
> > > for writes when there is an outstanding timestamp delegation like we're
> > > doing in nfsd? If so, does it do the same for
> > > SETATTR/SETXATTR/REMOVEXATTR operations as well?
> >
> > Jeff,
> >
> > I think the right way to look at this is closer to how size is
> > handled under delegation in RFC8881, rather than as a per-op rule.
> >
> > In our implementation, because we are acting as an MDS and data I/O
> > goes to DSes, we already treat size as effectively delegated when
> > a write layout is outstanding. The MDS does not maintain authoritative
> > size locally in that case. We may refresh size/timestamps internally
> > (e.g., on GETATTR by querying DSes), but we don’t treat that as
> > overriding the delegated authority.
> >
> > For timestamps, our behavior is effectively the same model. When
> > the client holds the relevant delegation, the server does not
> > consider itself authoritative for ctime/mtime. If current values
> > are needed, we can obtain them from the client (e.g., via CB_GETATTR),
> > and the client must present the delegation stateid to demonstrate
> > that authority. So the authority follows the delegation, not the
> > specific operation.
> >
> > That said, I don’t think we’ve fully resolved the semantics for all
> > metadata-style ops either. WRITE and SETATTR are clear in our model,
> > but for things like CLONE/COPY/SETXATTR/REMOVEXATTR, we’ve likely
> > been relying on assumptions rather than a fully consistent rule.
> > I.e., CLONE and COPY we just pass through to the DS and we don't
> > implement SETXATTR/REMOVEXATTR.
> >
> > So the spec question, as I see it, is not whether REMOVEXATTR (or
> > any particular op) should update ctime/mtime, but whether delegated
> > timestamps are meant to follow the same attribute-authority model
> > as delegated size in RFC8881. If so, then we expect that the server
> > should query the client via CB_GETATTR to return updated ctime/mtime
> > after such operations while the delegation is outstanding.
> >
>
> The dilemma we have is: because we _do_ allow local processes to stat()
> files that have an outstanding write delegation, we can never allow the
> ctime in particular to roll backward (modulo clock jumps).

I agree we do not want visible ctime rollback, but I do not see how
that can be guaranteed from delegated timestamps alone when the
authoritative timestamp may be generated on a different node with
a different clock and the object may change during the CB_GETATTR
window. That seems to require either monotonic clamping of exposed
ctime, or treating change_attr rather than ctime as the real
serialization signal.

>
> If we're dealing with changes that have been cached in the client and
> are being lazily flushed out, then we can't update the timestamp when
> that operation occurs. The time of the RPC to flush the changes will
> almost certainly be later than the cached timestamps on the client that
> will eventually be set, so when the client comes back we'd end up
> violating the rollback rule.
>
> Our only option is to freeze timestamp updates on anything that might
> represent such an operation. So far, we only do that on WRITE and COPY
> operations -- in general, operations that require an open file, since
> FMODE_NOCMTIME is attached to the file.
>
> Some SETATTRs that only update the mtime and atime can be cached on the
> client by virtue of the fact that it's authoritative for timestamps.
> There are some exceptions though:
>
> - atime-only updates can't be cached since the ctime won't change with
> a timestamp update if the mtime didn't change
>
> - if you set the mtime to a time that is later than the time you got
> the delegation from the server, but earlier than the current time, you
> can't cache that. The ctime would be later than the mtime in that case,
> and we don't have a mechanism to handle that in a delegated timestamp
> SETATTR.
>
> I don't see how you could reasonably buffer a SETXATTR or REMOVEXATTR
> operation to be sent later. These need to be done synchronously since
> they could always fail for some reason and we don't have a mechanism at
> the syscall layer to handle a deferred error. They also only update the
> ctime and not the mtime, and we have no mechanism to do that with
> delegated timestamps.
>
> Based on that, I think the client and server both need to ignore the
> timestamp delegation on a SETXATTR or REMOVEXATTR. The server should
> update the ctime and the client needs to send a trailing GETATTR on the
> REMOVEXATTR compound in order to get it and the change attr.

One concern I have with a per-op formulation is extensibility. If
delegated timestamp behavior is defined by enumerating specific
operations, then every new operation added to the protocol creates
a fresh ambiguity until the spec is updated again. It seems better
to define the behavior in terms of operation properties - e.g., whether
the operation is synchronously visible, can be deferred/cached at
the client, and whether it affects only ctime versus mtime/atime -
so future operations can be classified without reopening the base
rule.

I.e., I can't tell if you want me to update the spec with
guidance per-op or you are just documenting what you did.

>
> Exactly what this patch does, fwiw...
> --
> Jeff Layton <jlayton@xxxxxxxxxx>