Re: [PATCH v8 0/5] fs: multigrain timestamps for XFS's change_cookie

From: Jeff Layton
Date: Wed Sep 27 2023 - 06:26:41 EST


On Wed, 2023-09-27 at 09:33 +1000, Dave Chinner wrote:
> On Tue, Sep 26, 2023 at 07:31:55AM -0400, Jeff Layton wrote:
> > On Tue, 2023-09-26 at 08:32 +1000, Dave Chinner wrote:
> > > We also must not lose sight of the fact that the lazytime mount
> > > option makes atime updates on XFS behave exactly as the nfsd/NFS
> > > client application wants. That is, XFS will do in-memory atime
> > > updates unless the atime update also sets S_VERSION to explicitly
> > > bump the i_version counter if required. That leads to another
> > > potential nfsd specific solution without requiring filesystems to
> > > change on disk formats: the nfsd explicitly asks operations for lazy
> > > atime updates...
> > >
> >
> > Not exactly. The problem with XFS's i_version is that it also bumps it
> > on atime updates. lazytime reduces the number of atime updates to
> > ~1/day. To be exactly what nfsd wants, you'd need to make that 0.
>
> As long as there are future modifications going to those files,
> lazytime completely elides the visibility of atime updates as they
> get silently aggregated into future modifications and so there are
> 0 i_version changes as a resutl of pure atime updates in those cases.
>
> If there are no future modifications, then just like relatime, there
> is a timestamp update every 24hrs. That's no big deal, nobody is
> complaining about this being a problem.
>

Right. The main issue here is that (with relatime) we'll still end up
with a cache invalidation once every 24 hours for any r/o files that
have been accessed. It's not a _huge_ problem on most workloads; it's
just not ideal.

> It's the "persistent atime update after modification" heuristic
> implemented by relatime that is causing all the problems here. If
> that behaviour is elided on the server side, then most of the client
> side invalidation problems with these workloads go away.
>
> IOWs, nfsd needs direct control over how atime updates should be
> treated by the VFS/filesystem (i.e. as pure in-memory updates)
> rather than leaving it to some heuristic that may do the exact
> opposite of what the nfsd application needs.
>
> That's the point I was making: we have emerging requirements for
> per-operation timestamp update behaviour control with io_uring and
> other non-blocking applications. The nfsd application also has
> specific semantics it wants the VFS/filesystem to implement
> (non-persistent atime unless something else changes)....
>
> My point is that we've now failed a couple of times now to implement
> what NFSD requires via trying to change VFS and/or filesystem
> infrastructure to provide i_version or ctime semantics the nfsd
> requires. That's a fairly good sign that we might not be approaching
> this problem from the right direction, and so doubling down and
> considering changing the timestamp infrastructure from the ground up
> just to solve a relatively niche, filesystem specific issue doesn't
> seem like the best approach.
>
> OTOH, having the application actually tell the timestamp updates
> exactly what semantics it needs (non blocking, persistent vs in
> memory, etc) will allow the VFS and filesystems can do the right
> thing for the application without having to worry about general
> heuristics that sometimes do exactly the wrong thing....
>

I'm a little unclear on exactly what you're proposing here, but I think
that's overstating what's needed. nfsd's needs are pretty simple: it
wants a change attribute that changes any time the ctime would change.

btrfs, ext4 and tmpfs have this. xfs does not because its change
attribute changes when the atime changes as well. With the right mount
options, that problem can be mitigated to some degree, but it's still
not ideal.

We have a couple of options: try to make the ctime behave the way we
need, or just implement a proper change attribute in xfs (which involves
revving the on-disk format).
--
Jeff Layton <jlayton@xxxxxxxxxx>