Re: [PATCH v3 1/7] iversion: update comments with info about atime updates

From: Trond Myklebust
Date: Tue Aug 30 2022 - 11:43:27 EST


On Tue, 2022-08-30 at 11:17 -0400, J. Bruce Fields wrote:
> On Tue, Aug 30, 2022 at 02:58:27PM +0000, Trond Myklebust wrote:
> > On Tue, 2022-08-30 at 10:44 -0400, J. Bruce Fields wrote:
> > > On Tue, Aug 30, 2022 at 09:50:02AM -0400, Jeff Layton wrote:
> > > > On Tue, 2022-08-30 at 09:24 -0400, J. Bruce Fields wrote:
> > > > > On Tue, Aug 30, 2022 at 07:40:02AM -0400, Jeff Layton wrote:
> > > > > > Yes, saying only that it must be different is intentional.
> > > > > > What
> > > > > > we
> > > > > > really want is for consumers to treat this as an opaque
> > > > > > value
> > > > > > for the
> > > > > > most part [1]. Therefore an implementation based on hashing
> > > > > > would
> > > > > > conform to the spec, I'd think, as long as all of the
> > > > > > relevant
> > > > > > info is
> > > > > > part of the hash.
> > > > >
> > > > > It'd conform, but it might not be as useful as an increasing
> > > > > value.
> > > > >
> > > > > E.g. a client can use that to work out which of a series of
> > > > > reordered
> > > > > write replies is the most recent, and I seem to recall that
> > > > > can
> > > > > prevent
> > > > > unnecessary invalidations in some cases.
> > > > >
> > > >
> > > > That's a good point; the linux client does this. That said,
> > > > NFSv4
> > > > has a
> > > > way for the server to advertise its change attribute behavior
> > > > [1]
> > > > (though nfsd hasn't implemented this yet).
> > >
> > > It was implemented and reverted.  The issue was that I thought
> > > nfsd
> > > should mix in the ctime to prevent the change attribute going
> > > backwards
> > > on reboot (see fs/nfsd/nfsfh.h:nfsd4_change_attribute()), but
> > > Trond
> > > was
> > > concerned about the possibility of time going backwards.  See
> > > 1631087ba872 "Revert "nfsd4: support change_attr_type
> > > attribute"".
> > > There's some mailing list discussion to that I'm not turning up
> > > right
> > > now.
>
> https://lore.kernel.org/linux-nfs/a6294c25cb5eb98193f609a52aa8f4b5d4e81279.camel@xxxxxxxxxxxxxxx/
> is what I was thinking of but it isn't actually that interesting.
>
> > My main concern was that some filesystems (e.g. ext3) were failing
> > to
> > provide sufficient timestamp resolution to actually label the
> > resulting
> > 'change attribute' as being updated monotonically. If the time
> > stamp
> > doesn't change when the file data or metadata are changed, then the
> > client has to perform extra checks to try to figure out whether or
> > not
> > its caches are up to date.
>
> That's a different issue from the one you were raising in that
> discussion.
>
> > > Did NFSv4 add change_attr_type because some implementations
> > > needed
> > > the
> > > unordered case, or because they realized ordering was useful but
> > > wanted
> > > to keep backwards compatibility?  I don't know which it was.
> >
> > We implemented it because, as implied above, knowledge of whether
> > or
> > not the change attribute behaves monotonically, or strictly
> > monotonically, enables a number of optimisations.
>
> Of course, but my question was about the value of the old behavior,
> not
> about the value of the monotonic behavior.
>
> Put differently, if we could redesign the protocol from scratch would
> we
> actually have included the option of non-monotonic behavior?
>

If we could design the filesystems from scratch, we probably would not.
The protocol ended up being as it is because people were trying to make
it as easy to implement as possible.

So if we could design the filesystem from scratch, we would have
probably designed it along the lines of what AFS does.
i.e. each explicit change is accompanied by a single bump of the change
attribute, so that the clients can not only decide the order of the
resulting changes, but also if they have missed a change (that might
have been made by a different client).

However that would be a requirement that is likely to be very specific
to distributed caches (and hence distributed filesystems). I doubt
there are many user space applications that would need that high
precision. Maybe MPI, but that's the only candidate I can think of for
now?

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx