Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field

From: Trond Myklebust
Date: Thu Sep 08 2022 - 08:41:00 EST


On Thu, 2022-09-08 at 07:37 -0400, Jeff Layton wrote:
> On Thu, 2022-09-08 at 00:41 +0000, Trond Myklebust wrote:
> > On Thu, 2022-09-08 at 10:31 +1000, NeilBrown wrote:
> > > On Wed, 07 Sep 2022, Trond Myklebust wrote:
> > > > On Wed, 2022-09-07 at 09:12 -0400, Jeff Layton wrote:
> > > > > On Wed, 2022-09-07 at 08:52 -0400, J. Bruce Fields wrote:
> > > > > > On Wed, Sep 07, 2022 at 08:47:20AM -0400, Jeff Layton
> > > > > > wrote:
> > > > > > > On Wed, 2022-09-07 at 21:37 +1000, NeilBrown wrote:
> > > > > > > > On Wed, 07 Sep 2022, Jeff Layton wrote:
> > > > > > > > > +The change to \fIstatx.stx_ino_version\fP is not
> > > > > > > > > atomic
> > > > > > > > > with
> > > > > > > > > respect to the
> > > > > > > > > +other changes in the inode. On a write, for
> > > > > > > > > instance,
> > > > > > > > > the
> > > > > > > > > i_version it usually
> > > > > > > > > +incremented before the data is copied into the
> > > > > > > > > pagecache.
> > > > > > > > > Therefore it is
> > > > > > > > > +possible to see a new i_version value while a read
> > > > > > > > > still
> > > > > > > > > shows the old data.
> > > > > > > >
> > > > > > > > Doesn't that make the value useless?
> > > > > > > >
> > > > > > >
> > > > > > > No, I don't think so. It's only really useful for
> > > > > > > comparing
> > > > > > > to an
> > > > > > > older
> > > > > > > sample anyway. If you do "statx; read; statx" and the
> > > > > > > value
> > > > > > > hasn't
> > > > > > > changed, then you know that things are stable.
> > > > > >
> > > > > > I don't see how that helps.  It's still possible to get:
> > > > > >
> > > > > >                 reader          writer
> > > > > >                 ------          ------
> > > > > >                                 i_version++
> > > > > >                 statx
> > > > > >                 read
> > > > > >                 statx
> > > > > >                                 update page cache
> > > > > >
> > > > > > right?
> > > > > >
> > > > >
> > > > > Yeah, I suppose so -- the statx wouldn't necessitate any
> > > > > locking.
> > > > > In
> > > > > that case, maybe this is useless then other than for testing
> > > > > purposes
> > > > > and userland NFS servers.
> > > > >
> > > > > Would it be better to not consume a statx field with this if
> > > > > so?
> > > > > What
> > > > > could we use as an alternate interface? ioctl? Some sort of
> > > > > global
> > > > > virtual xattr? It does need to be something per-inode.
> > > >
> > > > I don't see how a non-atomic change attribute is remotely
> > > > useful
> > > > even
> > > > for NFS.
> > > >
> > > > The main problem is not so much the above (although NFS clients
> > > > are
> > > > vulnerable to that too) but the behaviour w.r.t. directory
> > > > changes.
> > > >
> > > > If the server can't guarantee that file/directory/... creation
> > > > and
> > > > unlink are atomically recorded with change attribute updates,
> > > > then
> > > > the
> > > > client has to always assume that the server is lying, and that
> > > > it
> > > > has
> > > > to revalidate all its caches anyway. Cue endless
> > > > readdir/lookup/getattr
> > > > requests after each and every directory modification in order
> > > > to
> > > > check
> > > > that some other client didn't also sneak in a change of their
> > > > own.
> > >
> > > NFS re-export doesn't support atomic change attributes on
> > > directories.
> > > Do we see the endless revalidate requests after directory
> > > modification
> > > in that situation?  Just curious.
> >
> > Why wouldn't NFS re-export be capable of supporting atomic change
> > attributes in those cases, provided that the server does? It seems
> > to
> > me that is just a question of providing the correct information
> > w.r.t.
> > atomicity to knfsd.
> >
> > ...but yes, a quick glance at nfs4_update_changeattr_locked(), and
> > what
> > happens when !cinfo->atomic should tell you all you need to know.
>
> The main reason we disabled atomic change attribute updates was that
> getattr calls on NFS can be pretty expensive. By setting the NOWCC
> flag,
> we can avoid those for WCC info, but at the expense of the client
> having
> to do more revalidation on its own.

While providing WCC attributes on regular files is typically expensive,
since it may involve needing to flush out I/O, doing so for directories
tends to be a lot less so. The main reason is that all directory
operations are synchronous in NFS, and typically do return at least the
change attribute when they are modifying the directory contents.

So yes, when we re-export NFS as NFSv3, we do want to skip returning
WCC attributes for the file. However we usually do our best to return
post-op attributes for the directory.

Atomicity is a different matter though. Right now the NFS client does
set EXPORT_OP_NOATOMIC_ATTR, but we could find ways to work around that
for the NFSv4 change attribute at least, if we wanted to.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx