Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field

From: Jeff Layton
Date: Wed Sep 07 2022 - 10:44:16 EST


On Wed, 2022-09-07 at 15:51 +0200, Jan Kara wrote:
> On Wed 07-09-22 09:12:34, Jeff Layton wrote:
> > On Wed, 2022-09-07 at 08:52 -0400, J. Bruce Fields wrote:
> > > On Wed, Sep 07, 2022 at 08:47:20AM -0400, Jeff Layton wrote:
> > > > On Wed, 2022-09-07 at 21:37 +1000, NeilBrown wrote:
> > > > > On Wed, 07 Sep 2022, Jeff Layton wrote:
> > > > > > +The change to \fIstatx.stx_ino_version\fP is not atomic with respect to the
> > > > > > +other changes in the inode. On a write, for instance, the i_version it usually
> > > > > > +incremented before the data is copied into the pagecache. Therefore it is
> > > > > > +possible to see a new i_version value while a read still shows the old data.
> > > > >
> > > > > Doesn't that make the value useless?
> > > > >
> > > >
> > > > No, I don't think so. It's only really useful for comparing to an older
> > > > sample anyway. If you do "statx; read; statx" and the value hasn't
> > > > changed, then you know that things are stable.
> > >
> > > I don't see how that helps. It's still possible to get:
> > >
> > > reader writer
> > > ------ ------
> > > i_version++
> > > statx
> > > read
> > > statx
> > > update page cache
> > >
> > > right?
> > >
> >
> > Yeah, I suppose so -- the statx wouldn't necessitate any locking. In
> > that case, maybe this is useless then other than for testing purposes
> > and userland NFS servers.
> >
> > Would it be better to not consume a statx field with this if so? What
> > could we use as an alternate interface? ioctl? Some sort of global
> > virtual xattr? It does need to be something per-inode.
>
> I was thinking how hard would it be to increment i_version after updating
> data but it will be rather hairy. In particular because of stuff like
> IOCB_NOWAIT support which needs to bail if i_version update is needed. So
> yeah, I don't think there's an easy way how to provide useful i_version for
> general purpose use.
>

Yeah, it does look ugly.

Another idea might be to just take the i_rwsem for read in the statx
codepath when STATX_INO_VERSION has been requested. xfs, ext4 and btrfs
hold the i_rwsem exclusively over their buffered write ops. Doing that
should be enough to prevent the race above, I think. The ext4 DAX path
also looks ok there.

The ext4 DIO write implementation seems to take the i_rwsem for read
though unless the size is changing or the write is unaligned. So a
i_rwsem readlock would probably not be enough to guard against changes
there. Maybe we can just say if you're doing DIO, then don't expect real
atomicity wrt i_version?

knfsd seems to already hold i_rwsem when doing directory morphing
operations (where it fetches the pre and post attrs), but it doesn't
take it when calling nfsd4_encode_fattr (which is used to fill out
GETATTR and READDIR replies, etc.). We'd probably have to start taking
it in those codepaths too.

We should also bear in mind that from userland, doing a read of a normal
file and fetching the i_version takes two different syscalls. I'm not
sure we need things to be truly "atomic", per-se. Whether and how we can
exploit that fact, I'm not sure.
--
Jeff Layton <jlayton@xxxxxxxxxx>