Re: [PATCH v3 1/7] iversion: update comments with info about atime updates
From: Jeff Layton
Date: Mon Aug 29 2022 - 06:39:13 EST
On Mon, 2022-08-29 at 17:56 +1000, Dave Chinner wrote:
> On Fri, Aug 26, 2022 at 05:46:57PM -0400, Jeff Layton wrote:
> > The i_version field in the kernel has had different semantics over
> > the decades, but we're now proposing to expose it to userland via
> > statx. This means that we need a clear, consistent definition of
> > what it means and when it should change.
> >
> > Update the comments in iversion.h to describe how a conformant
> > i_version implementation is expected to behave. This definition
> > suits the current users of i_version (NFSv4 and IMA), but is
> > loose enough to allow for a wide range of possible implementations.
> >
> > Cc: Colin Walters <walters@xxxxxxxxxx>
> > Cc: NeilBrown <neilb@xxxxxxx>
> > Cc: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx>
> > Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> > Link: https://lore.kernel.org/linux-xfs/166086932784.5425.17134712694961326033@xxxxxxxxxxxxxxxxxxxxx/#t
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> > include/linux/iversion.h | 23 +++++++++++++++++++++--
> > 1 file changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> > index 3bfebde5a1a6..45e93e1b4edc 100644
> > --- a/include/linux/iversion.h
> > +++ b/include/linux/iversion.h
> > @@ -9,8 +9,19 @@
> > * ---------------------------
> > * The change attribute (i_version) is mandated by NFSv4 and is mostly for
> > * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
> > - * appear different to observers if there was a change to the inode's data or
> > - * metadata since it was last queried.
> > + * appear different to observers if there was an explicit change to the inode's
> > + * data or metadata since it was last queried.
> > + *
> > + * An explicit change is one that would ordinarily result in a change to the
> > + * inode status change time (aka ctime). The version must appear to change, even
> > + * if the ctime does not (since the whole point is to avoid missing updates due
> > + * to timestamp granularity). If POSIX mandates that the ctime must change due
> > + * to an operation, then the i_version counter must be incremented as well.
> > + *
> > + * A conformant implementation is allowed to increment the counter in other
> > + * cases, but this is not optimal. NFSv4 and IMA both use this value to determine
> > + * whether caches are up to date. Spurious increments can cause false cache
> > + * invalidations.
>
> "not optimal", but never-the-less allowed - that's "unspecified
> behaviour" if I've ever seen it. How is userspace supposed to
> know/deal with this?
>
> Indeed, this loophole clause doesn't exist in the man pages that
> define what statx.stx_ino_version means. The man pages explicitly
> define that stx_ino_version only ever changes when stx_ctime
> changes.
>
We can fix the manpage to make this more clear.
> IOWs, the behaviour userspace developers are going to expect *does
> not include* stx_ino_version changing it more often than ctime is
> changed. Hence a kernel iversion implementation that bumps the
> counter more often than ctime changes *is not conformant with the
> statx version counter specification*. IOWs, we can't export such
> behaviour to userspace *ever* - it is a non-conformant
> implementation.
>
Nonsense. The statx version counter specification is *whatever we decide
to make it*. If we define it to allow for spurious version bumps, then
these implementations would be conformant.
Given that you can't tell what or how much changed in the inode whenever
the value changes, allowing it to be bumped on non-observable changes is
ok and the counter is still useful. When you see it change you need to
go stat/read/getxattr etc, to see what actually happened anyway.
Most applications won't be interested in every possible explicit change
that can happen to an inode. It's likely these applications would check
the parts of the inode they're interested in, and then go back to
waiting for the next bump if the change wasn't significant to them.
> Hence I think anything that bumps iversion outside the bounds of the
> statx definition should be declared as such:
>
> "Non-conformant iversion implementations:
> - MUST NOT be exported by statx() to userspace
> - MUST be -tolerated- by kernel internal applications that
> use iversion for their own purposes."
>
I think this is more strict than is needed. An implementation that bumps
this value more often than is necessary is still useful. It's not
_ideal_, but it still meets the needs of NFSv4, IMA and other potential
users of it. After all, this is basically the definition of i_version
today and it's still useful, even if atime update i_version bumps are
currently harmful for performance.
--
Jeff Layton <jlayton@xxxxxxxxxx>