Re: [PATCH v8 RESEND 2/8] fs: clarify when the i_version counter must be updated
From: Jan Kara
Date: Wed Jan 25 2023 - 11:06:48 EST
On Tue 24-01-23 14:30:19, Jeff Layton wrote:
> The i_version field in the kernel has had different semantics over
> the decades, but NFSv4 has certain expectations. Update the comments
> in iversion.h to describe when the i_version must change.
>
> Cc: Colin Walters <walters@xxxxxxxxxx>
> Cc: NeilBrown <neilb@xxxxxxx>
> Cc: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx>
> Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
Looks good to me. But one note below:
> diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> index 6755d8b4f20b..fced8115a5f4 100644
> --- a/include/linux/iversion.h
> +++ b/include/linux/iversion.h
> @@ -9,8 +9,25 @@
> * ---------------------------
> * The change attribute (i_version) is mandated by NFSv4 and is mostly for
> * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
> - * appear different to observers if there was a change to the inode's data or
> - * metadata since it was last queried.
> + * appear larger to observers if there was an explicit change to the inode's
> + * data or metadata since it was last queried.
> + *
> + * An explicit change is one that would ordinarily result in a change to the
> + * inode status change time (aka ctime). i_version must appear to change, even
> + * if the ctime does not (since the whole point is to avoid missing updates due
> + * to timestamp granularity). If POSIX or other relevant spec mandates that the
> + * ctime must change due to an operation, then the i_version counter must be
> + * incremented as well.
> + *
> + * Making the i_version update completely atomic with the operation itself would
> + * be prohibitively expensive. Traditionally the kernel has updated the times on
> + * directories after an operation that changes its contents. For regular files,
> + * the ctime is usually updated before the data is copied into the cache for a
> + * write. This means that there is a window of time when an observer can
> + * associate a new timestamp with old file contents. Since the purpose of the
> + * i_version is to allow for better cache coherency, the i_version must always
> + * be updated after the results of the operation are visible. Updating it before
> + * and after a change is also permitted.
This sounds good but it is not the case for any of the current filesystems, is
it? Perhaps the documentation should mention this so that people are not
confused?
Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR