Re: [PATCH v5 6/9] xfs: switch to multigrain timestamps

From: Jeff Layton
Date: Thu Jul 11 2024 - 15:41:15 EST

Next message: Eric Chan: "Re: [PATCH 2/3] kunit: Rename KUNIT_ASSERT_FAILURE to KUNIT_ASSERT for readability"
Previous message: Eric Chan: "[PATCH v2 3/3] kunit: Introduce KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros"
In reply to: Darrick J. Wong: "Re: [PATCH v5 6/9] xfs: switch to multigrain timestamps"
Next in thread: Jeff Layton: "[PATCH v5 7/9] ext4: switch to multigrain timestamps"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, 2024-07-11 at 12:14 -0700, Darrick J. Wong wrote:
> On Thu, Jul 11, 2024 at 11:58:59AM -0400, Jeff Layton wrote:
> > On Thu, 2024-07-11 at 08:09 -0700, Darrick J. Wong wrote:
> > > On Thu, Jul 11, 2024 at 07:08:10AM -0400, Jeff Layton wrote:
> > > > Enable multigrain timestamps, which should ensure that there is an
> > > > apparent change to the timestamp whenever it has been written after
> > > > being actively observed via getattr.
> > > >
> > > > Also, anytime the mtime changes, the ctime must also change, and those
> > > > are now the only two options for xfs_trans_ichgtime. Have that function
> > > > unconditionally bump the ctime, and ASSERT that XFS_ICHGTIME_CHG is
> > > > always set.
> > > >
> > > > Finally, stop setting STATX_CHANGE_COOKIE in getattr, since the ctime
> > > > should give us better semantics now.
> > >
> > > Following up on "As long as the fs isn't touching i_ctime_nsec directly,
> > > you shouldn't need to worry about this" from:
> > > https://lore.kernel.org/linux-xfs/cae5c28f172ac57b7eaaa98a00b23f342f01ba64.camel@xxxxxxxxxx/
> > >
> > > xfs /does/ touch i_ctime_nsec directly when it's writing inodes to disk.
> > > From xfs_inode_to_disk, see:
> > >
> > > to->di_ctime = xfs_inode_to_disk_ts(ip, inode_get_ctime(inode));
> > >
> > > AFAICT, inode_get_ctime itself remains unchanged, and still returns
> > > inode->__i_ctime, right? In which case it's returning a raw timespec64,
> > > which can include the QUERIED flag in tv_nsec, right?
> > >
> >
> > No, in the first patch in the series, inode_get_ctime becomes this:
> >
> > #define I_CTIME_QUERIED ((u32)BIT(31))
> >
> > static inline time64_t inode_get_ctime_sec(const struct inode *inode)
> > {
> > return inode->i_ctime_sec;
> > }
> >
> > static inline long inode_get_ctime_nsec(const struct inode *inode)
> > {
> > return inode->i_ctime_nsec & ~I_CTIME_QUERIED;
> > }
> >
> > static inline struct timespec64 inode_get_ctime(const struct inode *inode)
> > {
> > struct timespec64 ts = { .tv_sec = inode_get_ctime_sec(inode),
> > .tv_nsec = inode_get_ctime_nsec(inode) };
> >
> > return ts;
> > }
>
> Doh! I forgot that this has already been soaking in the vfs tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/include/linux/fs.h?h=next-20240711&id=3aa63a569c64e708df547a8913c84e64a06e7853
>
> > ...which should ensure that you never store the QUERIED bit.
>
> So yep, we're fine here. Sorry about the noise; this was the very
> subtle clue in the diff that the change had already been applied:
>
> static inline struct timespec64 inode_get_ctime(const struct inode *inode)
> @@ -1626,13 +1637,7 @@ static inline struct timespec64 inode_get_ctime(const struct inode *inode)
> return ts;
> }
>
> (Doh doh doh doh doh...)
>
> > > Now let's look at the consumer:
> > >
> > > static inline xfs_timestamp_t
> > > xfs_inode_to_disk_ts(
> > > struct xfs_inode *ip,
> > > const struct timespec64 tv)
> > > {
> > > struct xfs_legacy_timestamp *lts;
> > > xfs_timestamp_t ts;
> > >
> > > if (xfs_inode_has_bigtime(ip))
> > > return cpu_to_be64(xfs_inode_encode_bigtime(tv));
> > >
> > > lts = (struct xfs_legacy_timestamp *)&ts;
> > > lts->t_sec = cpu_to_be32(tv.tv_sec);
> > > lts->t_nsec = cpu_to_be32(tv.tv_nsec);
> > >
> > > return ts;
> > > }
> > >
> > > For the !bigtime case (aka before we added y2038 support) the queried
> > > flag gets encoded into the tv_nsec field since xfs doesn't filter the
> > > queried flag.
> > >
> > > For the bigtime case, the timespec is turned into an absolute nsec count
> > > since the xfs epoch (which is the minimum timestamp possible under the
> > > old encoding scheme):
> > >
> > > static inline uint64_t xfs_inode_encode_bigtime(struct timespec64 tv)
> > > {
> > > return xfs_unix_to_bigtime(tv.tv_sec) * NSEC_PER_SEC + tv.tv_nsec;
> > > }
> > >
> > > Here we'd also be mixing in the QUERIED flag, only now we've encoded a
> > > time that's a second in the future. I think the solution is to add a:
> > >
> > > static inline struct timespec64
> > > inode_peek_ctime(const struct inode *inode)
> > > {
> > > return (struct timespec64){
> > > .tv_sec = inode->__i_ctime.tv_sec,
> > > .tv_nsec = inode->__i_ctime.tv_nsec & ~I_CTIME_QUERIED,
> > > };
> > > }
> > >
> > > similar to what inode_peek_iversion does for iversion; and then
> > > xfs_inode_to_disk can do:
> > >
> > > to->di_ctime = xfs_inode_to_disk_ts(ip, inode_peek_ctime(inode));
> > >
> > > which would prevent I_CTIME_QUERIED from going out to disk.
> > >
> > > At load time, xfs_inode_from_disk uses inode_set_ctime_to_ts so I think
> > > xfs won't accidentally introduce QUERIED when it's loading an inode from
> > > disk.
> > >
> > >
> >
> > Also already done in this patchset:
> >
> > struct timespec64 inode_set_ctime_to_ts(struct inode *inode, struct timespec64 ts)
> > {
> > inode->i_ctime_sec = ts.tv_sec;
> > inode->i_ctime_nsec = ts.tv_nsec & ~I_CTIME_QUERIED;
> > trace_inode_set_ctime_to_ts(inode, &ts);
> > return ts;
> > }
> > EXPORT_SYMBOL(inode_set_ctime_to_ts);
> >
> > Basically, we never want to store or fetch the QUERIED flag from disk,
> > and since it's in an unused bit, we can just universally mask it off
> > when dealing with "external" users of it.
> >
> > One caveat -- I am using the sign bit for the QUERIED flag, so I'm
> > assuming that no one should ever pass inode_set_ctime_to_ts a negative
> > tv_nsec value.
> >
> > Maybe I should add a WARN_ON_ONCE here to check for that? It seems
> > nonsensical, but you never know...
>
> Well in theory filesystems should validate incoming timestamps and
> reject tv_nsec with the high bit set, but I'd bet there's a filesystem
> out there that allows negative nanoseconds, even if the kernel will
> never pass it such a thing. ;)
>

Hmm, in that case, we probably should normalize the timestamp in this
function before setting the ctime to it. That way we can ensure that
the bit will be meaningless when we use it. I think the kernel has a
way to do that. I'll take a look tomorrow.

Thanks!

> > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > > ---
> > > > fs/xfs/libxfs/xfs_trans_inode.c | 6 +++---
> > > > fs/xfs/xfs_iops.c | 10 +++-------
> > > > fs/xfs/xfs_super.c | 2 +-
> > > > 3 files changed, 7 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
> > > > index 69fc5b981352..1f3639bbf5f0 100644
> > > > --- a/fs/xfs/libxfs/xfs_trans_inode.c
> > > > +++ b/fs/xfs/libxfs/xfs_trans_inode.c
> > > > @@ -62,12 +62,12 @@ xfs_trans_ichgtime(
> > > > ASSERT(tp);
> > > > xfs_assert_ilocked(ip, XFS_ILOCK_EXCL);
> > > >
> > > > - tv = current_time(inode);
> > > > + /* If the mtime changes, then ctime must also change */
> > > > + ASSERT(flags & XFS_ICHGTIME_CHG);
> > > >
> > > > + tv = inode_set_ctime_current(inode);
> > > > if (flags & XFS_ICHGTIME_MOD)
> > > > inode_set_mtime_to_ts(inode, tv);
> > > > - if (flags & XFS_ICHGTIME_CHG)
> > > > - inode_set_ctime_to_ts(inode, tv);
> > > > if (flags & XFS_ICHGTIME_CREATE)
> > > > ip->i_crtime = tv;
>
> And as I mentioned elsewhere in this thread, 6.11 contains a change to
> make it so that xfs_trans_ichgtime can set the access time. That breaks
> the old assertion that XFS_ICHGTIME_CHG is always set, but I think we
> can work around that easily.
>
> if (flags & XFS_ICHGTIME_CHG)
> tv = inode_set_ctime_current(inode);
> else
> tv = current_time(inode);
>
> --D
>
> > > > }
> > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > > index a00dcbc77e12..d25872f818fa 100644
> > > > --- a/fs/xfs/xfs_iops.c
> > > > +++ b/fs/xfs/xfs_iops.c
> > > > @@ -592,8 +592,9 @@ xfs_vn_getattr(
> > > > stat->gid = vfsgid_into_kgid(vfsgid);
> > > > stat->ino = ip->i_ino;
> > > > stat->atime = inode_get_atime(inode);
> > > > - stat->mtime = inode_get_mtime(inode);
> > > > - stat->ctime = inode_get_ctime(inode);
> > > > +
> > > > + fill_mg_cmtime(stat, request_mask, inode);
> > > > +
> > > > stat->blocks = XFS_FSB_TO_BB(mp, ip->i_nblocks + ip->i_delayed_blks);
> > > >
> > > > if (xfs_has_v3inodes(mp)) {
> > > > @@ -603,11 +604,6 @@ xfs_vn_getattr(
> > > > }
> > > > }
> > > >
> > > > - if ((request_mask & STATX_CHANGE_COOKIE) && IS_I_VERSION(inode)) {
> > > > - stat->change_cookie = inode_query_iversion(inode);
> > > > - stat->result_mask |= STATX_CHANGE_COOKIE;
> > > > - }
> > > > -
> > > > /*
> > > > * Note: If you add another clause to set an attribute flag, please
> > > > * update attributes_mask below.
> > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > index 27e9f749c4c7..210481b03fdb 100644
> > > > --- a/fs/xfs/xfs_super.c
> > > > +++ b/fs/xfs/xfs_super.c
> > > > @@ -2052,7 +2052,7 @@ static struct file_system_type xfs_fs_type = {
> > > > .init_fs_context = xfs_init_fs_context,
> > > > .parameters = xfs_fs_parameters,
> > > > .kill_sb = xfs_kill_sb,
> > > > - .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
> > > > + .fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME,
> > > > };
> > > > MODULE_ALIAS_FS("xfs");
> > > >
> > > >
> > > > --
> > > > 2.45.2
> > > >
> >
> > --
> > Jeff Layton <jlayton@xxxxxxxxxx>

--
Jeff Layton <jlayton@xxxxxxxxxx>

Next message: Eric Chan: "Re: [PATCH 2/3] kunit: Rename KUNIT_ASSERT_FAILURE to KUNIT_ASSERT for readability"
Previous message: Eric Chan: "[PATCH v2 3/3] kunit: Introduce KUNIT_ASSERT_MEMEQ and KUNIT_ASSERT_MEMNEQ macros"
In reply to: Darrick J. Wong: "Re: [PATCH v5 6/9] xfs: switch to multigrain timestamps"
Next in thread: Jeff Layton: "[PATCH v5 7/9] ext4: switch to multigrain timestamps"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]