Re: [PATCH RFC] fs: turn inode->i_ctime into a ktime_t
From: Jeff Layton
Date: Sun May 26 2024 - 11:55:37 EST
On Sun, 2024-05-26 at 11:31 -0400, Theodore Ts'o wrote:
> On Sun, May 26, 2024 at 08:20:16AM -0400, Jeff Layton wrote:
> >
> > Switch the __i_ctime fields to a single ktime_t. Move the i_generation
> > down above i_fsnotify_mask and then move the i_version into the
> > resulting 8 byte hole. This shrinks struct inode by 8 bytes total, and
> > should improve the cache footprint as the i_version and __i_ctime are
> > usually updated together.
>
> So first of all, this patch is a bit confusing because the patch
> doens't change __i_ctime, but rather i_ctime_sec and i_ctime_nsec, and
> Linus's tree doesn't have those fields. That's because the base
> commit in the patch, a6f48ee9b741, isn't in Linus's tree, and
> apparently this patch is dependent on "fs: switch timespec64 fields in
> inode to discrete integers"[1].
>
> [1] https://lore.kernel.org/all/20240517-amtime-v1-1-7b804ca4be8f@xxxxxxxxxx/
>
My bad. I should have mentioned that in the changelog, and I'll clean
up the title.
> > The one downside I can see to switching to a ktime_t is that if someone
> > has a filesystem with files on it that has ctimes outside the ktime_t
> > range (before ~1678 AD or after ~2262 AD), we won't be able to display
> > them properly in stat() without some special treatment. I'm operating
> > under the assumption that this is not a practical problem.
>
> There are two additional possible problems with this. The first is
> that if there is a maliciously fuzzed file system with timestamp
> outside of ctimes outside of the ktime_t range, this will almost
> certainly trigger UBSAN warnings, which will result in Syzkaller
> security bugs reported to file system developers. This can be fixed
> by explicitly and clamping ranges whenever converting to ktime_t in
> include/linux/fs.h, but that leads to another problem.
>
There should be no undefined behavior since this is using ktime_set.
> The second problem is if the file system converts their on-disk inode
> to the in-memory struct inode, and then converts it from the in-memory
> to the on-disk inode format, and the timestamp is outside of the
> ktime_t range, this could result the on-disk inode having its ctime
> field getting corrupted. Now, *most* of the time, whenver the inode
> needs to be written back to disk, the ctime field will be changed
> anyway. However, if there are changes that don't result
> userspace-visible changes, but involves internal file system changes
> (for example, in case of an online repair or defrag, or a COW snap),
> where the file system doesn't set ctime --- and in it's possible that
> this might result in the ctime field messed.
>
>
> We could argue that ctime fields outside of the ktime_t range should
> never, ever happen (except for malicious fuzz testing by systems like
> syzkaller), and so it could be argued that we don't care(tm), but it's
> worth at least a mention in the comments and commit description. Of
> course, a file system which *did* care could work around the problem
> by having their own copy of ctime in the file-specific inode, but this
> would come at the cost of space saving benefits of this commit.
>
My assumption was that we'd not overwrite an on-disk inode unless we
were going to stamp the ctime as well. That's not 100% true (as you
point out). I guess we have to decide whether we care about preserving
the results with this sort of intentional fuzz testing.
>
> I'd suspect what I'd do is to add a manual check for an out of range
> ctime on-disk, log a warning, and then clamp the ctime to the maximum
> ktime_t value, which is what would be returned by stat(2), and then
> write that clamped value back to disk if the ctime field doesn't get
> set to the current time before it needs to be written back to disk.
>
ktime_set does this:
if (unlikely(secs >= KTIME_SEC_MAX))
return KTIME_MAX;
So, this should already clamp the value to KTIME_MAX, at least as long
as the seconds value is normalized when called. We certainly could have
this do a pr_warn or pr_notice too if the value comes back as
KTIME_MAX.
Thanks for taking a look, Ted!
--
Jeff Layton <jlayton@xxxxxxxxxx>