Re: [PATCH v7 12/13] ext4: switch to multigrain timestamps

From: Chuck Lever III
Date: Wed Sep 20 2023 - 10:02:31 EST




> On Sep 20, 2023, at 7:48 AM, Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
>>>> While we initially thought we can do this unconditionally it turns out
>>>> that this might break existing workloads that rely on timestamps in very
>>>> specific ways and we always knew this was a possibility. Move
>>>> multi-grain timestamps behind a vfs mount option.
>>>
>>> Surely this is a safe choice as it moves the responsibility to the sysadmin
>>> and the cases where finegrained timestamps are required. But I kind of
>>> wonder how is the sysadmin going to decide whether mgtime is safe for his
>>> system or not? Because the possible breakage needn't be obvious at the
>>> first sight...
>>>
>>
>> That's the main reason I really didn't want to go with a mount option.
>> Documenting that may be difficult. While there is some pessimism around
>> it, I may still take a stab at just advancing the coarse clock whenever
>> we fetch a fine-grained timestamp. It'd be nice to remove this option in
>> the future if that turns out to be feasible.
>>
>>> If I were a sysadmin, I'd rather opt for something like
>>> finegrained timestamps + lazytime (if I needed the finegrained timestamps
>>> functionality). That should avoid the IO overhead of finegrained timestamps
>>> as well and I'd know I can have problems with timestamps only after a
>>> system crash.
>>
>>> I've just got another idea how we could solve the problem: Couldn't we
>>> always just report coarsegrained timestamp to userspace and provide access
>>> to finegrained value only to NFS which should know what it's doing?
>>>
>>
>> I think that'd be hard. First of all, where would we store the second
>> timestamp? We can't just truncate the fine-grained ones to come up with
>> a coarse-grained one. It might also be confusing having nfsd and local
>> filesystems present different attributes.
>
> As far as I can tell we have two options. The first one is to make this
> into a mount option which I really think isn't a big deal and lets us
> avoid this whole problem while allowing filesytems exposed via NFS to
> make use of this feature for change tracking.

A mount option isn't hard to implement, but I think it would be a
mistake.

As Jan pointed out, the two alternative compromises are often very
difficult to choose between. Tossing this decision to administrators
doesn't seem like a responsible way to handle a question that might
result in, at the least, unexpected behavior, and at worst, data
corruption.

Plus, on Linux, often times files are accessed locally on NFS servers
as well as remotely -- how does the server's administrator pick the
correct setting in that case?


> The second option is that we turn off fine-grained finestamps for v6.6
> and you get to explore other options.

You could put it behind an EXPERIMENTAL Kconfig option so that the
code stays in and can be used by the brave or foolish while it is
still being refined.


> It isn't a big deal regressions like this were always to be expected but
> v6.6 needs to stabilize so anything that requires more significant work
> is not an option.


--
Chuck Lever