Re: Apparent backward time travel in timestamps on file creation

From: Bob Peterson
Date: Fri Mar 31 2017 - 08:35:33 EST


----- Original Message -----
| On Thu, Mar 30, 2017 at 1:13 PM, David Howells <dhowells@xxxxxxxxxx> wrote:
| > Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
| >
| >> The error bar can be huge, for the simple reason that the filesystem
| >> you are testing may not be sharing a clock with the CPU at _all_.
| >>
| >> IOW, think network filesystems.
| >
| > Can't I just not do the tests when the filesystem is a network fs? I don't
| > think it should be a problem for disk filesystems on network-attached
| > storage.
|
| So I actually think that the whole "check timestamps" would be
| interesting as a test across a lot of filesystems - including very
| much network filesystems - but I think such a test should be largely
| informational rather than about correctness.
|
| Now, there definitely are correctness issues too wrt file timestamps
| (ie the whole "writes should update mtime" kind of testing), and I
| think many of those cound be extended to check relative timestamps on
| the same filesystem. For example, if you write to file A first, and to
| file B second, it would certainly be odd and interesting if file B now
| has a modification time that is before file A.

This can happen, and it's not just network file systems. This issue
is also a concern of GFS2 where we have shared storage. We like to think
ntp will keep things relatively sane, but still, we've had issues in the
past where time discrepancies have caused confusion:

File X is created on node 1, but due to clock drift, node 2 sees that
file as having been created in the future, etc.

It's even more worrisome outside the kernel where software (e.g. in the
past, parts of the cluster infrastructure) would calculate negative
time values, interpret them as an "nearly infinite amount of time" having
passed, and then various watchdogs nuking nodes.

I remember a long time ago someone was up in arms because of weird
effects they were seeing, and it boiled down to not using any time sync
and one of their cluster nodes had the wrong month, or some such.

Regards,

Bob Peterson
Red Hat File Systems