Re: [GIT PULL] y2038 changes for vfs
From: Arnd Bergmann
Date: Wed May 25 2016 - 12:03:27 EST
On Tuesday, May 24, 2016 3:23:39 PM CEST Linus Torvalds wrote:
> On Tue, May 24, 2016 at 1:11 PM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > The following changes since commit bf16200689118d19de1b8d2a3c314fc21f5dc7bb:
> >
> > Linux 4.6-rc3 (2016-04-10 17:58:30 -0700)
> >
> > are available in the git repository at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git tags/y2038-4.7
>
> The more I look at this, the less I like it.
>
> There doesn't even seem to be any *point* to the preparatory patches.
> I'm not seeing what any of those patches actually help prepare. The
> two new superblock fields that it adds, for example, should likely
> never be touched directly by any code in the first place, so adding
> them only encourages people to add more "preparatory" patches to
> filesystems that simply don't seem sensible.
The minimum and maximum times in the superblock are needed to
implement a proper policy for dealing with limited file systems:
Today, using 'utimes' on a file to set a date in the distant
future or past usually results in a silent overflow with a more
or less random time. We want to replace that with a sane behavior
and either fail with an error for out of range times or truncate
the the earliest or latest times that are supported by the
respective file system. Whatever we decide to do there however
requires knowing those times, and storing them in the superblock
seems much easier than adding code to each file system for checking
them.
The other point is being able to refuse writable mounts to file
systems that are close to overflowing. If someone wants to ship
a system that has longterm support beyond 2038, then any file system
that cannot store later time stamps must already not be mountable
so we can prevent it from behaving inconsistently later.
Again, the specific policy and how it's controlled (mount option,
compile-time, or sysctl to refuse the mount, or simply printing
a warning) is to be decided later, but we have to know the limits
at mount time in order to do it.
> It's not clear we want a
> seconds-based interface there, when in many ways ktime_t is much
> *much* preferable for internal kernel representations for the next
> hundred years or so.
>
> For example, preparing to replace CURRENT_TIME_SEC with
> current_fs_time_sec() is going to be a huge big patch replacing every
> single user *anyway* due to the addition of the superblock parameter.
Adding current_fs_time_sec() instead of just using current_fs_time()
is just to preserve the existing performance behavior. Reading
the current seconds value is a bit faster than reading
seconds+nanoseconds (it doesn't require computing the exact
nanosecond value), and much faster than computing the seconds
from a ktime_t. Note that most file systems don't do sub-second
timestamps at all.
> And since we'd have to change the type in the inode, that will be a
> flag-day anyway.
It's not really a flag day, after the initial per-filesystem
patches, a fairly manageable patch changes the VFS data structures
and the few things that cannot be done separately before:
https://github.com/deepa-hub/vfs/commit/09323fb679c27694475d9c12992782dcba69c493
> So I'm not seeing real advantages to the prep-work. What does it
> actually *help*? I'd have seen more point to it if it had actually
> converted all the existing CURRENT_TIME_SEC cases, and basically said
> "the code is now syntactically ready to start using per-sb limits".
>
> I don't much see the point of a preparatory patch that just paves the
> way for a hundred other small pointless one-liner patches, when it
> shouldn't be a problem to just do it in one go.
>
> Just as an example: code that does
>
> dir->i_mtime = dir->i_ctime = CURRENT_TIME_SEC;
>
> could pretty mechanically be converted to
>
> dir->i_mtime = dir->i_ctime = current_fs_time(sb);
>
> and there really is only about a hundred of those. THAT would be a
> preparatory patch that actually adds value.
>
> IOW, do it as one single patch that gets rid of a bad interface, not
> as "one pointless preparatory patch that than makes it possible to
> make a hundred other pointless patches to use it".
>
> It's not like it's hard to compile-test the pretty mechanical
> conversion. There are no architecture-specific users, so I suspect
> that a trivial "make allmodconfig" build will catch all the cases.
>
> Why drag something like this out, in other words?
The vfs_time_to_timespec/timespec_to_vfs_time accessors and the
s_time_min/s_time_max patch are really the ones that make most
sense doing per file system. These are still all really simple
patches, but it seemed logical to keep all three together and then
go through each file system one by one. The hard part here is
really catching the attention of the file system maintainers,
not doing the patches.
Arnd