Re: [Y2038] [RFC 02/15] vfs: Change all structures to support 64 bit time

From: Arnd Bergmann
Date: Thu Jan 14 2016 - 11:53:31 EST


On Thursday 14 January 2016 08:04:36 Dave Chinner wrote:
> On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote:
> > On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote:
> > > On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote:
> > > > > On Jan 11, 2016, at 04:33, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > > > >> On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote:
> >
> > 2. How to achieve a seamless transition?
> > Is inode_timespec solution agreed upon to achieve 1a?
>
> No. Just convert direct to timespec64.

The hard part here is how to split that change into logical patches
per file system. We have already discussed all sorts of ways to
do that, but there is no ideal solution, as you usually end up
either having some really large patches, or you have to modify
the same lines multiple times.

The most promising approaches are:

a) In Deepa's current patch set, some infrastructure is first
introduced by changing the type from timespec to an identical
inode_timespec, which lets us convert one file system at a time
to inode_timespec and then change the type once they are all
done. The downside is then that all file systems have to get
touched twice so we end up with timespec64 everywhere.

b) A variation of that which I would do is to use have a smaller
set of infrastructure first, so we can change one file system
at a time to timespec64 while leaving the common structures to
use timespec until all file systems are converted. The downside
is the use of some conversion macros when accessing the times
in the inode.
When the common code is changed, those accessor macros get
turned into trivial assignments that can be removed up later
or changed in the same patch.

c) The opposite direction from b) is to first change the common
code, but then any direct assignment between a timespec in
a file system and the timespec64 in the inode/iattr/kstat/etc
first needs a conversion helper so we can build cleanly,
and then we do one file system at a time to remove them all
again while changing the internal structures in the
file system from timespec to timespec64.

> > An alternate approach is included in the cover letter.
> > 3. policy for handling out of range timestamps:
> > There was no conclusion on this from the previous series as noted in the
> > cover letter.
> > a. sysadmin through sysctl (Arnd's suggestion)
> > b. have default vfs handlers with an option for individual fs to override.
> > c. clamp and ignore
>
> I think it's a mix - if the timestamps come in from userspace,
> fail with ERANGE. That could be controlled by sysctl via VFS
> part of the ->setattr operation, or in each of the individual FS
> implementations. If they come from the kernel (e.g. atime update) then
> the generic behvaiour is to warn and continue, filesystems can
> otherwise select their own policy for kernel updates via
> ->update_time.

I'd prefer not to have it done by the individual file system
implementation, so we get a consistent behavior. Normally you either
care about correct time stamps, or you care about interoperability
and you don't want to have errors returned here.

It could be done per mount, but that seems overly complicated
for rather little to be gained.

> > d. disable expired fs at compile time (Arnd's suggestion)
>
> Not really an option, because it means we can't use filesystems that
> interop with other systems (e.g. cameras, etc) because they won't
> support y2038k timestamps for a long time, if ever (e.g. vfat).

Let me clarify what my idea is here: I want a global kernel option
that disables all code that has known y2038 issues. If anyone tries
to build an embedded system with support beyond 2038, that should
disable all of those things, including file systems, drivers and
system calls, so we can reasonably assume that everything that works
today with that kernel build will keep working in the future and
not break in random ways.

For a file system, this can be done in a number of ways:

* Most file systems today interpret the time as an unsigned 32-bit
number (as opposed to signed as ext3, xfs and few others do),
so as long as we use timespec64 in the syscalls, we are ok.

* Some legacy file systems (maybe hfs) can remain disabled, as
nobody cares about them any more.

* If we still care about them (e.g. ext2), we can make them support
only read-only mode. In ext4, this would mean forbidding write
access to file systems that don't have the extended inode format
enabled.

Normal users that don't care about not breaking in 2038 obviously
won't set the option, and have the same level of backwards compatibility
support as today.

> > > > The problem really is that
> > > > there is more than one way of updating these attributes(timestamps in
> > > > this particular case). The side effect of this is that we don't always
> > > > call timespec_trunc() before assigning timestamps which can lead to
> > > > inconsistencies between on disk and in memory inode timestamps.
> > >
> > > That's a problem that can be fixed independently of y2038 support.
> > > Indeed, we can be quite lazy about updating timestamps - by intent
> > > and design we usually have different timestamps in memory compared
> > > to on disk, which is one of the reasons why there are so many
> > > different ways to change and update timestamps....
> >
> > This has nothing to do with lazy updates.
> > This is about writing wrong granularities and non clamped values to
> > in-memory inode.
>
> Which really shouldn't happen because we should be clamping and/or
> truncating timestamps at the creation/entry point into the
> VFS/filesystem.
>
> e.g. current_fs_time(sb) is how filesystems grab the current kernel
> time for timestamp updates. Add an equivalent current_fs_time64(sb)
> to do return timespec64 and do clamping and limit warning, and now
> you have a simple vehicle for converting the VFS and filesystems to
> support y2038k clean date formats.

I think the current patch series does this already.

> If there are places where filesystems are receiving or using
> unchecked timestamps then those are bugs that need fixing. Those
> need to be in separate patches to y2038k support...

Fair enough, but that probably means that patch series will have to
come first. This will also reduce the number of places in which a
separate type conversion function needs to be added.

Arnd