Re: [PATCH] rtc: Add an option to invalidate dates in 2038
From: Arnd Bergmann
Date: Sat Feb 20 2016 - 17:18:17 EST
On Saturday 20 February 2016 21:47:15 Alexandre Belloni wrote:
>
> Actually, I'm not trying to solve the 2038 issue.
>
> But in the current state on 32 bit platforms, while the kernel is able
> to handle a 64bit date, userspace is not. The main issue is that
> distributions use HCTOSYS so if the RTC is set to a date after 2038
> (which we know is currently bogus), the kernel will set a system time to
> that date.
>
> This result in a system that fails when using timerfd, The timerfd will
> always fire immediately (until, as some people pointed out, we have
> relative timers).
>
> This is know to break systemd [1] but I have had reports for other
> failing applications.
>
> I understand this is a workaround and I plan to switch the default to n
> once libc and critical userspace is able to handle 64 bit time.
>
> The other way of solving that is to get back to a 32 bit time_t
> internally until we switch the whole userspace to a 64 bit time_t but I
> don't think this is practical.
>
> [1] https://github.com/systemd/systemd/issues/1143
>
I think in both cases you introduce a new 2038 problem though:
as long as you have a kernel that tries to support an old
32-bit systemd build, the kernel becomes incompatible with RTC
times beyond 2038, even on 64-bit systems and 32-bit systems
that have fixed system call table and fixed user space.
This is bad because it means we still have to break systemd
eventually in order to fix the 2038 overflow.
The plan to revert this after glibc has been converted is
problematic because a lot of 32-bit distros will likely never
recompile with 64-bit time_t in order to avoid breaking
backwards compatibility. While we could require that user
space and kernel must match here (either support 64-bit time_t
everywhere or nowhere), that makes it much harder to deal
with the migration, and it has always been a strict requirement
that none of the changes for y2038 compatibility break existing
user space (which of course is what happened for RTC and what
we need to fix here).
Has the problem of random RTC times been observed on more than
one RTC driver yet? Maybe we can just apply your workaround
to that one driver that saw it instead.
Have you figured out whether there is a pattern in the reported
times? Is it just completely random or could we perhaps
detect an RTC that reports an invalid time other than by
looking at the year?
Arnd