Re: Bug: Status/Summary of slashdot leap-second crash on new years2008-2009

From: David Newall
Date: Sat Jan 10 2009 - 04:46:33 EST


Alan Cox wrote:
>> Is there a mktime() in the kernel? Isn't it pure user-space? Mktime
>> does appear to know all about leap seconds (assuming they're in zoneinfo.)
>>
>
> The GPL goes to great trouble to ensure you get the kernel source code.
> Why not use it.
>

Okay. I'm not sure how long you have realised that two, completely
different mktimes have been confused with each other, but surely longer
than me.

The kernel mktime, as far as I can tell without spending a week on it,
is used only on some platforms, during startup to read and set real time
clocks and alarms. Where it matters is RTC ioctls which, tragically, I
think, are passed a struct rtc_time. They should be passed a time_t or
struct timeval because these are what's used everywhere else that I can
think of, between kernel and user-space.

Ideally, struct rtc_time should be deprecated in favour of time_t.
Changes to user-space programs should be trivial; probably, they
currently look like

{ struct rtc_time *rt = gmtime(&t); ioctl(fd, RTC_xxx, rt); }


This is not going to happen without a huge song and dance, which I
certainly don't have the energy for. I think it makes no practical
difference, and only affects how tidy the kernel looks. Assuming
leap-seconds are properly configured if zoneinfo, user-space programs
which use gmtime() to set the RTC will run it fast by the current number
of leap-seconds. However the RTC will continue to advance by one second
per second, and being that fast, mktime will produce the correct time_t
when RTC is read back. This will eventually cause a real problem with
RTCs that handle leap years. When we have 4 years worth of
leap-seconds, or maybe its 96 years worth, the RTC will be set for a
leap-year when it is not, or vice versa. That's a long time away.

It's something of a farce that some systems crashed at the leap-second
because no adjustment was needed. Trying to turn two seconds into one
was a mistake. I gather the mistake is in the NTP client, which should
just ignore the LEAP-SECOND bit, and use the leap-second information
from zoneinfo to convert from the NTP timebase to Linux's.

>> showing "#15 0xffffffff8104ec16 in ntp_leap_second (timer=<value
>> optimized out>) at kernel/time/ntp.c:143". That would be kernel code to
>> process leap seconds from NTP broadcasts, I think. That code needs to
>> be removed.
>>
>
> I suggest you read that code and understand it.
>

Well, there's rather a lot wrong with it, isn't there? All of the stuff
that tries to handle leap seconds is wrong; that goes for timex.h, too.
The kernel needs to do nothing special to handle leap-seconds; they're
just seconds, like every other one.

For the third time, this code has to come out.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/