Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

From: John Stultz
Date: Thu Jun 04 2015 - 20:08:19 EST


On Wed, Jun 3, 2015 at 11:48 PM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> * John Stultz <john.stultz@xxxxxxxxxx> wrote:
>>
>> But more importantly, this change to the read path prevents timers that may be
>> expired before update_wall_time timer runs (most likely on other cpus) from
>> being expired early. Since the time read that is used by the hrtimer expiration
>> logic is adjusted properly right on that edge.
>
> But with leap second smearing we'd have the same benefits and some more.

I'm not sure how this follows. Leap smearing is a different behavior
and I'd like to support it (as a separate clockid) as well, but adding
that support doesn't change that the existing behavior applying the
leapsecond to UTC/CLOCK_REALTIME via a timer results in behavior that
isn't strictly correct.

>> > Especially as second smearing appears to be the way superior future method of
>> > handling leap seconds.
>>
>> So here the problem is it depends on the user. For probably most users, who
>> really don't care, the leap-smear is ideal behavior for CLOCK_REALTIME (I think
>> leap-smears causing any change to other clockids would be surprising). However,
>> there are some users who expect posix UTC leapsecond behavior. Either because
>> they're positioning telescopes doing things that do depend on strict solar time,
>> or because they are required (in some cases by law) to use UTC.
>
> That usecase is perfectly OK: they can use the old leap second logic.

Which again, the current leap second logic has slight behavioral
issues the patch I'm proposing is trying to fix.


> What I argue is that we should not add significant complexity to logic that is
> fragile already as-is, and which runs at most only once per year.
>
>> I don't think we can just abandon/break those users, for leap-smearing. So I
>> don't know if we can get away from that complexity.
>
> That's a false dichotomy - I'm not suggesting to 'abandon' them.
>
> I'm suggesting one of these options:
>
> - Keep the current leap second code as-is, because it's proven. Concentrate on
> leap second smearing instead that is superior and which might emerge as a
> future standard.

I understand that you're saying "focus on the future", which is good
advice. But if the objection is complexity, adding leap-smearing
isn't going to reduce complexity.

I'd also argue the current leap second code isn't quite "proven". Or
at least it didn't prove itself well last time around. Those major
issues have been addressed, but the strict correctness issue where the
leap second is done at timer time, instead of the second edge is not
currently addressed.

> - or make leap second insertion much more obvious and easier to verify (i.e. not
> a +100 LOC delta!)

I can work on simplifying and compressing some of the state to try to
reduce the line count and make it easier to verify. But again,
applying the leap-second at the exact second edge means we have to
mimic the state transition in the various read paths, which is going
to still be a non-trivial patch.


> - or make leap second handling a part of some other existing facility that is
> used much more frequently: for example have you considered making it a
> special, kernel-internal case of settimeofday()? If settimeofday was
> implemented in a fashion to set the time at a given 'time edge', then
> thousands of systems would test that facility day in and out. Leap second
> insertion would simply be a special settimeofday() call that sets the time
> back by one second at midnight June 30 or so. Normal settimeofday() would be a
> call that sets the time back (or forward) at the next seconds edge right now -
> but it would fundamentally use the same facility.
>
> and yes, this variant would necessarily complicate settimeofday(), but that's
> OK, as it's used so frequently so we have adequate testing of it.

I will have to think more about this, but initially it doesn't seem to
make much sense to me (again, I still worry that the specifics of the
issue aren't clear). settimeofday() calls happen immediately, and
under the timekeeping write lock. The issues here with leap seconds is
that we are supposed to make a second repeat exactly on the second
boundary. Since nothing asynchronous can be reliably done at a
specific time, the only way to have correct behavior is to handle the
leap adjustment in the read path, basically checking if we have
crossed that edge, and if so, setting the time we return back by one
second. For the adjtime case its more complicated because we also have
to return other state data (leap-in-progress TIME_OOP flag, or after
the leapsecond is over, the TIME_WAIT flag) which we have to adjust on
that edge as well.

The larger point of finding a way to ensure the code paths are tested
well in an everyday fashion is a good one. I'm just not sure how to
make that happen.

(One could argue having ntp apply the leapsecond from userspace via
settimeofday() rather then having the kernel do it would be more along
this line of though, but again the problem there is userspace can't
ensure something happens at a specific time, so it has the same issues
of being "late" in applying the leap adjustment that the kernel
currently has).

> A third facility would effectiely be merged with this as well: when NTP
> adjusts large offsets (when running with -g, etc.) it will use settimeofday(),
> right?

So.. it depends, a method has been added to adjtimex to allow clock
syncing applications to inject time offsets, rather then trying to set
the time absolutely to correct for error (since the calculation of
setting the time absolutely is inherently racy). But again, those
mechanisms happen immediately and don't have the same behavioral
expectation that leap seconds have.

> I don't think we've exhausted all of these options.

So again, lots of good things to think about, but I'm still not seeing
too much in the way of actionable suggestions yet.

The leap-smear work is interesting, but also I see it as independent
to the issue brought up by Prarit.

In the mean-time, to try to address the complexity concern of this
issue, I am looking at trying to simplify some of the NTP state
handling, per Richard's comment, as well as moving that global state
into the timekeeper so we don't have to copy it over so often. This
will hopefully simplify my proposed change a bit (although it will
make it hell for folks trying to backport a fix).

Further changes to avoid duplicate state in the timekeeping structure
(as well as duplicate accessors) will further reduce the footprint of
the change (though I could see Thomas or you objecting to those on
performance grounds).

I can see about also removing support for leap-second removals
(STA_DEL) in the core code, which might reduce the proposed patch by a
few lines.

I'm hoping after that it will appear simple enough. :)

Thanks again for the review and thoughts!
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/