Re: [RFC][PATCH 4/4] time: Do leapsecond adjustment in gettime fastpaths

From: Richard Cochran
Date: Sun May 31 2015 - 12:05:57 EST


On Fri, May 29, 2015 at 01:24:28PM -0700, John Stultz wrote:
> Apologies to Richard Cochran, who pushed for such a change
> years ago, which I resisted due to the concerns about the
> performance overhead.

For the record, I got the idea from Michel Hack of IBM.

> While I suspect this isn't extremely critical, folks who
> care about strict leap-second correctness will likely
> want to watch this, and it will likely be a -stable candidate.

I think this is a step in the right direction. If the 'next_leap_sec'
is made available to the vdso, then the 1-10 ms time error could also
be prevented there.

I have some comments, but, as is, feel free to add my ack.

Acked-by: Richard Cochran <richardcochran@xxxxxxxxx>

> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index 472591e..6e15fbb 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c

> @@ -359,6 +364,33 @@ u64 ntp_tick_length(void)
> return tick_length;
> }
>
> +/**
> + * get_leap_state - Returns the NTP leap state
> + * @next_leap_sec: Next leapsecond in time64_t
> + * @next_leap_ktime: Next leapsecond in ktime_t
> + *
> + * Provides NTP leapsecond state. Returns direction
> + * of the leapsecond adjustment as an integer.
> + */
> +int get_leap_state(time64_t *next_leap_sec, ktime_t *next_leap_ktime)
> +{
> + int dir;
> +
> + if ((time_state == TIME_INS) && (time_status & STA_INS)) {

This can be reduced to just one test on (time_state == TIME_INS).
If user spaces clears STA_INS, then you can immediately cancel the
leap second.

> + dir = -1;
> + *next_leap_sec = ntp_next_leap_sec;
> + *next_leap_ktime = ktime_set(ntp_next_leap_sec, 0);
> + } else if ((time_state == TIME_DEL) && (time_status & STA_DEL)) {
> + dir = 1;
> + *next_leap_sec = ntp_next_leap_sec;
> + *next_leap_ktime = ktime_set(ntp_next_leap_sec, 0);
> + } else {
> + dir = 0;
> + *next_leap_sec = TIME64_MAX;
> + next_leap_ktime->tv64 = KTIME_MAX;
> + }
> + return dir;
> +}
>
> /*
> * this routine handles the overflow of the microsecond field

> @@ -382,15 +414,21 @@ int second_overflow(unsigned long secs)
> */
> switch (time_state) {
> case TIME_OK:
> - if (time_status & STA_INS)
> + if (time_status & STA_INS) {

The user sets STA_INS via adjtimex, but we don't change to TIME_INS
until the next tick. Why not change immediately? Then this funtion
would only need to check for TIME_INS && (secs % 86400 == 0) and the
very unlikey TIME_DEL.

> time_state = TIME_INS;
> - else if (time_status & STA_DEL)
> + ntp_next_leap_sec = secs + SECS_PER_DAY -
> + (secs % SECS_PER_DAY);
> + } else if (time_status & STA_DEL) {
> time_state = TIME_DEL;
> + ntp_next_leap_sec = secs + SECS_PER_DAY -
> + ((secs+1) % SECS_PER_DAY);
> + }
> break;
> case TIME_INS:
> - if (!(time_status & STA_INS))
> + if (!(time_status & STA_INS)) {
> + ntp_next_leap_sec = TIME64_MAX;
> time_state = TIME_OK;
> - else if (secs % 86400 == 0) {
> + } else if (secs % SECS_PER_DAY == 0) {
> leap = -1;
> time_state = TIME_OOP;
> printk_deferred(KERN_NOTICE

> @@ -711,6 +752,24 @@ int __do_adjtimex(struct timex *txc, struct timespec64 *ts, s32 *time_tai)
> if (!(time_status & STA_NANO))
> txc->time.tv_usec /= NSEC_PER_USEC;
>
> + /* Handle leapsec adjustments */

This block and its commnet rather confused me. What this code
actually does is fix up the time value returned to the caller of
adjtimex, but only in the 1-10 millisecond window before the leap
second tick.

> + if (unlikely(ts->tv_sec >= ntp_next_leap_sec)) {
> + if ((time_state == TIME_INS) && (time_status & STA_INS)) {
> + result = TIME_OOP;
> + txc->tai++;
> + txc->time.tv_sec--;
> + }
> + if ((time_state == TIME_DEL) && (time_status & STA_DEL)) {
> + result = TIME_WAIT;
> + txc->tai--;
> + txc->time.tv_sec++;
> + }
> + if ((time_state == TIME_OOP) &&
> + (ts->tv_sec == ntp_next_leap_sec)) {
> + result = TIME_WAIT;
> + }
> + }
> +
> return result;
> }

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/