Leap Second, nohz, and ABSOLUTE timers problem

From: Prarit Bhargava
Date: Wed May 27 2015 - 10:55:13 EST


John,

We've identified an issue with ABSOLUTE timers and the leap second that
effects, AFAICT, all kernels.

This is the scenario. Suppose you have a userspace program that has an
ABSOLUTE timer that will expire @ midnight (00:00:00) UTC. The timer expiry
should occur at the end of the leap second event but it does not always occur
at the end. In fact it may occur during the leap second event and effectively
one second too early.

In the NOHZ kernel, timers are not necessarily aligned to the ticks the
following does happen in userspace:

Events:
+++ = second++
--- = second-- (leap second inserted)
EVENT | TIME | Times in the same usecond
| 1434326399.995481 |...................................
[... events deleted ...]
| 1434326399.999993 |........
| 1434326399.999994 |....................................
| 1434326399.999995 |....................................
| 1434326399.999996 |...................................
| 1434326399.999997 |....................................
| 1434326399.999998 |....................................
| 1434326399.999999 |................................... <---- no ticks
yet ...
+++ | 1434326400.000007 |timer.................. <---- ABS timer
fires at wrong time
| 1434326400.000008 |....................................
| 1434326400.000009 |.....................................
| 1434326400.000010 |....................................
| 1434326400.000011 |....................................
| 1434326400.000012 |.....................................
[... events deleted ...]
| 1434326400.000483 |....................................
| 1434326400.000484 |.....................................
| 1434326400.000485 |....................................
| 1434326400.000486 |.....................................
| 1434326400.000487 |..........
--- | 1434326399.000522 |...... <---- now a tick occurs, and leap second
happens
| 1434326399.000523 |.............................
| 1434326399.000524 |...............................
[... events deleted ...]
| 1434326399.000875 |..........................
| 1434326399.000876 |..........................
| 1434326399.000877 |.........

With NO_HZ, I'm not sure there is a way to sync CLOCK_REALTIME & the ticks.
The best suggestion I can come up with is to brute force the ticks such that

(This assumes TIME_INS is set and was tested)

23:59:58.XXX
23:59:59.XXX disable NO_HZ for the next 3 seconds. This will cause a tick every
CONFIG_HZ (in our case every millisecond).
23:59:59.999 Next tick will be leap second event (ie, check for TIME_INS).
Check to see what ABSOLUTE timers will expire at 00:00:00.000 and change them to
RELATIVE to expire in 2 seconds. (probably easier said than done :) )
00:00:00.000 leap second event
23:59:59.000 No timer expiry expected because of change made previously
00:00:00.000 Timers expire
00:00:00.XXX continue on ...
00:00:01.000 re-enable NO_HZ

There is a small chance someone adds a timer @ 23:59:59.999 - 00:00:00.000, but
maybe we also delay that too.

Here's the reproducer which can be compiled with

gcc -o reproducer reproducer.c -lrt


/*
* By Daniel Bristot de Oliveira (bristot@xxxxxxxxxx)
*
* Licensed under the GPLv2.
*
* Based on leap-a-day.c:
* Leap second stress test
* by: John Stultz (john.stultz@xxxxxxxxxx)
* (C) Copyright IBM 2012
* (C) Copyright 2013, 2015 Linaro Limited
* Licensed under the GPLv2
*
* And on customer's reproducer.
*/

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <sys/timex.h>
#include <string.h>
#include <signal.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>

/*
* From leap-a-day.c
*/
const char *time_state_str(int state)
{
switch (state) {
case TIME_OK: return "TIME_OK";
case TIME_INS: return "TIME_INS";
case TIME_DEL: return "TIME_DEL";
case TIME_OOP: return "TIME_OOP";
case TIME_WAIT: return "TIME_WAIT";
case TIME_BAD: return "TIME_BAD";
}
return "ERROR";
}

/*
* From leap-a-day.c
*/
int clear_time_state(void)
{
struct timex tx;
int ret;

/*
* We have to call adjtime twice here, as kernels
* prior to 6b1859dba01c7 (included in 3.5 and
* -stable), had an issue with the state machine
* and wouldn't clear the STA_INS/DEL flag directly.
*/
tx.modes = ADJ_STATUS;
tx.status = STA_PLL;
ret = adjtimex(&tx);

/* Clear maxerror, as it can cause UNSYNC to be set */
tx.modes = ADJ_MAXERROR;
tx.maxerror = 0;
ret = adjtimex(&tx);

/* Clear the status */
tx.modes = ADJ_STATUS;
tx.status = 0;
ret = adjtimex(&tx);

return ret;
}

/*
* Based on leap-a-day.c
*/
void insert_a_leap_in_seconds(unsigned long second)
{
int ret;
struct timespec ts;
struct timex tx;
struct timeval tv;
time_t next_leap;

/* Get the current time */
clock_gettime(CLOCK_REALTIME, &ts);

/* Calculate the next possible leap second 23:59:60 GMT */
next_leap = ts.tv_sec;
next_leap += 86400 - (next_leap % 86400);

tv.tv_sec = next_leap - second;
tv.tv_usec = 0;

settimeofday(&tv, NULL);
printf("Setting time to %s", ctime(&tv.tv_sec));

/* Reset NTP time state */
clear_time_state();

/* Set the leap second insert flag */
tx.modes = ADJ_STATUS;

/* insert */
tx.status = STA_INS;

ret = adjtimex(&tx);
if (ret < 0) {
printf("Error: Problem setting STA_INS!: %s\n",
time_state_str(ret));
exit(1);
}

/* Validate STA_INS was set */
tx.modes = 0;
ret = adjtimex(&tx);
if (tx.status != STA_INS) {
printf("Error: STA_INS not set!: %s\n",
time_state_str(ret));
exit(1);
}

printf("Scheduling leap second for %s", ctime(&next_leap));
}



/*
* Timer functions: set the timer and its handler.
*/
char timer_occurrence;

/*
* 0 = Did not fired
* 1 = Handled
* 2 = It is already known
*/
void sig_handler(int sig)
{
timer_occurrence = 1;
}


void set_timer_in_seconds(unsigned long seconds)
{
timer_t timerid;
struct itimerspec its;
struct timespec ts;
int retval;

signal(SIGALRM, sig_handler);

retval = clock_gettime(CLOCK_REALTIME, &ts);
if (retval)
goto out_error;

/*
* 5 seconds from now is 00:00:00 (next day).
*/
its.it_value.tv_sec = ts.tv_sec + seconds;
its.it_value.tv_nsec = 0;

its.it_interval.tv_sec = 0;
its.it_interval.tv_nsec = 0;

retval = timer_create(CLOCK_REALTIME, NULL, &timerid);
if (retval)
goto out_error;

/*
* Set TIMER_ABSTIME, so, it is not 5 seconds from now
* it is 00:00:00.
*/
retval = timer_settime(timerid, TIMER_ABSTIME, &its, NULL);
if (retval)
goto out_error;

printf("Timer set to %lu:%0.9lu\n",
its.it_value.tv_sec, its.it_value.tv_nsec);
return;

out_error:
printf("Error setting timer: %s", strerror(errno));
exit(1);
}


#define BUFFER_SIZE 200000

/*
* Based on customer's reproducer.
*/
void print_buffer(struct timeval *tvb, unsigned long index,
unsigned long startindex, unsigned long is_full,
unsigned long timer_index)
{
unsigned long i;
unsigned long current;
unsigned long first_usec = 1;
struct timeval tv_aux, tv;

tv_aux = tvb[startindex];

printf("Events: \n\t+++ = second++ \n\t--- = second-- (leap second inserted)\n");
printf("EVENT | TIME | Times in the same usecond\n");
for (i = 0;
i < (is_full ? BUFFER_SIZE - 1 : index - startindex); ++i) {

current = (startindex + i) % BUFFER_SIZE;

tv = tvb[current];

if (tv.tv_sec == tv_aux.tv_sec) {
if (tv.tv_usec == tv_aux.tv_usec) {
if (!first_usec)
printf(".");
goto jump_print;
}
}

if (!first_usec)
printf("\n");

first_usec = 0;

if (tv_aux.tv_sec != tv.tv_sec) {
if (tv.tv_sec < tv_aux.tv_sec)
printf("%s", "--- | ");
else
printf("%s", "+++ | ");
} else
printf(" | ");

printf("%lu.%0.6lu |", tv.tv_sec, tv.tv_usec);


jump_print:
if ((current == timer_index) && (timer_occurrence == 2))
printf("timer");

tv_aux = tv;
}
printf("\n");
}

/*
* Based on customer's reproducer
*/

int main(int argc, char **argv)
{
char step_back = 0;
char is_full = 0;
unsigned long startindex = 0;
unsigned long index = 0;
unsigned long samples_after_step_back = BUFFER_SIZE / 20;
unsigned long after_step_back = 0;
unsigned long timer_index = -1;
unsigned long i;

struct timeval tvb[BUFFER_SIZE];
struct timeval tv_aux, tv, start_time;

/* Timer did not fired yet. */
timer_occurrence = 0;

/*
* Insert a leap second, and set the time to 5 secods before it
*/
insert_a_leap_in_seconds(5);

/*
* Set the timer at ABSTIME 5 seconds ahead:
* 00:00:00 of the next day.
* This "5 seconds" is ambiguous, but the timer is
* create to fire at ABSTIME 00:00:00 of the next day.
*/
set_timer_in_seconds(5);

gettimeofday(&start_time, 0);


for (;;) {

gettimeofday(&tv, 0);

/* Wait for 7 seconds before fail. */
if (start_time.tv_sec + 6 < tv.tv_sec)
break;

/*
* Check if the time went back: the leap second ocurred.
*/
if (tv.tv_sec < tv_aux.tv_sec ||
(tv.tv_sec == tv_aux.tv_sec && tv.tv_usec < tv_aux.tv_usec))
step_back = 1;

/* Save it in the buffer. */
i = index % BUFFER_SIZE;

tvb[i] = tv;

/* Check if the buffer is full. */
if (!is_full) {
if (index == BUFFER_SIZE -1)
is_full = 1;
}

index = (index + 1) % BUFFER_SIZE;

/* If the buffer is full, push the start index. */
if (is_full)
startindex = (startindex + 1) % BUFFER_SIZE;

/*
* If it step back, continue saving occurences after the
* leap second to be able analyse more events.
*/
if (step_back) {
++after_step_back;
if (after_step_back >= samples_after_step_back)
break;
}

/*
* Check if the timer was fired, save its index position.
*/
if (timer_occurrence == 1) {
timer_index = index;
timer_occurrence = 2;
}

tv_aux = tv;
}

if (step_back) {
printf("reproduced! search for --- in the buffer, ");

if (timer_occurrence == 2)
printf("timer also occured, ");
printf("printing buffer...\n");
print_buffer(tvb, index, startindex,
is_full, timer_index);
}
else
printf("did not reproduce :-(\n");

exit(0);
}

P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/