Time precision, adjtime(x) vs. gettimeofday

From: Benjamin Herrenschmidt
Date: Wed Oct 08 2003 - 08:53:20 EST


Hi !

While fixing problems experienced by some scientific users who
found out that gettimeofday() could sometimes run backward, I
found a nasty issue I don't know if we can fix at all or if it's
not worth bothering.

So the problem is with any arch (ppc, x86, ...) who uses a HW
timer (like the CPU timebase on PPC) to provide better-than-jiffy
precision in do_gettimeofday().

The problem is that the offset added to xtime value (typically
the HW timer current value minus the HW timer value at the last
timer interrupt scaled to usec) uses a scaling factor which has
been calibrated once, and doesn't take into account the adjustements
done to xtime increase by adjtime/adjtimex algorithm.

That means that if, for example, adjtimex was called with a factor
that is trying to slow you down a bit, and you call gettimeofday
right before the end of a jiffy period, you may calculate an offset
based on the HW timer that is actually higher than what will be
really added to xtime on the next interrupt.

So you can end-up returning non-monotonic values from gettimeofday().

I don't see a way to fix that that wouldn't bloat do_gettimeofday(),
except if we can, at jiffy interrupt time, pre-calculate a scaling
factor for the next jiffy and just apply it on the HW timer value
on the next calls to do_gettimeofday(). But that option would need
better understanding of the adjtime(x) algorithm that what I have
at this point.

Storing the last value to make sure we don't return a value that is
lower will defeat the read_lock/write_lock mecanism, forcing us to
take the write_lock(), and thus screwing up scalability.

Any idea ?

Note: In addition to the above, there seem to be a race on x86 2.4
(only, 2.6 doesn't have it) due to the fact that the actual xtime
increase is done from a bottom half. The HW timer "last stamp" is
stored from the HW interrupt, xtime is only updated on the BH, so
if gettimeofday is called in between those 2, you'll end up using
the "new" "last stamp" with the old xtime, thus returning an
incorrect value. A fix we use on PPC is to use

jiffies - wall_jiffies

As an additional correction.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/