Re: [RFC][PATCH] timekeeping: Keep xtime_nsec remainder separatefrom ntp_error

From: john stultz
Date: Wed Dec 08 2010 - 15:00:09 EST


On Wed, 2010-12-08 at 13:30 -0500, Steven Rostedt wrote:
> While doing my end of year unlikely() cleanup, running the annotate
> branch profiler, I came across this:
>
> correct incorrect % Function File Line
> ------- --------- - -------- ---- ----
> 122588 65653641 99 timekeeping_adjust timekeeping.c 664
> 167493 14584927 98 timekeeping_adjust timekeeping.c 658
>
> This shows that the following likely()'s are wrong most of the time:
>
> if (error > interval) {
> error >>= 2;
> if (likely(error <= interval))
> adj = 1;
> else
> adj = timekeeping_bigadjust(error, &interval, &offset);
> } else if (error < -interval) {
> error >>= 2;
> if (likely(error >= -interval)) {
>
> Talking about this with John Stultz, we both agreed that the annotations
> should be correct and is not the issue, but something else is going
> wrong.
>
> Adding in trace_printks(), I saw that the adj values that were added to
> the "mult" multiplier were sometimes quite large. The time intervals
> never got down into a small error, but instead was making large
> oscillations, both positive and negative to where it should be.
>
> John noticed that if he removed the commit:
>
> commit 5cd1c9c5cf30d4b33df3d3f74d8142f278d536b7
> timekeeping: fix rounding problem during clock update
>
> that the problem would go away and we would get back into a tight
> oscillation that would stay within the fast path (and the likely()'s
> were again likely).
>
> What the above commit did was to fix a bug that caused time to go
> backward a nanosec due to the truncating of the xtime_nsec shifted into
> the xtime.tv_nsec. The fix for that bug (and what that commit did) was
> to always round up one. It added +1 to the xtime.tv_nsec after it did
> the conversion, and then took the difference between this shifted and
> the xtime_nsec and stored that into the ntp_error.
>
> The ntp_error is used to control the frequency, and this constant adding
> of the shift remainder would cause the large oscillation.
>
> This patch instead adds another field to the timekeeping structure that
> stores the remainder separately. On re-entry into update_wall_time(),
> the remainder is added back onto the xtime_nsec after it is set to the
> xtime.tv_nsec and restoring its original value.
>
> This handles the rounding problem that the original commit addressed but
> does not cause the large oscillation that it caused.

Hey Steven!

Thanks for the great analysis and tooling to help find these unexpected
behaviors!

Sadly, I believe your proposed change can still cause minor nsec
inconsistencies from gettimeofday/vgettimeofday. In fact, the previous
implementation where the nsec inconsistency error was observed preserved
the sub-nanosecond remainder in xtime_nsec.

I suspect we may need to still round up and store the error, but tweak
the adjustment code to handle the larger error per-iteration then it was
originally designed for (note: the current code is still functioning
properly, its just not often hitting the expected trivial case).

The only alternative would be to integrate the sub-ns remainder into the
gettime caclculation (including reworking all the vsyscall
implementations to utilize it as well).

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/