[RFC][PATCH] timekeeping: Keep xtime_nsec remainder separate fromntp_error

From: Steven Rostedt
Date: Wed Dec 08 2010 - 13:31:04 EST


While doing my end of year unlikely() cleanup, running the annotate
branch profiler, I came across this:

correct incorrect % Function File Line
------- --------- - -------- ---- ----
122588 65653641 99 timekeeping_adjust timekeeping.c 664
167493 14584927 98 timekeeping_adjust timekeeping.c 658

This shows that the following likely()'s are wrong most of the time:

if (error > interval) {
error >>= 2;
if (likely(error <= interval))
adj = 1;
else
adj = timekeeping_bigadjust(error, &interval, &offset);
} else if (error < -interval) {
error >>= 2;
if (likely(error >= -interval)) {

Talking about this with John Stultz, we both agreed that the annotations
should be correct and is not the issue, but something else is going
wrong.

Adding in trace_printks(), I saw that the adj values that were added to
the "mult" multiplier were sometimes quite large. The time intervals
never got down into a small error, but instead was making large
oscillations, both positive and negative to where it should be.

John noticed that if he removed the commit:

commit 5cd1c9c5cf30d4b33df3d3f74d8142f278d536b7
timekeeping: fix rounding problem during clock update

that the problem would go away and we would get back into a tight
oscillation that would stay within the fast path (and the likely()'s
were again likely).

What the above commit did was to fix a bug that caused time to go
backward a nanosec due to the truncating of the xtime_nsec shifted into
the xtime.tv_nsec. The fix for that bug (and what that commit did) was
to always round up one. It added +1 to the xtime.tv_nsec after it did
the conversion, and then took the difference between this shifted and
the xtime_nsec and stored that into the ntp_error.

The ntp_error is used to control the frequency, and this constant adding
of the shift remainder would cause the large oscillation.

This patch instead adds another field to the timekeeping structure that
stores the remainder separately. On re-entry into update_wall_time(),
the remainder is added back onto the xtime_nsec after it is set to the
xtime.tv_nsec and restoring its original value.

This handles the rounding problem that the original commit addressed but
does not cause the large oscillation that it caused.

The new results of the branch annotation profiler is now:

correct incorrect % Function File Line
------- --------- - -------- ---- ----
736971 129 0 timekeeping_adjust timekeeping.c 685
736943 115 0 timekeeping_adjust timekeeping.c 676


Cc: Roman Zippel <zippel@xxxxxxxxxxxxxx>
Cc: John Stultz <johnstul@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>

Index: linux-compile.git/kernel/time/timekeeping.c
===================================================================
--- linux-compile.git.orig/kernel/time/timekeeping.c
+++ linux-compile.git/kernel/time/timekeeping.c
@@ -37,6 +37,10 @@ struct timekeeper {

/* Clock shifted nano seconds remainder not stored in xtime.tv_nsec. */
u64 xtime_nsec;
+
+ /* remainder in xtime subtraction */
+ u64 xtime_nsec_rem;
+
/* Difference between accumulated time and NTP time in ntp
* shifted nano seconds. */
s64 ntp_error;
@@ -84,6 +88,7 @@ static void timekeeper_setup_internals(s
((u64) interval * clock->mult) >> clock->shift;

timekeeper.xtime_nsec = 0;
+ timekeeper.xtime_nsec_rem = 0;
timekeeper.shift = clock->shift;

timekeeper.ntp_error = 0;
@@ -751,6 +756,12 @@ void update_wall_time(void)
timekeeper.xtime_nsec = (s64)xtime.tv_nsec << timekeeper.shift;

/*
+ * Add back the remainder that was left over when adding +1 to
+ * xtime.tv_nsec;
+ */
+ timekeeper.xtime_nsec += timekeeper.xtime_nsec_rem;
+
+ /*
* With NO_HZ we may have to accumulate many cycle_intervals
* (think "ticks") worth of time at once. To do this efficiently,
* we calculate the largest doubling multiple of cycle_intervals
@@ -797,12 +808,11 @@ void update_wall_time(void)

/*
* Store full nanoseconds into xtime after rounding it up and
- * add the remainder to the error difference.
+ * store the remainder to update xtime_nsec on the next iteration.
*/
xtime.tv_nsec = ((s64) timekeeper.xtime_nsec >> timekeeper.shift) + 1;
timekeeper.xtime_nsec -= (s64) xtime.tv_nsec << timekeeper.shift;
- timekeeper.ntp_error += timekeeper.xtime_nsec <<
- timekeeper.ntp_error_shift;
+ timekeeper.xtime_nsec_rem = timekeeper.xtime_nsec;

/*
* Finally, make sure that after the rounding


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/