Re: [PATCH RFC] timekeeping: Fix clock stability with nohz

From: Miroslav Lichvar
Date: Fri Dec 06 2013 - 09:26:14 EST


On Mon, Dec 02, 2013 at 08:03:17PM -0800, John Stultz wrote:
> On 12/02/2013 04:53 PM, John Stultz wrote:
> Finally found a config to get it working (disabling kernel debugging
> seems to work), and am currently trying to fixup the missing symbols
> (although I'm getting segfaults from various inline cli's :)

Patches are welcome :).

> Very cool simulator, by the way. Do you plan to have a git repo at some
> point for it?

It's now at https://github.com/mlichvar/linux-tktest

I'm considering to include it in https://github.com/mlichvar/clknetsim
as an optional replacement of the somewhat idealized clock which is
currently implemented there. This would allow us to see the whole
picture with applications controlling the clock.

> See the patch below. I'm doing some actual testing with it to see if its
> maybe too dampened.

It seems to fix the problem with stability, that's good. But the
response seems to be very slow now. In the simulated test with 10Hz
clock update it takes about 1000 updates (100 seconds!) for the loop
to converge to the correct frequency.

With the current tktest code from git:
n: 30, slope: 1.00 (1.00 GHz), dev: 3.1 ns, max: 3.6 ns, freq: -100.43404 ppm

You can see here the frequency is off by 0.43 ppm, that's after the 20
skipped updates.

When the sampling interval is changed to 100*50 ticks:
n: 30, slope: 1.00 (1.00 GHz), dev: 2146.9 ns, max: 5446.5 ns, freq: -100.07928 ppm

Only when the warmup period is extended to 100*1000 ticks, it produces
a nice fit:
n: 30, slope: 1.00 (1.00 GHz), dev: 7.3 ns, max: 12.2 ns, freq: -100.00004 ppm

This graph shows the value of tk->mult as it changes with clock
updates:
http://mlichvar.fedorapeople.org/tmp/tk_test1.png

When the TSC frequency is set to 100 MHz, it becomes more pronounced:
http://mlichvar.fedorapeople.org/tmp/tk_test2.png

I'm worried about the artifacts in the response, is that a bug?

> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -1068,7 +1068,7 @@ static __always_inline int timekeeping_bigadjust(struct timekeeper *tk,
> * here. This is tuned so that an error of about 1 msec is adjusted
> * within about 1 sec (or 2^20 nsec in 2^SHIFT_HZ ticks).
> */
> - error2 = tk->ntp_error >> (NTP_SCALE_SHIFT + 22 - 2 * SHIFT_HZ);
> + error2 = tk->ntp_error >> (NTP_SCALE_SHIFT/2);
> error2 = abs(error2);
> for (look_ahead = 0; error2 > 0; look_ahead++)
> error2 >>= 2;
>

--
Miroslav Lichvar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/