Re: [PATCH RFC] timekeeping: Fix clock stability with nohz
From: John Stultz
Date: Mon Dec 02 2013 - 19:54:11 EST
On 11/20/2013 10:39 AM, Miroslav Lichvar wrote:
> On Mon, Nov 18, 2013 at 12:46:00PM -0800, John Stultz wrote:
>>> In a simulation with 1GHz TSC clock and 10Hz clock update the maximum
>>> error went down from 4.7 microseconds to 5.5 nanoseconds. With 1Hz
>>> update the maximum error went down from 480 microseconds to 55
>>> nanoseconds.
>> Curious what sort of a environment you're using for simulation (as well
>> as the real test below)?
> I compile the kernel timekeeping.c file into a test program where a
> fake TSC is provided to the kernel code and I can easily control the
> tick length and timekeeper updates. The program collects pairs of
> TSC/getnstimeofday() values and then it runs linear regression through
> the points to see the frequency offset and how stable the clock was.
>
> http://mlichvar.fedorapeople.org/tmp/tk_test.tar.gz
So.. this doesn't build for me.
timekeeping.o: In function `change_clocksource':
timekeeping.c:(.text+0x535): undefined reference to `lock_acquire'
timekeeping.c:(.text+0x588): undefined reference to `lock_release'
timekeeping.o: In function `__getnstimeofday':
timekeeping.c:(.text+0x675): undefined reference to `trace_hardirqs_off'
timekeeping.c:(.text+0x69b): undefined reference to `lock_acquire'
timekeeping.c:(.text+0x6b0): undefined reference to `lock_release'
timekeeping.c:(.text+0x6c2): undefined reference to `trace_hardirqs_on'
timekeeping.c:(.text+0x77c): undefined reference to `trace_hardirqs_off'
Do you need a special .config in your kernel source? Changing lockdep
and irq tracer configs don't seem to fix things for me.
>> But this is all very interesting! Thanks again for digging into this
>> issue and sending the patch!
> Thanks for looking into it. I agree this is a rather radical change
> and it needs to be done very carefully. At this point, I'd like to
> know if you think there are no fundamental problems in the design and
> whether it would be an acceptable replacement.
>
> A different approach to fix this problem would be controlling the
> maximum idle interval from the tick adjusting code so that the NTP
> error corrections can be accurate. That would further increase the
> complexity of the code and increase the interrupt rate.
So I'm still trying to break apart the larger change you've made into
smaller functional steps.
The main two differences I see are:
1) Calculating the mult value directly from the ntp_tick_length() value
(via the division) each update cycle.
2) Drastically simplifying the ntp_error correction to a 0/1 multiplier
adjustment (assuming the calculated mult will be slightly slow, due to
truncating the remainder in the division) based on the sign of the error.
(and I'm ignoring the previously mentioned accounting changes that
appear to be improperly dropped ;)
This makes me suspect the main issue is we're over-correcting in the
timekeeping_bigadjust() logic with nohz and I'm curious if instead we
could either limit timekeeping_bigadjust() adjustment to achieve the
same stability? In particular, bigadjust()'s exponential adjustment of
mult seems problematic.
Am I missing or glossing over anything else that is key to your changes?
thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/