Re: [PATCH] correct inconsistent ntp interval/tick_length usage

From: Roman Zippel
Date: Tue Feb 12 2008 - 09:45:45 EST

Next message: Peter Zijlstra: "Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startupand is unkillable [still broken in 2.6.25-rc1]"
Previous message: Martin K. Petersen: "[PATCH] pata_cs5536 Fix secondary port configuration"
In reply to: john stultz: "Re: [PATCH] correct inconsistent ntp interval/tick_length usage"
Next in thread: john stultz: "Re: [PATCH] correct inconsistent ntp interval/tick_length usage"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

On Mon, 11 Feb 2008, john stultz wrote:

> > I don't want to just send a patch, I want you to understand why your
> > approach is wrong.
>
> With all due respect, it also keeps the critique in one direction and
> makes your review less collaborative and more confrontational then I
> suspect (or maybe just hope) you intend.

I don't think that's necessarily a contradiction, if we keep it to
confronting the problem. A simple patch wouldn't have provided any further
understanding of the problem compared to what I already said. You would
have seen what the patch does (which I described already differently), but
not really why it does that.

In this sense I prefer to force the confrontation of the problem. I'm
afraid a working patch would encourage to simply ignore the problem, as
your problem at hand would be solved without completely understanding it.
The point is that I'd really like you to understand the problem, so I'm
not the only one who understands this code :) and in the end it might
allow better collaboration to further improve this code.

To make it very clear this is just about understanding the problem, I
don't want to force a specific solution (which a patch would practically
do). If we both understand the problem, we can also discuss the solution
and maybe we find something better, but maybe I'm also totally wrong,
which would be a little embarrassing :), but that would be fine too. There
may be better ways to go about this problem, but IMO it would still be
better than just ignoring the problem and force it with a patch.

> This fine grained error accounting is where the bug I'm trying to
> address is cropping up from. In order to have the comparison we need to
> have two values:
> A: The clocksource's notion of how long the fixed interval is.
> B: NTP's notion of how long the fixed interval is.
>
> When no NTP adjustment is being made, these two values should be equal,
> but currently they are not. This is what causes the 280ppm error seen on
> my test system.
>
> Part A is calculated in the following fashion:
> #define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ)
>
> Which is then functionally shifted up by TICK_LENGTH_SHIFT, but for the
> course of this discussion, lets ignore that.
>
> Part B is calculated in ntp_update_frequency() as:
>
> u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ)
> << TICK_LENGTH_SHIFT;
> second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT;
> second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC);
>
> tick_length_base = second_length;
> do_div(tick_length_base, NTP_INTERVAL_FREQ);
>
>
> If we're assuming there is no NTP adjustment, and avoiding the
> TICK_LENGTH_SHIFT, this can be shorted to:
> B = ((TICK_USEC * NSEC_PER_USEC * USER_HZ)
> + CLOCK_TICK_ADJUST)/NTP_ITNERVAL_FREQ
>
>
> The A vs B comparison can be shortened further to:
> NSEC_PER_SEC != (TICK_USEC * NSEC_PER_USEC * USER_HZ)
> + CLOCK_TICK_ADJUST
>
> So now on to what seems to be the point of contention:
> If A != B, which side do we fix?
>
>
> My patches fix the A side so that it matches B, which on its face isn't
> terribly complicated, but you seem to be suggesting we fix the B side
> instead (Now I'm assuming here, because there's no patch. So I can only
> speak to your emails, which were not clear to me).

If we go from your base assumption above "there is no NTP adjustment", I
would actually agree with you and it wouldn't matter much on which side
to correct the equation.

The question is now what happens, if there are NTP adjustments, i.e. when
the time_freq value isn't zero. Based on this initialization we tell the
NTP daemon the base frequency, although not directly but it knows the
length freq1 == 1 sec. If the clock now needs adjustment, the NTP daemon
tells the kernel via time_freq how to change the speed so that freq1 ==
1 sec + time_freq.

The problem is now that by using CLOCK_TICK_ADJUST we are cheating and we
don't tell the NTP daemon the real frequency. We define 1 sec as freq2 +
tick_adjust and with the NTP adjustment we have freq2 + tick_adj == 1 sec
+ time_freq. Above initialization now calcalutes the base time length for
an update interval of freq2 / NTP_INTERVAL_FREQ, this means the requested
time_freq adjustment isn't applied to (freq2 + tick_adj) cycles but to
freq2 cycles, so this finally means any adjustment made by the NTP daemon
is slightly off.

To demonstrate this let's look at some real values and let's use the PIT
for it (as this is where this originated and on which CLOCK_TICK_ADJUST is
based on). With freq1=1193182 and HZ=1000 we program the timer with 1193
cycles and the actual update frequency is freq2=1193000. To adjust for
this difference we change the length of a timer tick:

(NSEC_PER_SEC + CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ

or

(10^9 nsec - 152533 nsec) / 1000

This way every 1000 interrupts or 1193000 cycles the clock is advanced by
999847467 nsec, so that we advance time according to freq1, but any
further adjustments aren't applied to an interval of what the NTP daemon
thinks is 1 sec but 999847467 nsec instead.

In consequence this means, if we want to improve timekeeping, we first set
the (update_cycles * NTP_INTERVAL_FREQ) interval as close as possible to
the real frequency. It doesn't has to be perfect as we usually don't know
the real frequency with 100% certainty anyway. Second, we drop the tick
adjustment if possible and leave the adjustments to the NTP daemon and as
long as the drift is within the 500ppm limit it has no problem to manage
this.

> However, what happens if a system uses the PIT or jiffies clocksource,
> which do suffer from the HZ / ACTHZ error resolved by the patch linked
> to above? Since those clocksources are so course, we still need to take
> that error into account. So in that regard the assumptions are still
> valid, no?
>
> You've seemingly suggested we set CLOCK_TICK_ADJUST to zero in the
> CONFIG_NO_HZ situation, however note that a CONFIG_NO_HZ kernel should
> be able to boot and run on a box that only supports the jiffies or pit
> clocksource (just not be able to switching into NO_HZ mode). Further,
> how does that resolve the issue w/ the tsc/hpet/acpi_pm clocksources
> that still exist when CONFIG_NO_HZ=n?
>
> I just don't see how this will work.

Look at your Part A, with NO_HZ the update interval is much larger, so any
rounding error becomes practically irrelevant. In the above PIT example
the update interval wouldn't be 1193 cycles but 596591 cycles instead.

I hope this helps to understand what I suggested. For NO_HZ we can just
drop this correction without problems. In the other case it's a little
more complicated, but we basically have to check the update interval and
if (interval * NTP_INTERVAL_FREQ) is within the 500ppm limit there isn't
much we really have to do.
This correction was only introduced to accommodate clocks which exceed
this 500ppm limit for whatever reason, so the NTP daemon isn't confused.
The best solution would be really to limit this correction to these
clocks.

bye, Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Peter Zijlstra: "Re: 2.6.24-git2: Oracle 11g VKTM process enters R state on startupand is unkillable [still broken in 2.6.25-rc1]"
Previous message: Martin K. Petersen: "[PATCH] pata_cs5536 Fix secondary port configuration"
In reply to: john stultz: "Re: [PATCH] correct inconsistent ntp interval/tick_length usage"
Next in thread: john stultz: "Re: [PATCH] correct inconsistent ntp interval/tick_length usage"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]