Re: [PATCH v6 3/7] timekeeping: Account for clocksource tick quantisation via NTP

From: John Stultz

Date: Thu Jun 18 2026 - 21:26:13 EST


On Sun, Jun 14, 2026 at 7:40 AM David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
>
> From: David Woodhouse <dwmw@xxxxxxxxxxxx>
>
> cycle_interval is an integer number of counter cycles per NTP interval,
> so the real time it represents differs from the nominal
> NTP_INTERVAL_LENGTH by up to half a counter period. For coarse
> clocksources this is significant: the 3.579545 MHz ACPI PM timer at
> HZ=1000 rounds 3579.545 cycles up to 3580, making each tick 1.000127 ms
> (+127 PPM).
>
> Commit a386b5af8edd ("time: Compensate for rounding on odd-frequency
> clocksources") introduced xtime_remainder to compensate for exactly
> this, citing the same 127 PPM ACPI PM example. The compensation is
> correct and necessary, but it was applied inside the timekeeping
> accumulation in timekeeping.c: subtracted in the mult computation in
> timekeeping_adjust() and folded into the ntp_error update in
> logarithmic_accumulation(). That keeps the base rate correct and leaves
> NTP its full symmetric +/-MAXFREQ range rather than +373/-627 PPM, but
> the NTP code in ntp.c never sees it: tick_length is computed without the
> correction, so ntp.c's notion of how long a tick is disagrees with the
> rate timekeeping actually produces.
>
> Make the offset an explicit part of the NTP tick_length instead. Add
> ntp_data::cs_tick_adj, a fixed per-second addend that
> ntp_update_frequency() includes alongside ntp_tick_adj and time_freq.
> tk_setup_internals() computes it from the difference between the real
> cycle_interval duration and the nominal interval, stores it in the
> timekeeper, and hands it to NTP through a new argument to ntp_clear() --
> which already recomputes the frequency and is invoked after every
> clocksource (re)configuration. timekeeping_init() now uses TK_UPDATE_ALL
> for this; clearing NTP there is otherwise redundant since ntp_init() has
> just initialised it.
>
> ntp.c now computes the true tick rate, giving a single source of truth.
> Like ntp_tick_adj, cs_tick_adj stays internal to the kernel: userspace
> still sees the nominal 1.000000 ms tick via adjtimex and is unaware of
> the addends. timekeeping_adjust() and logarithmic_accumulation() use
> ntp_tick / xtime_interval directly, and xtime_remainder is removed.
>
> The base-rate arithmetic is unchanged: ntp_tick becomes
> xtime_interval << ntp_error_shift, so the mult division yields the same
> base mult and the ntp_error accumulation still nets to zero per tick.
>
> Beyond the cleanup of treating all the tick_length contributions
> (nominal interval, ntp_tick_adj, cs_tick_adj, time_freq) consistently
> as addends in one place, it also prepares for feed-forward discipline:
> a future timekeeping_set_reference() will set tick_length to track an
> absolute external reference such as a vmclock, and that path needs
> ntp.c to own a tick_length that already reflects the clocksource
> quantisation, with no hidden correction applied elsewhere.
>

Thanks so much for taking another swing at this! This resolves my
earlier conceptual concerns.

> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index d847bba0481b..bdafd599413d 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -366,7 +366,6 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
>
> /* Go back from cycles -> shifted ns */
> tk->xtime_interval = interval * clock->mult;
> - tk->xtime_remainder = ntpinterval - tk->xtime_interval;
> tk->raw_interval = interval * clock->mult;
>
> /* if changing clocks, convert xtime_nsec shift units */
> @@ -386,7 +385,19 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
>
> tk->ntp_error = 0;
> tk->ntp_error_shift = NTP_SCALE_SHIFT - clock->shift;
> - tk->ntp_tick = ntpinterval << tk->ntp_error_shift;
> +
> + /*
> + * cycle_interval is a whole number of counter cycles, so the real
> + * time it represents differs from the nominal NTP_INTERVAL_LENGTH by
> + * up to half a counter period (e.g. +127 PPM on the 3.579545 MHz ACPI
> + * PM timer at HZ=1000). Record this fixed per-tick offset, scaled up
> + * to a per-second value to match the ntp_update_frequency() addends,
> + * so it can be handed to NTP via ntp_clear() and reflected explicitly
> + * in tick_length rather than applied behind NTP's back.
> + */
> + tk->cs_tick_adj = (((s64)tk->xtime_interval - (s64)ntpinterval) <<
> + tk->ntp_error_shift) * NTP_INTERVAL_FREQ;
> + tk->ntp_tick = (u64)tk->xtime_interval << tk->ntp_error_shift;
>

So I think it ends up being correct, but this bit setting ntp_tick
(using xtime_interval instead of ntpinterval) /looks/ wrong at first
glance.

Further ntpinterval is now basically only used in the cs_tick_adj but
is set a fair bit earlier in this function. I wonder if getting rid of
ntpinterval and open coding it to set cs_tick_adj might make it easier
to understand (though obviously not trivial) the math? Maybe breaking
it into two (or more) lines similar to how we set ntp_error in
logarithmic_accumulation?

Otherwise looks good!

thanks
-john