Re: [PATCH v6 4/7] timekeeping: Drive time_offset skew via per-tick ntp_error transfer

From: John Stultz

Date: Thu Jun 18 2026 - 23:41:14 EST

On Sun, Jun 14, 2026 at 7:40 AM David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
>
> From: David Woodhouse <dwmw@xxxxxxxxxxxx>
>
> Instead of inflating tick_length to effect the time_offset slew,
> transfer the intended per-tick skew into ntp_error to achieve the
> desired rate.

Thanks again for sending this out! Apologies for my slow reviews!

I'm wondering, maybe could you better express the problem with the
current code that your patch improves? This feels a bit opaque to me
initially.

In my (admittedly goldfish-in-the-fog) sense of current code, we
adjust consume tick_offset in chunks added to tick_length which then
gets copied to tk->ntp_tick, which then modifies ntp_error.

It seems you're switching it to be a bit more direct, but still
introduces a new skew_delta value, which consumes from time_offset,
and is then the skew_delta gets pulled out of ntp_error. So while
looking at the code I can see you're handling things more carefully,
its not immediately clear to what end. This isn't a critique of your
code here, just in how the motivation for it is explained.

> In second_overflow(), calculate skew_delta which is the per-tick slew
> rate, in the same units as time_offset: (ns << NTP_SCALE_SHIFT) / HZ.
>
> In logarithmic_accumulation(), drain up to 'skew_delta' time units from
> time_offset into ntp_error to drive the overall effective rate.
>
> In timekeeping_adjust(), take skew_delta into account when calculating
> 'mult', such that the available choices (mult, mult+1) bracket the
> overall effective rate including the skew — otherwise the delta would
> just build up in ntp_error.
>
> This give behaviour equivalent to the old tick_length += delta approach
> but with exact per-tick accounting of the time_offset actually imparted
> to the clock, which was previously *extremely* approximate.

Clarifying this approximateness would probably help the reader.

> Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx>
> Assisted-by: Kiro:claude-opus-4.8
> ---
> include/linux/timekeeper_internal.h | 1 +
> kernel/time/ntp.c | 70 ++++++++++++++++++++++++++---
> kernel/time/ntp_internal.h | 2 +
> kernel/time/timekeeping.c | 43 ++++++++++++++++--
> 4 files changed, 107 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
> index ec81587a1400..fb37a736ec1c 100644
> --- a/include/linux/timekeeper_internal.h
> +++ b/include/linux/timekeeper_internal.h
> @@ -189,6 +189,7 @@ struct timekeeper {
> u32 ntp_err_mult;
> s64 cs_tick_adj;
> u32 skip_second_overflow;
> + s64 skew_delta;
> s32 tai_offset;
> };
>
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index 3fad82c47c4c..2082f8316b94 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -31,6 +31,9 @@
> * @time_state: State of the clock synchronization
> * @time_status: Clock status bits
> * @time_offset: Time adjustment in nanoseconds
> + * @skew_delta: Per-tick phase slew rate for the coming second, in
> + * @time_offset units (shifted-ns / HZ). Set by
> + * second_overflow().
> * @time_constant: PLL time constant
> * @time_maxerror: Maximum error in microseconds holding the NTP sync distance
> * (NTP dispersion + delay / 2)
> @@ -67,6 +70,7 @@ struct ntp_data {
> int time_state;
> int time_status;
> s64 time_offset;
> + s64 skew_delta;
> long time_constant;
> long time_maxerror;
> long time_esterror;
> @@ -349,6 +353,7 @@ static void __ntp_clear(struct ntp_data *ntpdata)
>
> ntpdata->tick_length = ntpdata->tick_length_base;
> ntpdata->time_offset = 0;
> + ntpdata->skew_delta = 0;
>
> ntpdata->ntp_next_leap_sec = TIME64_MAX;
> /* Clear PPS state variables */
> @@ -385,6 +390,37 @@ u64 ntp_tick_length(unsigned int tkid)
> return tk_ntp_data[tkid].tick_length;
> }
>
> +s64 ntp_get_skew_delta(unsigned int tkid)
> +{
> + return tk_ntp_data[tkid].skew_delta;
> +}
> +
> +/* Sign of @x as +1 or -1 (zero counts as positive; callers pass nonzero). */
> +static inline int signof(s64 x)
> +{
> + return x < 0 ? -1 : 1;
> +}
> +
> +s64 ntp_drain_time_offset(unsigned int tkid, s64 amount)
> +{
> + struct ntp_data *ntpdata = &tk_ntp_data[tkid];
> +
> + /* Only drain if amount and time_offset have the same sign */
> + if (!amount || signof(amount) != signof(ntpdata->time_offset))
> + return amount;
> +
> + /* Clamp: don't overshoot zero */
> + if (abs(amount) > abs(ntpdata->time_offset)) {
> + s64 undrained = amount - ntpdata->time_offset;
> +
> + ntpdata->time_offset = 0;
> + return undrained;
> + }
> +
> + ntpdata->time_offset -= amount;
> + return 0;
> +}
> +
> /**
> * ntp_get_next_leap - Returns the next leapsecond in CLOCK_REALTIME ktime_t
> * @tkid: Timekeeper ID
> @@ -419,7 +455,6 @@ ktime_t ntp_get_next_leap(unsigned int tkid)
> int second_overflow(unsigned int tkid, time64_t secs)
> {
> struct ntp_data *ntpdata = &tk_ntp_data[tkid];
> - s64 delta;
> int leap = 0;
> s32 rem;
>
> @@ -481,13 +516,38 @@ int second_overflow(unsigned int tkid, time64_t secs)
> /* Compute the phase adjustment for the next second */
> ntpdata->tick_length = ntpdata->tick_length_base;
>
> - delta = ntp_offset_chunk(ntpdata, ntpdata->time_offset);
> - ntpdata->time_offset -= delta;
> - ntpdata->tick_length += delta;
> -
> /* Check PPS signal */
> pps_dec_valid(ntpdata);
>
> + /*
> + * Set the per-tick skew rate for the next second. This is in
> + * the same units as time_offset: (ns << NTP_SCALE_SHIFT) / HZ.
> + * If the result is so low that the skew imparted would round
> + * to zero, pass the bare minimum ±1 to ensure that it *does*
> + * actually drain completely to zero. It won't overshoot because
> + * logarithmic_accumulation() only drains what it can from
> + * time_offset and the rest ends up in ntp_error which drives
> + * the selection of 'mult' immediately each tick.
> + */
> + if (ntpdata->time_offset) {
> + s64 off_chunk = ntp_offset_chunk(ntpdata, ntpdata->time_offset);
> +
> + /*
> + * Once the exponential chunk rounds to zero, deliver the last
> + * remaining offset this second so it converges to zero instead
> + * of stalling just above it.
> + */
> + if (!off_chunk)
> + off_chunk = ntpdata->time_offset;
> +
> + /* Reduce to per-tick, then floor. */
> + ntpdata->skew_delta = div_s64(off_chunk, NTP_INTERVAL_FREQ);
> + if (!ntpdata->skew_delta)
> + ntpdata->skew_delta = (off_chunk > 0) ? 1 : -1;

Maybe would using the signof() function you added above make sense here?

> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index bdafd599413d..5fd06d94c4b5 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -2430,17 +2431,26 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
> static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
> {
> u64 ntp_tl = ntp_tick_length(tk->id);
> + s64 skew = ntp_get_skew_delta(tk->id);
> u32 mult;
>
> /*
> - * Determine the multiplier from the current NTP tick length.
> - * Avoid expensive division when the tick length doesn't change.
> + * Determine the multiplier from the current NTP tick length plus
> + * skew_delta. The skew biases mult so that ±1 dithering can deliver
> + * the time_offset slew rate. Recompute when either changes.
> */
> - if (likely(tk->ntp_tick == ntp_tl)) {
> + if (likely(tk->ntp_tick == ntp_tl && tk->skew_delta == skew)) {
> + /* Revert to the base mult rate. */
> mult = tk->tkr_mono.mult - tk->ntp_err_mult;
> } else {
> tk->ntp_tick = ntp_tl;
> - mult = div64_u64(tk->ntp_tick >> tk->ntp_error_shift,
> + tk->skew_delta = skew;
> + /*
> + * skew_delta is stored pre-divided by HZ (matching time_offset);
> + * scale it back up to the full per-tick rate for the mult bias.
> + */
> + skew *= NTP_INTERVAL_FREQ;
> + mult = div64_u64((tk->ntp_tick + skew) >> tk->ntp_error_shift,
> tk->cycle_interval);
> }

So not really an issue, but just to make sure I understand: a side
effect here is we may be doing the division more frequently? I guess
potentially not much, since skew is /HZ so hopefully it lines up to
the second_overflow, but it seems like since xtime_interval won't
necessarily match NTP_INTERVAL_FREQ, you'd potentially have one
"drained" iteration before the next second_overflow?

> @@ -2568,6 +2578,31 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset,
> tk->ntp_error += tk->ntp_tick << shift;
> tk->ntp_error -= tk->xtime_interval << (tk->ntp_error_shift + shift);
>
> + /*
> + * The above accounting of ntp_error includes the part of clock
> + * skew which was *intentional*, imparted through deliberately
> + * adjusting 'mult' in timekeeping_adjust() taking skew_delta
> + * into account.
> + *
> + * Drain the intentional skew from time_offset, and readjust
> + * ntp_error by the amount that *could* actually be drained.
> + * This ensures that any *overshoot* is correctly left in
> + * ntp_error and will be correctly compensated for over time.
> + */
> + if (tk->skew_delta) {
> + /*
> + * skew_delta is stored pre-divided by HZ, matching time_offset,
> + * so drain it directly. Fold the amount actually drained back
> + * into ntp_error in full clock units (× NTP_INTERVAL_FREQ); any
> + * undrainable overshoot is left in ntp_error to be compensated
> + * by the dithering over subsequent ticks.
> + */
> + s64 drain = tk->skew_delta << shift;
> + s64 unclaimed = ntp_drain_time_offset(tk->id, drain);

bikeshed nit: drain / unclaimed feels like mixing metaphors?
consumed / remaining, maybe?

> +
> + tk->ntp_error += (drain - unclaimed) * NTP_INTERVAL_FREQ;
> + }
> +
> return offset;
> }

I haven't tested with it, but other than the nits this looks ok to me.

thanks!
-john