Re: [RFC PATCH 1/2] time/timekeeping: Fix possible inconsistencies in _COARSE clockids

From: Thomas Gleixner
Date: Sat Mar 15 2025 - 15:23:39 EST


On Fri, Mar 14 2025 at 17:37, John Stultz wrote:
> Now, by design, this negative adjustment should be fine, because
> the logic run from timekeeping_adjust() is done after we
> accumulate approx mult*interval_cycles into xtime_nsec.
> The accumulated (mult*interval_cycles) will be larger then the
> (mult_adj*offset) value subtracted from xtime_nsec, and both
> operations are done together under the tk_core.lock, so the net
> change to xtime_nsec should always be positive.

/should/is/

We better are confident about that :)

> However, do_adjtimex() calls into timekeeping_advance() as well,
> since we want to apply the ntp freq adjustment immediately.
> In this case, we don't return early when the offset is smaller
> then interval_cycles, so we don't end up accumulating any time
> into xtime_nsec. But we do go on to call timekeeping_adjust(),
> which modifies the mult value, and subtracts from xtime_nsec
> to correct for the new mult value.

We don't do anything. :)

> Here because we did not accumulate anything, we have a window
> where the _COARSE clockids that don't utilize the mult*offset
> value, can see an inconsistency.
>
> So to fix this, rework the timekeeping_advance() logic a bit
> so that when we are called from do_adjtimex() and the offset
> is smaller then cycle_interval, that we call
> timekeeping_forward(), to first accumulate the sub-interval
> time into xtime_nsec. Then with no unaccumulated cycles in
> offset, we can do the mult adjustment without worry of the
> subtraction having an impact.

It's a smart solution. I briefly pondered something similar, but I'm not
really fond of the fact, that it causes a clock_was_set() event for no
good reason.

clock_was_set() means that there is a time jump. But that's absolutely
not the case with do_adjtimex() changing the frequency for quick
adjustments. That does not affect continuity at all.

That event causes avoidable overhead in the kernel, but it's also
exposed to user space via timerfd TFD_TIMER_CANCEL_ON_SET.

I have no really strong opinion about that, but the route through
clock_was_set() triggers my semantical mismatch sensors :)

> NOTE: This was implemented as a potential alternative to
> Thomas' approach here:
> https://lore.kernel.org/lkml/87cyej5rid.ffs@tglx/
>
> And similarly, it needs some additional review and testing,
> as it was developed while packing for conference travel.

We can debate that next week over your favourite beverage :)

Have a safe trip!

Thanks,

tglx