Re: [PATCH] ntp: Make sure RTC is synchronized when time goes backwards

From: Thomas Gleixner
Date: Sat Sep 07 2024 - 17:43:06 EST


On Sat, Sep 07 2024 at 21:09, Benjamin ROBIN wrote:
> The "sync_hw_clock" is normally called every 11 minutes when time is

s/The "sync_hw_clock"/sync_hw_clock()/

See: https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#function-references-in-changelogs

> synchronized. This issue is that this periodic timer uses the REALTIME
> clock, so when time moves backwards (the NTP server jumps into the past),
> the next call to "sync_hw_clock" could be realized after a very long

s/the next .../the timer expires late./

And then please explain what the consequence is when it expires
late. See the changelog section of the above link.

> period.

Cute.

> A normal NTP server should not jump in the past like that, but it is
> possible... Another way to reproduce this issue is using phc2sys to
> synchronize the REALTIME clock with for example an IRIG timecode with
> the source always starting at the same date (not synchronized).
>
> This patch cancels the periodic timer on a time jump (ADJ_SETOFFSET).

s/This patch cancels/Cancel/

For explanation:
# git grep 'This patch' Documentation/process

> The timer will be relaunched at the end of "do_adjtimex" if NTP is still
> considered synced. Otherwise the timer will be relaunched later when NTP
> is synced. This way, when the time is synchronized again, the RTC is
> updated after less than 2 seconds.
>
> Signed-off-by: Benjamin ROBIN <dev@xxxxxxxxxxxxx>
> ---
> kernel/time/ntp.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index 8d2dd214ec68..5c8dd92cf012 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -751,6 +751,9 @@ static inline void process_adjtimex_modes(const struct __kernel_timex *txc,
>
> if (txc->modes & (ADJ_TICK|ADJ_FREQUENCY|ADJ_OFFSET))
> ntp_update_frequency();
> +
> + if (txc->modes & ADJ_SETOFFSET)
> + hrtimer_cancel(&sync_hrtimer);

Did you test this with lockdep enabled?

The caller holds timekeeping_lock and has the time keeper sequence write
held. There is an existing dependency chain which is invers. Assume the
sync_hrtimer is queued on a different CPU, CPU1 in this example:

CPU 0 CPU1

lock(&timekeeper_lock); lock_hrtimer_base(CPU1);

write_seqcount_begin(&tk_core.seq); <- Makes tk_core.seq odd

__do_adjtimex()
process_adjtimex_modes() base->get_time()
hrtimer_cancel() do {
hrtimer_try_to_cancel() seq = read_seqcount_begin(&tk_core.seq);
lock_hrtimer_base(CPU1); ^^^
^^^ deadlock Spin waits for tk_core.seq
to become even

You can do that cancel only outside of timekeeper_lock:

--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2426,6 +2426,7 @@ int do_adjtimex(struct __kernel_timex *t
{
struct timekeeper *tk = &tk_core.timekeeper;
struct audit_ntp_data ad;
+ bool offset_set = false;
bool clock_set = false;
struct timespec64 ts;
unsigned long flags;
@@ -2448,6 +2449,7 @@ int do_adjtimex(struct __kernel_timex *t
if (ret)
return ret;

+ offset_set = delta.tv_sec < 0;
audit_tk_injoffset(delta);
}

@@ -2481,7 +2483,7 @@ int do_adjtimex(struct __kernel_timex *t
if (clock_set)
clock_was_set(CLOCK_REALTIME);

- ntp_notify_cmos_timer();
+ ntp_notify_cmos_timer(offset_set);

return ret;
}

Now you can fix that up in ntp_notify_cmos_timer() which is outside of
the timekeeper_lock held region for the very same reason and it's the
proper place to do that.

Thanks,

tglx