Re: [PATCH v10 2/7] time: sync read_boot_clock64() with persistent clock
From: Pavel Tatashin
Date: Tue Jun 19 2018 - 17:26:42 EST
On Tue, Jun 19, 2018 at 5:17 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Fri, 15 Jun 2018, Pavel Tatashin wrote:
>
> > read_boot_clock64() returns a boot start timestamp from epoch. Some arches
> > may need to access the persistent clock interface in order to calculate the
> > epoch offset. The resolution of the persistent clock, however, might be
> > low. Therefore, in order to avoid time discrepancies a new argument 'now'
> > is added to read_boot_clock64() parameters. Arch may decide to use it
> > instead of accessing persistent clock again.
>
> I kinda know what you are trying to say, but it's all but obvious.
>
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 4786df904c22..fb6898fab374 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -1502,9 +1502,13 @@ void __weak read_persistent_clock64(struct timespec64 *ts64)
> > * Function to read the exact time the system has been started.
> > * Returns a timespec64 with tv_sec=0 and tv_nsec=0 if unsupported.
> > *
> > + * Argument 'now' contains time from persistent clock to calculate offset from
> > + * epoch. May contain zeros if persist ant clock is not available.
>
> What's a 'persist ant' ?
persistent, but I think spell checker decided that I was writing about
some ants. :)
>
> > + *
> > * XXX - Do be sure to remove it once all arches implement it.
> > */
> > -void __weak read_boot_clock64(struct timespec64 *ts)
> > +void __weak __init read_boot_clock64(struct timespec64 *now,
> > + struct timespec64 *ts)
> > {
> > ts->tv_sec = 0;
> > ts->tv_nsec = 0;
> > @@ -1535,7 +1539,7 @@ void __init timekeeping_init(void)
> > } else if (now.tv_sec || now.tv_nsec)
> > persistent_clock_exists = true;
> >
> > - read_boot_clock64(&boot);
> > + read_boot_clock64(&now, &boot);
>
> This interface was already bolted on and this extension just makes it
> worse. If 'now' is invalid then you need the same checks as after
> read_persistent_clock() replicated along with conditionals of some sort.
>
> 'boot' time is used to adjust the wall to monotonic offset. So looking at
> the math behind all of that:
>
> read_persistent_clock(&now);
> read_boot_clock(&boot);
>
> tk_set_xtime(tk, now)
> tk->xtime_sec = now.sec;
> tk->xtime_nsec = now.nsec;
>
> tk_set_wall_to_mono(tk, -boot);
> tk->wall_to_mono = boot;
> tk->offs_real = -boot;
>
> timekeeping_update(tk)
> tk->tkr_mono = tk->xtime + tk->wall_to_mono;
>
> tkr_mono.base is guaranteed to be >= 0. So you need to make sure that
>
> tk->xtime + tk->wall_to_mono >= 0
>
> Yet another check which you need to do in that magic function. That's just
> wrong. If this grows more users then they all have to copy the same sanity
> checks over and over and half of them will be wrong.
>
> Fortunately there is only a single real user of read_boot_clock() in the
> tree: S390. By virtue of being S390 there is no worry about any sanity
> checks. It just works.
>
> ARM has the function, but there is no single subarch which actually
> implements the thing, so this is easy to fix by removing it.
>
> So the right thing to do is the following:
>
> Provide a new function:
>
> void __weak read_persistent_wall_and_boot_offset(struct timespec64 *wall_time,
> ktime_t *boot_offset)
> {
> read_persistent_clock(wall_time);
> }
>
> Then add the new function to S390:
>
> void read_persistent_wall_and_boot_offset(struct timespec64 *wall_time,
> ktime_t *boot_offset)
> {
> unsigned char clk[STORE_CLOCK_EXT_SIZE];
> struct timespec64 boot_time;
> __u64 delta;
>
> read_persistent_clock(wall_time);
>
> delta = initial_leap_seconds + TOD_UNIX_EPOCH;
> memcpy(clk, tod_clock_base, 16);
> *(__u64 *) &clk[1] -= delta;
> if (*(__u64 *) &clk[1] > delta)
> clk[0]--;
> ext_to_timespec64(clk, boot_time);
> *boot_offset = timespec64_to_ns(timespec64_sub(wall_time, boot_time));
> }
>
> Then rework timekeeping_init():
>
> timekeeping_init()
> struct timespec64 wall_time, wall_to_mono, offs;
> ktime_t boot_offset = 0;
>
> read_persistent_wall_and_boot_offset(&walltime, &boot_offset);
>
> if (!valid(walltime))
> boottime = wall_time.tv_sec = wall_time.tv_nsec = 0;
> else
> persistent_clock = true;
>
> if (boot_offset > timespec64_to_nsec(wall_time))
> offs.tv_sec = offs.tv_nsec = 0;
> else
> offs = ns_to_timespec64(boot_offset);
>
> wall_to_mono = timespec64_sub(offs, wall_time);
> tk_set_wall_to_mono(tk, wall_time);
>
>
> After that remove the read_boot_time() leftovers all over the place. And
> then the x86 implementation boils down to:
>
> void read_persistent_wall_and_boot_offset(struct timespec64 *wall_time,
> ktime_t *boot_offset)
> {
> x86_platform.get_wallclock(ts);
> *boot_offset = sched_clock();
> }
>
> And doing it this way - by adding the extra math to the only architecture
> on the planet which has sane timers - is the right thing to do because all
> other architectures will have to fall back to tricks similar to x86 by
> doing clocksource/schedclock based guestimation of the time where the
> machine actually reached the kernel.
>
> Hmm?
Sounds good, I will redo this and the next patch with your suggestions.
Thank you,
Pavel