Re: [PATCH v2] timekeeping: move multigrain timestamp floor handling into timekeeper
From: Jeff Layton
Date: Thu Sep 12 2024 - 16:18:57 EST
On Thu, 2024-09-12 at 13:11 -0700, John Stultz wrote:
> On Thu, Sep 12, 2024 at 11:02 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> >
> > The kernel test robot reported a performance hit in some will-it-scale
> > tests due to the multigrain timestamp patches. My own testing showed
> > about a 7% drop in performance on the pipe1_threads test, and the data
> > showed that coarse_ctime() was slowing down current_time().
>
> So, you provided some useful detail about why coarse_ctime() was slow
> in your reply earlier, but it would be good to preserve that in the
> commit message here.
>
>
> > Move the multigrain timestamp floor tracking word into timekeeper.c. Add
> > two new public interfaces: The first fills a timespec64 with the later
> > of the coarse-grained clock and the floor time, and the second gets a
> > fine-grained time and tries to swap it into the floor and fills a
> > timespec64 with the result.
> >
> > The first function returns an opaque cookie that is suitable for passing
> > to the second, which will use it as the "old" value in the cmpxchg.
>
> The cookie usage isn't totally clear to me right off. It feels a bit
> more subtle then I'd expect.
>
>
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 5391e4167d60..bb039c9d525e 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw ____cacheline_aligned = {
> > .base[1] = FAST_TK_INIT,
> > };
> >
> > +/*
> > + * This represents the latest fine-grained time that we have handed out as a
> > + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> > + * realtime clock on an as-needed basis.
> > + */
> > +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> > +
>
> So I do really like this general approach of having an internal floor
> value combined with special coarse/fine grained assessors that work
> with the floor, so we're not impacting the normal hotpath logic
> (basically I was writing up a suggestion to this effect to the thread
> with Arnd when I realized you had follow up patch in my inbox).
>
>
> > static inline void tk_normalize_xtime(struct timekeeper *tk)
> > {
> > while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> > @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
> > }
> > EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
> >
> > +/**
> > + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> > + * @ts: timespec64 to be filled
> > + *
> > + * Adjust floor to realtime and compare it to the coarse time. Fill
> > + * @ts with the latest one. Returns opaque cookie suitable to pass
> > + * to ktime_get_real_ts64_mg.
> > + */
> > +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> > +{
> > + struct timekeeper *tk = &tk_core.timekeeper;
> > + u64 floor = atomic64_read(&mg_floor);
> > + ktime_t f_real, offset, coarse;
> > + unsigned int seq;
> > +
> > + WARN_ON(timekeeping_suspended);
> > +
> > + do {
> > + seq = read_seqcount_begin(&tk_core.seq);
> > + *ts = tk_xtime(tk);
> > + offset = *offsets[TK_OFFS_REAL];
> > + } while (read_seqcount_retry(&tk_core.seq, seq));
> > +
> > + coarse = timespec64_to_ktime(*ts);
> > + f_real = ktime_add(floor, offset);
> > + if (ktime_after(f_real, coarse))
> > + *ts = ktime_to_timespec64(f_real);
> > + return floor;
> > +}
> > +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
>
> Generally this looks ok to me.
>
>
> > +/**
> > + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> > + * @ts: pointer to the timespec to be set
> > + * @cookie: opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> > + *
> > + * Get a current monotonic fine-grained time value and attempt to swap
> > + * it into the floor using @cookie as the "old" value. @ts will be
> > + * filled with the resulting floor value, regardless of the outcome of
> > + * the swap.
> > + */
>
> Again this cookie argument usage and the behavior of this function
> isn't very clear to me.
>
> > +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> > +{
> > + struct timekeeper *tk = &tk_core.timekeeper;
> > + ktime_t offset, mono, old = (ktime_t)cookie;
> > + unsigned int seq;
> > + u64 nsecs;
> > +
> > + WARN_ON(timekeeping_suspended);
> > +
> > + do {
> > + seq = read_seqcount_begin(&tk_core.seq);
> > +
> > + ts->tv_sec = tk->xtime_sec;
> > + mono = tk->tkr_mono.base;
> > + nsecs = timekeeping_get_ns(&tk->tkr_mono);
> > + offset = *offsets[TK_OFFS_REAL];
> > + } while (read_seqcount_retry(&tk_core.seq, seq));
> > +
> > + mono = ktime_add_ns(mono, nsecs);
> > + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> > + ts->tv_nsec = 0;
> > + timespec64_add_ns(ts, nsecs);
> > + } else {
> > + *ts = ktime_to_timespec64(ktime_add(old, offset));
> > + }
> > +
> > +}
> > +EXPORT_SYMBOL(ktime_get_real_ts64_mg);
>
>
> So initially I was expecting this to look something like (sorry for
> the whitespace damage here):
> {
> do {
> seq = read_seqcount_begin(&tk_core.seq);
> ts->tv_sec = tk->xtime_sec;
> mono = tk->tkr_mono.base;
> nsecs = timekeeping_get_ns(&tk->tkr_mono);
> offset = *offsets[TK_OFFS_REAL];
> } while (read_seqcount_retry(&tk_core.seq, seq));
>
> mono = ktime_add_ns(mono, nsecs);
> do {
> old = atomic64_read(&mg_floor);
> if (floor >= mono)
> break;
> } while(!atomic64_try_cmpxchg(&mg_floor, old, mono);
> ts->tv_nsec = 0;
> timespec64_add_ns(ts, nsecs);
> }
>
> Where you read the tk data, atomically update the floor (assuming it's
> not in the future) and then return the finegrained value, not needing
> to manage a cookie value.
>
> But instead, it seems like if something has happened since the cookie
> value was saved (another cpu getting a fine grained timestamp), your
> ktime_get_real_ts64_mg() will fall back to returning the same coarse
> grained time saved to the cookie, as if no time had past?
>
> It seems like that could cause problems:
>
> cpu1 cpu2
> --------------------------------------------------------------------------
> t2a = ktime_get_coarse_real_ts64_mg
> t1a = ktime_get_coarse_real_ts64_mg()
> t1b = ktime_get_real_ts64_mg(t1a)
>
> t2b = ktime_get_real_ts64_mg(t2a)
>
> Where t2b will seem to be before t1b, even though it happened afterwards.
>
Ahh no, the subtle thing about atomic64_try_cmpxchg is that it
overwrites "old" with the value that was currently there in the event
that the cmp fails.
So, the try_cmpxchg there will either swap the new value into place, or
if it was updated in the meantime, "old" will now refer to the value
that's currently in the floor word. Either is fine in this case, so we
don't need to retry anything.
--
Jeff Layton <jlayton@xxxxxxxxxx>