Re: [PATCH v7 01/11] timekeeping: move multigrain timestamp floor handling into timekeeper

From: Jeff Layton
Date: Fri Sep 13 2024 - 15:07:20 EST


On Fri, 2024-09-13 at 11:59 -0700, John Stultz wrote:
> On Fri, Sep 13, 2024 at 6:54 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> >
> > For multigrain timestamps, we must keep track of the latest timestamp
> > that has ever been handed out, and never hand out a coarse time below
> > that value.
> >
> > Add a static singleton atomic64_t into timekeeper.c that we can use to
> > keep track of the latest fine-grained time ever handed out. This is
>
> Maybe drop "ever" and add "handed out through a specific interface",
> as timestamps can be accessed in a lot of ways that don't keep track
> of what was returned.
>

Will do. I'll make it clear that this only applies to the *_mg
interfaces.

>
> > tracked as a monotonic ktime_t value to ensure that it isn't affected by
> > clock jumps.
> >
> > Add two new public interfaces:
> >
> > - ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of the
> > coarse-grained clock and the floor time
> >
> > - ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries
> > to swap it into the floor. A timespec64 is filled with the result.
> >
> > Since the floor is global, we take great pains to avoid updating it
> > unless it's absolutely necessary. If we do the cmpxchg and find that the
> > value has been updated since we fetched it, then we discard the
> > fine-grained time that was fetched in favor of the recent update.
> >
> > To maximize the window of this occurring when multiple tasks are racing
> > to update the floor, ktime_get_coarse_real_ts64_mg returns a cookie
> > value that represents the state of the floor tracking word, and
> > ktime_get_real_ts64_mg accepts a cookie value that it uses as the "old"
> > value when calling cmpxchg().
> >
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> > include/linux/timekeeping.h | 4 +++
> > kernel/time/timekeeping.c | 81 +++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 85 insertions(+)
> >
> > diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
> > index fc12a9ba2c88..cf2293158c65 100644
> > --- a/include/linux/timekeeping.h
> > +++ b/include/linux/timekeeping.h
> > @@ -45,6 +45,10 @@ extern void ktime_get_real_ts64(struct timespec64 *tv);
> > extern void ktime_get_coarse_ts64(struct timespec64 *ts);
> > extern void ktime_get_coarse_real_ts64(struct timespec64 *ts);
> >
> > +/* Multigrain timestamp interfaces */
> > +extern u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts);
> > +extern void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie);
> > +
> > void getboottime64(struct timespec64 *ts);
> >
> > /*
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 5391e4167d60..ee11006a224f 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw ____cacheline_aligned = {
> > .base[1] = FAST_TK_INIT,
> > };
> >
> > +/*
> > + * This represents the latest fine-grained time that we have handed out as a
> > + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> > + * realtime clock on an as-needed basis.
> > + */
> > +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> > +
> > static inline void tk_normalize_xtime(struct timekeeper *tk)
> > {
> > while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> > @@ -2394,6 +2401,80 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
> > }
> > EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
> >
> > +/**
> > + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> > + * @ts: timespec64 to be filled
> > + *
> > + * Adjust floor to realtime and compare it to the coarse time. Fill
> > + * @ts with the latest one. Returns opaque cookie suitable for passing
> > + * to ktime_get_real_ts64_mg().
> > + */
> > +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> > +{
> > + struct timekeeper *tk = &tk_core.timekeeper;
> > + u64 floor = atomic64_read(&mg_floor);
> > + ktime_t f_real, offset, coarse;
> > + unsigned int seq;
> > +
> > + WARN_ON(timekeeping_suspended);
> > +
> > + do {
> > + seq = read_seqcount_begin(&tk_core.seq);
> > + *ts = tk_xtime(tk);
> > + offset = *offsets[TK_OFFS_REAL];
> > + } while (read_seqcount_retry(&tk_core.seq, seq));
> > +
> > + coarse = timespec64_to_ktime(*ts);
> > + f_real = ktime_add(floor, offset);
> > + if (ktime_after(f_real, coarse))
> > + *ts = ktime_to_timespec64(f_real);
> > + return floor;
> > +}
> > +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
> > +
> > +/**
> > + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> > + * @ts: pointer to the timespec to be set
> > + * @cookie: opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> > + *
> > + * Get a current monotonic fine-grained time value and attempt to swap
> > + * it into the floor using @cookie as the "old" value. @ts will be
> > + * filled with the resulting floor value, regardless of the outcome of
> > + * the swap.
>
> I'd add more detail here to clarify that this can return a coarse
> floor value if the cookie is stale.
>

Sure, or I can just drop the cookie, if that's better.

> > +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> > +{
> > + struct timekeeper *tk = &tk_core.timekeeper;
> > + ktime_t offset, mono, old = (ktime_t)cookie;
> > + unsigned int seq;
> > + u64 nsecs;
> > +
> > + WARN_ON(timekeeping_suspended);
> > +
> > + do {
> > + seq = read_seqcount_begin(&tk_core.seq);
> > +
> > + ts->tv_sec = tk->xtime_sec;
> > + mono = tk->tkr_mono.base;
> > + nsecs = timekeeping_get_ns(&tk->tkr_mono);
> > + offset = *offsets[TK_OFFS_REAL];
> > + } while (read_seqcount_retry(&tk_core.seq, seq));
> > +
> > + mono = ktime_add_ns(mono, nsecs);
> > +
> > + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> > + ts->tv_nsec = 0;
> > + timespec64_add_ns(ts, nsecs);
> > + } else {
> > + /*
> > + * Something has changed mg_floor since "old" was
> > + * fetched. That value is just as valid, so accept it.
> > + */
>
> Mostly because I embarrassingly tripped over this in front of
> everyone, I might suggest:
> /*
> * mg_floor was updated since the cookie was fetched, so the
> * the try_cmpxchg failed. However try_cmpxchg updated old
> * with the current mg_floor, so use that to return the current
> * coarse floor value
> */
>
> :)

Will do. I've already added some comments to that effect that should
help clarify things.

> Additionally, for these two new interfaces, since they are so
> specifically tuned to this particular need in the vfs, it might be
> good to add a comments in the kerneldoc here that they are special
> case interfaces for the vfs and should be avoided outside that space.
>
> That probably would alleviate my main worries, and we can polish the
> details around cookie or no cookie later if needed.
>

Will do.

Thanks for the review!
--
Jeff Layton <jlayton@xxxxxxxxxx>