Re: [PATCH v2] timekeeping: move multigrain timestamp floor handling into timekeeper
From: John Stultz
Date: Thu Sep 12 2024 - 16:11:56 EST
On Thu, Sep 12, 2024 at 11:02 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> The kernel test robot reported a performance hit in some will-it-scale
> tests due to the multigrain timestamp patches. My own testing showed
> about a 7% drop in performance on the pipe1_threads test, and the data
> showed that coarse_ctime() was slowing down current_time().
So, you provided some useful detail about why coarse_ctime() was slow
in your reply earlier, but it would be good to preserve that in the
commit message here.
> Move the multigrain timestamp floor tracking word into timekeeper.c. Add
> two new public interfaces: The first fills a timespec64 with the later
> of the coarse-grained clock and the floor time, and the second gets a
> fine-grained time and tries to swap it into the floor and fills a
> timespec64 with the result.
>
> The first function returns an opaque cookie that is suitable for passing
> to the second, which will use it as the "old" value in the cmpxchg.
The cookie usage isn't totally clear to me right off. It feels a bit
more subtle then I'd expect.
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 5391e4167d60..bb039c9d525e 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw ____cacheline_aligned = {
> .base[1] = FAST_TK_INIT,
> };
>
> +/*
> + * This represents the latest fine-grained time that we have handed out as a
> + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> + * realtime clock on an as-needed basis.
> + */
> +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> +
So I do really like this general approach of having an internal floor
value combined with special coarse/fine grained assessors that work
with the floor, so we're not impacting the normal hotpath logic
(basically I was writing up a suggestion to this effect to the thread
with Arnd when I realized you had follow up patch in my inbox).
> static inline void tk_normalize_xtime(struct timekeeper *tk)
> {
> while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
> }
> EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
>
> +/**
> + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> + * @ts: timespec64 to be filled
> + *
> + * Adjust floor to realtime and compare it to the coarse time. Fill
> + * @ts with the latest one. Returns opaque cookie suitable to pass
> + * to ktime_get_real_ts64_mg.
> + */
> +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> +{
> + struct timekeeper *tk = &tk_core.timekeeper;
> + u64 floor = atomic64_read(&mg_floor);
> + ktime_t f_real, offset, coarse;
> + unsigned int seq;
> +
> + WARN_ON(timekeeping_suspended);
> +
> + do {
> + seq = read_seqcount_begin(&tk_core.seq);
> + *ts = tk_xtime(tk);
> + offset = *offsets[TK_OFFS_REAL];
> + } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> + coarse = timespec64_to_ktime(*ts);
> + f_real = ktime_add(floor, offset);
> + if (ktime_after(f_real, coarse))
> + *ts = ktime_to_timespec64(f_real);
> + return floor;
> +}
> +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
Generally this looks ok to me.
> +/**
> + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> + * @ts: pointer to the timespec to be set
> + * @cookie: opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> + *
> + * Get a current monotonic fine-grained time value and attempt to swap
> + * it into the floor using @cookie as the "old" value. @ts will be
> + * filled with the resulting floor value, regardless of the outcome of
> + * the swap.
> + */
Again this cookie argument usage and the behavior of this function
isn't very clear to me.
> +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> +{
> + struct timekeeper *tk = &tk_core.timekeeper;
> + ktime_t offset, mono, old = (ktime_t)cookie;
> + unsigned int seq;
> + u64 nsecs;
> +
> + WARN_ON(timekeeping_suspended);
> +
> + do {
> + seq = read_seqcount_begin(&tk_core.seq);
> +
> + ts->tv_sec = tk->xtime_sec;
> + mono = tk->tkr_mono.base;
> + nsecs = timekeeping_get_ns(&tk->tkr_mono);
> + offset = *offsets[TK_OFFS_REAL];
> + } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> + mono = ktime_add_ns(mono, nsecs);
> + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> + ts->tv_nsec = 0;
> + timespec64_add_ns(ts, nsecs);
> + } else {
> + *ts = ktime_to_timespec64(ktime_add(old, offset));
> + }
> +
> +}
> +EXPORT_SYMBOL(ktime_get_real_ts64_mg);
So initially I was expecting this to look something like (sorry for
the whitespace damage here):
{
do {
seq = read_seqcount_begin(&tk_core.seq);
ts->tv_sec = tk->xtime_sec;
mono = tk->tkr_mono.base;
nsecs = timekeeping_get_ns(&tk->tkr_mono);
offset = *offsets[TK_OFFS_REAL];
} while (read_seqcount_retry(&tk_core.seq, seq));
mono = ktime_add_ns(mono, nsecs);
do {
old = atomic64_read(&mg_floor);
if (floor >= mono)
break;
} while(!atomic64_try_cmpxchg(&mg_floor, old, mono);
ts->tv_nsec = 0;
timespec64_add_ns(ts, nsecs);
}
Where you read the tk data, atomically update the floor (assuming it's
not in the future) and then return the finegrained value, not needing
to manage a cookie value.
But instead, it seems like if something has happened since the cookie
value was saved (another cpu getting a fine grained timestamp), your
ktime_get_real_ts64_mg() will fall back to returning the same coarse
grained time saved to the cookie, as if no time had past?
It seems like that could cause problems:
cpu1 cpu2
--------------------------------------------------------------------------
t2a = ktime_get_coarse_real_ts64_mg
t1a = ktime_get_coarse_real_ts64_mg()
t1b = ktime_get_real_ts64_mg(t1a)
t2b = ktime_get_real_ts64_mg(t2a)
Where t2b will seem to be before t1b, even though it happened afterwards.
thanks
-john