Re: [PATCH v2] timekeeping: move multigrain timestamp floor handling into timekeeper

From: Jan Kara
Date: Fri Sep 13 2024 - 07:26:18 EST


On Thu 12-09-24 14:02:52, Jeff Layton wrote:
> The kernel test robot reported a performance hit in some will-it-scale
> tests due to the multigrain timestamp patches. My own testing showed
> about a 7% drop in performance on the pipe1_threads test, and the data
> showed that coarse_ctime() was slowing down current_time().
>
> Move the multigrain timestamp floor tracking word into timekeeper.c. Add
> two new public interfaces: The first fills a timespec64 with the later
> of the coarse-grained clock and the floor time, and the second gets a
> fine-grained time and tries to swap it into the floor and fills a
> timespec64 with the result.
>
> The first function returns an opaque cookie that is suitable for passing
> to the second, which will use it as the "old" value in the cmpxchg.
>
> With this patch on top of the multigrain series, the will-it-scale
> pipe1_threads microbenchmark shows these averages on my test rig:
>
> v6.11-rc7: 103561295 (baseline)
> v6.11-rc7 + mgtime + this: 101357203 (~2% performance drop)
>
> Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> Closes: https://lore.kernel.org/oe-lkp/202409091303.31b2b713-oliver.sang@xxxxxxxxx
> Suggested-by: Arnd Bergmann <arnd@xxxxxxxxxx>
> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>

One question regarding the cookie handling as well :)

> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 5391e4167d60..bb039c9d525e 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw ____cacheline_aligned = {
> .base[1] = FAST_TK_INIT,
> };
>
> +/*
> + * This represents the latest fine-grained time that we have handed out as a
> + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> + * realtime clock on an as-needed basis.
> + */
> +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> +
> static inline void tk_normalize_xtime(struct timekeeper *tk)
> {
> while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
> }
> EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
>
> +/**
> + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> + * @ts: timespec64 to be filled
> + *
> + * Adjust floor to realtime and compare it to the coarse time. Fill
> + * @ts with the latest one. Returns opaque cookie suitable to pass
> + * to ktime_get_real_ts64_mg.
> + */
> +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> +{
> + struct timekeeper *tk = &tk_core.timekeeper;
> + u64 floor = atomic64_read(&mg_floor);
> + ktime_t f_real, offset, coarse;
> + unsigned int seq;
> +
> + WARN_ON(timekeeping_suspended);
> +
> + do {
> + seq = read_seqcount_begin(&tk_core.seq);
> + *ts = tk_xtime(tk);
> + offset = *offsets[TK_OFFS_REAL];
> + } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> + coarse = timespec64_to_ktime(*ts);
> + f_real = ktime_add(floor, offset);
> + if (ktime_after(f_real, coarse))
> + *ts = ktime_to_timespec64(f_real);
> + return floor;
> +}
> +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
> +
> +/**
> + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> + * @ts: pointer to the timespec to be set
> + * @cookie: opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> + *
> + * Get a current monotonic fine-grained time value and attempt to swap
> + * it into the floor using @cookie as the "old" value. @ts will be
> + * filled with the resulting floor value, regardless of the outcome of
> + * the swap.
> + */
> +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> +{
> + struct timekeeper *tk = &tk_core.timekeeper;
> + ktime_t offset, mono, old = (ktime_t)cookie;
> + unsigned int seq;
> + u64 nsecs;

So what would be the difference if we did instead:

old = atomic64_read(&mg_floor);

and not bother with the cookie? AFAIU this could result in somewhat more
updates to mg_floor (the contention on the mg_floor cacheline would be the
same but there would be more invalidates of the cacheline). OTOH these
updates can happen only if max(current_coarse_time, mg_floor) ==
inode->i_ctime which is presumably rare? What is your concern that I'm
missing?

Honza
> +
> + WARN_ON(timekeeping_suspended);
> +
> + do {
> + seq = read_seqcount_begin(&tk_core.seq);
> +
> + ts->tv_sec = tk->xtime_sec;
> + mono = tk->tkr_mono.base;
> + nsecs = timekeeping_get_ns(&tk->tkr_mono);
> + offset = *offsets[TK_OFFS_REAL];
> + } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> + mono = ktime_add_ns(mono, nsecs);
> + if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> + ts->tv_nsec = 0;
> + timespec64_add_ns(ts, nsecs);
> + } else {
> + *ts = ktime_to_timespec64(ktime_add(old, offset));
> + }
> +
> +}
> +EXPORT_SYMBOL(ktime_get_real_ts64_mg);
> +
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR