Re: Question about ktime_get_mono_fast_ns() non-monotonic behavior

From: Yosry Ahmed
Date: Fri Oct 14 2022 - 00:37:04 EST


On Thu, Oct 13, 2022 at 9:13 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
>
> On Thu, Oct 13, 2022 at 8:47 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> >
> > On Thu, Oct 13, 2022 at 8:42 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Oct 13, 2022 at 8:26 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> > > > On Thu, Oct 13, 2022 at 7:39 PM John Stultz <jstultz@xxxxxxxxxx> wrote:
> > > > > On Mon, Sep 26, 2022 at 2:18 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > I have a question about ktime_get_mono_fast_ns(), which is used by the
> > > > > > BPF helper bpf_ktime_get_ns() among other use cases. The comment above
> > > > > > this function specifies that there are cases where the observed clock
> > > > > > would not be monotonic.
> > > > > >
> > > > > > I had 2 beginner questions:
> > > > >
> > > > > Thinking about this a bit more, I have my own "beginner question": Why
> > > > > does bpf_ktime_get_ns() need to use the ktime_get_mono_fast_ns()
> > > > > accessor instead of ktime_get_ns()?
> > > > >
> > > > > I don't know enough about the contexts that bpf logic can run, so it's
> > > > > not clear to me and it's not obviously commented either.
> > > >
> > > > I am not the best person to answer this question (the BPF list is
> > > > CC'd, it's full of more knowledgeable people).
> > > >
> > > > My understanding is that because BPF programs can basically be run in
> > > > any context (because they can attach to almost all functions /
> > > > tracepoints in the kernel), the time accessor needs to be safe in all
> > > > contexts.
> > >
> > > Ah. Ok, the tracepoint connection is indeed likely the case. Thanks
> > > for clarifying.
> > >
> > > > Now that I know that ktime_get_mono_fast_ns() can drift significantly,
> > > > I am wondering why we don't just read sched_clock(). Can the
> > > > difference between sched_clock() on different cpus be even higher than
> > > > the potential drift from ktime_get_mono_fast_ns()?
> > >
> > > sched_clock is also lock free and so I think it's possible to have
> > > inconsistencies.
> >
> > Right, I am just trying to figure out which is worse,
> > ktime_get_mono_fast_ns() or sched_clock(). It appears to me that both
> > can be inconsistent, but at least AFAICT sched_clock() can only be
> > inconsistent if read across different cpus, right? It should also be
> > faster (at least in my experimentation).
> >
> > I am wondering if there is a bound on the inconsistency we might
> > observe from sched_clock() if we read it across different cpus, and if
> > there is, how does it compare to ktime_get_mono_fast_ns() in that
> > regard.
>
> Again, I think ktime_get_raw_fast_ns() (so CLOCK_MONOTONIC_RAW) is
> likely to be closer to sched_clock() as neither of them are NTP
> adjusted.
> (Which also likely makes them unusable for the case where timestamps
> are compared with userland CLOCK_MONOTONIC timestamps).
>
> So folks might need a new bpf interface for that.
>
> Also I think folks would want to avoid exporting sched_clock
> timestamps out to userland as they aren't connected to a well defined
> clockid, and may have odd behavior around suspend/resume, etc.

I think I should have described my use case long ago, sorry :) My use
case does not involve exporting any timestamps. It involves using BPF
programs to measure the duration of events, so running one BPF program
before and one after, then subtracting the acquired timestamps.

What I was looking for is something fast enough for hot paths and also
consistent enough (in case the two BPF programs end up running on
different cpus), and also safe from all contexts to satisfy this
general BPF restriction.

>
> thanks
> -john