Re: [PATCH v4 4/6] sched_ext: Implement scx_bpf_now_ns()

From: Changwoo Min
Date: Thu Dec 12 2024 - 20:42:20 EST


Hello,

On 24. 12. 11. 17:14, Tejun Heo wrote:
Hello,

I'd roll the preceding two patches into this one.
Sure. I will merge patches 2, 3, 4 into one.

On Mon, Dec 09, 2024 at 03:15:29PM +0900, Changwoo Min wrote:
...
3) Monotonically non-decreasing clock for the same CPU: scx_bpf_now_ns()
guarantees the clock never goes backward when comparing them in the same
CPU. On the other hand, when comparing clocks in different CPUs, there
is no such guarantee -- the clock can go backward. It provides a
monotonically *non-decreasing* clock so that it would provide the same
clock values in two different scx_bpf_now_ns() calls in the same CPU
during the same period of when the rq clock is valid.

We probably should provide helpers to calculate deltas between timestamps
and use them consitently in SCX scheds. e.g. ops.runnable() and
ops.running() can run on different CPUs and it'd be useful and common to
calculate the delta between the two points in time.

If I understand correctly, it should be something similar to
jiffies_delta_to_msecs(). Regarding the API name, what about
scx_time_delta(s64 time_delta) and/or scx_time_diff(u64 time_a,
u64 time_b)?

+ if (!(rq->scx.flags & SCX_RQ_CLK_VALID) ||
+ (rq->scx.prev_clock >= clock)) {

The clocks usually start at zero but it'd still be a good idea to use
time_after64() and friends when comparing the ordering between timestamps.

Sure. I will update the code as suggested.


+ /*
+ * If the rq clock is invalid or goes backward,
+ * start a new rq clock period with a fresh sched_clock_cpu().
+ *
+ * The cached rq clock can go backward because there is a
+ * race with a timer interrupt. Suppose that a timer interrupt

This is not limited to timer interrupts, right? This kfunc can be called
from anywhere including tracepoints for code running in IRQ
Yup, you are right. I will update the comments.



+ * occurred while running scx_bpf_now_ns() *after* reading the
+ * rq clock and *before* comparing the if condition. The timer
+ * interrupt will eventually call a BPF scheduler's ops.tick(),
+ * and the BPF scheduler can call scx_bpf_now_ns(). Since the
+ * scheduler core updates the rq clock before calling
+ * ops.tick(), the scx_bpf_now_ns() call will get the fresh
+ * clock. After handling the timer interrupt, the interrupted

This might be easier to explain with two column table explaning what each
party is doing in what order.
I will beautify the text for readability.


+ * scx_bpf_now_ns() will be resumed, so the if condition will
+ * be compared. In this case, the clock, which was read before
+ * the timer interrupt, will be the same as rq->scx.prev_clock.
+ * When such a case is detected, start a new rq clock period
+ * with a fresh sched_clock_cpu().
+ */
+ clock = sched_clock_cpu(cpu_of(rq));
+ scx_rq_clock_update(rq, clock);

Hmmm... what happens if e.g. a timer ends up performing multiple operations
each going through rq pin/unpin?

That should be okay. After multiple rq pin/unpin operations, the
resumed scx_bpf_now_ns() will found that the prev_clock is
greater (not equal) than the current clock, so it will get the
fresh clock.

Thanks!
Changwoo Min