Hello,Sure. I will merge patches 2, 3, 4 into one.
I'd roll the preceding two patches into this one.
On Mon, Dec 09, 2024 at 03:15:29PM +0900, Changwoo Min wrote:
...
3) Monotonically non-decreasing clock for the same CPU: scx_bpf_now_ns()
guarantees the clock never goes backward when comparing them in the same
CPU. On the other hand, when comparing clocks in different CPUs, there
is no such guarantee -- the clock can go backward. It provides a
monotonically *non-decreasing* clock so that it would provide the same
clock values in two different scx_bpf_now_ns() calls in the same CPU
during the same period of when the rq clock is valid.
We probably should provide helpers to calculate deltas between timestamps
and use them consitently in SCX scheds. e.g. ops.runnable() and
ops.running() can run on different CPUs and it'd be useful and common to
calculate the delta between the two points in time.
+ if (!(rq->scx.flags & SCX_RQ_CLK_VALID) ||
+ (rq->scx.prev_clock >= clock)) {
The clocks usually start at zero but it'd still be a good idea to use
time_after64() and friends when comparing the ordering between timestamps.
Yup, you are right. I will update the comments.
+ /*
+ * If the rq clock is invalid or goes backward,
+ * start a new rq clock period with a fresh sched_clock_cpu().
+ *
+ * The cached rq clock can go backward because there is a
+ * race with a timer interrupt. Suppose that a timer interrupt
This is not limited to timer interrupts, right? This kfunc can be called
from anywhere including tracepoints for code running in IRQ
I will beautify the text for readability.
+ * occurred while running scx_bpf_now_ns() *after* reading the
+ * rq clock and *before* comparing the if condition. The timer
+ * interrupt will eventually call a BPF scheduler's ops.tick(),
+ * and the BPF scheduler can call scx_bpf_now_ns(). Since the
+ * scheduler core updates the rq clock before calling
+ * ops.tick(), the scx_bpf_now_ns() call will get the fresh
+ * clock. After handling the timer interrupt, the interrupted
This might be easier to explain with two column table explaning what each
party is doing in what order.
+ * scx_bpf_now_ns() will be resumed, so the if condition will
+ * be compared. In this case, the clock, which was read before
+ * the timer interrupt, will be the same as rq->scx.prev_clock.
+ * When such a case is detected, start a new rq clock period
+ * with a fresh sched_clock_cpu().
+ */
+ clock = sched_clock_cpu(cpu_of(rq));
+ scx_rq_clock_update(rq, clock);
Hmmm... what happens if e.g. a timer ends up performing multiple operations
each going through rq pin/unpin?