Re: [PATCH 2/5] sched_ext: Manage the validity of scx_rq_clock
From: Changwoo Min
Date: Mon Nov 18 2024 - 20:23:38 EST
Hello,
Thank you for the prompt feedback. I hope the following answers
can clarify most of your doubts.
On 24. 11. 18. 18:41, Peter Zijlstra wrote:
On Mon, Nov 18, 2024 at 12:46:32AM +0900, Changwoo Min wrote:
The main reason to keep the second copy (rq->scx.clock) is that
a BPF scheduler can call scx_bpf_clock_get_ns() at almost any
time in any context, including any of sched_ext operations, BPF
timer callbacks, BPF syscalls, kprobes, and so on.
If it's going to be a BPF wide thing, why is it presented as part of
sched_ext ? That makes no sense.
There is a confusion here. scx_bpf_clock_get_ns() is for BPF
schedulers, not for random BPF programs. In almost all cases, it
will be used in the shced_ext operations, such as ops.running()
and ops.stopping(), to implement scheduling policies. However, if
BPF schedulers use other BPF features, such as BPF timer,
scx_bpf_clock_get_ns() also can be used. For example, scx_lavd uses
a BPF timer for periodic background processing; scx_lavd and
scx_flash use kprobe to trace futex system calls. Also, since
scx_bpf_clock_get_ns() relies on rq lock, it is not meaningful
outside of the BPF schedulers. Hence, it should be a part of
sched_ext.
Another approach would be to extend struct sched_clock_data (in
kernel/sched/clock.c) to store the update flag
(SCX_RQ_CLK_UPDATED). This would be the best regarding the number
of cache line accesses. However, that would be an overkill since
now sched_clock_data stores the sched_ext-specific data.
I thought it would be better to keep sched_ext specific data in
one place, struct scx_rq, for managibility.
What's the purpose of that flag? Why can't BPF use sched_clock_local()
and call it a day?
Let's suppose the following timeline:
T1. rq_lock(rq)
T2. update_rq_clock(rq)
T3. a sched_ext BPF operation
T4. rq_unlock(rq)
T5. a sched_ext BPF operation
T6. rq_lock(rq)
T7. update_rq_clock(rq)
For [T2, T4), we consider that rq clock is valid
(SCX_RQ_CLK_UPDATED is set), so scx_bpf_clock_get_ns calls during
[T2, T4) (including T3) will return the rq clock updated at T2.
Let's think about what we should do for the duration [T4, T7)
when a BPF scheduler can still call scx_bpf_clock_get_ns (T5).
During that duration, we consider the rq clock is invalid
(SCX_RQ_CLK_UPDATED is unset). So when calling
scx_bpf_clock_get_ns at T5, we call sched_clock() to get the
fresh clock.
I think the term `UPDATED` was misleading. I will change it to
`VALID` in the next version.
Growing sched_clock_data shouldn't be a problem, it's only 24 bytes, so
we have plenty free bytes there.
Alright. I will change the current implementation and extend
`struct sched_clock_data` to store the `VALID` flag in the next
version.