Re: [PATCH 2/5] sched_ext: Manage the validity of scx_rq_clock

From: Changwoo Min
Date: Tue Nov 26 2024 - 19:42:08 EST

Next message: Masahiro Yamada: "Re: [PATCH v2] [MIPS] Place __kernel_entry at the beginning of text section"
Previous message: Jakob Hauser: "[PATCH] iio: magnetometer: yas530: Use signed integer type for clamp limits"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello,

On 24. 11. 20. 00:57, Changwoo Min wrote:

Hello,

On 24. 11. 19. 17:17, Peter Zijlstra wrote:

On Tue, Nov 19, 2024 at 10:19:44AM +0900, Changwoo Min wrote:

Let's suppose the following timeline:

T1. rq_lock(rq)
T2. update_rq_clock(rq)
T3. a sched_ext BPF operation
T4. rq_unlock(rq)
T5. a sched_ext BPF operation
T6. rq_lock(rq)
T7. update_rq_clock(rq)

For [T2, T4), we consider that rq clock is valid
(SCX_RQ_CLK_UPDATED is set), so scx_bpf_clock_get_ns calls during
[T2, T4) (including T3) will return the rq clock updated at T2.
Let's think about what we should do for the duration [T4, T7)
when a BPF scheduler can still call scx_bpf_clock_get_ns (T5).
During that duration, we consider the rq clock is invalid
(SCX_RQ_CLK_UPDATED is unset). So when calling
scx_bpf_clock_get_ns at T5, we call sched_clock() to get the
fresh clock.

So the question then becomes, what is T5 doing and is it 'right' for it
to get a fresh clock value.

Please give an example of T5 -- I really don't know this BPF crap much
-- and reason about how the clock should behave.

Here is one example. `scx_central` uses a BPF timer for
preemptive scheduling. In every msec, the timer callback checks
if the currently running tasks exceed their timeslice. At the
beginning of the BPF timer callback (central_timerfn in
scx_central.bpf.c), scx_central gets the current time. When the
BPF timer callback runs, the rq clock could be invalid, the same
as T5. In this case, it is reasonable to return a fresh clock
value rather than returning the old one (T2).

Besides this timer example, scx_bpf_clock_get_ns() can be called
any callbacks defined in `struct sched_ext_ops`. Some callbacks
can be called without holding a rq lock (e.g., ops.cpu_online,
ops.cgroup_init). In these cases, it is reasonable to reutrn a
fresh clock value rather returning the old one.

I wonder if the above example is sufficient for you. If you need
more examples or clarification, please let me know.

Regarding the my following my comment in the previous email, ...

On 24. 11. 19. 10:19, Changwoo Min wrote:

Growing sched_clock_data shouldn't be a problem, it's only 24 bytes,
so we have plenty free bytes there.

Alright. I will change the current implementation and extend
`struct sched_clock_data` to store the `VALID` flag in the next
version.

I found `struct sched_clock_data` is defined only when
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, so I think extending
`struct sched_clock_data` is not an approach approach. Extending
`struct scx_rq` seems the best option after opting out
sched_clock_data. I will make sure the cached clock value and
flag in the scx_rq are in the same cache line to minimize the
cache misses. What do you think?

Thanks!
Changwoo Min

Next message: Masahiro Yamada: "Re: [PATCH v2] [MIPS] Place __kernel_entry at the beginning of text section"
Previous message: Jakob Hauser: "[PATCH] iio: magnetometer: yas530: Use signed integer type for clamp limits"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]