Re: [PATCH 2/5] sched_ext: Manage the validity of scx_rq_clock

From: Peter Zijlstra
Date: Tue Nov 19 2024 - 03:17:54 EST


On Tue, Nov 19, 2024 at 10:19:44AM +0900, Changwoo Min wrote:

> > What's the purpose of that flag? Why can't BPF use sched_clock_local()
> > and call it a day?
>
> Let's suppose the following timeline:
>
> T1. rq_lock(rq)
> T2. update_rq_clock(rq)
> T3. a sched_ext BPF operation
> T4. rq_unlock(rq)
> T5. a sched_ext BPF operation
> T6. rq_lock(rq)
> T7. update_rq_clock(rq)
>
> For [T2, T4), we consider that rq clock is valid
> (SCX_RQ_CLK_UPDATED is set), so scx_bpf_clock_get_ns calls during
> [T2, T4) (including T3) will return the rq clock updated at T2.
> Let's think about what we should do for the duration [T4, T7)
> when a BPF scheduler can still call scx_bpf_clock_get_ns (T5).
> During that duration, we consider the rq clock is invalid
> (SCX_RQ_CLK_UPDATED is unset). So when calling
> scx_bpf_clock_get_ns at T5, we call sched_clock() to get the
> fresh clock.
>
> I think the term `UPDATED` was misleading. I will change it to
> `VALID` in the next version.

So the reason rq->clock is tied to rq->lock, is to ensure a scheduling
operation happens at a single point in time.

Suppose re-nice, you dequeue the task, you modify its properties
(weight) and then you requeue it. If time were passing 'normally' the
task would loose the time between dequeue and enqueue -- this is not
right.

The only obvious exception here is a migration.

So the question then becomes, what is T5 doing and is it 'right' for it
to get a fresh clock value.

Please give an example of T5 -- I really don't know this BPF crap much
-- and reason about how the clock should behave.