Re: [PATCH sched_ext/for-6.12-fixes] Disable SM_IDLE/rq empty path when scx_enabled

From: K Prateek Nayak
Date: Tue Sep 24 2024 - 23:18:55 EST


Hello Tejun,

On 9/25/2024 3:51 AM, Tejun Heo wrote:
Hello,

On Tue, Sep 24, 2024 at 09:10:02AM +0530, K Prateek Nayak wrote:
prev_state = READ_ONCE(prev->__state);
if (sched_mode == SM_IDLE) {
- if (!rq->nr_running) {
+ /* SCX must consult the BPF scheduler to tell if rq is empty */

I was wondering if sched_ext case could simply do:

if (scx_enabled())
prev_balance(rq, prev, rf);

and use "rq->scx.flags" to skip balancing in balance_scx() later when
__pick_next_task() calls prev_balance() but (and please correct me if
I'm wrong here) balance_scx() calls balance_one() which can call
consume_dispatch_q() to pick a task from global / user-defined dispatch
queue, and in doing so, it does not update "rq->nr_running".

Hmm... would that be a meaningful optimization? prev_balance() calls into
SCX's dispatch path and there can be quite a bit going on there. I'm not
sure whether it'd worth much to save a trip through __pick_next_task().

Probably not worth it given balance_scx() is indeed very complex and can
release and re-acquire the rq-lock (I don't believe it should be a
problem in SM_IDLE path but the given he complexity, I could have easily
missed something again :)


I could only see add_nr_running() being called from enqueue_task_scx()
and this is even before the ext core calls do_enqueue_task() which hooks
into the bpf layer which makes the decision where the task actually
goes.

Is my understanding correct that whichever CPU is the target for the
enqueue_task_scx() callback initially is the one that accounts the
enqueue in "rq->nr_running" until the task is dequeued or did I miss
something?

Whenever a task is dispatched to a local DSQ of a CPU including from
balance_one(), if the task is not on that CPU already,
move_remote_task_to_local_dsq() is called which migrates the task to the
target CPU by deactivating and then re-activating it. As deactivating and
re-activating involves dequeueing and re-enqueueing, rq->running gets
updated accordingly.

Ah! I gave up too soon going down the call chain. Thank you for
clarifying.


Thanks.


--
Thanks and Regards,
Prateek