Re: [PATCH v8] sched_ext: idle: Refresh idle masks during idle-to-idle transitions

From: Tejun Heo
Date: Fri Jan 10 2025 - 17:44:38 EST


On Fri, Jan 10, 2025 at 11:16:31PM +0100, Andrea Righi wrote:
> With the consolidation of put_prev_task/set_next_task(), see
> commit 436f3eed5c69 ("sched: Combine the last put_prev_task() and the
> first set_next_task()"), we are now skipping the transition between
> these two functions when the previous and the next tasks are the same.
>
> As a result, the scx idle state of a CPU is updated only when
> transitioning to or from the idle thread. While this is generally
> correct, it can lead to uneven and inefficient core utilization in
> certain scenarios [1].
>
> A typical scenario involves proactive wake-ups: scx_bpf_pick_idle_cpu()
> selects and marks an idle CPU as busy, followed by a wake-up via
> scx_bpf_kick_cpu(), without dispatching any tasks. In this case, the CPU
> continues running the idle thread, returns to idle, but remains marked
> as busy, preventing it from being selected again as an idle CPU (until a
> task eventually runs on it and releases the CPU).
>
> For example, running a workload that uses 20% of each CPU, combined with
> an scx scheduler using proactive wake-ups, results in the following core
> utilization:
>
> CPU 0: 25.7%
> CPU 1: 29.3%
> CPU 2: 26.5%
> CPU 3: 25.5%
> CPU 4: 0.0%
> CPU 5: 25.5%
> CPU 6: 0.0%
> CPU 7: 10.5%
>
> To address this, refresh the idle state also in pick_task_idle(), during
> idle-to-idle transitions, but only trigger ops.update_idle() on actual
> state changes to prevent unnecessary updates to the scx scheduler and
> maintain balanced state transitions.
>
> With this change in place, the core utilization in the previous example
> becomes the following:
>
> CPU 0: 18.8%
> CPU 1: 19.4%
> CPU 2: 18.0%
> CPU 3: 18.7%
> CPU 4: 19.3%
> CPU 5: 18.9%
> CPU 6: 18.7%
> CPU 7: 19.3%
>
> [1] https://github.com/sched-ext/scx/pull/1139
>
> Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
> Signed-off-by: Andrea Righi <arighi@xxxxxxxxxx>

Applied to sched_ext/for-6.13-fixes.

Thanks.

--
tejun