Re: [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx()
From: Kuba Piecuch
Date: Thu Jun 25 2026 - 05:48:29 EST
Hi Tejun,
On Wed Jun 24, 2026 at 11:55 PM UTC, Tejun Heo wrote:
> put_prev_task_scx() warns when a runnable task drops to a lower
> sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would
> otherwise keep it running.
>
> Under core scheduling that assumption is wrong: a forced-idle SMT sibling
> reschedules through the core_pick fast path in pick_next_task(), which
> skips balance() for the CPU, so balance_one() never runs and a runnable
Nit: balance_one() doesn't happen in balance() anymore, it happens in pick.
So IMO it should read "... skips pick_task_scx() for the CPU, ...".
> task can drop to idle with ENQ_LAST unset. Skip the warning when core
> scheduling is enabled.
>
> Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> ---
> kernel/sched/ext/ext.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c
> index 9c9cb9d08bca..503c4d2105ee 100644
> --- a/kernel/sched/ext/ext.c
> +++ b/kernel/sched/ext/ext.c
> @@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
> * which should trigger an explicit follow-up scheduling event.
> */
> if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
> - WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
> + /* core-sched can force cpu idle while @p is runnable */
> + if (!sched_core_enabled(rq))
Is there a more precise check that we could do to determine if this switch is
due to core-sched forcing the CPU idle? I was thinking about
rq->core->core_forceidle_count, but IIUC that's the core-wide number of CPUs
forced idle, so it's not a reliable signal about any particular CPU.
> + WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
> do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
> } else {
> do_enqueue_task(rq, p, 0, -1);
This patch made me think a bit about core-sched interactions and I have
a concern about IMMED tasks staying on local DSQ when the CPU is forced idle.
I wasn't able to quickly convince myself that an IMMED task will be reenqueued
in the case where a CPU running an SCX task has an IMMED task enqueued in its
local DSQ by a remote CPU, and the CPU is forced idle while the IMMED task
is on the local DSQ.
Looks like we might need a call to schedule_reenq_local() somewhere in here
(in a separate patch, of course). WDYT?
Thanks,
Kuba