Re: [PATCH v2] sched_ext: Honor SCX_OPS_ALWAYS_ENQ_IMMED on framework-internal goto-local paths
From: Andrea Righi
Date: Mon Apr 20 2026 - 01:23:50 EST
Hi zhidao,
On Mon, Apr 20, 2026 at 11:56:46AM +0800, zhidao su wrote:
> SCX_OPS_ALWAYS_ENQ_IMMED promises that SCX_ENQ_IMMED is set on all local
> DSQ enqueues. scx_vet_enq_flags() enforces this for BPF kfunc callers
> (scx_bpf_dsq_insert, scx_bpf_dsq_move_*), but the framework-internal
> goto-local paths in do_enqueue_task() — PF_EXITING, migration-disabled,
> and !scx_rq_online fallbacks — bypass scx_vet_enq_flags() entirely.
>
> When a scheduler sets SCX_OPS_ALWAYS_ENQ_IMMED, tasks hitting these
> goto-local paths arrive at dispatch_enqueue() without SCX_ENQ_IMMED in
> enq_flags, violating the flag's documented semantics.
>
> This can be observed with trace_printk instrumentation at the enqueue:
> label while running a multi-threaded fork-exit workload under a scheduler
> with SCX_OPS_ALWAYS_ENQ_IMMED:
>
> Before (scx_simple + ALWAYS_ENQ_IMMED, 2 CPUs, mmap-contention exit):
> 95 PF_EXITING local enqueues, 95/95 IMMED=0 ALW=1 <-- bug
>
> After:
> 1030 PF_EXITING local enqueues, 1030/1030 IMMED=1 ALW=1 <-- fixed
>
> Fix by checking SCX_OPS_ALWAYS_ENQ_IMMED at the enqueue: label and
> setting SCX_ENQ_IMMED when dispatching to a local DSQ. This mirrors
> what scx_vet_enq_flags() does for BPF callers.
>
> Fixes: 3229ac4a5ef5 ("sched_ext: Add SCX_OPS_ALWAYS_ENQ_IMMED ops flag")
> Signed-off-by: zhidao su <suzhidao@xxxxxxxxxx>
> ---
> v2: Resend to correct a submission error in v1 where unrelated files
> were accidentally included in the patch. The code change is
> identical; only kernel/sched/ext.c is modified. Apologies for
> the noise.
> ---
> kernel/sched/ext.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 9628c64e5592..0758f5e5a8f0 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -1859,7 +1859,9 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
> * Clear persistent TASK_IMMED for fresh enqueues, see dsq_inc_nr().
> * Note that exiting and migration-disabled tasks that skip
> * ops.enqueue() below will lose IMMED protection unless
> - * %SCX_OPS_ENQ_EXITING / %SCX_OPS_ENQ_MIGRATION_DISABLED are set.
> + * %SCX_OPS_ENQ_EXITING / %SCX_OPS_ENQ_MIGRATION_DISABLED are set,
> + * or %SCX_OPS_ALWAYS_ENQ_IMMED is enabled (which re-applies IMMED
> + * at the enqueue: label below).
> */
> p->scx.flags &= ~SCX_TASK_IMMED;
>
> @@ -1949,6 +1951,17 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
> */
> touch_core_sched(rq, p);
> refill_task_slice_dfl(sch, p);
> +
> + /*
> + * Honor %SCX_OPS_ALWAYS_ENQ_IMMED for framework-internal local DSQ
> + * enqueues (PF_EXITING, migration-disabled, !online fallbacks).
> + * scx_vet_enq_flags() already handles this for BPF kfunc callers,
> + * but the goto-local paths above bypass it.
> + */
> + if ((sch->ops.flags & SCX_OPS_ALWAYS_ENQ_IMMED) &&
> + dsq == &rq->scx.local_dsq)
> + enq_flags |= SCX_ENQ_IMMED;
> +
I'm not sure this should be applied across all fallback cases, it's probably
safer to avoid triggering re-enqueues for the internal events, especially
considering that the BPF scheduler doesn't have visibility into them.
If we do this we should also update %SCX_ENQ_IMMED documentation, that says:
* Exiting and migration-disabled tasks bypass ops.enqueue() and
* are placed directly on a local DSQ without IMMED protection
* unless %SCX_OPS_ENQ_EXITING and %SCX_OPS_ENQ_MIGRATION_DISABLED
* are set respectively.
But again, do we actually want to do this? If SCX_OPS_ENQ_EXITING and
SCX_OPS_ENQ_MIGRATION_DISABLED aren't set, these cases are handled internally by
the sched_ext core, so triggering a re-enqueue seems unnecessary, as the BPF
scheduler wouldn't have visibility of such events anyway.
> dispatch_enqueue(sch, rq, dsq, p, enq_flags);
> }
>
> --
> 2.43.0
>
Thanks,
-Andrea