Re: [PATCH 2/3] sched_ext: Implement SCX_ENQ_IMMED
From: Tejun Heo
Date: Fri Mar 13 2026 - 06:40:54 EST
Hello,
On Mon, Mar 09, 2026 at 06:35:37PM +0100, Andrea Righi wrote:
> > diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
> > index f8df73044515..cd4272117be4 100644
> > --- a/kernel/sched/ext_internal.h
> > +++ b/kernel/sched/ext_internal.h
> > @@ -31,6 +31,8 @@ enum scx_consts {
> > SCX_BYPASS_LB_MIN_DELTA_DIV = 4,
> > SCX_BYPASS_LB_BATCH = 256,
> >
> > + SCX_REENQ_LOCAL_MAX_REPEAT = 256,
>
> That's a lot of re-enqueues. What if we simply ignore SCX_ENQ_IMMED when
> SCX_ENQ_REENQ is set?
It's meant to be a safety mechanism against system lockup, not a workaround
for BPF scheduler misbehavior.
> This would solve the SCX_OPS_ALWAYS_ENQ_IMMED issue and naturally limit the
> loop to a single retry:
> - first attempt (IMMED) fails -> task re-enqueued with REENQ flag,
> - second attempt sees REENQ -> ignores IMMED check -> queues normally on
> local DSQ.
>
> This approach seems more robust and would avoid the latency overhead of
> repeated failures (the re-enqueues were actually the reason of the latency
> issues that I was experiencing). If I don't use SCX_OPS_ALWAYS_ENQ_IMMED
> and I selectively use SCX_ENQ_IMMED with just one retry I can actually see
> some small, but consistent, benefits with scx_cosmos running some latency
> benchmarks.
The intention is making IMMED guarnatee immediate execution - if IMMED is
set, the task will get on the CPU or get re-enqueued. On v2 patchset, this
behavior is extended to staying on CPU. If an IMMED task is preempted for
whatever reason, it gets fully reenqueued instead of e.g. silently put back
on the local DSQ. The goal is giving the BPF controller full latency
control.
I don't think it makes sense to paper over IMMED failures. The BPF scheduler
shouldn't be doing that in the first place. If the CPU is not availalbe and
the task keeps requesting IMMED dispatch of a task on that CPU, the
scheduler is buggy. Is cosmos doing DSQ_LOCAL dispatch on single-CPU bound
tasks? If so, it shouldn't use ALWAYS_IMMED. Instead, it should only mark
dispatches that know the target CPU to be available (IOW, claimed idle) with
SCX_ENQ_IMMED. I don't think that's too much of a burden.
Thanks.
--
tejun