Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics

From: Andrea Righi

Date: Thu Feb 12 2026 - 05:23:18 EST

On Wed, Feb 11, 2026 at 12:37:13PM -1000, Tejun Heo wrote:
> Hello,
>
> On Wed, Feb 11, 2026 at 11:34:54PM +0100, Andrea Righi wrote:
> > > The end result is about the same because whenever we migrate we're sending
> > > it to the local DSQ of the destination CPU, so whether we generate the event
> > > on deactivation of the source CPU or activation on the destination doesn't
> > > make *whole* lot of difference. However, conceptually, migrations are
> > > internal events. There isn't anything actionable for the BPF scheduler. The
> > > reason why ops.dequeue() should be emitted is not because the task is
> > > changing CPUs (which caused the deactivation) but the fact that it ends up
> > > in a local DSQ afterwards. I think it'll be cleaner both conceptually and
> > > code-wise to emit ops.dequeue() only from dispatch_enqueue() and dequeue
> > > paths.
> >
> > Does this include core scheduler migrations or just SCX-initiated
> > migrations (move_remote_task_to_local_dsq())?
> >
> > Because with core scheduler migrations we trigger ops.enqueue(), so we
> > should also trigger ops.dequeue(). Or we need to send the task straight to
> > local to prevent calling ops.enqueue().
>
> I'm a bit lost. Can you elaborate on core scheduler migrations triggering
> ops.enqueue()?

Alright, let me re-elaborate more on this with a (slightly) fresher brain.

We have two main classes of migrations:

1) Internal SCX-initiated migrations: e.g.,
dispatch_to_local_dsq() -> move_remote_task_to_local_dsq(), or
consume_remote_task() -> move_remote_task_to_local_dsq(), these
are completely internal to SCX and shouldn't trigger
ops.dequeue/enqueue()

2) Core scheduler migrations
- CPU affinity: sched_setaffinity, cpuset/cgroup mask change, etc.
affine_move_task -> move_queued_task migrates it -> we trigger
ops.dequeue(SCX_DEQ_SCHED_CHANGE) on the source and ops.enqueue() on
the target.

- Core scheduling (CONFIG_SCHED_CORE): two different cases:
- Migration (task moved between runqueues via move_queued_task_locked()
to satisfy core cookie)

- NUMA balancing: migrate_task_to() can move an SCX task to another CPU

- CPU hotplug: on CPU down, runnable tasks are pushed off via
__balance_push_cpu_stop() -> __migrate_task()

If we want to skip ops.dequeue() only for internal SCX migrations (and
maybe also for NUMA and hotplug?), then only checking
task_on_rq_migrating(p) is not enough, because that's true for every
migration listed above and we'd skip all of them.

So, we need a way to mark "this migration is internal to SCX", like a new
SCX_TASK_MIGRATING_INTERNAL flag?

The alternative is to always trigger ops.dequeue/enqueue() on every
migration (no flag): even for internal SCX migrations the BPF scheduler
could use it to track task movements, though there's nothing it can do.
That way we don't need the additional flag.

Does one of these directions fit better with what you have in mind?

Thanks,
-Andrea