Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics

From: Andrea Righi

Date: Thu Feb 12 2026 - 17:30:27 EST

On Thu, Feb 12, 2026 at 08:35:55AM -1000, Tejun Heo wrote:
> Hello, Andrea.
>
> On Thu, Feb 12, 2026 at 07:14:13PM +0100, Andrea Righi wrote:
> ...
> > In ops.enqueue() the BPF scheduler doesn't necessarily pick a target CPU:
> > it can put the task on an arbitrary DSQ or even in some internal BPF data
> > structures. The task is still associated with a runqueue, but only to
> > satisfy a kernel requirement, for sched_ext that association isn't
> > meaningful, because the task isn't really "on" that CPU (in fact in
> > ops.dispatch() can do the "last minute" migration).
>
> Yes.
>
> > Therefore, keeping accurate per-CPU information from the kernel's
> > perspective doesn't buy us much, given that the BPF scheduler can keep
> > tasks in its own queues or structures.
> >
> > Accurate PELT is still doable: the BPF scheduler can track where it puts
> > each task in its own state, updates runnable load when it places the task
> > in a DSQ / data structure and when the task leaves (dequeue). And it can
> > use ops.running() / ops.stopping() for utilization.
>
> And the BPF sched might choose to do load aggregation at a differnt level
> too - e.g. maybe per-CPU load metric doesn't make sense given the machine
> and scheduler and only per-LLC level aggregation would be meaningful, which
> would be true for multiple of the current SCX schedulers given the per-LLC
> DSQ usage.
>
> > And with a proper ops.dequeue() semantics, PELT can be driven by the BPF
> > scheduler's own placement and the scx callbacks, not by the specific rq a
> > task is on.
> >
> > If all of the above makes sense for everyone, I agree that we don't need to
> > notify all the internal migrations.
>
> Yeah, I think we're on the same page. BTW, I wonder whether we could use
> p->scx.sticky_cpu to detect internal migrations. It's only used for internal
> migrations, so maybe it can be used for detection.

Perfect. And yes, I think if we set p->scx.sticky_cpu before
deactivate_task() in move_remote_task_to_local_dsq(), then in ops_dequeue()
we should be able to catch the internal migrations checking
task_on_rq_migrating(p) && p->scx.sticky_cpu >= 0.

I'll run some tests with that.

Thanks,
-Andrea