Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics

From: Andrea Righi

Date: Sat Feb 14 2026 - 14:32:41 EST

On Sat, Feb 14, 2026 at 07:56:12AM -1000, Tejun Heo wrote:
> Hello, Andrea.
>
> On Sat, Feb 14, 2026 at 11:16:34AM +0100, Andrea Righi wrote:
> > I ran more tests and I don't think we can simply rely on p->scx.sticky_cpu.
> >
> > In particular, I don't see how to handle this scenario using only
> > p->scx.sticky_cpu: a task starts an internal migration, a sched_change
> > occurs, and ops.dequeue() gets skipped because p->scx.sticky_cpu >= 0.
>
> Oh, that shouldn't happen, so move_remote_task_to_local_dsq() does the
> following:
>
> deactivate_task(src_rq, p, 0);
> set_task_cpu(p, cpu_of(dst_rq));
> p->scx.sticky_cpu = cpu_of(dst_rq);
>
> raw_spin_rq_unlock(src_rq);
> raw_spin_rq_lock(dst_rq);
> ...
> activate_task(dst_rq, p, 0);
>
> It *looks* like something get can get while the locks are switched; however,
> the above deactivate_task() does WRITE_ONCE(p->on_rq, TASK_ON_RQ_MIGRATING)
> and task_rq_lock() does the following:
>
> for (;;) {
> raw_spin_lock_irqsave(&p->pi_lock, rf->flags);
> rq = task_rq(p);
> raw_spin_rq_lock(rq);
> /*
> * move_queued_task() task_rq_lock()
> *
> * ACQUIRE (rq->lock)
> * [S] ->on_rq = MIGRATING [L] rq = task_rq()
> * WMB (__set_task_cpu()) ACQUIRE (rq->lock);
> * [S] ->cpu = new_cpu [L] task_rq()
> * [L] ->on_rq
> * RELEASE (rq->lock)
> *
> * If we observe the old CPU in task_rq_lock(), the acquire of
> * the old rq->lock will fully serialize against the stores.
> *
> * If we observe the new CPU in task_rq_lock(), the address
> * dependency headed by '[L] rq = task_rq()' and the acquire
> * will pair with the WMB to ensure we then also see migrating.
> */
> if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) {
> rq_pin_lock(rq, rf);
> return rq;
> }
> raw_spin_rq_unlock(rq);
> raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);
>
> while (unlikely(task_on_rq_migrating(p)))
> cpu_relax();
> }
>
> ie. TASK_ON_RQ_MIGRATING works like a separate lock that protects the task
> while it's switching the RQs, so any operations that use task_rq_lock()
> which includes any property changes can't get inbetween.

Yeah, that makes sense, so the scenario I was thinking it was happening
can't happen. I guess I'm missing some ops.dequeue() events then or there's
a race somewhere, because I can see tasks being enqueued without a
corresponding ops.dequeue(). I'll add some debugging and keep
investigating.

Thanks!
-Andrea