Re: [PATCH 01/10] sched/core: Skip migration disabled tasks in proxy execution

From: K Prateek Nayak

Date: Wed May 06 2026 - 23:35:25 EST

Hello John, Andrea,

(Full disclaimer: I haven't looked at the entire series)

On 5/7/2026 2:39 AM, John Stultz wrote:
>> + /*
>> + * Tasks pinned to a single CPU (per-CPU kthreads via
>> + * kthread_bind(), tasks under migrate_disable()) cannot
>> + * be moved to @owner_cpu. proxy_migrate_task() uses
>> + * __set_task_cpu() which would silently violate the
>> + * pinning and leave the task to run on a CPU outside
>> + * its cpus_ptr once it is unblocked. Stay on this CPU
>> + * via force_return; the owner running elsewhere will
>> + * wake @p back up when the mutex becomes available.
>> + */
>> + if (p->nr_cpus_allowed == 1 || is_migration_disabled(p))
>> + goto force_return;
>> goto migrate_task;
>
> Hey Andrea!
> I'm excited to see this series! Thanks for your efforts here!
>
> Though I'm a bit confused on this patch. I see the patch changes it
> so we don't proxy-migrate pinned/migration-disabled patches, but I'm
> not sure I understand why.
>
> We only proxy-migrate blocked_on tasks, which don't run on the cpu
> they are migrated to (they are only migrated to be used as a donor).
> That's why we have the proxy_force_return() function to return-migrate
> them back when they do become runnable.

I agree this shouldn't be a problem from core perspective but there
are some interesting sched-ext interactions possible. More on that
below:

>
> Could you provide some more details about what motivated this change
> (ie: how you tripped a problem that it resolved?).

I think ops.enqueue() always assumes that the task being enqueued is
runnable on the task_cpu() and when the the sched-ext layer tries to
dispatch this task to local DSQ, the ext core complains and marks
the sched-ext scheduler as buggy.

With sched-ext, even the lock owner's CPU is slightly complicated
since the owner might be associated with a CPU but it is in fact on a
custom DSQ and after moving the donor to owner's CPU, we will need
sched-ext scheduler to guarantee that the owner runs there else
there is no point in doing a proxy.

scx flow should look something like (please correct me if I'm
wrong):

CPU0: donor CPU1: owner
=========== ===========

/* Donor is retained on rq*/
put_prev_task_scx()
ops.stopping()
ops.dispatch() /* May be skipped if SCX_OPS_ENQ_LAST is not set */
do_pick_task_scx()
next = donor;
find_proxy_task()
proxy_migrate_task()
ops.dequeue()
======================> /*
* Moves to owner CPU (May be outside of affinity list)
* ops.enqueue() still happens on CPU0 but I've shown it
* here to depict the context has moved to owner's CPU.
*/
ops.enqueue()
scx_bpf_dsq_insert()
/*
* !!! Cannot dispatch to local CPU; Outside affinity !!!
*
* We need to allow local dispatch outside affinity iff:
*
* p->is_blocked && cpu == task_cpu(p)
*
* Since enqueue_task_scx() hold's the task's rq_lock, the
* is_blocked indicator should be stable during a dispatch.
*/
ops.dispatch()
do_pick_task_scx()
set_next_task_scx()
ops.running(donor)
find_proxy_task()
next = owner
/*
* !!! Owner stats running without any notification. !!!
*
* If owner blocks, dequeue_task_scx() is executed first and
* the sched-ext scheduler sees:
*
* ops.stopping(owner)
*
* which leads to some asymmetry.
*
* XXX: Below is how I imagine the flow should continue.
*/
ops.quiescent(owner) /* Core is taking back control of owner's running */
/* Runs owner */
ops.runnable(owner) /* Core is giving back control to ext layer */
ops.stopping(donor); /* Accounting symmetry for donor */

I think dequeue_task_scx() should see task_current_donor() before
calling ops.stopping() else we get some asymmetry. The donor will
anyways be placed back via put_prev_task_scx() and since it hasn't run,
it cannot block itself and there should be no dependency on
dequeue_task_scx() for donors.

With the quiescent() + runnable() scheme, the sched-ext schedulers need
to be made aware that task can go quiescent() and then back to
runnable() while being SCX_TASK_QUEUED or the ext core has to spoof a
full:

dequeue(SLEEP) -> quiescent() -> /* Run owner */ -> runnable() -> select_cpu() -> enqueue()

Also since the mutex owner can block, the sched-ext scheduler needs to
be aware of the fact that it can get a dequeue() -> quiescent()
without having stopping() in between if we plan to keep
symmetry.

There might be more issues there that I'm missing.

--
Thanks and Regards,
Prateek