Re: [PATCH 01/10] sched/core: Skip migration disabled tasks in proxy execution

From: K Prateek Nayak

Date: Thu May 07 2026 - 11:54:43 EST

Hello Andrea,

On 5/7/2026 3:43 PM, Andrea Righi wrote:
>>>> scx flow should look something like (please correct me if I'm
>>>> wrong):
>>>>
>>>> CPU0: donor CPU1: owner
>>>> =========== ===========
>>>>
>>>> /* Donor is retained on rq*/
>>>> put_prev_task_scx()
>>>> ops.stopping()
>>>> ops.dispatch() /* May be skipped if SCX_OPS_ENQ_LAST is not set */
>>>> do_pick_task_scx()
>>>> next = donor;
>>>> find_proxy_task()
>>>> proxy_migrate_task()
>>>> ops.dequeue()
>>>> ======================> /*
>>
>> At this point I mean ^
>>
>>>> * Moves to owner CPU (May be outside of affinity list)
>>>> * ops.enqueue() still happens on CPU0 but I've shown it
>>>> * here to depict the context has moved to owner's CPU.
>>>> */
>>>> ops.enqueue()
>>>> scx_bpf_dsq_insert()
>>>> /*
>>>> * !!! Cannot dispatch to local CPU; Outside affinity !!!
>>>> *
>>>> * We need to allow local dispatch outside affinity iff:
>>>> *
>>>> * p->is_blocked && cpu == task_cpu(p)
>>>> *
>>>> * Since enqueue_task_scx() hold's the task's rq_lock, the
>>>> * is_blocked indicator should be stable during a dispatch.
>>>> */
>>>> ops.dispatch()
>>>> do_pick_task_scx()
>>>> set_next_task_scx()
>>>> ops.running(donor)
>>>> find_proxy_task()
>>>> next = owner
>>>> /*
>>>> * !!! Owner stats running without any notification. !!!
>>>> *
>>>> * If owner blocks, dequeue_task_scx() is executed first and
>>>> * the sched-ext scheduler sees:
>>>> *
>>>> * ops.stopping(owner)
>>>> *
>>>> * which leads to some asymmetry.
>>>> *
>>>> * XXX: Below is how I imagine the flow should continue.
>>>> */
>>>> ops.quiescent(owner) /* Core is taking back control of owner's running */
>>>> /* Runs owner */
>>>> ops.runnable(owner) /* Core is giving back control to ext layer */
>>>> ops.stopping(donor); /* Accounting symmetry for donor */
>>>
>>> I think the order of operations should be the following:
>>>
>>> ops.runnable(donor)
>>> -> ops.enqueue(donor)
>>> -> donor becomes curr
>>> -> ops.running(donor) /* set_next_task_scx(donor); !task_is_blocked(donor) */
>>> -> donor executes
>>> -> donor blocks on mutex (proxy: stays on_rq; task_is_blocked(donor) true)
>>> -> __schedule()
>>> -> pick_next -> proxy-exec selects owner as next
>>> -> put_prev_task_scx(donor)
>>> -> ops.stopping(donor)
>>> -> dispatch_enqueue(local_dsq) /* blocked donor: ext core parks on local DSQ */
>>> -> set_next_task_scx(owner)
>>> -> ops.running(owner)
>>
>> So ext will just switch the context back to owner? But how does this
>> happen with the changes in your series?
>>
>> Based on my understanding, this happens:
>>
>> -> pick_next -> sced-ext returns donor as next
>> /* prev's context is put back */
>> -> set_next_task_scx(donor)
>> -> ops.running(donor)
>>
>> /* In core.c */
>>
>> /* next = donor */
>> if (next->blocked_on) /* true since we have blocked donor */
>> next = find_proxy_task(); /* Returns owner */
>>
>> /* next = owner; */
>> /* Starts running owner */
>>
>> How does ext core swap back the owner context here? Am I missing
>> something? find_proxy_task() doesn't call put_prev_set_next_task() so
>> I'm at a loss how we get to set_next_task_scx(owner).
>
> The sequence should be the following:

Still a bit confused! Hope you can bear with me for just a little
bit longer :-)

>
> - pick_next_task(rq, rq->donor, &rf) returns donor (because we parked it on the local DSQ)

So put_prev_set_next_task() happens as a part of pick_next_task().

When we pick the donor, we have already called set_next_task(donor)
on it before returning it from pick_next_task().

"owner" is still not known at this point ...

> - in __schedule() (still holding rq->lock), proxy sees next->blocked_on and does:
> - next = find_proxy_task(rq, next, &rf); -> returns owner (or triggers migration / retries)
> - Only after that, __schedule() reaches the point where it performs the switch
> (put_prev_set_next_task(rq, prev, next) via the pick path). At that point,

... and we don't do put_prev_set_next_task(donor, owner) after
(or within) find_proxy_task() as far as I'm aware. The "donor"
remains as the task on which we last called put_prev_task().

If you are referring to the bits in your Patch2, the calls to
put_prev_task() and set_next_task() is done on the same "donor"
task. It is purely for the sake of adding a balance callback if
we had skipped migrating away the prev task due to proxy.

AFAIC, nothing does a set_next_task(owner) after
pick_next_task() in __schedule() unless I'm grossly mistaken.

Sorry in case I didn't see it until now but could you please
point out where this happens? I'm not seeing any scx specific
calls either in the context switch path other than
sched_ext_dead() called for a dead task.

> next is already the owner, so:
> - put_prev_task_scx(prev=donor) (or whatever prev is)
> - set_next_task_scx(next=owner)
>
> And parking the blocked donor on rq->scx.local_dsq makes it the obvious
> candidate for pick_next_task_scx() on that CPU.
>
> So the donor isn't "moved" by find_proxy_task() in the DSQ sense, rather:
> - SCX picks the donor token
> - proxy-exec replaces the picked task with the lock owner (or triggers
> migration/return paths)
>

--
Thanks and Regards,
Prateek