Re: [RFC 1/1] sched: Remove unreliable wake_cpu check in proxy_needs_return

From: Madadi Vineeth Reddy
Date: Wed Apr 09 2025 - 02:04:58 EST


On 07/04/25 19:16, hupu wrote:
> The (p->wake_cpu != cpu_of(rq)) check in proxy_needs_return() is unsafe
> during early wakeup phase. When called via ttwu_runnable() path:
>
> |-- try_to_wake_up
> |-- ttwu_runnable
> |-- proxy_needs_return //we are here
> |-- select_task_rq
> |-- set_task_cpu //set p->wake_cpu here
> |-- ttwu_queue
>
> The p->wake_cpu at this point reflects the CPU where donor last ran before
> blocking, not the target migration CPU. During blocking period:
> 1. CPU affinity may have been changed by other threads
> 2. Proxy migrations might have altered the effective wake_cpu
> 3. set_task_cpu() hasn't updated wake_cpu yet in this code path
>
> This makes the wake_cpu vs current CPU comparison meaningless and potentially
> dangerous. Rely on find_proxy_task()'s later migration logic to handle CPU
> placement based on up-to-date affinity and scheduler state.
>
> Signed-off-by: hupu <hupu.gm@xxxxxxxxx>
> ---
> kernel/sched/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3c4ef4c71cfd..ca4ca739eb85 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4047,7 +4047,7 @@ static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
>
> raw_spin_lock(&p->blocked_lock);
> if (__get_task_blocked_on(p) && p->blocked_on_state & BO_NEEDS_RETURN) {
> - if (!task_current(rq, p) && (p->wake_cpu != cpu_of(rq))) {
> + if (!task_current(rq, p)) {
> if (task_current_donor(rq, p)) {
> put_prev_task(rq, p);
> rq_set_donor(rq, rq->idle);

Which tree is this change based on? I don't see `proxy_needs_return` in tip/sched/core.

Thanks,
Madadi Vineeth Reddy