Re: [RFC 1/1] sched: Remove unreliable wake_cpu check in proxy_needs_return
From: John Stultz
Date: Wed Apr 09 2025 - 19:35:44 EST
Hey hupu!
Thanks so much for taking the time to review the full proxy patch
series and to submit this change!
On Mon, Apr 7, 2025 at 6:47 AM hupu <hupu.gm@xxxxxxxxx> wrote:
>
> The (p->wake_cpu != cpu_of(rq)) check in proxy_needs_return() is unsafe
> during early wakeup phase. When called via ttwu_runnable() path:
>
> |-- try_to_wake_up
> |-- ttwu_runnable
> |-- proxy_needs_return //we are here
> |-- select_task_rq
> |-- set_task_cpu //set p->wake_cpu here
> |-- ttwu_queue
>
> The p->wake_cpu at this point reflects the CPU where donor last ran before
> blocking, not the target migration CPU. During blocking period:
> 1. CPU affinity may have been changed by other threads
> 2. Proxy migrations might have altered the effective wake_cpu
> 3. set_task_cpu() hasn't updated wake_cpu yet in this code path
A few nits here:
1) We do reassign wake_cpu __set_cpus_allowed_ptr_locked() if necessary.
2) Proxy migrations use proxy_set_task_cpu() which preserve the wake_cpu
And I'm not sure I quite understand the concern of #3 above. Could you
clarify further?
> This makes the wake_cpu vs current CPU comparison meaningless and potentially
> dangerous. Rely on find_proxy_task()'s later migration logic to handle CPU
> placement based on up-to-date affinity and scheduler state.
The task_cpu serialization rules are a bit more subtle then I'd like,
so I'll grant there likely could be bugs lurking there, but I'm not
totally sure I understand the case you're thinking of here.
As you noted, in proxy_needs_return() we use the wake_cpu to tell us
if the task is still on the rq where it was blocked, with the hope
that we can short-cut the deactiave&wakeup logic and instead just mark
it BO_RUNNABLE and return from ttwu.
I'm not sure how the "find_proxy_task()'s later migration logic" gets
involved here. Instead, it seems your change would just force it so we
always do the deactivate&wakeup (still using the task->wake_cpu as the
guide for select_task_rq()).
This would be a more conservative approach, since select_task_rq would
handle finding an appropriate rq if wake_cpu was not correct, so I
don't disagree with your suggestion. In fact, in getting the v16
series ready, I've reworked the find_proxy_tasks()'s "need_return:"
logic to do the deactivate/wakeup step because otherwise I was hitting
cases where we would try to do the manual return-migration off to a
offline cpu during cpu-hotplug testing. So a similar approach might
be helpful here to avoid wakeups on offlined cpus (though I need to
look a bit more at the cpu hotplug code to understand how we'd hit the
task->on_rq case and have task_cpu() return an offlined rq).
But I'm a little hesitant to drop the optimization, so if you have a
more specific case that would be problematic, I'd appreciate you
sharing.
thanks
-john