Re: [RFC 1/1] sched: Remove unreliable wake_cpu check in proxy_needs_return

From: John Stultz
Date: Wed Apr 09 2025 - 19:35:44 EST

Next message: Jason Gunthorpe: "Re: [PATCH fwctl] fwctl: fix repeated device word in log message"
Previous message: Nelson, Shannon: "Re: [PATCH net 3/6] pds_core: handle unsupported PDS_CORE_CMD_FW_CONTROL result"
In reply to: hupu: "Re: [RFC 1/1] sched: Remove unreliable wake_cpu check in proxy_needs_return"
Next in thread: hupu: "Re: [RFC 1/1] sched: Remove unreliable wake_cpu check in proxy_needs_return"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hey hupu!
Thanks so much for taking the time to review the full proxy patch
series and to submit this change!

On Mon, Apr 7, 2025 at 6:47 AM hupu <hupu.gm@xxxxxxxxx> wrote:
>
> The (p->wake_cpu != cpu_of(rq)) check in proxy_needs_return() is unsafe
> during early wakeup phase. When called via ttwu_runnable() path:
>
> |-- try_to_wake_up
> |-- ttwu_runnable
> |-- proxy_needs_return //we are here
> |-- select_task_rq
> |-- set_task_cpu //set p->wake_cpu here
> |-- ttwu_queue
>
> The p->wake_cpu at this point reflects the CPU where donor last ran before
> blocking, not the target migration CPU. During blocking period:
> 1. CPU affinity may have been changed by other threads
> 2. Proxy migrations might have altered the effective wake_cpu
> 3. set_task_cpu() hasn't updated wake_cpu yet in this code path

A few nits here:
1) We do reassign wake_cpu __set_cpus_allowed_ptr_locked() if necessary.
2) Proxy migrations use proxy_set_task_cpu() which preserve the wake_cpu

And I'm not sure I quite understand the concern of #3 above. Could you
clarify further?

> This makes the wake_cpu vs current CPU comparison meaningless and potentially
> dangerous. Rely on find_proxy_task()'s later migration logic to handle CPU
> placement based on up-to-date affinity and scheduler state.

The task_cpu serialization rules are a bit more subtle then I'd like,
so I'll grant there likely could be bugs lurking there, but I'm not
totally sure I understand the case you're thinking of here.

As you noted, in proxy_needs_return() we use the wake_cpu to tell us
if the task is still on the rq where it was blocked, with the hope
that we can short-cut the deactiave&wakeup logic and instead just mark
it BO_RUNNABLE and return from ttwu.

I'm not sure how the "find_proxy_task()'s later migration logic" gets
involved here. Instead, it seems your change would just force it so we
always do the deactivate&wakeup (still using the task->wake_cpu as the
guide for select_task_rq()).

This would be a more conservative approach, since select_task_rq would
handle finding an appropriate rq if wake_cpu was not correct, so I
don't disagree with your suggestion. In fact, in getting the v16
series ready, I've reworked the find_proxy_tasks()'s "need_return:"
logic to do the deactivate/wakeup step because otherwise I was hitting
cases where we would try to do the manual return-migration off to a
offline cpu during cpu-hotplug testing. So a similar approach might
be helpful here to avoid wakeups on offlined cpus (though I need to
look a bit more at the cpu hotplug code to understand how we'd hit the
task->on_rq case and have task_cpu() return an offlined rq).

But I'm a little hesitant to drop the optimization, so if you have a
more specific case that would be problematic, I'd appreciate you
sharing.

thanks
-john

Next message: Jason Gunthorpe: "Re: [PATCH fwctl] fwctl: fix repeated device word in log message"
Previous message: Nelson, Shannon: "Re: [PATCH net 3/6] pds_core: handle unsupported PDS_CORE_CMD_FW_CONTROL result"
In reply to: hupu: "Re: [RFC 1/1] sched: Remove unreliable wake_cpu check in proxy_needs_return"
Next in thread: hupu: "Re: [RFC 1/1] sched: Remove unreliable wake_cpu check in proxy_needs_return"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]