Re: [RFC 1/1] sched: Remove unreliable wake_cpu check in proxy_needs_return

From: hupu
Date: Thu Apr 10 2025 - 03:19:39 EST


Hi John,
Thank you for your feedback, I appreciate you taking the time to
review my proposal.

On Thu, Apr 10, 2025 at 7:35 AM John Stultz <jstultz@xxxxxxxxxx> wrote:
>
> Hey hupu!
> Thanks so much for taking the time to review the full proxy patch
> series and to submit this change!
>
> On Mon, Apr 7, 2025 at 6:47 AM hupu <hupu.gm@xxxxxxxxx> wrote:
> >
> > The (p->wake_cpu != cpu_of(rq)) check in proxy_needs_return() is unsafe
> > during early wakeup phase. When called via ttwu_runnable() path:
> >
> > |-- try_to_wake_up
> > |-- ttwu_runnable
> > |-- proxy_needs_return //we are here
> > |-- select_task_rq
> > |-- set_task_cpu //set p->wake_cpu here
> > |-- ttwu_queue
> >
> > The p->wake_cpu at this point reflects the CPU where donor last ran before
> > blocking, not the target migration CPU. During blocking period:
> > 1. CPU affinity may have been changed by other threads
> > 2. Proxy migrations might have altered the effective wake_cpu
> > 3. set_task_cpu() hasn't updated wake_cpu yet in this code path
>
> A few nits here:
> 1) We do reassign wake_cpu __set_cpus_allowed_ptr_locked() if necessary.
> 2) Proxy migrations use proxy_set_task_cpu() which preserve the wake_cpu
>
> And I'm not sure I quite understand the concern of #3 above. Could you
> clarify further?

Thank you for pointing this out! You are correct that
`__set_cpus_allowed_ptr_locked()` has updated `wake_cpu`, which makes
the `p->wake_cpu` check in `proxy_needs_return()` valid. This avoids
placing the donor task on an incorrect CPU, and I sincerely regret my
oversight in failing to account for this behavior in my initial
submission.

However, I still believe this logic could lead to potential issues,
which I will explain below.

In `__set_cpus_allowed_ptr_locked()`, the `wake_cpu` is updated by
using `cpumask_any_and_distribute()` to select the first online CPU
from `ctx->new_mask`. I believe this selection criterion is overly
simplistic, as CPU selection for tasks is inherently complex and
varies depending on the scheduling class of the task. For instance,
CFS tasks require considerations for load balancing across
sched_domains, while RT tasks aim to avoid placing multiple RT threads
on the same CPU to minimize contention. Therefore, I believe it is
insufficient to simply select a CPU from `ctx->new_mask` and assign it
to `p->wake_cpu`. Instead, this task is better handled by delegating
it to `select_task_rq()`.

Additionally, on derivative platforms like Android, performance issues
may arise. Vendors often hook into `select_task_rq()` to customize CPU
selection for critical tasks, such as foreground or high-priority
threads. For instance, on ARM big.LITTLE architectures, vendors
prioritize placing high-priority tasks on powerful CPUs to reduce
latency. If we assign `wake_cpu` to the first online CPU in `new_mask`
(often a less powerful CPU due to smaller ID values), donor tasks may
wake up on CPUs that cannot meet performance or load-balancing needs.
Worse, vendors cannot intervene in this logic via their hooks in
`select_task_rq()`.

>
> As you noted, in proxy_needs_return() we use the wake_cpu to tell us
> if the task is still on the rq where it was blocked, with the hope
> that we can short-cut the deactiave&wakeup logic and instead just mark
> it BO_RUNNABLE and return from ttwu.

I understand and agree that using `wake_cpu` can effectively shortcut
the deactivate and wakeup logic. However, my concern is that it may
not always meet the donor task's performance requirements and may
disrupt load balancing across CPUs.

To address these concerns, I suggest that `proxy_needs_return()`
should always return false for donor tasks unless the task is already
running on a CPU, as this would ensure the `set_task_cpu()` logic in
`try_to_wake_up()` is triggered, allowing `select_task_rq()` to make a
more informed CPU selection and enabling vendors on derivative
platforms to retain the ability to customize CPU selection for
critical tasks via hooks in `select_task_rq()`.

These are my considerations for the current patch proposal. I am
uncertain if this design is reasonable and would appreciate the
opportunity to discuss it further with you.

Best regards,
hupu