Re: [PATCH RFC] sched_ext: Choose prev_cpu if idle and cache affine without WF_SYNC

From: Tejun Heo
Date: Mon Mar 17 2025 - 13:14:38 EST


Hello, Joel.

On Mon, Mar 17, 2025 at 04:28:02AM -0400, Joel Fernandes wrote:
> Consider that the previous CPU is cache affined to the waker's CPU and
> is idle. Currently, scx's default select function only selects the
> previous CPU in this case if WF_SYNC request is also made to wakeup on the
> waker's CPU.
>
> This means, without WF_SYNC, the previous CPU being cache affined to the
> waker and is idle is not considered. This seems extreme. WF_SYNC is not
> normally passed to the wakeup path outside of some IPC drivers but it is
> very possible that the task is cache hot on previous CPU and shares
> cache with the waker CPU. Lets avoid too many migrations and select the
> previous CPU in such cases.

Hmm.. if !WF_SYNC:

1. If smt, if prev_cpu's core is idle, pick it. If not, try to pick an idle
core in widening scopes.

2. If no idle core is foudn, pick prev_cpu if idle. If not, search for an
idle CPU in widening scopes.

So, it is considering prev_cpu, right? I think it's preferring idle core a
bit too much - it probably doesn't make sense to cross the NUMA boundary if
there is an idle CPU in this node, at least.

Isn't the cpus_share_cache() code block mostly about not doing
waker-affining if prev_cpu of the wakee is close enough and idle, so
waker-affining is likely to be worse?

Thanks.

--
tejun