Re: [PATCH RFC] sched_ext: Choose prev_cpu if idle and cache affine without WF_SYNC
From: Joel Fernandes
Date: Mon Mar 17 2025 - 18:45:10 EST
On 3/17/2025 6:20 PM, Andrea Righi wrote:
> Hi Joel,
>
> On Mon, Mar 17, 2025 at 04:28:02AM -0400, Joel Fernandes wrote:
>> Consider that the previous CPU is cache affined to the waker's CPU and
>> is idle. Currently, scx's default select function only selects the
>> previous CPU in this case if WF_SYNC request is also made to wakeup on the
>> waker's CPU.
>>
>> This means, without WF_SYNC, the previous CPU being cache affined to the
>> waker and is idle is not considered. This seems extreme. WF_SYNC is not
>> normally passed to the wakeup path outside of some IPC drivers but it is
>> very possible that the task is cache hot on previous CPU and shares
>> cache with the waker CPU. Lets avoid too many migrations and select the
>> previous CPU in such cases.
>>
>> This change is consistent with the fair scheduler's behavior as well. In
>> select_idle_sibling(), the previous CPU is selected if it is cache
>> affined with the target. This is done regardless of WF_SYNC and before
>> any scanning of fully idle cores is done.
>>
>> One difference still exists though between SCX and CFS in this regard, in CFS
>> we first check if the target CPU is idle before checking if the previous CPU is
>> idle. However that could be a matter of choice and in the future, that behavior
>> could also be unified.
>>
>> Signed-off-by: Joel Fernandes <joelagnelf@xxxxxxxxxx>
>> ---
>> kernel/sched/ext.c | 24 +++++++++++-------------
>> 1 file changed, 11 insertions(+), 13 deletions(-)
>>
>> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
>> index 5a81d9a1e31f..3b1a489e2aaf 100644
>> --- a/kernel/sched/ext.c
>> +++ b/kernel/sched/ext.c
>> @@ -3479,7 +3479,7 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>> {
>> const struct cpumask *llc_cpus = NULL;
>> const struct cpumask *numa_cpus = NULL;
>> - s32 cpu;
>> + s32 cpu = smp_processor_id();
>>
>> *found = false;
>>
>> @@ -3507,22 +3507,20 @@ static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu,
>> llc_cpus = llc_span(prev_cpu);
>> }
>>
>> + /*
>> + * If the waker's CPU is cache affine and prev_cpu is idle, then avoid
>> + * a migration.
>> + */
>> + if (cpus_share_cache(cpu, prev_cpu) &&
>> + test_and_clear_cpu_idle(prev_cpu)) {
>> + cpu = prev_cpu;
>> + goto cpu_found;
>> + }
>> +
>
> One potential issue that I see is that when SMT is enabled, you may end up
> using prev_cpu also when the other sibling is busy. Maybe we should check
> if prev_cpu is set in the SMT idle cpumask.
Hmm so you're suggesting select previous if it is part of a fully idle core
*and* shares cache with the waker.
That does sound reasonable to me, will look more into it, thanks.
- Joel