Re: [PATCH v2] sched/fair: Prefer cache locality for EAS wakeup
From: Dietmar Eggemann
Date: Thu Nov 13 2025 - 09:55:04 EST
On 13.11.25 01:26, Shubhang Kaushik OS wrote:
>> From your previous answer on v1, I don't think that you use
>> heterogeneous system so eas will not be enabled in your case and even
>> when used find_energy_efficient_cpu() will be called before
>
> I agree that the EAS centric approach in the current patch is misplaced for our homogeneous systems.
>
>> Otherwise you might want to check in wake_affine() where we decide
>> between local cpu and previous cpu which one should be the target.
>> This can have an impact especially if there are not in the same LLC
>
> While wake_affine() modifications seem logical, I see that they cause performance regressions across the board due to the inherent trade-offs in altering that critical initial decision point.
Which testcases are you running on your Altra box? I assume it's a
single NUMA node (80 CPUs).
For us, 'perf bench sched messaging` w/o CONFIG_SCHED_CLUSTER, so only
PKG SD (i.e. sis() only returns prev or this CPU) gives better results
then w/ CONFIG_SCHED_CLUSTER.
> We might need to solve the non-idle fallback within `select_idle_sibling` to ring fence the impact for preserving locality effectively.
IMHO, the scheduler only cares about shared LLC (and shared L2 with
CONFIG_SCHED_CLUSTER). Can you check:
$ cat /sys/devices/system/cpu/cpu0/cache/index*/{type,shared_cpu_map}
Data
Instruction
Unified
Unified <-- (1)
00000000,00000000,00000000,00000000,00000001
00000000,00000000,00000000,00000000,00000001
00000000,00000000,00000000,00000000,00000001
CPU mask > 00000000,00000000,00000000,00000000,00000001 <-- (1)
Does (1) exists? IMHO it doesn't.
I assume your machine is quite unique here. IIRC, you configure 2 CPUs
groups in your ACPI pptt which then form a 2 CPUs cluster_cpumask and
since your core_mask (in cpu_coregrop_mask()) has only 1 CPU, it gets
set to the cluster_cpumask so at the end you have a 2 CPU MC SD and no
CLS SD plus an 80 CPU PKG SD.
This CLS->MC propagation is somehow important since only then you get a
valid 'sd = rcu_dereference(per_cpu(sd_llc, target))' in sis() so you
not just return target (prev or this CPU).
But I can imagine that your MC cpumask is way too small for the SIS_UTIL
based selection of an idle CPU.
[...]