Re: [PATCH v4 1/2] sched/fair: Check a task has a fitting cpu when updating misfit

From: Dietmar Eggemann
Date: Mon Feb 05 2024 - 16:34:02 EST


On 26/01/2024 02:46, Qais Yousef wrote:
> On 01/25/24 18:40, Vincent Guittot wrote:
>> On Wed, 24 Jan 2024 at 23:30, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
>>>
>>> On 01/23/24 09:26, Vincent Guittot wrote:
>>>> On Fri, 5 Jan 2024 at 23:20, Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
>>>>>
>>>>> From: Qais Yousef <qais.yousef@xxxxxxx>

[...]

>>> And to be honest I am not sure if flattening of topology matters too since
>>> I first noticed this, which was on Juno which doesn't have flat topology.
>>>
>>> FWIW I can still reproduce this, but I have a different setup now. On M1 mac
>>> mini if I spawn a busy task affined to littles then expand the mask for
>>> a single big core; I see big delays (>500ms) without the patch. But with the
>>> patch it moves in few ms. The delay without the patch is too large and I can't
>>> explain it. So the worry here is that generally misfit migration not happening
>>> fast enough due to this fake misfit cases.
>>
>> I tried a similar scenario on RB5 but I don't see any difference with
>> your patch. And that could be me not testing it correctly...
>>
>> I set the affinity of always running task to cpu[0-3] for a few
>> seconds then extend it to [0-3,7] and the time to migrate is almost
>> the same.
>
> That matches what I do.
>
> I write a trace_marker when I change affinity to help see when it should move.
>
>>
>> I'm using tip/sched/core + [0]
>>
>> [0] https://lore.kernel.org/all/20240108134843.429769-1-vincent.guittot@xxxxxxxxxx/
>
> I tried on pinebook pro which has a rk3399 and I can't reproduce there too.
>
> On the M1 I get two sched domains, MC and DIE. But on the pine64 it has only
> MC. Could this be the difference as lb has sched domains dependencies?
>
> It seems we flatten topologies but not sched domains. I see all cpus shown as
> core_siblings. The DT for apple silicon sets clusters in the cpu-map - which
> seems the flatten topology stuff detect LLC correctly but still keeps the
> sched-domains not flattened. Is this a bug? I thought we will end up with one
> sched domain still.

IMHO, if you have a cpu_map entry with > 1 cluster in your dtb, you end
up with MC and PKG (former DIE) Sched Domain (SD) level. And misfit load
balance takes potentially longer on PKG than to MC.

(1) Vanilla Juno-r0 [L b b L L L)

root@juno:~# echo 1 > /sys/kernel/debug/sched/verbose
root@juno:~# cat /sys/kernel/debug/sched/domains/cpu0/domain*/name
MC
PKG

root@juno:~# cat /proc/schedstat | head -5 | grep ^[cd]
cpu0 0 0 0 0 0 0 2441100800 251426780 6694
domain0 39 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 3f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

(2) flattened topology (including SDs):

Remove cluster1 from cpu_map and A57_L2 $ entry.

--- a/arch/arm64/boot/dts/arm/juno.dts
+++ b/arch/arm64/boot/dts/arm/juno.dts
@@ -44,19 +44,16 @@ core0 {
core1 {
cpu = <&A57_1>;
};
- };
-
- cluster1 {
- core0 {
+ core2 {
cpu = <&A53_0>;
};
- core1 {
+ core3 {
cpu = <&A53_1>;
};
- core2 {
+ core4 {
cpu = <&A53_2>;
};
- core3 {
+ core5 {
cpu = <&A53_3>;
};
};
@@ -95,7 +92,7 @@ A57_0: cpu@0 {
d-cache-size = <0x8000>;
d-cache-line-size = <64>;
d-cache-sets = <256>;
- next-level-cache = <&A57_L2>;
+ next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 0>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
capacity-dmips-mhz = <1024>;
@@ -113,7 +110,7 @@ A57_1: cpu@1 {
d-cache-size = <0x8000>;
d-cache-line-size = <64>;
d-cache-sets = <256>;
- next-level-cache = <&A57_L2>;
+ next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 0>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
capacity-dmips-mhz = <1024>;
@@ -192,15 +189,6 @@ A53_3: cpu@103 {
dynamic-power-coefficient = <140>;
};

- A57_L2: l2-cache0 {
- compatible = "cache";
- cache-unified;
- cache-size = <0x200000>;
- cache-line-size = <64>;
- cache-sets = <2048>;
- cache-level = <2>;
- };
-
A53_L2: l2-cache1 {
compatible = "cache";
cache-unified;

root@juno:~# echo 1 > /sys/kernel/debug/sched/verbose
root@juno:~# cat /sys/kernel/debug/sched/domains/cpu0/domain*/name
MC

root@juno:~# cat /proc/schedstat | head -4 | grep ^[cd]
cpu0 0 0 0 0 0 0 2378087600 310618500 8152
domain0 3f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

[...]