Re: [PATCH V2] arm64: add SCHED_CLUSTER's dependency on ACPI

From: Dietmar Eggemann
Date: Tue Apr 26 2022 - 15:15:29 EST


On 26/04/2022 15:25, Sudeep Holla wrote:
> On Tue, Apr 26, 2022 at 06:52:34AM +0000, 王擎 wrote:
>>
>>>>>>>> From: Wang Qing <wangqing@xxxxxxxx>
>>>>>>>>
>>>>>>>> cluster sched_domain configured by cpu_topology[].cluster_sibling,
>>>>>>>> which is set by cluster_id, cluster_id can only get from ACPI.
>>>>>>>>
>>>>>>>> If the system does not enable ACPI, cluster_id is always -1, even enable
>>>>>>>> SCHED_CLUSTER is invalid, this is misleading.
>>>>>>>>
>>>>>>>> So we add SCHED_CLUSTER's dependency on ACPI here.
>>>>>>>>
>>>>>>>
>>>>>>> Any reason why this can't be extended to support DT based systems via
>>>>>>> cpu-map in the device tree. IMO we almost have everything w.r.t topology
>>>>>>> in DT and no reason to deviate this feature between ACPI and DT.
>>>>>>>
>>>>>> That's the problem, we parse out "cluster" info according to the
>>>>>> description in cpu-map, but do assign it to package_id, which used to
>>>>>> configure the MC sched domain, not cluster sched domain.
>>>>>>
>>>>>
>>>>> Right, we haven't updated the code after updating the bindings to match
>>>>> ACPI sockets which are the physical package boundaries. Clusters are not
>>>>> the physical boundaries and the current topology code is not 100% aligned
>>>>> with the bindings after Commit 849b384f92bc ("Documentation: DT: arm: add
>>>>> support for sockets defining package boundaries")
>>>>
>>>> I see, but this commit is a long time ago, why hasn't it been used widely.
>>>> Maybe I can help about it if you need.
>>>>
>>>
>>> I assume no one cared or had a requirement for the same.
>>
>> It took me a while to find the root cause why enabling SCHED_CLUSTER
>> didn't work.
>>
>> We should add SCHED_CLUSTER's dependency before implementation.
>> Otherwise, everyone who doesn't have ACPI but use SCHED_CLUSTER
>> will have this problem.
>>
>
> I am fine with that or mark it broken for DT, but ideally I wouldn't
> want to create unnecessary dependency on ACPI or DT when both supports
> the feature.

IMHO trying to introduce SCHED_COMPLEX for DT next to the linux-wide
available SCHED_CLUSTER (used only for ACPI right now) is the wrong way.

_If_ asymmetric sub-clustering of CPUs underneath LLC (L3) makes any
sense on ARMv9 single DSU systems like:

.---------------.
CPU |0 1 2 3 4 5 6 7|
+---------------+
uarch |l l l l m m m b| (so called tri-gear: little, medium, big)
+---------------+
L2 | | | | | | |
+---------------+
L3 |<-- -->|
+---------------+
|<-- cluster -->|
+---------------+
|<-- DSU -->|
'---------------'

(and I'm not saying here it does!) then the existing SCHED_CLUSTER
should be used in DT as well. Since it provides exactly the same
functionality for the task scheduler no matter whether it's setup from
ACPI or DT.

parse_cluster() -> parse_core() should be changed to be able to decode
both id's (package_id and cluster_id) in this case.

DT's cpu_map would have to be changed to code 2 level setups.

For a system like the one above it should look like:

cpu-map {
cluster0 {
foo0 {
core0 {
cpu = <&cpu0>;
};
core1 {
cpu = <&cpu1>;
};
};
foo2 {
core2 {
cpu = <&cpu2>;
};
core3 {
cpu = <&cpu3>;
};
};
};
cluster1 {
core0 {
cpu = <&cpu4>;
};
core1 {
cpu = <&cpu5>;
};
core2 {
cpu = <&cpu6>;
};
};
cluster2 {
core0 {
cpu = <&cpu7>;
};
};
};

I pimped my Hikey 960 to look like one of those Armv9 4-3-1 systems with
L2-complexes on the LITTLES and I get:

# cat /sys/kernel/debug/sched/domains/cpu*/domain*/name
CLS
MC
DIE

CLS
MC
DIE

CLS
MC
DIE

CLS
MC
DIE

MC
DIE

MC
DIE

MC
DIE

DIE

cat /proc/schedstat | awk '{print $1 " " $2 }' | grep ^[cd]

cpu0 0
domain0 03
domain1 0f
domain2 ff
cpu1 0
domain0 03
domain1 0f
domain2 ff
cpu2 0
domain0 0c
domain1 0f
domain2 ff
cpu3 0
domain0 0c
domain1 0f
domain2 ff
cpu4 0
domain0 70
domain1 ff
cpu5 0
domain0 70
domain1 ff
cpu6 0
domain0 70
domain1 ff
cpu7 0
domain0 ff

Like I mentioned earlier, I'm not sure if this additional complexity
makes sense on mobile systems running EAS (since only CFS load-balancing
on little CPUs would be affected).

But my hunch is that this setup is what you want for your system. If we
could agree on this one, that would already be some progress to see the
entire story here.