RE: [PATCH V2] sched: topology: make cache topology separate from cpu topology

From: 王擎
Date: Sun Mar 13 2022 - 22:13:15 EST



>> From: Wang Qing <wangqing@xxxxxxxx>
>>
>> Some architectures(e.g. ARM64), caches are implemented like below:
>> SD(Level 1):          ************ DIE ************
>> SD(Level 0):          **** MC ****    **** MC *****
>> cluster:              **cluster 0**   **cluster 1**
>> cores:                0   1   2   3   4   5   6   7
v> cache(Level 1):       C   C   C   C   C   C   C   C
>> cache(Level 2):       **C**   **C**   **C**   **C**
>> cache(Level 3):       *******shared Level 3********
>> sd_llc_id(current):   0   0   0   0   4   4   4   4
>> sd_llc_id(should be): 0   0   2   2   4   4   6   6
>
>Should cluster 0 and 1 span the same cpu mask as the MCs? Based on how
>you describe the cache above, it seems like what you are looking for
>would be:
>
>(SD DIE level removed in favor of the same span MC)
>SD(Level 1):          ************ MC  ************
>SD(Level 0):          *CLS0*  *CLS1*  *CLS2*  *CLS3* (CONFIG_SCHED_CLUSTER)
>cores:                0   1   2   3   4   5   6   7
>cache(Level 1):       C   C   C   C   C   C   C   C
>cache(Level 2):       **C**   **C**   **C**   **C**
>cache(Level 3):       *******shared Level 3********
>
>Provided cpu_coregroup_mask and cpu_clustergroup_mask return the
>corresponding cpumasks, this should work with the default sched domain
>topology.
>
>It looks to me like the lack of nested cluster support in
>parse_cluster() in drivers/base/arch_topology.c is what needs to be
>updated to accomplish the above. With cpu_topology[cpu].cluster_sibling and
>core_sibling updated to reflect the topology you describe, the rest of
>the sched domains construction would work with the default sched domain
>topology.

Complex (core[0-1]) looks like a nested cluster, but is not exactly,.
They only share L2 cache.
parse_cluster() only parses the CPU topology, and does not parse the cache
topology even if described.

>I'm not very familiar with DT, especially the cpu-map. Does your DT
>reflect the topology you want to build?

The DT looks like:
cpu-map {
cluster0 {
core0 {
cpu = <&cpu0>;
};
core1 {
cpu = <&cpu1>;
};
core2 {
cpu = <&cpu2>;
};
core3 {
cpu = <&cpu3>;
};
doe_dvfs_cl0: doe {
};
};

cluster1 {
core0 {
cpu = <&cpu4>;
};
core1 {
cpu = <&cpu5>;
};
core2 {
cpu = <&cpu6>;
};
doe_dvfs_cl1: doe {
};
};
};

cpus {
cpu0: cpu@100 {
next-level-cache = <&L2_1>;
L2_1: l2-cache {
compatible = "cache";
next-level-cache = <&L3_1>;
};
L3_1: l3-cache {
compatible = "cache";
};
};

cpu1: cpu@101 {
next-level-cache = <&L2_1>;
};

cpu2: cpu@102 {
next-level-cache = <&L2_2>;
L2_2: l2-cache {
compatible = "cache";
next-level-cache = <&L3_1>;
};
};

cpu3: cpu@103 {
next-level-cache = <&L2_2>;
};

cpu4: cpu@100 {
next-level-cache = <&L2_3>;
L2_3: l2-cache {
compatible = "cache";
next-level-cache = <&L3_1>;
};
};

cpu5: cpu@101 {
next-level-cache = <&L2_3>;
};

cpu6: cpu@102 {
next-level-cache = <&L2_4>;
L2_4: l2-cache {
compatible = "cache";
next-level-cache = <&L3_1>;
};
};

cpu7: cpu@200 {
next-level-cache = <&L2_4>;
};
};

Thanks,
Wang

>
>
>--
>Darren Hart
>Ampere Computing / OS and Kernel