Re: [PATCH v3 15/16] arch_topology: Set cluster identifier in each core/thread from /cpu-map

From: Dietmar Eggemann
Date: Fri Jun 03 2022 - 08:30:22 EST


On 25/05/2022 10:14, Sudeep Holla wrote:
> Let us set the cluster identifier as parsed from the device tree
> cluster nodes within /cpu-map.
>
> We don't support nesting of clusters yet as there are no real hardware
> to support clusters of clusters.
>
> Signed-off-by: Sudeep Holla <sudeep.holla@xxxxxxx>
> ---
> drivers/base/arch_topology.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index b8f0d72908c8..5f4f148a7769 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -492,7 +492,7 @@ static int __init get_cpu_for_node(struct device_node *node)
> }
>
> static int __init parse_core(struct device_node *core, int package_id,
> - int core_id)
> + int cluster_id, int core_id)
> {
> char name[20];
> bool leaf = true;
> @@ -508,6 +508,7 @@ static int __init parse_core(struct device_node *core, int package_id,
> cpu = get_cpu_for_node(t);
> if (cpu >= 0) {
> cpu_topology[cpu].package_id = package_id;
> + cpu_topology[cpu].cluster_id = cluster_id;
> cpu_topology[cpu].core_id = core_id;
> cpu_topology[cpu].thread_id = i;
> } else if (cpu != -ENODEV) {
> @@ -529,6 +530,7 @@ static int __init parse_core(struct device_node *core, int package_id,
> }
>
> cpu_topology[cpu].package_id = package_id;
> + cpu_topology[cpu].cluster_id = cluster_id;

I'm still not convinced that this is the right thing to do. Let's take
the juno board as an example here. And I guess our assumption should be
that we want to make CONFIG_SCHED_CLUSTER a default option, like
CONFIG_SCHED_MC is. Simply to avoid a unmanageable zoo of config-option
combinations.

(1) Scheduler Domains (SDs) w/o CONFIG_SCHED_CLUSTER:

MC <-- !!!
DIE

(2) SDs w/ CONFIG_SCHED_CLUSTER:

CLS <-- !!!
DIE

In (2) MC gets degenerated in sd_parent_degenerate() since CLS and MC
cpumasks are equal and MC does not have any additional flags compared to
CLS.
I'm not convinced that we can change the degeneration rules without
destroying other scenarios of the scheduler so that here MC stays and
CLS gets removed instead.

Even though MC and CLS are doing the same right now from the perspective
of the scheduler, we should also see MC and not CLS under (2). CLS only
makes sense longer term if the scheduler also makes use of it (next to
MC) in the wakeup-path for instance. Especially when this happens, a
platform should always construct the same scheduler domain hierarchy, no
matter which CONFIG_SCHED_XXX options are enabled.


You can see this in update_siblings_masks()

if (last_level_cache_is_shared)
set llc_sibling

if (cpuid_topo->package_id != cpu_topo->package_id)
continue

set core_sibling

If llc cache and socket boundaries are congruent, llc_sibling and
core_sibling are the same.

if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
continue

set cluster_sibling

Now we potentially set clusters. Since socket=0 is by default and we
use the existing juno.dts, the cluster nodes end up being congruent to
the llc cache cpumasks as well.

The problem is that we code `llc cache` and `DT cluster nodes` as the
same thing in juno.dts. `Cluster0/1` are congruent with the llc
information, although they should be actually `socket0/1` right now. But
we can't set-up a cpu-map with a `socketX` containing `coreY` directly.
And then we use llc_sibling and cluster_sibling in two different SD
cpumask functions (cpu_coregroup_mask() and cpu_clustergroup_mask()).

Remember, CONFIG_SCHED_CLUSTER was introduced in ACPI/PPTT as a cpumask
which is a subset of the cpumasks of CONFIG_SCHED_MC.

---

IMHO we probably could just introduce your changes w/o setting `cpu-map
cluster nodes` in DT for now. We would just have to make sure that for
all `*.dts` affected, the `llc cache` info can take over the old role of
the `cluster nodes`. In this case e.g. Juno ends up with MC, DIE no
matter if CONFIG_SCHED_CLUSTER is set or not.

[...]