Re: [PATCH 1/1] arm64: smp: Skip MC sched domain on SoCs with no LLC
From: Barry Song
Date: Thu Mar 03 2022 - 00:36:47 EST
On Thu, Mar 3, 2022 at 3:22 PM Darren Hart
<darren@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Mar 02, 2022 at 10:32:06AM +0100, Vincent Guittot wrote:
> > On Tue, 1 Mar 2022 at 01:35, Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> > > Control Unit, but have no shared CPU-side last level cache.
> > >
> > > cpu_coregroup_mask() will return a cpumask with weight 1, while
> > > cpu_clustergroup_mask() will return a cpumask with weight 2.
> > >
> > > As a result, build_sched_domain() will BUG() once per CPU with:
> > >
> > > BUG: arch topology borken
> > > the CLS domain not a subset of the MC domain
> > >
> > > The MC level cpumask is then extended to that of the CLS child, and is
> > > later removed entirely as redundant. This sched domain topology is an
> > > improvement over previous topologies, or those built without
> > > SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> > > With the current scheduler model and heuristics, this is a desirable
> > > default topology for Ampere Altra and Altra Max system.
> > >
> > > Introduce an alternate sched domain topology for arm64 without the MC
> > > level and test for llc_sibling weight 1 across all CPUs to enable it.
> > >
> > > Do this in arch/arm64/kernel/smp.c (as opposed to
> > > arch/arm64/kernel/topology.c) as all the CPU sibling maps are now
> > > populated and we avoid needing to extend the drivers/acpi/pptt.c API to
> > > detect the cluster level being above the cpu llc level. This is
> > > consistent with other architectures and provides a readily extensible
> > > mechanism for other alternate topologies.
> > >
> > > The final sched domain topology for a 2 socket Ampere Altra system is
> > > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> > >
> > > For CPU0:
> > >
> > > CONFIG_SCHED_CLUSTER=y
> > > CLS [0-1]
> > > DIE [0-79]
> > > NUMA [0-159]
> > >
> > > CONFIG_SCHED_CLUSTER is not set
> > > DIE [0-79]
> > > NUMA [0-159]
> > >
> > > Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
> > > Cc: Will Deacon <will@xxxxxxxxxx>
> > > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> > > Cc: Barry Song <song.bao.hua@xxxxxxxxxxxxx>
> > > Cc: Valentin Schneider <valentin.schneider@xxxxxxx>
> > > Cc: D. Scott Phillips <scott@xxxxxxxxxxxxxxxxxxxxxx>
> > > Cc: Ilkka Koskinen <ilkka@xxxxxxxxxxxxxxxxxxxxxx>
> > > Cc: <stable@xxxxxxxxxxxxxxx> # 5.16.x
> > > Signed-off-by: Darren Hart <darren@xxxxxxxxxxxxxxxxxxxxxx>
> > > ---
> > > arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++
> > > 1 file changed, 28 insertions(+)
> > >
> > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > index 27df5c1e6baa..3597e75645e1 100644
> > > --- a/arch/arm64/kernel/smp.c
> > > +++ b/arch/arm64/kernel/smp.c
> > > @@ -433,6 +433,33 @@ static void __init hyp_mode_check(void)
> > > }
> > > }
> > >
> > > +static struct sched_domain_topology_level arm64_no_mc_topology[] = {
> > > +#ifdef CONFIG_SCHED_SMT
> > > + { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> > > +#endif
> > > +
> > > +#ifdef CONFIG_SCHED_CLUSTER
> > > + { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
> > > +#endif
> > > +
> > > + { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > > + { NULL, },
> > > +};
> > > +
> > > +static void __init update_sched_domain_topology(void)
> > > +{
> > > + int cpu;
> > > +
> > > + for_each_possible_cpu(cpu) {
> > > + if (cpu_topology[cpu].llc_id != -1 &&
> >
> > Have you tested it with a non-acpi system ? AFAICT, llc_id is only set
> > by ACPI system and llc_id == -1 for others like DT based system
> >
> > > + cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > > + return;
> > > + }
>
> Hi Vincent,
>
> I did not have a non-acpi system to test, no. You're right of course,
> llc_id is only set by ACPI systems on arm64. We could wrap this in a
> CONFIG_ACPI ifdef (or IS_ENABLED), but I think this would be preferable:
>
> + for_each_possible_cpu(cpu) {
> + if (cpu_topology[cpu].llc_id == -1 ||
> + cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> + return;
> + }
>
> Quickly tested on Altra successfully. Would appreciate anyone with non-acpi
> arm64 systems who can test and verify this behaves as intended. I will ask
> around tomorrow as well to see what I may have access to.
I wonder if we can fix it by this
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 976154140f0b..551655ccd0eb 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -627,6 +627,13 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask))
core_mask = &cpu_topology[cpu].llc_sibling;
}
+ /*
+ * Some machines have no LLC but have clusters, we let MC = CLUSTER
+ * as MC should always be after CLUSTER. But anyway, the MC domain
+ * will be removed
+ */
+ if (cpumask_subset(core_mask, &cpu_topology[cpu].cluster_sibling))
+ core_mask = &cpu_topology[cpu].cluster_sibling;
return core_mask;
}
as it can make all kinds of topologies happy - symmetric and asymmetric.
>
> Thanks,
>
> > > +
> > > + pr_info("No LLC siblings, using No MC sched domains topology\n");
> > > + set_sched_topology(arm64_no_mc_topology);
> > > +}
> > > +
> > > void __init smp_cpus_done(unsigned int max_cpus)
> > > {
> > > pr_info("SMP: Total of %d processors activated.\n", num_online_cpus());
> > > @@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > > hyp_mode_check();
> > > apply_alternatives_all();
> > > mark_linear_text_alias_ro();
> > > + update_sched_domain_topology();
> > > }
> > >
> > > void __init smp_prepare_boot_cpu(void)
> > > --
> > > 2.31.1
> > >
>
> --
> Darren Hart
> Ampere Computing / OS and Kernel
Thanks
Barry