RE: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

From: vkilari
Date: Sun Feb 25 2018 - 01:18:32 EST


Hi,

> -----Original Message-----
> From: linux-arm-kernel
[mailto:linux-arm-kernel-bounces@xxxxxxxxxxxxxxxxxxx]
> On Behalf Of Xiongfeng Wang
> Sent: Saturday, February 24, 2018 8:36 AM
> To: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>; Jeremy Linton
> <jeremy.linton@xxxxxxx>
> Cc: mark.rutland@xxxxxxx; Jonathan.Zhang@xxxxxxxxxx;
> Jayachandran.Nair@xxxxxxxxxx; catalin.marinas@xxxxxxx; Juri Lelli
> <juri.lelli@xxxxxxx>; gregkh@xxxxxxxxxxxxxxxxxxx; jhugo@xxxxxxxxxxxxxx;
> rjw@xxxxxxxxxxxxx; linux-pm@xxxxxxxxxxxxxxx; will.deacon@xxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; morten.rasmussen@xxxxxxx; linux-
> acpi@xxxxxxxxxxxxxxx; viresh.kumar@xxxxxxxxxx; hanjun.guo@xxxxxxxxxx;
> sudeep.holla@xxxxxxx; austinwc@xxxxxxxxxxxxxx; vkilari@xxxxxxxxxxxxxx;
> ahs3@xxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; lenb@xxxxxxxxxx
> Subject: Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU
> topology
>
>
> Hi,
> On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> > On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >> Hi,
> >>
> >> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>> Hi Jeremy,
> >>>
> >>> I have tested the patch with the newest UEFI. It prints the below
error:
> >>>
> >>> [ 4.017371] BUG: arch topology borken
> >>> [ 4.021069] BUG: arch topology borken
> >>> [ 4.024764] BUG: arch topology borken
> >>> [ 4.028460] BUG: arch topology borken
> >>> [ 4.032153] BUG: arch topology borken
> >>> [ 4.035849] BUG: arch topology borken
> >>> [ 4.039543] BUG: arch topology borken
> >>> [ 4.043239] BUG: arch topology borken
> >>> [ 4.046932] BUG: arch topology borken
> >>> [ 4.050629] BUG: arch topology borken
> >>> [ 4.054322] BUG: arch topology borken
> >>>
> >>> I checked the code and found that the newest UEFI set PPTT
> >>> physical_package_flag on a physical package node and the NUMA domain
> (SRAT domains) starts from the layer of DIE. (The topology of our board is
core-
> >cluster->die->package).
> >>
> >> I commented about that on the EDK2 mailing list. While the current
> >> spec doesn't explicitly ban having the flag set multiple times
> >> between the leaf and the root I consider it a "bug" and there is an
> >> effort to clarify the spec and the use of that flag.
> >>>
> >>> When the kernel starts to build sched_domain, the multi-core
> >>> sched_domain contains all the cores within a package, and the lowest
> >>> NUMA sched_domain contains all the cores within a die. But the kernel
> requires that the multi-core sched_domain should be a subset of the lowest
> NUMA sched_domain, so the BUG info is printed.
> >>
> >> Right. I've mentioned this problem a couple of times.
> >>
> >> At at the moment, the spec isn't clear about how the proximity domain
> >> is detected/located within the PPTT topology (a node with a 1:1
> >> correspondence isn't even required). As you can see from this patch
> >> set, we are making the general assumption that the proximity domains
> >> are at the same level as the physical socket. This isn't ideal for
> >> NUMA topologies, like the D05, that don't align with the physical
socket.
> >>
> >> There are efforts underway to clarify and expand upon the
> >> specification to deal with this general problem. The simple solution
> >> is another flag (say PPTT_PROXIMITY_DOMAIN which would map to the D05
> >> die) which could be used to find nodes with 1:1 correspondence. At
> >> that point we could add a fairly trivial patch to correct just the
> >> scheduler topology without affecting the rest of the system topology
code.
> >
> > I think Morten asked already but isn't this the same end result we end
> > up having if we remove the DIE level if NUMA-within-package is
> > detected (instead of using the default_topology[]) and we create our
> > own ARM64 domain hierarchy (with DIE level removed) through
> > set_sched_topology() accordingly ?

To overcome this, on x86 as well the DIE level is removed when
NUMA-within-package is detected with this patch
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ar
ch/x86/kernel/smpboot.c?h=v4.16-rc2&id=8f37961cf22304fb286c7604d3a7f6104dcc1
283

Solving with PPTT would be clean approach instead of overriding
default_topology[]

> >
> > Put it differently: do we really need to rely on another PPTT flag to
> > collect this information ?
> >
> > I can't merge code that breaks a platform with legitimate firmware
> > bindings.
>
> I think we really need another PPTT flag, from which we can get
information
> about how to build a multi-core sched_domain. I think only cache-sharing
> information is not enough to get information about how to build a
multi-core
> shced_domain.
>
> How about this? We assume the upper layer of the lowest layer to be multi-
> core layer.
> After that flag is added into ACPI specs, we add another patch to adapt to
the
> change.
>
> Thanks,
> Xiongfeng
>
> >
> > Thanks,
> > Lorenzo
> >
> >>
> >>>
> >>> If we modify the UEFI to make NUMA sched_domain start from the layer
> >>> of package, then all the topology information within the package
> >>> will be discarded. I think we need to build the multi-core
sched_domain
> using the cores within the cluster instead of the cores within the
package. I
> think that's what 'multi-core' means. Multi cores form a cluster. I guess.
> >>> If we build the multi-core sched_domain using the cores within a
> >>> cluster, I think we need to add fields in struct cpu_topology to
record which
> cores are in each cluster.
> >>
> >> The problem is that there isn't a generic way to identify which level
> >> of cache sharing is the "correct" top layer MC domain. For one system
> >> cluster might be appropriate, for another it might be the highest
> >> caching level within a socket, for another is might be a something in
> >> between or a group of clusters or LLCs..
> >>
> >> Hence the effort to standardize/guarantee a PPTT node that exactly
> >> matches a SRAT domain. With that, each SOC/system provider has
> >> clearly defined method for communicating where they want the proximity
> domain information to begin.
> >>
> >> Thanks,
> >>
> >>>
> >>>
> >>> Thanks,
> >>> Xiongfeng
> >>>
> >>> On 2018/1/13 8:59, Jeremy Linton wrote:
> >>>> Propagate the topology information from the PPTT tree to the
> >>>> cpu_topology array. We can get the thread id, core_id and
> >>>> cluster_id by assuming certain levels of the PPTT tree correspond
> >>>> to those concepts. The package_id is flagged in the tree and can be
> >>>> found by calling find_acpi_cpu_topology_package() which terminates
> >>>> its search when it finds an ACPI node flagged as the physical
> >>>> package. If the tree doesn't contain enough levels to represent all
> >>>> of the requested levels then the root node will be returned for all
> >>>> subsequent levels.
> >>>>
> >>>> Cc: Juri Lelli <juri.lelli@xxxxxxx>
> >>>> Signed-off-by: Jeremy Linton <jeremy.linton@xxxxxxx>
> >>>> ---
> >>>> arch/arm64/kernel/topology.c | 46
> >>>> +++++++++++++++++++++++++++++++++++++++++++-
> >>>> 1 file changed, 45 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/arm64/kernel/topology.c
> >>>> b/arch/arm64/kernel/topology.c index 7b06e263fdd1..ce8ec7fd6b32
> >>>> 100644
> >>>> --- a/arch/arm64/kernel/topology.c
> >>>> +++ b/arch/arm64/kernel/topology.c
> >>>> @@ -11,6 +11,7 @@
> >>>> * for more details.
> >>>> */
> >>>> +#include <linux/acpi.h>
> >>>> #include <linux/arch_topology.h>
> >>>> #include <linux/cpu.h>
> >>>> #include <linux/cpumask.h>
> >>>> @@ -22,6 +23,7 @@
> >>>> #include <linux/sched.h>
> >>>> #include <linux/sched/topology.h>
> >>>> #include <linux/slab.h>
> >>>> +#include <linux/smp.h>
> >>>> #include <linux/string.h>
> >>>> #include <asm/cpu.h>
> >>>> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
> >>>> }
> >>>> }
> >>>> +#ifdef CONFIG_ACPI
> >>>> +/*
> >>>> + * Propagate the topology information of the
> >>>> +processor_topology_node tree to the
> >>>> + * cpu_topology array.
> >>>> + */
> >>>> +static int __init parse_acpi_topology(void) {
> >>>> + bool is_threaded;
> >>>> + int cpu, topology_id;
> >>>> +
> >>>> + is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> >>>> +
> >>>> + for_each_possible_cpu(cpu) {
> >>>> + topology_id = find_acpi_cpu_topology(cpu, 0);
> >>>> + if (topology_id < 0)
> >>>> + return topology_id;
> >>>> +
> >>>> + if (is_threaded) {
> >>>> + cpu_topology[cpu].thread_id = topology_id;
> >>>> + topology_id = find_acpi_cpu_topology(cpu,
1);
> >>>> + cpu_topology[cpu].core_id = topology_id;
> >>>> + topology_id =
find_acpi_cpu_topology_package(cpu);
> >>>> + cpu_topology[cpu].package_id = topology_id;
> >>>> + } else {
> >>>> + cpu_topology[cpu].thread_id = -1;
> >>>> + cpu_topology[cpu].core_id = topology_id;
> >>>> + topology_id =
find_acpi_cpu_topology_package(cpu);
> >>>> + cpu_topology[cpu].package_id = topology_id;
> >>>> + }
> >>>> + }
> >>>> +
> >>>> + return 0;
> >>>> +}
> >>>> +
> >>>> +#else
> >>>> +static inline int __init parse_acpi_topology(void) {
> >>>> + return -EINVAL;
> >>>> +}
> >>>> +#endif
> >>>> void __init init_cpu_topology(void) { @@ -309,6 +351,8 @@ void
> >>>> __init init_cpu_topology(void)
> >>>> * Discard anything that was parsed if we hit an error so we
> >>>> * don't use partial information.
> >>>> */
> >>>> - if (of_have_populated_dt() && parse_dt_topology())
> >>>> + if ((!acpi_disabled) && parse_acpi_topology())
> >>>> + reset_cpu_topology();
> >>>> + else if (of_have_populated_dt() && parse_dt_topology())
> >>>> reset_cpu_topology();
> >>>> }
> >>>>
> >>>
> >>
> >
> > .
> >
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel