Re: [PATCH v3 6/7] arm64: topology: Enable ACPI/PPTT based CPU topology.

From: Jeremy Linton
Date: Mon Oct 23 2017 - 17:26:56 EST


Hi,

On 10/20/2017 02:55 PM, Jeffrey Hugo wrote:
On 10/20/2017 10:14 AM, Jeremy Linton wrote:
Hi,

On 10/20/2017 04:14 AM, Lorenzo Pieralisi wrote:
On Thu, Oct 19, 2017 at 11:13:27AM -0500, Jeremy Linton wrote:
On 10/19/2017 10:56 AM, Lorenzo Pieralisi wrote:
On Thu, Oct 12, 2017 at 02:48:55PM -0500, Jeremy Linton wrote:
Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id, core_id and
cluster_id by assuming certain levels of the PPTT tree correspond
to those concepts. The package_id is flagged in the tree and can be
found by passing an arbitrary large level to setup_acpi_cpu_topology()
which terminates its search when it finds an ACPI node flagged
as the physical package. If the tree doesn't contain enough
levels to represent all of thread/core/cod/package then the package
id will be used for the missing levels.

Since server/ACPI machines are more likely to be multisocket and NUMA,

I think this stuff is vague enough already so to start with I would drop
patch 4 and 5 and stop assuming what machines are more likely to ship
with ACPI than DT.

I am just saying, for the umpteenth time, that these levels have no
architectural meaning _whatsoever_, level is a hierarchy concept
with no architectural meaning attached.

?

Did anyone say anything about that? No, I think the only thing being
guaranteed here is that the kernel's physical_id maps to an ACPI
defined socket. Which seems to be the mindset of pretty much the
entire !arm64 community meaning they are optimizing their software
and the kernel with that concept in mind.

Are you denying the existence of non-uniformity between threads
running on different physical sockets?

No, I have not explained my POV clearly, apologies.

AFAIK, the kernel currently deals with 2 (3 - if SMT) topology layers.

1) thread
2) core
3) package

What I wanted to say is, that, to simplify this series, you do not need
to introduce the COD topology level, since it is just another arbitrary
topology level (ie there is no way you can pinpoint which level
corresponds to COD with PPTT - or DT for the sake of this discussion)
that would not be used in the kernel (apart from big.LITTLE cpufreq
driver and PSCI checker whose usage of topology_physical_package_id() is
questionable anyway).

Oh! But, i'm at a loss as to what to do with those two users if I set the node which has the physical socket flag set, as the "cluster_id" in the topology.

Granted, this being ACPI I don't expect the cpufreq driver to be active (given CPPC) and the psci checker might be ignored? Even so, its a bit of a misnomer what is actually happening. Are we good with this?



PPTT allows you to define what level corresponds to a package, use
it to initialize the package topology level (that on ARM internal
variables we call cluster) and be done with it.

I do not think that adding another topology level improves anything as
far as ACPI topology detection is concerned, you are not able to use it
in the scheduler or from userspace to group CPUs anyway.

Correct, and AFAIK after having poked a bit at the scheduler its sort of redundant as the generic cache sharing levels are more useful anyway.

What do you mean, it can't be used? We expect a followup series which uses PPTT to define scheduling domains/groups.

The scheduler supports 4 types of levels, with an arbitrary number of instances of each - NUMA, DIE (package, usually not used with NUMA), MC (multicore, typically cores which share resources like cache), SMT (threads).

It turns out to be pretty easy to map individual PPTT "levels" to MC layers simply by creating a custom sched_domain_topology_level and populating it with an equal number of MC layers. The only thing that changes is the "mask" portion of each entry.

Whether that is good/bad vs just using a topology like:

static struct sched_domain_topology_level arm64_topology[] = {
#ifdef CONFIG_SCHED_SMT
{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
#endif
{ cpu_cluster_mask, cpu_core_flags, SD_INIT_NAME(CLU) },
#ifdef CONFIG_SCHED_MC
{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
#endif
{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
{ NULL, },
};

and using it on successful ACPI/PPTT parse, along with a new cpu_cluster_mask isn't clear to me either. Particularly, if one goes in and starts changing the "cpu_core_flags" for starters to the cpu_smt_flags.


But as mentioned I think this is a follow on patch which meshes with patches 4/5 here.




Our particular platform has a single socket/package, with multiple "clusters", each cluster consisting of multiple cores that share caches. ÂWe represent all of this in PPTT, and expect it to be used. Leaf nodes are cores. The level above is the cluster. The top level is the package. We expect eventually (and understand that Jeremy is not tackling this with his current series) that clusters get represented MC so that migrated processes prefer their cache-shared siblings, and the entire package is represented by DIE.

This will have to come from PPTT since you can't use core_siblings to derive this. Additionally, if we had multiple layers of clustering, we would expect each layer to be represented by MC. Topology.c has none of this support today.

PPTT can refer to SLIT/SRAT to determine if a hirearchy level corresponds to the "Cluster-on-Die" concept of other architectures (which end up as NUMA nodes in NUMA scheduling domains).

What PPTT will have to do is parse the tree(s), determine what each level is - SMT, MC, NUMA, DIE - and then use set_sched_topology() so that the scheduler can build up groups/domains appropriately.


Jeremy, we've tested v3 on our platform. The topology part works as expected, we no longer see lstopo reporting sockets where there are none, but the scheduling groups are broken (expected). Caches still don't work right (no sizes reported, and the sched caches are not attributed to the cores). We will likely have additional comments as we delve into it.


Does this answer your question ?
Yes, other than what to do with the two drivers.


Thanks,
Lorenzo