Re: [RFC PATCH] topology: Represent clusters of CPUs within a die.
From: Sudeep Holla
Date: Mon Oct 19 2020 - 06:01:38 EST
+Morten
On Fri, Oct 16, 2020 at 11:27:02PM +0800, Jonathan Cameron wrote:
> Both ACPI and DT provide the ability to describe additional layers of
> topology between that of individual cores and higher level constructs
> such as the level at which the last level cache is shared.
> In ACPI this can be represented in PPTT as a Processor Hierarchy
> Node Structure [1] that is the parent of the CPU cores and in turn
> has a parent Processor Hierarchy Nodes Structure representing
> a higher level of topology.
>
> For example Kunpeng 920 has clusters of 4 CPUs. These do not share
> any cache resources, but the interconnect topology is such that
> the cost to transfer ownership of a cacheline between CPUs within
> a cluster is lower than between CPUs in different clusters on the same
> die. Hence, it can make sense to deliberately schedule threads
> sharing data to a single cluster.
>
This is very SoC specific and hard to generalise, hence LLC is chosen
as one of the main factor to decide.
Are there any scheduler topology related changes to achieve the same ?
If so, we need their opinion on that.
> This patch simply exposes this information to userspace libraries
> like hwloc by providing cluster_cpus and related sysfs attributes.
> PoC of HWLOC support at [2].
>
OK, cluster is too Arm specific with no definition for it whatsoever.
How do you plan to support clusters of clusters or higher level of
hierarchy present in PPTT.
> Note this patch only handle the ACPI case.
>
If we decide to add this to sysfs, I prefer to keep DT implementation
also in sync. The bindings are in sync, just matter of implementation.
> Special consideration is needed for SMT processors, where it is
> necessary to move 2 levels up the hierarchy from the leaf nodes
> (thus skipping the processor core level).
>
> Currently the ID provided is the offset of the Processor
> Hierarchy Nodes Structure within PPTT. Whilst this is unique
> it is not terribly elegant so alternative suggestions welcome.
>
That is firmware choice. May be your firmware just fills in mandatory
fields and doesn't care about anything and everything optional. We do
check for valid UID fields and if that is not set, offset is used as
unique ID in kernel implementation. So if you enhance the firmware, the
kernel sysfs will become elegant as you expect 😉
> Note that arm64 / ACPI does not provide any means of identifying
> a die level in the topology but that may be unrelate to the cluster
> level.
>
Indeed. It can be cluster of clusters on some platform. If we need that
info, we should add it. My assumption was that generally each die forms
a NUMA node and hence the information is available there. I may be wrong.
> RFC questions:
> 1) Naming
> 2) Related to naming, do we want to represent all potential levels,
> or this enough? On Kunpeng920, the next level up from cluster happens
> to be covered by llc cache sharing, but in theory more than one
> level of cluster description might be needed by some future system.
That is my question above. Can't recall the terminology used in ACPI
PPTT, but IIRC "cluster" is not used to keep it generic. May be we need
to do the same here as the term "cluster" is ill-defined on Arm and I
would avoid using it if possible.
> 3) Do we need DT code in place? I'm not sure any DT based ARM64
> systems would have enough complexity for this to be useful.
I prefer to keep the implementation in sync
> 4) Other architectures? Is this useful on x86 for example?
>
AMD had multi-die within socket IIRC. IIUC, cpuid will provide the info
and nothing more is needed from ACPI ? I may be wrong, just guessing/asking.
--
Regards,
Sudeep