On Wed, Nov 07, 2018 at 04:31:34AM +0200, Nick Kossifidis wrote:
Mark and Sundeep thanks a lot for your feedback, I guess you convinced
me that having a device tree binding for the scheduler is not a
correct approach. It's not a device after all and I agree that the
device tree shouldn't become an OS configuration file.
Good to hear.
Regarding multiple levels of shared resources my point is that since
cpu-map doesn't contain any information of what is shared among the
cluster/core members it's not easy to do any further translation. Last
time I checked the arm code that uses cpu-map, it only defines one
domain for SMT, one for MC and then everything else is ignored. No
matter how many clusters have been defined, anything above the core
level is the same (and then I guess you started talking about adding
"packages" on the representation side).
While cpu-map doesn't contain that information today, we can *add* that
information to the cpu-map binding if necessary.
The reason I proposed to have a binding for the scheduler directly is
not only because it's simpler and closer to what really happens in the
code, it also makes more sense to me than the combination of cpu-map
with all the related mappings e.g. for numa or caches or power
domains etc.
However you are right we could definitely augment cpu-map to include
support for what I'm saying and clean things up, and since you are
open about improving it here is a proposal that I hope you find
interesting:
At first let's get rid of the <thread> nodes, they don't make sense:
thread0 {
cpu = <&CPU0>;
};
A thread node can't have more than one cpu entry and any properties
should be on the cpu node itself, so it doesn't / can't add any
more information. We could just have an array of cpu nodes on the
<core> node, it's much cleaner this way.
core0 {
members = <&CPU0>, <&CPU1>;
};
Hold on. Rather than reinventing things from first principles, can we
please discuss what you want to *achieve*, i.e. what information you
need?
Having a node is not a significant cost, and there are reasons we may
want thread nodes. For example, it means that we can always refer to any
level of topology with a phandle, and we might want to describe
thread-affine devices in future.
There are a tonne of existing bindings that are ugly, but re-inventing
them for taste reasons alone is more costly to the ecosystem than simply
using the existing bindings. We avoid re-inventing bindings unless there
is a functional problem e.g. cases which they cannot possibly describe.
Then let's allow the cluster and core nodes to accept attributes that are
common for the cpus they contain. Right now this is considered invalid.
For power domains we have a generic binding described on
Documentation/devicetree/bindings/power/power_domain.txt
which basically says that we need to put power-domains = <power domain
specifiers>
attribute on each of the cpu nodes.
FWIW, given this is arguably topological, I'm not personally averse to
describing this in the cpu-map, if that actually gains us more than the
complexity require to support it.
Given we don't do this for device power domains, I suspect that it's
simpler to stick with the existing binding.
The same happens with the capacity binding specified for arm on
Documentation/devicetree/bindings/arm/cpu-capacity.txt
which says we should add the capacity-dmips-mhz on each of the cpu nodes.
The cpu-map was intended to expose topological dtails, and this isn't
really a topological property. For example, Arm DynamIQ systems can have
heterogeneous CPUs within clusters.
I do not think it's worth moving this, tbh.
The same also happens with the generic numa binding on
Documentation/devicetree/bindings/numa.txt
which says we should add the nuna-node-id on each of the cpu nodes.
Is there a strong gain from moving this?
[...]
Finally from the examples above I'd like to stress out that the distinction
between a cluster and a core doesn't make much sense and it also makes the
representation more complicated. To begin with, how would you call the setup
on HiFive Unleashed ? A cluster of 4 cores that share the same L3 cache ?
Not knowing much about the hardware, I can't really say.
I'm not sure I follow why the distinction between a cluster and a core
is non-sensical. A cluster is always a collection of cores.
A hart could be a core in its own right, or it could be a thread under a
core, which shares functional units with other harts within that core.
Arguably, we could have mandated that the topology always needed to
describe down to a thread, even if a core only had a single thread. That
ship has sailed, however.