Re: [PATCH] arm64: of: handle multiple threads in ARM cpu node

From: Mark Rutland
Date: Fri Jan 10 2025 - 12:26:10 EST


On Fri, Jan 10, 2025 at 05:02:11PM +0000, Alireza Sanaee wrote:
> On Fri, 10 Jan 2025 16:23:00 +0000
> Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> Hi Mark,
>
> Thanks for prompt feedback.
>
> Please look inline.
>
> > On Fri, Jan 10, 2025 at 04:10:57PM +0000, Alireza Sanaee wrote:
> > > Update `of_parse_and_init_cpus` to parse reg property of CPU node as
> > > an array based as per spec for SMT threads.
> > >
> > > Spec v0.4 Section 3.8.1:
> >
> > Which spec, and why do we care?
>
> For the spec, this is what I looked
> into https://github.com/devicetree-org/devicetree-specification/releases/download/v0.4/devicetree-specification-v0.4.pdf
> Section 3.8.1
>
> Sorry I didn't put the link in there.

Ok, so that's "The devicetree specification v0.4 from ${URL}", rather
than "Spec v0.4".

> One limitation with the existing approach is that it is not really
> possible to describe shared caches for SMT cores as they will be seen
> as separate CPU cores in the device tree. Is there anyway to do so?

Can't the existing cache bindings handle that? e.g. give both threads a
next-level-cache pointing to the shared L1?

> More discussion over sharing caches for threads
> here https://lore.kernel.org/kvm/20241219083237.265419-1-zhao1.liu@xxxxxxxxx/

In that thread Rob refers to earlier discussions, so I don't think
that thread alone has enough context.

> > > The value of reg is a <prop-encoded-**array**> that defines a unique
> > > CPU/thread id for the CPU/threads represented by the CPU node.
> > > **If a CPU supports more than one thread (i.e. multiple streams of
> > > execution) the reg property is an array with 1 element per
> > > thread**. The address-cells on the /cpus node specifies how many
> > > cells each element of the array takes. Software can determine the
> > > number of threads by dividing the size of reg by the parent node's
> > > address-cells.
> >
> > We already have systems where each thread gets a unique CPU node under
> > /cpus, so we can't rely on this to determine the topology.
>
> I assume we can generate unique values even in reg array, but probably
> makes things more complicated.

The other bindings use phandles to refer to threads, and phandles point
to nodes in the dt, so it's necessary for threads to be given separate
nodes.

Note that the CPU topology bindings use that to describe threads, see

Documentation/devicetree/bindings/cpu/cpu-topology.txt

> > Further, there are bindings which rely on being able to address each
> > CPU/thread with a unique phandle (e.g. for affinity of PMU
> > interrupts), which this would break.

> > Regardless, as above I do not think this is a good idea. While it
> > allows the DT to be written in a marginally simpler way, it makes
> > things more complicated for the kernel and is incompatible with
> > bindings that we already support.
> >
> > If anything "the spec" should be relaxed here.
>
> Hi Rob,
>
> If this approach is too disruptive, then shall we fallback to the
> approach where go share L1 at next-level-cache entry?

Ah, was that previously discussed, and were there any concerns against
that approach?

To be clear, my main concern here is that threads remain represented as
distinct nodes under /cpus; I'm not wedded to the precise solution for
representing shared caches.

Mark.