Re: [PATCH] arm64: of: handle multiple threads in ARM cpu node

From: Alireza Sanaee
Date: Mon Jan 13 2025 - 06:59:09 EST


On Fri, 10 Jan 2025 17:25:50 +0000
Mark Rutland <mark.rutland@xxxxxxx> wrote:

Hi Mark,

Just resending, but without the screenshot mistakenly attached to
the other email. Sorry about that.

> On Fri, Jan 10, 2025 at 05:02:11PM +0000, Alireza Sanaee wrote:
> > On Fri, 10 Jan 2025 16:23:00 +0000
> > Mark Rutland <mark.rutland@xxxxxxx> wrote:
> >
> > Hi Mark,
> >
> > Thanks for prompt feedback.
> >
> > Please look inline.
> >
> > > On Fri, Jan 10, 2025 at 04:10:57PM +0000, Alireza Sanaee wrote:
> > > > Update `of_parse_and_init_cpus` to parse reg property of CPU
> > > > node as an array based as per spec for SMT threads.
> > > >
> > > > Spec v0.4 Section 3.8.1:
> > >
> > > Which spec, and why do we care?
> >
> > For the spec, this is what I looked
> > into https://github.com/devicetree-org/devicetree-specification/releases/download/v0.4/devicetree-specification-v0.4.pdf
> > Section 3.8.1
> >
> > Sorry I didn't put the link in there.
>
> Ok, so that's "The devicetree specification v0.4 from ${URL}", rather
> than "Spec v0.4".

:) sure, I will be more precise in my future correspondences.

>
> > One limitation with the existing approach is that it is not really
> > possible to describe shared caches for SMT cores as they will be
> > seen as separate CPU cores in the device tree. Is there anyway to
> > do so?
>
> Can't the existing cache bindings handle that? e.g. give both threads
> a next-level-cache pointing to the shared L1?

Unfortunately, I have tested this recently, there are some leg work to
be able to even enable that, and does not work right now.

>
> > More discussion over sharing caches for threads
> > here https://lore.kernel.org/kvm/20241219083237.265419-1-zhao1.liu@xxxxxxxxx/
>
> In that thread Rob refers to earlier discussions, so I don't think
> that thread alone has enough context.

https://lore.kernel.org/linux-devicetree/CAL_JsqLGEvGBQ0W_B6+5cME1UEhuKXadBB-6=GoN1tmavw9K_w@xxxxxxxxxxxxxx/

This was the earlier discussion, where Rob pointed me towards
investigating this approach (this patch).

>
> > > > The value of reg is a <prop-encoded-**array**> that defines a
> > > > unique CPU/thread id for the CPU/threads represented by the CPU
> > > > node. **If a CPU supports more than one thread (i.e. multiple
> > > > streams of execution) the reg property is an array with 1
> > > > element per thread**. The address-cells on the /cpus node
> > > > specifies how many cells each element of the array takes.
> > > > Software can determine the number of threads by dividing the
> > > > size of reg by the parent node's address-cells.
> > >
> > > We already have systems where each thread gets a unique CPU node
> > > under /cpus, so we can't rely on this to determine the topology.
> >
> > I assume we can generate unique values even in reg array, but
> > probably makes things more complicated.
>
> The other bindings use phandles to refer to threads, and phandles
> point to nodes in the dt, so it's necessary for threads to be given
> separate nodes.
>
> Note that the CPU topology bindings use that to describe threads, see
>
> Documentation/devicetree/bindings/cpu/cpu-topology.txt

Noted. Makes sense.

>
> > > Further, there are bindings which rely on being able to address
> > > each CPU/thread with a unique phandle (e.g. for affinity of PMU
> > > interrupts), which this would break.
>
> > > Regardless, as above I do not think this is a good idea. While it
> > > allows the DT to be written in a marginally simpler way, it makes
> > > things more complicated for the kernel and is incompatible with
> > > bindings that we already support.
> > >
> > > If anything "the spec" should be relaxed here.
> >
> > Hi Rob,
> >
> > If this approach is too disruptive, then shall we fallback to the
> > approach where go share L1 at next-level-cache entry?
>
> Ah, was that previously discussed, and were there any concerns against
> that approach?
>
> To be clear, my main concern here is that threads remain represented
> as distinct nodes under /cpus; I'm not wedded to the precise solution
> for representing shared caches.

This was basically what comes to mind as a
non-invasive preliminary solution. That said
there were no discussions over downsides or advantages of having a
separate layer for l1-cache YET.

But if it is something reasonable, I can look into it.

>
> Mark.
>
>

Thanks,
Alireza