Re: [PATCH v3 15/21] sched/cache: Disable cache aware scheduling for processes with high thread counts

Next message: Konrad Dybcio: "Re: [PATCH v7] crypto: qce - Add runtime PM and interconnect bandwidth scaling support"
Previous message: Andreas Hindborg: "[PATCH v15 3/9] rust: Add missing SAFETY documentation for `ARef` example"
In reply to: Madadi Vineeth Reddy: "Re: [PATCH v3 15/21] sched/cache: Disable cache aware scheduling for processes with high thread counts"
Next in thread: Peter Zijlstra: "Re: [PATCH v3 15/21] sched/cache: Disable cache aware scheduling for processes with high thread counts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Peter Zijlstra

Date: Fri Feb 20 2026 - 04:54:23 EST

On Fri, Feb 20, 2026 at 12:10:21PM +0530, Madadi Vineeth Reddy wrote:
> Hi Peter,
>
> On 19/02/26 22:25, Peter Zijlstra wrote:
> > On Wed, Feb 18, 2026 at 11:24:05PM +0530, Madadi Vineeth Reddy wrote:
> >> Is there a way to make this useful for architectures with small LLC
> >> sizes? One possible approach we were exploring is to have LLC at a
> >> hemisphere level that comprise multiple SMT4 cores.
> >
> > Is this hemisphere an actual physical cache level, or would that be
> > artificial?
>
> It's artificial. There is no cache being shared at this level but this is
> still the level where some amount of cache-snooping takes place and it is
> relatively faster to access the data from the caches of the cores
> within this domain.
>
> We verified with this producer consumer workload where the producer
> and consumer threads placed in the same hemisphere showed measurably
> better latency compared to cross-hemisphere placement.

So I just read the Power10 Wikipedia entry; that seems to suggest there
actually is a significant L3 at the hemisphere level.

That thing states that Power10 has:

- 16 cores in two hemispheres of 8 cores each.
- each core has 2M L2 cache
- each hemi has 64M of L3 cache

Then there appears to be a 'funny' in that there's always one 'dead'
core, so you end up with 8+7, and the small hemi looses an 8M L3 slice
due to that.

Now, I'm just reading a Wiki pages written by a random person on the
interweb, so perhaps this is wrong (in which case I would suggest you
get someone from IBM to go and edit that page and provide references),
or there has been a miscommunication somewhere else, and perhaps there
really is L3 at the hemi level, and arch/powerpc/ 'forgot' to expose
that :-)