Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared

From: Peter Zijlstra
Date: Thu May 12 2016 - 07:34:19 EST


On Thu, May 12, 2016 at 09:07:52PM +1000, Michael Neuling wrote:
> On Thu, 2016-05-12 at 07:07 +0200, Peter Zijlstra wrote:

> > But as per the above, Power7 and Power8 have explicit logic to share the
> > per-core L3 with the other cores.
> >
> > How effective is that? From some of the slides/documents i've looked at
> > the L3s are connected with a high-speed fabric. Suggesting that the
> > cross-core sharing should be fairly efficient.
>
> I'm not sure.  I thought it was mostly private but if another core was
> sleeping or not experiencing much cache pressure, another core could use it
> for some things. But I'm fuzzy on the the exact properties, sorry.

Right; I'm going by bits and pieces found on the tubes, so I'm just
guessing ;-)

But it sounds like these L3s are nowhere close to what Intel does with
their L3, where each core has an L3 slice, and slices are connected on a
ring to form a unified/shared cache across all cores.

http://www.realworldtech.com/sandy-bridge/8/

> > In which case it would make sense to treat/model the combined L3 as a
> > single large LLC covering all cores.
>
> Are you thinking it would be much cheaper to migrate a task to another core
> inside this chip, than to off chip?

Basically; and if so, if its cheap enough to shoot a task to an idle
core to avoid queueing. Assuming there still is some cache residency on
the old core, the inter-core fill should be much cheaper than fetching
it off package (either remote cache or dram).

Or at least; so goes my reasoning based on my google results.