Re: [PATCH] arm64: dts: sdm845: Add CPU topology

From: Morten Rasmussen
Date: Thu Jun 06 2019 - 06:54:29 EST


On Thu, Jun 06, 2019 at 10:44:58AM +0200, Vincent Guittot wrote:
> On Thu, 6 Jun 2019 at 10:34, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
> >
> > On 6/6/19 10:20 AM, Vincent Guittot wrote:
> > > On Thu, 6 Jun 2019 at 09:49, Quentin Perret <quentin.perret@xxxxxxx> wrote:
> > >>
> > >> Hi Vincent,
> > >>
> > >> On Thursday 06 Jun 2019 at 09:05:16 (+0200), Vincent Guittot wrote:
> > >>> Hi Quentin,
> > >>>
> > >>> On Wed, 5 Jun 2019 at 19:21, Quentin Perret <quentin.perret@xxxxxxx> wrote:
> > >>>>
> > >>>> On Friday 17 May 2019 at 14:55:19 (-0700), Stephen Boyd wrote:
> > >>>>> Quoting Amit Kucheria (2019-05-16 04:54:45)
> > >>>>>> (cc'ing Andy's correct email address)
> > >>>>>>
> > >>>>>> On Wed, May 15, 2019 at 2:46 AM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote:
> > >>>>>>>
> > >>>>>>> Quoting Amit Kucheria (2019-05-13 04:54:12)
> > >>>>>>>> On Mon, May 13, 2019 at 4:31 PM Amit Kucheria <amit.kucheria@xxxxxxxxxx> wrote:
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Jan 15, 2019 at 12:13 AM Matthias Kaehlcke <mka@xxxxxxxxxxxx> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> The 8 CPU cores of the SDM845 are organized in two clusters of 4 big
> > >>>>>>>>>> ("gold") and 4 little ("silver") cores. Add a cpu-map node to the DT
> > >>>>>>>>>> that describes this topology.
> > >>>>>>>>>
> > >>>>>>>>> This is partly true. There are two groups of gold and silver cores,
> > >>>>>>>>> but AFAICT they are in a single cluster, not two separate ones. SDM845
> > >>>>>>>>> is one of the early examples of ARM's Dynamiq architecture.
> > >>>>>>>>>
> > >>>>>>>>>> Signed-off-by: Matthias Kaehlcke <mka@xxxxxxxxxxxx>
> > >>>>>>>>>
> > >>>>>>>>> I noticed that this patch sneaked through for this merge window but
> > >>>>>>>>> perhaps we can whip up a quick fix for -rc2?
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>> And please find attached a patch to fix this up. Andy, since this
> > >>>>>>>> hasn't landed yet (can we still squash this into the original patch?),
> > >>>>>>>> I couldn't add a Fixes tag.
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> I had the same concern. Thanks for catching this. I suspect this must
> > >>>>>>> cause some problem for IPA given that it can't discern between the big
> > >>>>>>> and little "power clusters"?
> > >>>>>>
> > >>>>>> Both EAS and IPA, I believe. It influences the scheduler's view of the
> > >>>>>> the topology.
> > >>>>>
> > >>>>> And EAS and IPA are OK with the real topology? I'm just curious if
> > >>>>> changing the topology to reflect reality will be a problem for those
> > >>>>> two.
> > >>>>
> > >>>> FWIW, neither EAS nor IPA depends on this. Not the upstream version of
> > >>>> EAS at least (which is used in recent Android kernels -- 4.19+).
> > >>>>
> > >>>> But doing this is still required for other things in the scheduler (the
> > >>>> so-called 'capacity-awareness' code). So until we have a better
> > >>>> solution, this patch is doing the right thing.
> > >>>
> > >>> I'm not sure to catch what you mean ?
> > >>> Which so-called 'capacity-awareness' code are you speaking about ? and
> > >>> what is the problem ?
> > >>
> > >> I'm talking about the wake-up path. ATM select_idle_sibling() is totally
> > >> unaware of capacity differences. In its current form, this function
> > >> basically assumes that all CPUs in a given sd_llc have the same
> > >> capacity, which would be wrong if we had a single MC level for SDM845.
> > >> So, until select_idle_sibling() is 'fixed' to be capacity-aware, we need
> > >> two levels of sd for asymetric systems (including DynamIQ) so the
> > >> wake_cap() story actually works.
> > >>
> > >> I hope that clarifies it :)
> > >
> > > hmm... does this justifies this wrong topology ?

No, it doesn't. It relies heavily on how nested clusters are interpreted
too, so it is quite fragile.

> > > select_idle_sibling() is called only when system is overloaded and
> > > scheduler disables the EAS path
> > > In this case, the scheduler looks either for an idle cpu or for evenly
> > > spreading the loads
> > > This is maybe not always optimal and should probably be fixed but
> > > doesn't justifies a wrong topology description IMHO
> >
> > The big/Little cluster detection in wake_cap() doesn't work anymore with
> > DynamIQ w/o Phanton (DIE) domain. So the decision of going sis() or slow
> > path is IMHO broken.
>
> That's probably not the right thread to discuss this further but i'm
> not sure to understand why wake_cap() doesn't work as it compares the
> capacity_orig of local cpu and prev cpu which are the same whatever
> the sche domainÅ

We have had this discussion a couple of times over the last couple of
years. The story, IIRC, is that when we introduced capacity awareness in
the wake-up path (wake_cap()) we realised (I think it was actually you)
that we could use select_idle_sibling() in cases where we know that the
search space is limited to cpus with sufficient capacity so we didn't
have to take the long route through find_idlest_cpu(). Back then, big
and little were grouped by clusters so it was "safe" to use
select_idle_sibling() on cpu or prev_cpu if they have sufficient
capacity.

With DynamiQ the true topology on many systems is just one cluster and
hence using select_idle_sibling() there means search space includes all
cpu types which isn't "safe" if you have a task requiring more capacity
than can be offered by any cpu in the system. We need to use the
find_idlest_cpu() path on more cases than we do today.

All the code is there I think, we just have to tweak some conditions. I
can try to come up with a simple fix we can discuss and refine as
necessary.

Morten