Re: [PATCH v4 1/3] sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag

From: Beata Michalska
Date: Tue May 18 2021 - 11:48:08 EST


On Tue, May 18, 2021 at 05:28:11PM +0200, Vincent Guittot wrote:
> On Tue, 18 May 2021 at 17:09, Beata Michalska <beata.michalska@xxxxxxx> wrote:
> >
> > On Tue, May 18, 2021 at 04:53:09PM +0200, Vincent Guittot wrote:
> > > On Tue, 18 May 2021 at 16:27, Beata Michalska <beata.michalska@xxxxxxx> wrote:
> > > >
> > > > On Tue, May 18, 2021 at 03:39:27PM +0200, Vincent Guittot wrote:
> > > > > On Mon, 17 May 2021 at 10:24, Beata Michalska <beata.michalska@xxxxxxx> wrote:
> > > > > >
> > > > > > Introducing new, complementary to SD_ASYM_CPUCAPACITY, sched_domain
> > > > > > topology flag, to distinguish between shed_domains where any CPU
> > > > > > capacity asymmetry is detected (SD_ASYM_CPUCAPACITY) and ones where
> > > > > > a full range of CPU capacities is visible to all domain members
> > > > > > (SD_ASYM_CPUCAPACITY_FULL).
> > > > >
> > > > > I'm not sure about what you want to detect:
> > > > >
> > > > > Is it a sched_domain level with a full range of cpu capacity, i.e.
> > > > > with at least 1 min capacity and 1 max capacity ?
> > > > > or do you want to get at least 1 cpu of each capacity ?
> > > > That would be at least one CPU of each available capacity within given domain,
> > > > so full -set- of available capacities within a domain.
> > >
> > > Would be good to add the precision.
> > Will do.
> > >
> > > Although I'm not sure if that's the best policy compared to only
> > > getting the range which would be far simpler to implement.
> > > Do you have some topology example ?
> >
> > An example from second patch from the series:
> >
> > DIE [ ]
> > MC [ ][ ]
> >
> > CPU [0] [1] [2] [3] [4] [5] [6] [7]
> > Capacity |.....| |.....| |.....| |.....|
> > L M B B
>
> The one above , which is described in your patchset, works with the range policy
Yeap, but that is just a variation of all the possibilities....
>
> >
> > Where:
> > arch_scale_cpu_capacity(L) = 512
> > arch_scale_cpu_capacity(M) = 871
> > arch_scale_cpu_capacity(B) = 1024
> >
> > which could also look like:
> >
> > DIE [ ]
> > MC [ ][ ]
> >
> > CPU [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
> > Capacity |.....| |.....| |.....| |.....| |.....|
> > L M B L B
>
> I know that that HW guys can come with crazy idea but they would
> probably add M instead of L with B in the 2nd cluster as a boost of
> performance at the cost of powering up another "cluster" in which case
> the range policy works as well
>
> >
> > Considering only range would mean loosing the 2 (M) CPUs out of sight
> > for feec in some cases.
>
> Is it realistic ? Considering all the code and complexity added by
> patch 2, will we really use it at the end ?
>
I do completely agree that the first approach was slightly .... blown out of
proportions, but with Peter's idea, the complexity has dropped significantly.
With the range being considered we are back to per domain tracking of available
capacities (min/max), plus additional cycles on comparing capacities.
Unless I fail to see the simplicity of that approach ?

---
BR
B.
> Regards,
> Vincent
> >
> > ---
> > BR.
> > B
> > >
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > ---
> > > > BR
> > > > B.
> > > > >
> > > > >
> > > > > >
> > > > > > With the distinction between full and partial CPU capacity asymmetry,
> > > > > > brought in by the newly introduced flag, the scope of the original
> > > > > > SD_ASYM_CPUCAPACITY flag gets shifted, still maintaining the existing
> > > > > > behaviour when one is detected on a given sched domain, allowing
> > > > > > misfit migrations within sched domains that do not observe full range
> > > > > > of CPU capacities but still do have members with different capacity
> > > > > > values. It loses though it's meaning when it comes to the lowest CPU
> > > > > > asymmetry sched_domain level per-cpu pointer, which is to be now
> > > > > > denoted by SD_ASYM_CPUCAPACITY_FULL flag.
> > > > > >
> > > > > > Signed-off-by: Beata Michalska <beata.michalska@xxxxxxx>
> > > > > > Reviewed-by: Valentin Schneider <valentin.schneider@xxxxxxx>
> > > > > > ---
> > > > > > include/linux/sched/sd_flags.h | 10 ++++++++++
> > > > > > 1 file changed, 10 insertions(+)
> > > > > >
> > > > > > diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
> > > > > > index 34b21e9..57bde66 100644
> > > > > > --- a/include/linux/sched/sd_flags.h
> > > > > > +++ b/include/linux/sched/sd_flags.h
> > > > > > @@ -91,6 +91,16 @@ SD_FLAG(SD_WAKE_AFFINE, SDF_SHARED_CHILD)
> > > > > > SD_FLAG(SD_ASYM_CPUCAPACITY, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
> > > > > >
> > > > > > /*
> > > > > > + * Domain members have different CPU capacities spanning all unique CPU
> > > > > > + * capacity values.
> > > > > > + *
> > > > > > + * SHARED_PARENT: Set from the topmost domain down to the first domain where
> > > > > > + * all available CPU capacities are visible
> > > > > > + * NEEDS_GROUPS: Per-CPU capacity is asymmetric between groups.
> > > > > > + */
> > > > > > +SD_FLAG(SD_ASYM_CPUCAPACITY_FULL, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
> > > > > > +
> > > > > > +/*
> > > > > > * Domain members share CPU capacity (i.e. SMT)
> > > > > > *
> > > > > > * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
> > > > > > --
> > > > > > 2.7.4
> > > > > >