Re: [PATCH v3 08/10] sched/topology: Remove SHARED_CHILD from ASYM_PACKING

From: Ionela Voinescu
Date: Mon Mar 06 2023 - 08:10:54 EST


Hey,

On Sunday 05 Mar 2023 at 11:08:11 (-0800), Ricardo Neri wrote:
> On Fri, Mar 03, 2023 at 11:29:52AM +0000, Ionela Voinescu wrote:
> > Hi Ricardo,
>
> Hi Ionela!
>
> >
> > On Monday 06 Feb 2023 at 20:58:36 (-0800), Ricardo Neri wrote:
> > > Only x86 and Power7 use ASYM_PACKING. They use it differently.
> > >
> > > Power7 has cores of equal priority, but the SMT siblings of a core have
> > > different priorities. Parent scheduling domains do not need (nor have) the
> > > ASYM_PACKING flag. SHARED_CHILD is not needed. Using SHARED_PARENT would
> > > cause the topology debug code to complain.
> > >
> > > X86 has cores of different priority, but all the SMT siblings of the core
> > > have equal priority. It needs ASYM_PACKING at the MC level, but not at the
> > > SMT level (it also needs it at upper levels if they have scheduling groups
> > > of different priority). Removing ASYM_PACKING from the SMT domain causes
> > > the topology debug code to complain.
> > >
> > > Remove SHARED_CHILD for now. We still need a topology check that satisfies
> > > both architectures.
> > >
> > > Cc: Ben Segall <bsegall@xxxxxxxxxx>
> > > Cc: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
> > > Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> > > Cc: Len Brown <len.brown@xxxxxxxxx>
> > > Cc: Mel Gorman <mgorman@xxxxxxx>
> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > > Cc: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
> > > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > > Cc: Tim C. Chen <tim.c.chen@xxxxxxxxx>
> > > Cc: Valentin Schneider <vschneid@xxxxxxxxxx>
> > > Cc: x86@xxxxxxxxxx
> > > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > > Suggested-by: Valentin Schneider <vschneid@xxxxxxxxxx>
> > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> > > ---
> > > Changes since v2:
> > > * Introduced this patch.
> > >
> > > Changes since v1:
> > > * N/A
> > > ---
> > > include/linux/sched/sd_flags.h | 5 +----
> > > 1 file changed, 1 insertion(+), 4 deletions(-)
> > >
> > > diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
> > > index 57bde66d95f7..800238854ba5 100644
> > > --- a/include/linux/sched/sd_flags.h
> > > +++ b/include/linux/sched/sd_flags.h
> > > @@ -132,12 +132,9 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
> > > /*
> > > * Place busy tasks earlier in the domain
> > > *
> > > - * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
> > > - * up, but currently assumed to be set from the base domain
> > > - * upwards (see update_top_cache_domain()).
> > > * NEEDS_GROUPS: Load balancing flag.
> > > */
> > > -SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
> > > +SD_FLAG(SD_ASYM_PACKING, SDF_NEEDS_GROUPS)
> >
> > While this silences the warning one would have gotten when removing
> > SD_ASYM_PACKING from SMT level, it will still result in sd_asym_packing
> > being NULL for these systems, which breaks nohz balance. That is because
> > highest_flag_domain() still stops searching at the first level without
> > the flag set, in this case SMT, even if levels above have the flag set.
>
> You are absolutely right! This how this whole discussion started. It
> slipped my mind.
>
> >
> > Maybe highest_flag_domain() should be changed to take into account the
> > metadata flags?
>
> What about the patch below? Search will stop if the flag has
> SDF_SHARED_CHILD as it does today. Otherwise it will search all the
> domains.
>
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1773,6 +1773,12 @@ queue_balance_callback(struct rq *rq,
> for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); \
> __sd; __sd = __sd->parent)
>
> +#define SD_FLAG(name, mflags) (name * !!((mflags) & SDF_SHARED_CHILD)) |
> +static const unsigned int SD_SHARED_CHILD_MASK =
> +#include <linux/sched/sd_flags.h>
> +0;
> +#undef SD_FLAG
> +
> /**
> * highest_flag_domain - Return highest sched_domain containing flag.
> * @cpu: The CPU whose highest level of sched domain is to
> @@ -1781,15 +1787,19 @@ queue_balance_callback(struct rq *rq,
> * for the given CPU.
> *
> * Returns the highest sched_domain of a CPU which contains the given flag.
> - */
> +*/
^^^
likely an unintended change
> static inline struct sched_domain *highest_flag_domain(int cpu, int flag)
> {
> struct sched_domain *sd, *hsd = NULL;
>
> for_each_domain(cpu, sd) {
> - if (!(sd->flags & flag))
> + if (sd->flags & flag) {
> + hsd = sd;
> + continue;
> + }
> +

There might be room for a comment here:
/*
* If the flag is not set and is known to be shared with lower
* domains, stop the search, as it won't be found further up.
*/
> + if (flag & SD_SHARED_CHILD_MASK)
> break;
> - hsd = sd;
> }
>
> return hsd;

It looks nice and sane to me - I've not compiled or tested it :).

Thanks,
Ionela.

>
> >
> > Thanks,
> > Ionela.
> >
> > >
> > > /*
> > > * Prefer to place tasks in a sibling domain
> > > --
> > > 2.25.1
> > >
> > >