Re: [PATCH v3 08/10] sched/topology: Remove SHARED_CHILD from ASYM_PACKING

From: Ricardo Neri
Date: Sun Mar 05 2023 - 13:58:41 EST


On Fri, Mar 03, 2023 at 11:29:52AM +0000, Ionela Voinescu wrote:
> Hi Ricardo,

Hi Ionela!

>
> On Monday 06 Feb 2023 at 20:58:36 (-0800), Ricardo Neri wrote:
> > Only x86 and Power7 use ASYM_PACKING. They use it differently.
> >
> > Power7 has cores of equal priority, but the SMT siblings of a core have
> > different priorities. Parent scheduling domains do not need (nor have) the
> > ASYM_PACKING flag. SHARED_CHILD is not needed. Using SHARED_PARENT would
> > cause the topology debug code to complain.
> >
> > X86 has cores of different priority, but all the SMT siblings of the core
> > have equal priority. It needs ASYM_PACKING at the MC level, but not at the
> > SMT level (it also needs it at upper levels if they have scheduling groups
> > of different priority). Removing ASYM_PACKING from the SMT domain causes
> > the topology debug code to complain.
> >
> > Remove SHARED_CHILD for now. We still need a topology check that satisfies
> > both architectures.
> >
> > Cc: Ben Segall <bsegall@xxxxxxxxxx>
> > Cc: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
> > Cc: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
> > Cc: Len Brown <len.brown@xxxxxxxxx>
> > Cc: Mel Gorman <mgorman@xxxxxxx>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > Cc: Tim C. Chen <tim.c.chen@xxxxxxxxx>
> > Cc: Valentin Schneider <vschneid@xxxxxxxxxx>
> > Cc: x86@xxxxxxxxxx
> > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > Suggested-by: Valentin Schneider <vschneid@xxxxxxxxxx>
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@xxxxxxxxxxxxxxx>
> > ---
> > Changes since v2:
> > * Introduced this patch.
> >
> > Changes since v1:
> > * N/A
> > ---
> > include/linux/sched/sd_flags.h | 5 +----
> > 1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
> > index 57bde66d95f7..800238854ba5 100644
> > --- a/include/linux/sched/sd_flags.h
> > +++ b/include/linux/sched/sd_flags.h
> > @@ -132,12 +132,9 @@ SD_FLAG(SD_SERIALIZE, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
> > /*
> > * Place busy tasks earlier in the domain
> > *
> > - * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
> > - * up, but currently assumed to be set from the base domain
> > - * upwards (see update_top_cache_domain()).
> > * NEEDS_GROUPS: Load balancing flag.
> > */
> > -SD_FLAG(SD_ASYM_PACKING, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
> > +SD_FLAG(SD_ASYM_PACKING, SDF_NEEDS_GROUPS)
>
> While this silences the warning one would have gotten when removing
> SD_ASYM_PACKING from SMT level, it will still result in sd_asym_packing
> being NULL for these systems, which breaks nohz balance. That is because
> highest_flag_domain() still stops searching at the first level without
> the flag set, in this case SMT, even if levels above have the flag set.

You are absolutely right! This how this whole discussion started. It
slipped my mind.

>
> Maybe highest_flag_domain() should be changed to take into account the
> metadata flags?

What about the patch below? Search will stop if the flag has
SDF_SHARED_CHILD as it does today. Otherwise it will search all the
domains.

--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1773,6 +1773,12 @@ queue_balance_callback(struct rq *rq,
for (__sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); \
__sd; __sd = __sd->parent)

+#define SD_FLAG(name, mflags) (name * !!((mflags) & SDF_SHARED_CHILD)) |
+static const unsigned int SD_SHARED_CHILD_MASK =
+#include <linux/sched/sd_flags.h>
+0;
+#undef SD_FLAG
+
/**
* highest_flag_domain - Return highest sched_domain containing flag.
* @cpu: The CPU whose highest level of sched domain is to
@@ -1781,15 +1787,19 @@ queue_balance_callback(struct rq *rq,
* for the given CPU.
*
* Returns the highest sched_domain of a CPU which contains the given flag.
- */
+*/
static inline struct sched_domain *highest_flag_domain(int cpu, int flag)
{
struct sched_domain *sd, *hsd = NULL;

for_each_domain(cpu, sd) {
- if (!(sd->flags & flag))
+ if (sd->flags & flag) {
+ hsd = sd;
+ continue;
+ }
+
+ if (flag & SD_SHARED_CHILD_MASK)
break;
- hsd = sd;
}

return hsd;

>
> Thanks,
> Ionela.
>
> >
> > /*
> > * Prefer to place tasks in a sibling domain
> > --
> > 2.25.1
> >
> >