Re: [RFC PATCH 09/23] sched/fair: Use task-class performance score to pick the busiest group

From: Ricardo Neri
Date: Wed Oct 05 2022 - 19:32:20 EST


On Tue, Sep 27, 2022 at 01:01:32PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:11:51PM -0700, Ricardo Neri wrote:
> > update_sd_pick_busiest() keeps on selecting as the busiest group scheduling
> > groups of identical priority. Since both groups have the same priority,
> > either group is a good choice. The classes of tasks in the scheduling
> > groups can break this tie.
> >
> > Pick as busiest the scheduling group that yields a higher task-class
> > performance score after load balancing.
>
> > +/**
> > + * sched_asym_class_pick - Select a sched group based on classes of tasks
> > + * @a: A scheduling group
> > + * @b: A second scheduling group
> > + * @a_stats: Load balancing statistics of @a
> > + * @b_stats: Load balancing statistics of @b
> > + *
> > + * Returns: true if @a has the same priority and @a has classes of tasks that
> > + * yield higher overall throughput after load balance. Returns false otherwise.
> > + */
> > +static bool sched_asym_class_pick(struct sched_group *a,
> > + struct sched_group *b,
> > + struct sg_lb_stats *a_stats,
> > + struct sg_lb_stats *b_stats)
> > +{
> > + /*
> > + * Only use the class-specific preference selection if both sched
> > + * groups have the same priority.
> > + */
> > + if (arch_asym_cpu_priority(a->asym_prefer_cpu) !=
> > + arch_asym_cpu_priority(b->asym_prefer_cpu))
> > + return false;
> > +
> > + return sched_asym_class_prefer(a_stats, b_stats);
> > +}
> > +
> > #else /* CONFIG_SCHED_TASK_CLASSES */
> > static void update_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs,
> > struct rq *rq)
>
> > @@ -9049,6 +9111,12 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > /* Prefer to move from lowest priority CPU's work */
> > if (sched_asym_prefer(sg->asym_prefer_cpu, sds->busiest->asym_prefer_cpu))
> > return false;
> > +
> > + /* @sg and @sds::busiest have the same priority. */
> > + if (sched_asym_class_pick(sds->busiest, sg, &sds->busiest_stat, sgs))
> > + return false;
> > +
> > + /* @sg has lower priority than @sds::busiest. */
> > break;
> >
> > case group_misfit_task:
>
> So why does only this one instance of asym_prefer() require tie
> breaking?

This is the only place in which two sched groups with running tasks and of
equal priority are compared.

In all other places sched_asym_prefer() is used to compare the destination
CPU with others. Since asym_packing is done only when the destination CPU is
idle, there is no need to break this tie.

>
> I must also re-iterate how much I hate having two different means of
> dealing with big-little topologies.

Yes, that is true. We discussed the challenges of using EAS on x86 during
this year's Linux Plumbers Conference. For the discussion at hand, I think
that the most relevant part is the difficulty to assess CPU capacity
on Intel processors given the presence of SMT, a large range of turbo
frequencies and hardware-controlled frequency scaling.

We are looking into these challenges.

This patchset focuses mainly on the infrastructure to support IPC classes of
tasks in the scheduler. We use it on this tie break when using asym_packing,
but it could be used in capacity computations.

>
> And while looking through this, I must ask about the comment that goes
> with sched_set_itmt_core_prio() vs the sg->asym_prefer_cpu assignment in
> init_sched_groups_capacity(), what-up ?!

Are you referring to this comment?

"No need to rebuild sched domain after updating
the CPU priorities. The sched domains have no
dependency on CPU priorities"

If yes, then it looks wrong to me. Sched domains are rebuilt after updating
priorities.

Thanks and BR,
Ricardo