Re: [PATCH v4 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
From: Vincent Guittot
Date: Fri Aug 27 2021 - 11:17:37 EST
On Fri, 27 Aug 2021 at 16:50, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, Aug 27, 2021 at 12:13:42PM +0200, Vincent Guittot wrote:
> > > +/**
> > > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> > > + * @dst_cpu: Destination CPU of the load balancing
> > > + * @sds: Load-balancing data with statistics of the local group
> > > + * @sgs: Load-balancing statistics of the candidate busiest group
> > > + * @sg: The candidate busiet group
> > > + *
> > > + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> > > + * if @dst_cpu can pull tasks. If @dst_cpu does not have SMT siblings, it can
> > > + * pull tasks if two or more of the SMT siblings of @sg are busy. If only one
> > > + * CPU in @sg is busy, pull tasks only if @dst_cpu has higher priority.
> > > + *
> > > + * If both @dst_cpu and @sg have SMT siblings, even the number of idle CPUs
> > > + * between @sds::local and @sg. Thus, pull tasks from @sg if the difference
> > > + * between the number of busy CPUs is 2 or more. If the difference is of 1,
> > > + * only pull if @dst_cpu has higher priority. If @sg does not have SMT siblings
> > > + * only pull tasks if all of the SMT siblings of @dst_cpu are idle and @sg
> > > + * has lower priority.
> > > + */
> > > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> > > + struct sg_lb_stats *sgs,
> > > + struct sched_group *sg)
> > > +{
> > > +#ifdef CONFIG_SCHED_SMT
> > > + bool local_is_smt, sg_is_smt;
> > > + int sg_busy_cpus;
> > > +
> > > + local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > > + sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > > +
> > > + sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > > +
> > > + if (!local_is_smt) {
> > > + /*
> > > + * If we are here, @dst_cpu is idle and does not have SMT
> > > + * siblings. Pull tasks if candidate group has two or more
> > > + * busy CPUs.
> > > + */
> > > + if (sg_is_smt && sg_busy_cpus >= 2)
> > > + return true;
> > > +
> > > + /*
> > > + * @dst_cpu does not have SMT siblings. @sg may have SMT
> > > + * siblings and only one is busy. In such case, @dst_cpu
> > > + * can help if it has higher priority and is idle.
> > > + */
> > > + return !sds->local_stat.group_util &&
> >
> > sds->local_stat.group_util can't be used to decide if a CPU or group
> > of CPUs is idle. util_avg is usually not null when a CPU becomes idle
> > and you can have to wait more than 300ms before it becomes Null
> > At the opposite, the utilization of a CPU can be null but a task with
> > null utilization has just woken up on it.
> > Utilization is used to reflect the average work of the CPU or group of
> > CPUs but not the current state
>
> If you want immediate idle, sgs->nr_running == 0 or sgs->idle_cpus ==
> sgs->group_weight come to mind.
yes, I have the same in mind
>
> > > + sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > > + }
> > > +
> > > + /* @dst_cpu has SMT siblings. */
> > > +
> > > + if (sg_is_smt) {
> > > + int local_busy_cpus = sds->local->group_weight -
> > > + sds->local_stat.idle_cpus;
> > > + int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> > > +
> > > + /* Local can always help to even the number busy CPUs. */
> >
> > default behavior of the load balance already tries to even the number
a> > of idle CPUs.
>
> Right, but I suppose this is because we're trapped here and have to deal
> with the SMT-SMT case too. Ricardo, can you clarify?
IIUC, this function is used in sg_lb_stats to set
sgs->group_asym_packing which is then used to set the group state to
group_asym_packing and force asym migration.
But if we only want to even the number of busy CPUs between the group,
we should not need to set the group state to group_asym_packing
>
> > > + if (busy_cpus_delta >= 2)
> > > + return true;
> > > +
> > > + if (busy_cpus_delta == 1)
> > > + return sched_asym_prefer(dst_cpu,
> > > + sg->asym_prefer_cpu);
> > > +
> > > + return false;
> > > + }
> > > +
> > > + /*
> > > + * @sg does not have SMT siblings. Ensure that @sds::local does not end
> > > + * up with more than one busy SMT sibling and only pull tasks if there
> > > + * are not busy CPUs. As CPUs move in and out of idle state frequently,
> > > + * also check the group utilization to smoother the decision.
> > > + */
> > > + if (!sds->local_stat.group_util)
> >
> > same comment as above about the meaning of group_util == 0
> >
> > > + return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > > +
> > > + return false;
> > > +#else
> > > + /* Always return false so that callers deal with non-SMT cases. */
> > > + return false;
> > > +#endif
> > > +}
> > > +
> > > static inline bool
> > > sched_asym(struct lb_env *env, struct sd_lb_stats *sds, struct sg_lb_stats *sgs,
> > > struct sched_group *group)
> > > {
> > > + /* Only do SMT checks if either local or candidate have SMT siblings */
> > > + if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> > > + (group->flags & SD_SHARE_CPUCAPACITY))
> > > + return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
> > > +
> > > return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
> > > }