Re: [PATCH 2/3] sched/fair: Generalize misfit lb by adding a misfit reason

From: Qais Yousef
Date: Thu Aug 01 2024 - 08:21:33 EST


On 07/29/24 18:47, Xuewen Yan wrote:
> Hi Qais
>
> On Thu, Jul 25, 2024 at 5:35 AM Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> >
> > Hi Xuewen
> >
> > On 07/17/24 16:26, Xuewen Yan wrote:
> > > Hi Qais
> > >
> > > On Sat, Dec 9, 2023 at 9:19 AM Qais Yousef <qyousef@xxxxxxxxxxx> wrote:
> >
> > > > @@ -11008,6 +11025,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> > > > * average load.
> > > > */
> > > > if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
> > > > + rq->misfit_reason == MISFIT_PERF &&
> > >
> > > In Android, I found this would cause a task loop to change the CPUs.
> > > Maybe this should be removed. Because for the same capacity cpus, we
> > > should skip this cpu when nr_running=1.
> >
> > Could you explain a bit more? Are you saying this is changing the behavior for
> > some use case? The check will ensure this path is only triggered for misfit
> > upmigration. Which AFAICT the only reason why this path was added.
> >
> > The problem is that to implement another misfit reason, the check for
> > capacity_greater() is not true except for MISFIT_PERF. For MISFIT_POWER, we
> > want the CPU to be smaller.
>
> Sorry, it was my mistake.

Np, it's always good to hear back in case there's a problem :)

> After debugging, I found that there was a problem with my handling of
> MISFIT_PERF.
> But it is true that due to the influence of rt and irq load,
> capacity_greater() sometimes does cause some confusion.
> Sometimes we find that due to the different capacities between small
> cores, a misfit task will migrate several times between small cores,
> for example:
> If capacity_cpu3 > capacity_cpu2 > capacity_cpu1 >capacity_cpu0,
> the misfit task may migrate as follows: cpu0->cpu1->cpu2->cpu3.
> I don't know if this migration is really necessary, but it does cause
> me some confusion.

It should be cheap in theory.

But have you verified that the load_balance type is misfit and not load balance
trying to distribute load on little cores? I think it is harmless if it is
caused by misfit, but yes looks unnecessary to me too.

I'd love to remove this 5% magic margin, but I have no idea how yet.