Re: [PATCH] sched/fair: Optimize select_idle_cpu
From: Peter Zijlstra
Date: Fri Dec 13 2019 - 07:09:45 EST
On Fri, Dec 13, 2019 at 11:28:00AM +0000, Valentin Schneider wrote:
> On 13/12/2019 09:57, chengjian (D) wrote:
> >
> > in select_idle_smt()
> >
> > /*
> > * Scan the local SMT mask for idle CPUs.
> > */
> > static int select_idle_smt(struct task_struct *p, int target)
> > {
> > int cpu, si_cpu = -1;
> >
> > if (!static_branch_likely(&sched_smt_present))
> > return -1;
> >
> > for_each_cpu(cpu, cpu_smt_mask(target)) {
> > if (!cpumask_test_cpu(cpu, p->cpus_ptr))
> > continue;
> > if (available_idle_cpu(cpu))
> > return cpu;
> > if (si_cpu == -1 && sched_idle_cpu(cpu))
> > si_cpu = cpu;
> > }
> >
> > return si_cpu;
> > }
> >
> >
> > Why don't we do the same thing in this function,
> >
> > although cpu_smt_present () often has few CPUs.
> >
> > it is better to determine the 'p->cpus_ptr' first.
> >
> >
>
> Like you said the gains here would probably be small - the highest SMT
> count I'm aware of is SMT8 (POWER9). Still, if we end up with both
> select_idle_core() and select_idle_cpu() using that pattern, it would make
> sense IMO to align select_idle_smt() with those.
The cpumask_and() operation added would also have cost. I really don't
see that paying off.
The other sites have the problem that we combine an iteration limit with
affinity constraints. This loop doesn't do that and therefore doesn't
suffer the problem.