Re: [PATCH v5 2/5] sched/fair: Limited scan for idle cores when overloaded

From: Tim Chen
Date: Wed Sep 14 2022 - 18:25:33 EST


On Fri, 2022-09-09 at 13:53 +0800, Abel Wu wrote:
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5af9bf246274..7abe188a1533 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6437,26 +6437,42 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> time = cpu_clock(this);
> }
>
> - if (sched_feat(SIS_UTIL) && !has_idle_core) {
> + if (sched_feat(SIS_UTIL)) {
> sd_share = rcu_dereference(per_cpu(sd_llc_shared, target));
> if (sd_share) {
> /* because !--nr is the condition to stop scan */
> nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
> - /* overloaded LLC is unlikely to have idle cpu/core */
> - if (nr == 1)
> +
> + /*
> + * Overloaded LLC is unlikely to have idle cpus.
> + * But if has_idle_core hint is true, a limited
> + * speculative scan might help without incurring
> + * much overhead.
> + */
> + if (has_idle_core)
> + nr = nr > 1 ? INT_MAX : 3;

The choice of nr is a very abrupt function of utilization when has_idle_core==true,
it is either feast or famine. Why is such choice better than a smoother
reduction of nr vs utilization? I agree that we want to scan more aggressively than
!has_idle_core, but it is not obvious why the above work better, versus something
like nr = nr*2+1.

Tim

> + else if (nr == 1)
> return -1;
> }
> }
>
> for_each_cpu_wrap(cpu, cpus, target + 1) {
> + /*
> + * This might get the has_idle_cores hint cleared for a
> + * partial scan for idle cores but the hint is probably
> + * wrong anyway. What more important is that not clearing
> + * the hint may result in excessive partial scan for idle
> + * cores introducing innegligible overhead.
> + */
> + if (!--nr)
> + break;
> +
> if (has_idle_core) {
> i = select_idle_core(p, cpu, cpus, &idle_cpu);
> if ((unsigned int)i < nr_cpumask_bits)
> return i;
>
> } else {
> - if (!--nr)
> - return -1;
> idle_cpu = __select_idle_cpu(cpu, p);
> if ((unsigned int)idle_cpu < nr_cpumask_bits)
> break;