Re: [PATCH v5 2/5] sched/fair: Limited scan for idle cores when overloaded

From: Abel Wu
Date: Wed Sep 14 2022 - 23:09:00 EST


Hi Tim, thanks for your reviewing!

On 9/15/22 6:25 AM, Tim Chen wrote:
On Fri, 2022-09-09 at 13:53 +0800, Abel Wu wrote:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5af9bf246274..7abe188a1533 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6437,26 +6437,42 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
time = cpu_clock(this);
}
- if (sched_feat(SIS_UTIL) && !has_idle_core) {
+ if (sched_feat(SIS_UTIL)) {
sd_share = rcu_dereference(per_cpu(sd_llc_shared, target));
if (sd_share) {
/* because !--nr is the condition to stop scan */
nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
- /* overloaded LLC is unlikely to have idle cpu/core */
- if (nr == 1)
+
+ /*
+ * Overloaded LLC is unlikely to have idle cpus.
+ * But if has_idle_core hint is true, a limited
+ * speculative scan might help without incurring
+ * much overhead.
+ */
+ if (has_idle_core)
+ nr = nr > 1 ? INT_MAX : 3;

The choice of nr is a very abrupt function of utilization when has_idle_core==true,
it is either feast or famine. Why is such choice better than a smoother
reduction of nr vs utilization? I agree that we want to scan more aggressively than
!has_idle_core, but it is not obvious why the above work better, versus something
like nr = nr*2+1.
This has been discussed with Mel, and he suggested do simple things
first before scaling the depth.

https://lore.kernel.org/all/20220906095717.maao4qtel4fhbmfq@xxxxxxxxxxxxxxxxxxx/

Thanks and BR,
Abel