Re: [PATCH v3 15/21] sched/cache: Disable cache aware scheduling for processes with high thread counts

From: Chen, Yu C

Date: Thu Feb 19 2026 - 09:40:11 EST

Hi Vineeth,

On 2/19/2026 10:28 AM, Madadi Vineeth Reddy wrote:

On 19/02/26 03:14, Tim Chen wrote:

On Wed, 2026-02-18 at 23:24 +0530, Madadi Vineeth Reddy wrote:

On 11/02/26 03:48, Tim Chen wrote:

From: Chen Yu <yu.c.chen@xxxxxxxxx>

[ .. snip ..]

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d1145997b88d..86b6b08e7e1e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1223,6 +1223,19 @@ static inline bool valid_llc_buf(struct sched_domain *sd,
return valid_llc_id(id);
}
+static bool exceed_llc_nr(struct mm_struct *mm, int cpu)
+{
+ int smt_nr = 1;
+
+#ifdef CONFIG_SCHED_SMT
+ if (sched_smt_active())
+ smt_nr = cpumask_weight(cpu_smt_mask(cpu));
+#endif
+
+ return !fits_capacity((mm->sc_stat.nr_running_avg * smt_nr),
+ per_cpu(sd_llc_size, cpu));

On Power10/Power11 with SMT4 and LLC size of 4, this check
effectively disables cache-aware scheduling for any process.

There are 4 cores per LLC, with 4 SMT per core? In that case, once we have more than
4 running threads and there's another idle LLC available, seems
like putting the additional thread on a different LLC is the
right thing to do as threads sharing a core will usually be much
slower.

But when number of threads are under 4, we should still be
doing aggregation.

Perhaps I am misunderstanding your topology.

There is only one core per LLC whose size is 4 CPUs.
So, mm->sc_stat.nr_running_avg can't be >= 1 for
cache aware scheduling to be enabled.

There is a scale factor in the final step that can be tuned by
the user space:
exceeded = !fits_capacity((mm->sc_stat.nr_running_avg * smt_nr),
(scale * per_cpu(sd_llc_size, cpu)));

So if the user increases the llc_aggr_tolerance via debugfs,
the cache aware aggregation is still enabled. Or do you suggest
to tune the nr_running check and the RSS check via different
debugfs knobs?

thanks,
Chenyu