Re: [PATCH v2 04/23] sched/cache: Make LLC id continuous

From: Chen, Yu C
Date: Tue Dec 16 2025 - 00:34:08 EST


On 12/16/2025 4:49 AM, Tim Chen wrote:
On Tue, 2025-12-09 at 12:58 +0100, Peter Zijlstra wrote:
On Wed, Dec 03, 2025 at 03:07:23PM -0800, Tim Chen wrote:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 710ed9943d27..0a3918269906 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1210,10 +1210,17 @@ __read_mostly unsigned int llc_imb_pct = 20;
static int llc_id(int cpu)
{
+ int llc;
+
if (cpu < 0)
return -1;
+ llc = per_cpu(sd_llc_id, cpu);
+ /* avoid race with cpu hotplug */
+ if (unlikely(llc >= max_llcs))
+ return -1;
+
+ return llc;
}
void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_sched)

@@ -668,6 +670,55 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
DEFINE_STATIC_KEY_FALSE(sched_cluster_active);
+/*
+ * Assign continuous llc id for the CPU, and return
+ * the assigned llc id.
+ */
+static int update_llc_id(struct sched_domain *sd,
+ int cpu)
+{
+ int id = per_cpu(sd_llc_id, cpu), i;
+
+ if (id >= 0)
+ return id;
+
+ if (sd) {
+ /* Look for any assigned id and reuse it.*/
+ for_each_cpu(i, sched_domain_span(sd)) {
+ id = per_cpu(sd_llc_id, i);
+
+ if (id >= 0) {
+ per_cpu(sd_llc_id, cpu) = id;
+ return id;
+ }
+ }
+ }
+
+ /*
+ * When 1. there is no id assigned to this LLC domain,
+ * or 2. the sd is NULL, we reach here.
+ * Consider the following scenario,
+ * CPU0~CPU95 are in the node0, CPU96~CPU191 are
+ * in the node1. During bootup, maxcpus=96 is
+ * appended.
+ * case 1: When running cpu_attach_domain(CPU24)
+ * during boot up, CPU24 is the first CPU in its
+ * non-NULL LLC domain. However,
+ * its corresponding llc id has not been assigned yet.
+ *
+ * case 2: After boot up, the CPU100 is brought up
+ * via sysfs manually. As a result, CPU100 has only a
+ * Numa domain attached, because CPU100 is the only CPU
+ * of a sched domain, all its bottom domains are degenerated.
+ * The LLC domain pointer sd is NULL for CPU100.
+ *
+ * For both cases, we want to increase the number of LLCs.
+ */
+ per_cpu(sd_llc_id, cpu) = max_llcs++;
+
+ return per_cpu(sd_llc_id, cpu);
+}

I'm not sure I follow. So partition_sched_domains() first calls
detach_destroy_domains() on the old set, and then build_sched_domains()
on the new set.

Do detach_destroy_domain() will do:

cpu_attach_domain(NULL,..);

That is, it will explicitly attach the NULL sched_domain to a CPU. At
which point I feel update_llc_id() should be returning -1, no?

Then later, build_sched_domains() will set a !NULL sched_domain, at
which point update_llc_id() can set a real value.

This should then also get rid of that weird max_llcs check in llc_id(),
right?

The check for max_llcs was intended to prevent out-of-bounds access
to rq->nr_pref_llc[] at multiple points in the code.
Since dst_llc = llc_id(env->dst_cpu); — and while the LLC ID for the
CPU is updated in update_llc_id(), this update occurs before we reallocate
the nr_pref_llc buffer — dst_llc may end up exceeding the bounds of the
original nr_pref_llc buffer.

For this reason, we added a check if (dst_llc > max_llc) in llc_id()
when attempting to access rq->nr_pref_llc[dst_llc].

However, I agree that the max_llc check seems to not properly integrated
into the current patch: it should instead be placed in the 7th patch, as
this would better illustrate the rationale for the max_llc check here:
sched/cache: Introduce per runqueue task LLC preference counter

In the 7th patch, we actually increment new_max_llcs rather than
max_llcs — meaning max_llcs always represents the "old" number of LLCs.
As a result, there is a race window between extending the rq->nr_pref_llc
buffer and updating max_llcs.


@@ -714,7 +827,7 @@ static int update_llc_id(struct sched_domain *sd,
*
* For both cases, we want to increase the number of LLCs.
*/
- per_cpu(sd_llc_id, cpu) = max_llcs++;
+ per_cpu(sd_llc_id, cpu) = new_max_llcs++;

return per_cpu(sd_llc_id, cpu);
}


Thanks for pointing this out. Yes, we should take care of the
attachment of NULL sd. Will update the code accordingly.


My understanding is that, if the sd is NULL, it is either because invoked
by detach_destroy_domain() for the old set, or by case 2 mentioned in above comments:
Say, CPU0-CPU95 are online during bootup, the boot command line is maxcpus=96.
Later after bootup, the user wants to bring up CPU100, the LLC domain for
CPU100 is NULL in this case(due to sd generation), and a new LLC should be
detected.

That is to say, when we reach update_llc_id(), there could be 2 reasons
for NULL sd. For the detach_destroy_domain() case, update_llc_id()
should return a valid id without increasing the max_llcs, because of
if (id >= 0)
return id;
And for the latter, the max_llcs should be increased.
Let me double check on this.

thanks,
Chenyu


Tim