Re: [PATCH v2 04/23] sched/cache: Make LLC id continuous

From: Tim Chen
Date: Tue Dec 16 2025 - 14:53:35 EST


On Tue, 2025-12-16 at 13:31 +0800, Chen, Yu C wrote:
> On 12/16/2025 4:49 AM, Tim Chen wrote:
> > On Tue, 2025-12-09 at 12:58 +0100, Peter Zijlstra wrote:
> > > On Wed, Dec 03, 2025 at 03:07:23PM -0800, Tim Chen wrote:
> > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index 710ed9943d27..0a3918269906 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -1210,10 +1210,17 @@ __read_mostly unsigned int llc_imb_pct = 20;
> > > >
> > > > static int llc_id(int cpu)
> > > > {
> > > > + int llc;
> > > > +
> > > > if (cpu < 0)
> > > > return -1;
> > > >
> > > > + llc = per_cpu(sd_llc_id, cpu);
> > > > + /* avoid race with cpu hotplug */
> > > > + if (unlikely(llc >= max_llcs))
> > > > + return -1;
> > > > +
> > > > + return llc;
> > > > }
> > > >
> > > > void mm_init_sched(struct mm_struct *mm, struct mm_sched __percpu *_pcpu_sched)
> > >
> > > > @@ -668,6 +670,55 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
> > > > DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
> > > > DEFINE_STATIC_KEY_FALSE(sched_cluster_active);
> > > >
> > > > +/*
> > > > + * Assign continuous llc id for the CPU, and return
> > > > + * the assigned llc id.
> > > > + */
> > > > +static int update_llc_id(struct sched_domain *sd,
> > > > + int cpu)
> > > > +{
> > > > + int id = per_cpu(sd_llc_id, cpu), i;
> > > > +
> > > > + if (id >= 0)
> > > > + return id;
> > > > +
> > > > + if (sd) {
> > > > + /* Look for any assigned id and reuse it.*/
> > > > + for_each_cpu(i, sched_domain_span(sd)) {
> > > > + id = per_cpu(sd_llc_id, i);
> > > > +
> > > > + if (id >= 0) {
> > > > + per_cpu(sd_llc_id, cpu) = id;
> > > > + return id;
> > > > + }
> > > > + }
> > > > + }
> > > > +
> > > > + /*
> > > > + * When 1. there is no id assigned to this LLC domain,
> > > > + * or 2. the sd is NULL, we reach here.
> > > > + * Consider the following scenario,
> > > > + * CPU0~CPU95 are in the node0, CPU96~CPU191 are
> > > > + * in the node1. During bootup, maxcpus=96 is
> > > > + * appended.
> > > > + * case 1: When running cpu_attach_domain(CPU24)
> > > > + * during boot up, CPU24 is the first CPU in its
> > > > + * non-NULL LLC domain. However,
> > > > + * its corresponding llc id has not been assigned yet.
> > > > + *
> > > > + * case 2: After boot up, the CPU100 is brought up
> > > > + * via sysfs manually. As a result, CPU100 has only a
> > > > + * Numa domain attached, because CPU100 is the only CPU
> > > > + * of a sched domain, all its bottom domains are degenerated.
> > > > + * The LLC domain pointer sd is NULL for CPU100.
> > > > + *
> > > > + * For both cases, we want to increase the number of LLCs.
> > > > + */
> > > > + per_cpu(sd_llc_id, cpu) = max_llcs++;
> > > > +
> > > > + return per_cpu(sd_llc_id, cpu);
> > > > +}
> > >
> > > I'm not sure I follow. So partition_sched_domains() first calls
> > > detach_destroy_domains() on the old set, and then build_sched_domains()
> > > on the new set.
> > >
> > > Do detach_destroy_domain() will do:
> > >
> > > cpu_attach_domain(NULL,..);
> > >
> > > That is, it will explicitly attach the NULL sched_domain to a CPU. At
> > > which point I feel update_llc_id() should be returning -1, no?
> > >
> > > Then later, build_sched_domains() will set a !NULL sched_domain, at
> > > which point update_llc_id() can set a real value.
> > >
> > > This should then also get rid of that weird max_llcs check in llc_id(),
> > > right?
>
> The check for max_llcs was intended to prevent out-of-bounds access
> to rq->nr_pref_llc[] at multiple points in the code.
> Since dst_llc = llc_id(env->dst_cpu); — and while the LLC ID for the
> CPU is updated in update_llc_id(), this update occurs before we reallocate
> the nr_pref_llc buffer — dst_llc may end up exceeding the bounds of the
> original nr_pref_llc buffer.
>
> For this reason, we added a check if (dst_llc > max_llc) in llc_id()
> when attempting to access rq->nr_pref_llc[dst_llc].
>
> However, I agree that the max_llc check seems to not properly integrated
> into the current patch: it should instead be placed in the 7th patch, as
> this would better illustrate the rationale for the max_llc check here:
> sched/cache: Introduce per runqueue task LLC preference counter
>
> In the 7th patch, we actually increment new_max_llcs rather than
> max_llcs — meaning max_llcs always represents the "old" number of LLCs.
> As a result, there is a race window between extending the rq->nr_pref_llc
> buffer and updating max_llcs.
>
>
> @@ -714,7 +827,7 @@ static int update_llc_id(struct sched_domain *sd,
> *
> * For both cases, we want to increase the number of LLCs.
> */
> - per_cpu(sd_llc_id, cpu) = max_llcs++;
> + per_cpu(sd_llc_id, cpu) = new_max_llcs++;
>
> return per_cpu(sd_llc_id, cpu);
> }
>
>
> > Thanks for pointing this out. Yes, we should take care of the
> > attachment of NULL sd. Will update the code accordingly.
> >
>
> My understanding is that, if the sd is NULL, it is either because invoked
> by detach_destroy_domain() for the old set, or by case 2 mentioned in
> above comments:
> Say, CPU0-CPU95 are online during bootup, the boot command line is
> maxcpus=96.
> Later after bootup, the user wants to bring up CPU100, the LLC domain for
> CPU100 is NULL in this case(due to sd generation), and a new LLC should be
> detected.
>
> That is to say, when we reach update_llc_id(), there could be 2 reasons
> for NULL sd. For the detach_destroy_domain() case, update_llc_id()
> should return a valid id without increasing the max_llcs, because of
> if (id >= 0)
> return id;
> And for the latter, the max_llcs should be increased.
> Let me double check on this.

The issue is we could offline all CPUs in a LLC and online them later.
In the current code, we will assign their ids all to -1. So on attach
of CPUs again, we'll be assigning a new LLC. I think the proper thing
to do is not to assign llc id of the offlined cpu (the case where sd == NULL)
and keep the original llc id assigned. Then we should be okay and not
increase max_llcs.

Tim

>
> thanks,
> Chenyu
>
>
> > Tim