Re: [PATCH v3 04/21] sched/cache: Make LLC id continuous

From: Tim Chen

Date: Tue Feb 17 2026 - 18:12:56 EST


On Tue, 2026-02-17 at 13:39 +0530, K Prateek Nayak wrote:
> Hello Chenyu,
>
>

[...snip...]


> > > >    */
> > > >   DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc);
> > > >   DEFINE_PER_CPU(int, sd_llc_size);
> > > > -DEFINE_PER_CPU(int, sd_llc_id);
> > > > +DEFINE_PER_CPU(int, sd_llc_id) = -1;
> > > >   DEFINE_PER_CPU(int, sd_share_id);
> > > >   DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
> > > >   DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
> > > > @@ -684,7 +685,6 @@ static void update_top_cache_domain(int cpu)
> > > >         rcu_assign_pointer(per_cpu(sd_llc, cpu), sd);
> > > >       per_cpu(sd_llc_size, cpu) = size;
> > > > -    per_cpu(sd_llc_id, cpu) = id;
> > > >       rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds);
> > > >         sd = lowest_flag_domain(cpu, SD_CLUSTER);
> > > > @@ -2567,10 +2567,18 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
> > > >         /* Set up domains for CPUs specified by the cpu_map: */
> > > >       for_each_cpu(i, cpu_map) {
> > > > -        struct sched_domain_topology_level *tl;
> > > > +        struct sched_domain_topology_level *tl, *tl_llc = NULL;
> > > > +        int lid;
> > > >             sd = NULL;
> > > >           for_each_sd_topology(tl) {
> > > > +            int flags = 0;
> > > > +
> > > > +            if (tl->sd_flags)
> > > > +                flags = (*tl->sd_flags)();
> > > > +
> > > > +            if (flags & SD_SHARE_LLC)
> > > > +                tl_llc = tl;
> > >
> > > nit. This loop breaks out when sched_domain_span(sd) covers the entire
> > > cpu_map and it might have not reached the topmost SD_SHARE_LLC domain
> > > yet. Is that cause for any concern?
> > >
> >
> > Could you please elaborate a little more on this? If it covers the
> > entire cpu_map shouldn't it stop going up to its parent domain?
> > Do you mean, sd_llc_1 and its parent sd_llc_2 could cover the same cpu_map,
> > and we should let tl_llc to assigned to sd_llc_2 (sd_llc_1 be degenerated? )
>
> I'm not sure if this is technically possible but assume following
> topology:
>
> [ LLC: 8-15 ]
> [ SMT: 8,9 ][ SMT: 10,11 ] ... [ SMT: 14,15 ]
>
> and the following series of events:
>
> o All CPUs in LLC are offline to begin with (maxcpus = 1 like scenario).
>
> o CPUs 10-15 are onlined first.
>
> o CPU8 is put in a separate root partition and brought online.
> (XXX: I'm not 100% sure if this is possible in this order)
>
> o build_sched_domains() will bail out at SMT domain since the cpumap
> is covered by tl->mask() and tl_llc = tl_smt.
>
> o llc_id calculation uses the tl_smt->mask() which will not contain
> CPUs 10-15 and CPU8 will get a unique LLC id even though there are
> other online CPUs in the LLC with a different llc_id (!!!)
>
>
> Instead, if we traversed to tl_mc, we would have seen all the online
> CPUs in the MC and reused the llc_id from them. Might not be an issue on
> its own but if this root partition is removed later, CPU8 will continue
> to have the unique llc_id even after merging into the same MC domain.

There is really no reason to reuse the llc_id as far as cache aware scheduling
goes in its v3 revision (see my reply to Madadi on this patch).  

I am thinking that if we just simply rebuild LLC id across sched domain
rebuilds, that is probably the cleanest solution. There could be some races
in cpus_share_cache() as llc_id gets reassigned for some CPUs when they
come online/offline. But we also having similar races in current mainline code.
Worst it can do is some temporary sub-optimal scheduling task placement.

Thoughts?

Tim

>
> [..snip..]
>
> > >
> > > It doesn't compact tl_max_llcs, but it should promote reuse of llc_id if
> > > all CPUs of a LLC go offline. I know it is a ridiculous scenario but it
> > > is possible nonetheless.
> > >
> > > I'll let Peter and Valentin be the judge of additional space and
> > > complexity needed for these bits :-)
> > >
> >
> > Smart approach! Dynamically reallocating the llc_id should be feasible,
> > as it releases the llc_id when the last CPU of that LLC is offlined. My
> > only concern is data synchronization issues arising from the reuse of
> > llc_id during load balancing - I’ll audit the logic to check for any race
> > conditions. Alternatively, what if we introduce a tl->static_mask? It would
> > be similar to tl->mask, but would not remove CPUs from static_mask when they
> > are offlined. This way, we can always find and reuse the llc_id of CPUs in
> > that LLC (even if all CPUs in the LLC have been offlined at some point,
> > provided they were once online), and we would thus maintain a static llc_id.
>
> That is possible but it would require a larger arch/ wide audit to add
> support for. Might be less complex to handle in the generic layer but
> again I'll let Peter and Valentin comment on this part :-)
>
> >
> > Anyway, let do some testings on your proposal as well as static_mask things,
> > and I'll reply to this thread later. Thanks for the insights!
>
> Thanks a ton! Much appreciated.