Re: [PATCH v2 04/23] sched/cache: Make LLC id continuous

From: Chen, Yu C
Date: Wed Dec 24 2025 - 04:56:45 EST


On 12/24/2025 4:19 PM, K Prateek Nayak wrote:
Hello Chenyu,

On 12/24/2025 12:38 PM, Chen, Yu C wrote:
Hello Prateek,

On 12/23/2025 1:31 PM, K Prateek Nayak wrote:
Hello Tim, Chenyu,

On 12/4/2025 4:37 AM, Tim Chen wrote:

[snip]

I'm OK with replacing the domain based cpumask by the topology_level
mask, just wondering whether re-using the llc_id would increase
the risk of race condition - it is possible that, a CPU has different
llc_ids before/after online/offline. Can we assign/reserve a "static"
llc_id for each CPU, whether it is online or offline? In this way,
we don't need to worry about the data synchronization when using
llc_id(). For example, I can think of adjusting the data in
percpu nr_pref_llc[max_llcs] on every CPU whenever a CPU gets
offline/online.

So I was thinking of of expanding the rq->nr_pref_llc[] if the
max_llc increases but leave it as is if the number of LLCs
decreases. That way we don't have to worry about the
dereferencing past the array boundary.


Sure, we can do in this way.

We can also have a wrapper like:

struct nr_llc_stats {
int nr_llcs;
struct rcu_head rcu;
int *nr_pref_llc;
}

And re-allocate and attach it in rq_attach_root() during sd
rebuild. That way, RCU read-side can always grab a reference to
it, enqueue / dequeue don't need to care since it cannot change
under rq_lock, and partition can use call_rcu() to free the old
ones up.


OK, doing it in this direction(and Peter also suggested something like this
in the domain)


          cpuset_update_active_cpus();
      } else {

[snip]

AFAICT, "sd_llc_id" isn't compared across different partitions so having
the CPUs that are actually associated with same physical LLC but across
different partitions sharing the same "sd_llc_id" shouldn't be a problem.

Thoughts?


This means cpus_share_resources(int this_cpu, int that_cpu)

Actually I was about to say cpus_share_cache().

 should be invoked when this_cpu and that_cpu belong to the same partition.
In this way, we do not alter the context of cpus_share_resources(). We can
conduct an audit of the places where cpus_share_resources() is used.

Only case I can think of is a task wakes up after partitioning
and it's wake cpu from a different partition is mistaken to
share the LLC as the current CPU - but the task cannot actually
run on that old CPU and it'll have to take the
select_fallback_rq() path if prev_cpu was selected during
wake_affine().


OK, make sense.
Actually, prev_cpu might not be chosen by select_task_rq_fair()->
select_idle_sibling(), because fast path select_idle_sibling()
is expected to be triggered when prev_cpu and the current cpu are in the
same domain in select_task_rq_fair():
cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))
sd = NULL; //wake affine
curr cpu and prev_cpu are in different partitions, they
are not in the same domains.

I don't think it will be such a common occurence to cause an
issue and even without that wake_affine() could still the
prev_cpu if current CPU is busy or via wake_affine_weight().


I realized that sched_cache has added cpus_share_cache() in
several places, most of which should be related to load
balancing, which should not be a problem if llc_id is shared
among partitions. I'll double check.

thanks,
Chenyu