Re: [PATCH v2 02/23] sched/cache: Record per-LLC utilization to guide cache-aware scheduling decisions

From: Chen, Yu C
Date: Wed Dec 10 2025 - 18:58:37 EST


On 12/11/2025 12:13 AM, Peter Zijlstra wrote:
On Wed, Dec 10, 2025 at 11:02:39PM +0900, Chen, Yu C wrote:
On 12/9/2025 8:21 PM, Peter Zijlstra wrote:
On Wed, Dec 03, 2025 at 03:07:21PM -0800, Tim Chen wrote:

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index bbcfdf12aa6e..0ba4697d74ba 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -68,6 +68,10 @@ struct sched_domain_shared {
atomic_t nr_busy_cpus;
int has_idle_cores;
int nr_idle_scan;
+#ifdef CONFIG_SCHED_CACHE
+ unsigned long util_avg;
+ unsigned long capacity ____cacheline_aligned_in_smp;

This cacheline annotation confuses me, see below.

+#endif
};
struct sched_domain {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cb82f558dc5b..b9f336300f14 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9622,6 +9622,29 @@ static inline int task_is_ineligible_on_dst_cpu(struct task_struct *p, int dest_
return 0;
}
+#ifdef CONFIG_SCHED_CACHE
+/* Called from load balancing paths with rcu_read_lock held */
+static __maybe_unused bool get_llc_stats(int cpu, unsigned long *util,
+ unsigned long *cap)
+{
+ struct sched_domain_shared *sd_share;
+
+ sd_share = rcu_dereference(per_cpu(sd_llc_shared, cpu));
+ if (!sd_share)
+ return false;
+
+ *util = READ_ONCE(sd_share->util_avg);
+ *cap = READ_ONCE(sd_share->capacity);

You placed capacity on a separate line, forcing the above to be 2
distinct lines. That seems... sub-optimal?


The reason capacity was placed in a separate cache line
is that writes to capacity are not very frequent(cpu hotplug
should not happen too-frequently), while writes to util_avg
tend to be relatively frequent(changes frequently).
If capacity and util_avg were placed in the same cache line,
I’m thinking writes to util_avg might invalidate the entire
cache line. This could potentially cause cache misses when
capacity is read elsewhere, which might lead to false sharing?

But its introduced here and already read/written together. Is this not
premature optimization?

I see, since they are read together, there could be a pre-load of
adjacent cachelines I suppose, I'll remove this cache alignment and
check if there is any performance impact.

Thanks,
Chenyu