RE: [PATCH v2 1/4] sched/rt: Optimize cpupri_vec layout to mitigate cache line contention

From: Deng, Pan

Date: Thu Apr 02 2026 - 06:38:15 EST


> + atomic_t counts[CPUPRI_COUNT_ARRAY_SIZE];
> +
> + /*
> + * Padding to separate count and mask vectors.
> + *
> + * Prevents false sharing between:
> + * - counts[] (read-write, hot path in cpupri_set)
> + * - masks[] (read-mostly, accessed in cpupri_find)
> + */
> + char padding[CPUPRI_VEC_PADDING];
> +
> + /*
> + * CPU mask vector.
> + *
> + * Either stores:
> + * - Pointers to dynamically allocated cpumasks (read-mostly after init)
> + * - Inline cpumasks (if !CPUMASK_OFFSTACK)
> + */
> + cpumask_var_t masks[CPUPRI_NR_PRIORITIES];
> };
>
> struct cpupri {
> - struct cpupri_vec pri_to_cpu[CPUPRI_NR_PRIORITIES];
> + /*
> + * Priority-to-CPU mapping.
> + *
> + * Single cpupri_vec structure containing all counts and masks,
> + * rather than 101 separate cpupri_vec elements. This reduces
> + * memory overhead from ~26 to ~21 cachelines.
> + */
> + struct cpupri_vec pri_to_cpu;
> int *cpu_to_pri;
> };
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 475bb5998295..2263237cdeb0 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1014,7 +1014,7 @@ struct root_domain {
> * one runnable RT task.
> */
> cpumask_var_t rto_mask;
> - struct cpupri cpupri;
> + struct cpupri cpupri ____cacheline_aligned;
>
> /*
> * NULL-terminated list of performance domains intersecting with the

Peter and Steven,

Here we consider two approaches:
The cache-line alignment approach is simple to implement but increases
memory usage.
The alternative approach (separating counts and masks, with padding after
counts[0]) reduces memory footprint at the cost of slightly higher complexity.
What is your opinion? thanks a lot!