Re: [PATCH v3 03/21] sched/cache: Introduce helper functions to enforce LLC migration policy

From: Madadi Vineeth Reddy

Date: Sat Feb 14 2026 - 11:14:40 EST


On 11/02/26 03:48, Tim Chen wrote:
> From: Chen Yu <yu.c.chen@xxxxxxxxx>
>
> Cache-aware scheduling aggregates threads onto their preferred LLC,
> mainly through load balancing. When the preferred LLC becomes
> saturated, more threads are still placed there, increasing latency.
> A mechanism is needed to limit aggregation so that the preferred LLC
> does not become overloaded.
>
> Introduce helper functions can_migrate_llc() and
> can_migrate_llc_task() to enforce the LLC migration policy:
>
> 1. Aggregate a task to its preferred LLC if both source and
> destination LLCs are not too busy, or if doing so will not
> leave the preferred LLC much more imbalanced than the
> non-preferred one (>20% utilization difference, a little
> higher than imbalance_pct(17%) of the LLC domain as hysteresis).
> 2. Allow moving a task from overloaded preferred LLC to a non
> preferred LLC if this will not cause the non preferred LLC
> to become too imbalanced to cause a later migration back.
> 3. If both LLCs are too busy, let the generic load balance to
> spread the tasks.
>
> Further (hysteresis)action could be taken in the future to prevent tasks
> from being migrated into and out of the preferred LLC frequently (back and
> forth): the threshold for migrating a task out of its preferred LLC should
> be higher than that for migrating it into the LLC.
>
> Co-developed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx>
> ---
>
> Notes:
> v2->v3:
> No change.
>
> kernel/sched/fair.c | 153 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 153 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index dfeb107f2cfd..bf5f39a01017 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9689,6 +9689,27 @@ static inline int task_is_ineligible_on_dst_cpu(struct task_struct *p, int dest_
> }
>
> #ifdef CONFIG_SCHED_CACHE
> +/*
> + * The margin used when comparing LLC utilization with CPU capacity.
> + * It determines the LLC load level where active LLC aggregation is
> + * done.
> + * Derived from fits_capacity().
> + *
> + * (default: ~50%)
> + */
> +#define fits_llc_capacity(util, max) \
> + ((util) * 2 < (max))
> +
> +/*
> + * The margin used when comparing utilization.
> + * is 'util1' noticeably greater than 'util2'
> + * Derived from capacity_greater().
> + * Bias is in perentage.
> + */
> +/* Allows dst util to be bigger than src util by up to bias percent */
> +#define util_greater(util1, util2) \
> + ((util1) * 100 > (util2) * 120)
> +
> /* Called from load balancing paths with rcu_read_lock held */
> static __maybe_unused bool get_llc_stats(int cpu, unsigned long *util,
> unsigned long *cap)
> @@ -9704,6 +9725,138 @@ static __maybe_unused bool get_llc_stats(int cpu, unsigned long *util,
>
> return true;
> }
> +
> +/*
> + * Decision matrix according to the LLC utilization. To
> + * decide whether we can do task aggregation across LLC.
> + *
> + * By default, 50% is the threshold for treating the LLC
> + * as busy. The reason for choosing 50% is to avoid saturation
> + * of SMT-2, and it is also a safe cutoff for other SMT-n
> + * platforms.
> + *
> + * 20% is the utilization imbalance percentage to decide
> + * if the preferred LLC is busier than the non-preferred LLC.
> + * 20 is a little higher than the LLC domain's imbalance_pct
> + * 17. The hysteresis is used to avoid task bouncing between the
> + * preferred LLC and the non-preferred LLC.
> + *
> + * 1. moving towards the preferred LLC, dst is the preferred
> + * LLC, src is not.
> + *
> + * src \ dst 30% 40% 50% 60%
> + * 30% Y Y Y N
> + * 40% Y Y Y Y
> + * 50% Y Y G G
> + * 60% Y Y G G
> + *

According to this matrix (which I assume shows utilization after migration),
G is expected for src=50% and dst=50%. However, the code performs the "both
busy" check before adjusting src_util and dst_util:

if (!fits_llc_capacity(dst_util, dst_cap) &&
!fits_llc_capacity(src_util, src_cap))
return mig_unrestricted;

src_util = src_util - tsk_util;
dst_util = dst_util + tsk_util;

For example, with a 10% task migrating from src_util=60% to dst_util=40%:

The check evaluates: !fits(40) && !fits(60) = false && true = false
- Doesn't return mig_unrestricted
- After adjustment: src=50%, dst=50%
- Falls through to return mig_llc (Y)

But the matrix indicates 50%/50% should be G, not Y.

Moving this check after the utilization adjustment would make it consistent
with the documented matrix.

Thanks,
Vineeth

> + * 2. moving out of the preferred LLC, src is the preferred
> + * LLC, dst is not:
> + *
> + * src \ dst 30% 40% 50% 60%
> + * 30% N N N N
> + * 40% N N N N
> + * 50% N N G G
> + * 60% Y N G G
> + *
> + * src : src_util
> + * dst : dst_util
> + * Y : Yes, migrate
> + * N : No, do not migrate
> + * G : let the Generic load balance to even the load.
> + *
> + * The intention is that if both LLCs are quite busy, cache aware
> + * load balance should not be performed, and generic load balance
> + * should take effect. However, if one is busy and the other is not,
> + * the preferred LLC capacity(50%) and imbalance criteria(20%) should
> + * be considered to determine whether LLC aggregation should be
> + * performed to bias the load towards the preferred LLC.
> + */
> +
> +/* migration decision, 3 states are orthogonal. */
> +enum llc_mig {
> + mig_forbid = 0, /* N: Don't migrate task, respect LLC preference */
> + mig_llc, /* Y: Do LLC preference based migration */
> + mig_unrestricted /* G: Don't restrict generic load balance migration */
> +};
> +
> +/*
> + * Check if task can be moved from the source LLC to the
> + * destination LLC without breaking cache aware preferrence.
> + * src_cpu and dst_cpu are arbitrary CPUs within the source
> + * and destination LLCs, respectively.
> + */
> +static enum llc_mig can_migrate_llc(int src_cpu, int dst_cpu,
> + unsigned long tsk_util,
> + bool to_pref)
> +{
> + unsigned long src_util, dst_util, src_cap, dst_cap;
> +
> + if (!get_llc_stats(src_cpu, &src_util, &src_cap) ||
> + !get_llc_stats(dst_cpu, &dst_util, &dst_cap))
> + return mig_unrestricted;
> +
> + if (!fits_llc_capacity(dst_util, dst_cap) &&
> + !fits_llc_capacity(src_util, src_cap))
> + return mig_unrestricted;
> +
> + src_util = src_util < tsk_util ? 0 : src_util - tsk_util;
> + dst_util = dst_util + tsk_util;
> + if (to_pref) {
> + /*
> + * Don't migrate if we will get preferred LLC too
> + * heavily loaded and if the dest is much busier
> + * than the src, in which case migration will
> + * increase the imbalance too much.
> + */
> + if (!fits_llc_capacity(dst_util, dst_cap) &&
> + util_greater(dst_util, src_util))
> + return mig_forbid;
> + } else {
> + /*
> + * Don't migrate if we will leave preferred LLC
> + * too idle, or if this migration leads to the
> + * non-preferred LLC falls within sysctl_aggr_imb percent
> + * of preferred LLC, leading to migration again
> + * back to preferred LLC.
> + */
> + if (fits_llc_capacity(src_util, src_cap) ||
> + !util_greater(src_util, dst_util))
> + return mig_forbid;
> + }
> + return mig_llc;
> +}
> +
> +/*
> + * Check if task p can migrate from source LLC to
> + * destination LLC in terms of cache aware load balance.
> + */
> +static __maybe_unused enum llc_mig can_migrate_llc_task(int src_cpu, int dst_cpu,
> + struct task_struct *p)
> +{
> + struct mm_struct *mm;
> + bool to_pref;
> + int cpu;
> +
> + mm = p->mm;
> + if (!mm)
> + return mig_unrestricted;
> +
> + cpu = mm->sc_stat.cpu;
> + if (cpu < 0 || cpus_share_cache(src_cpu, dst_cpu))
> + return mig_unrestricted;
> +
> + if (cpus_share_cache(dst_cpu, cpu))
> + to_pref = true;
> + else if (cpus_share_cache(src_cpu, cpu))
> + to_pref = false;
> + else
> + return mig_unrestricted;
> +
> + return can_migrate_llc(src_cpu, dst_cpu,
> + task_util(p), to_pref);
> +}
> +
> #else
> static inline bool get_llc_stats(int cpu, unsigned long *util,
> unsigned long *cap)