Re: [PATCH 15/19] sched/fair: Respect LLC preference in task migration and detach

From: Chen, Yu C

Date: Wed Oct 29 2025 - 10:24:03 EST

On 10/29/2025 11:54 AM, K Prateek Nayak wrote:

[snip]

@@ -10227,6 +10233,20 @@ static int detach_tasks(struct lb_env *env)
          if (env->imbalance <= 0)
              break;
+#ifdef CONFIG_SCHED_CACHE
+        /*
+         * Don't detach more tasks if the remaining tasks want
+         * to stay. We know the remaining tasks all prefer the
+         * current LLC, because after order_tasks_by_llc(), the
+         * tasks that prefer the current LLC are at the tail of
+         * the list. The inhibition of detachment is to avoid too
+         * many tasks being migrated out of the preferred LLC.
+         */
+        if (sched_cache_enabled() && detached && p->preferred_llc != -1 &&
+            llc_id(env->src_cpu) == p->preferred_llc)
+            break;

In all cases? Should we check can_migrate_llc() wrt to util migrated and
then make a call if we should move the preferred LLC tasks or not?

Prior to this "stop of detaching tasks", we performed a can_migrate_task(p)
to determine if the detached p is dequeued from its preferred LLC, and in
can_migrate_task(), we use can_migrate_llc_task() -> can_migrate_llc() to
carry out the check. That is to say, only when certain tasks have been
detached, will we stop further detaching.

Perhaps disallow it the first time if "nr_balance_failed" is 0 but
subsequent failed attempts should perhaps explore breaking the preferred
llc restriction if there is an imbalance and we are under
"mig_unrestricted" conditions.

I suppose you are suggesting that the threshold for stopping task detachment
should be higher. With the above can_migrate_llc() check, I suppose we have
raised the threshold for stopping "task detachment"?

Say the LLC is under heavy load and we only have overloaded groups.
can_migrate_llc() would return "mig_unrestricted" since
fits_llc_capacity() would return false.

Since we are under "migrate_load", sched_balance_find_src_rq() has
returned the CPU with the highest load which could very well be the
CPU with with a large number of preferred LLC tasks.

sched_cache_enabled() is still true and when detach_tasks() reaches
one of these preferred llc tasks (which comes at the very end of the
tasks list), we break out even if env->imbalance > 0 leaving
potential imbalance for the "migrate_load" case.

Instead, we can account for the util moved out of the src_llc and
after accounting for it, check if can_migrate_llc() would return
"mig_forbid" for the src llc.

I see your point, the original decision matrix intends to
spread the tasks when both LLCs are overloaded.
(src is the preferred LLC, dst is non-preferred LLC)

src \ dst 30% 40% 50% 60%
30% N N N N
40% N N N N
50% N N G G
60% Y N G G

src : src_util
dst : dst_util
Y : Yes, migrate
N : No, do not migrate
G : let the Generic load balance to even the load.

I suppose the reason why the code breaks the rule here is because
as Tim mentioned in another thread, to inhibit the task bouncing
between LLCs.

thanks,
Chenyu